Data Structures

As with all programming languages R represents (think stores) different kinds of data in different objects. It is important at a high level to understand what these different data objects allow us to do.

TL;DR

Key takeaway, try to get things to dataframes. These are flexible objects and can store most anything we want.

Vectors

Vectors are one dimensional data objects than can store one type of data.

Aside

In R we use the <- as the assignment operator. Use this to store an object into memory. You can use alt + - on Windows and Option + - to automatically insert this operator. When an object is assigned it sits in memory. You will see it in the “Environment Pane” (upper right in the default layout of R Studio).

Numeric Vector

Let’s built a numeric vector. A shortcut in R is that if you want a sequence of evently spaced digits you can you the colon : to make them. e.g.

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

Now we can make a vector. We can use the c() or concatenate function to joint this sequence into a vector.

# MAke the vector
numeric_vector <- c(1:10)

# Print the Vector
numeric_vector
##  [1]  1  2  3  4  5  6  7  8  9 10

We can also examine the class of the vector to see what kind of data exists in it:

class(numeric_vector)
## [1] "integer"

Character Vectors

Here we can subset the built in data set letters and select the first 10 letters and assign it to a vector. For extra practice print letters in the console to confirm that all 26 english letters are available

character_vector <- letters[1:10]

character_vector
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

Matrices

Matrices are two dimensional versions of vectors. All of the columns must be of the same type. MAtrices are useful for heavy computational tasks, but we often don’t use them in social science research (in my experience). Dealing with microarray absolutely.

numeric_matrix <- matrix(data = 1:10, nrow = 5, ncol = 2, byrow = TRUE)
numeric_matrix
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
## [4,]    7    8
## [5,]    9   10

The byrow argument allows you to specify if you want the matrix to fill in by row or by column.

Dataframe

Dataframes are the most versatile objects in R and allow you to combine vectors of difference types. The important feature is that they all must be the same length.

example_dataframe <- data.frame(numeric_vector= numeric_vector, 
                                character_vector = character_vector, 
                                another_number = c(21:30),
                                another_letter = LETTERS[1:10])

example_dataframe

Lists

Lists are one-dimensional objects (kind of like vectors) which can store any other objects, even other lists.

test_list <- list(
  example_dataframe = example_dataframe,
  numeric_vector = numeric_vector,
  character_vector = character_vector,
  numeric_matrix = numeric_matrix,
  inner_list = list(1:10, letters)
)

test_list
## $example_dataframe
##    numeric_vector character_vector another_number another_letter
## 1               1                a             21              A
## 2               2                b             22              B
## 3               3                c             23              C
## 4               4                d             24              D
## 5               5                e             25              E
## 6               6                f             26              F
## 7               7                g             27              G
## 8               8                h             28              H
## 9               9                i             29              I
## 10             10                j             30              J
## 
## $numeric_vector
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $character_vector
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
## 
## $numeric_matrix
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
## [4,]    7    8
## [5,]    9   10
## 
## $inner_list
## $inner_list[[1]]
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $inner_list[[2]]
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q"
## [18] "r" "s" "t" "u" "v" "w" "x" "y" "z"

You can subset lists using the [[]] syntax by position:

test_list[[1]]

Or by the list name (if it exists):

test_list[["numeric_matrix"]]
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
## [4,]    7    8
## [5,]    9   10


Introduction to R
dewittme.wfu.edu

Office of Institutional Research
309 Reynolda Hall
Winston- Salem, NC, 27106

Michael DeWitt


Copyright © 2018 Michael DeWitt. All rights reserved.