Data in R

The vector is the core R object. It is a collection of either numbers or character strings (although not both in the same collection).

Create vectors:

  1. use the c() function to combine or concatenate:
primes <- c(2,3,5,7)
primes
## [1] 2 3 5 7
colors <- c("red", "orange", "yellow")
colors
## [1] "red"    "orange" "yellow"

note that RStudio automatically shades colors names in vectors, for improved readability (this doens’t change the values of the vector)

  1. Use : to create a sequence of integers
counting_numbers <- 0:9
counting_numbers
##  [1] 0 1 2 3 4 5 6 7 8 9
  1. Create sequence of equally spaced points using seq()
one_hundred_between_0_and_1 <- seq(from = 0, to = 1, by = .01)
one_hundred_between_0_and_1
##   [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
##  [16] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
##  [31] 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44
##  [46] 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
##  [61] 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73 0.74
##  [76] 0.75 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89
##  [91] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
  1. Create a random sample from a particular distribution
normal_sample <- rnorm(10, mean = 0, sd = 1)
normal_sample
##  [1] -0.6990584 -1.4387050 -0.6291341  0.8715861  0.8281812 -0.1566125
##  [7]  1.2265386  0.1074003 -0.5505666  0.2449443

Functions on Vectors

We can transform vectors into other vectors using functions

  1. One-to-many functions. We apply a function defined on a single value to a vector, it will apply the function to each element and display the results as a vector.
squares <- counting_numbers^2
squares
##  [1]  0  1  4  9 16 25 36 49 64 81
my_function <- function(x){
  x^2 + 5*x + 1
  
}
my_function(counting_numbers)
##  [1]   1   7  15  25  37  51  67  85 105 127
  1. Many-to-many functions. If we apply an operation (like + or *) to two vectors, it will perform that operation pointwise on the vectors
count_plus_normal <- counting_numbers + normal_sample
count_plus_normal
##  [1] -0.6990584 -0.4387050  1.3708659  3.8715861  4.8281812  4.8433875
##  [7]  7.2265386  7.1074003  7.4494334  9.2449443

What happens if vectors are unequal length?

new_vec <- 0:5
new_vec + normal_sample
## Warning in new_vec + normal_sample: longer object length is not a multiple of
## shorter object length
##  [1] -0.6990584 -0.4387050  1.3708659  3.8715861  4.8281812  4.8433875
##  [7]  1.2265386  1.1074003  1.4494334  3.2449443
  1. Many-to-one functions. Some functions take vector as input and returns a number.
sample_mean <- mean(normal_sample)
sample_mean
## [1] -0.01954261
  1. We can concatenate two vectors into a longer vector using c()
long_vector <- c(counting_numbers, normal_sample)
long_vector
##  [1]  0.0000000  1.0000000  2.0000000  3.0000000  4.0000000  5.0000000
##  [7]  6.0000000  7.0000000  8.0000000  9.0000000 -0.6990584 -1.4387050
## [13] -0.6291341  0.8715861  0.8281812 -0.1566125  1.2265386  0.1074003
## [19] -0.5505666  0.2449443

Data Frames

We can “combine” vectors (of the same length) into a matrix (also known as a data frame).

To do so, use the data.frame() function.

my_data <- data.frame(counting_numbers, normal_sample)
my_data
##    counting_numbers normal_sample
## 1                 0    -0.6990584
## 2                 1    -1.4387050
## 3                 2    -0.6291341
## 4                 3     0.8715861
## 5                 4     0.8281812
## 6                 5    -0.1566125
## 7                 6     1.2265386
## 8                 7     0.1074003
## 9                 8    -0.5505666
## 10                9     0.2449443

Unlike a vector, data frames can contain columns of different types.

new_numbers <- 1:3

number_colors <- data.frame(new_numbers, colors)
number_colors
##   new_numbers colors
## 1           1    red
## 2           2 orange
## 3           3 yellow

In a data frame, columns are treated as variables, and rows are treated as individual observations.

Consider the following data set called mpg from the ggplot2 package:

library(ggplot2)
data(mpg)
mpg
## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # … with 224 more rows
## # ℹ Use `print(n = ...)` to see more rows

To access a particular column from a data frame, use $ as in my_data_frame$variable_name

mpg$model
##   [1] "a4"                     "a4"                     "a4"                    
##   [4] "a4"                     "a4"                     "a4"                    
##   [7] "a4"                     "a4 quattro"             "a4 quattro"            
##  [10] "a4 quattro"             "a4 quattro"             "a4 quattro"            
##  [13] "a4 quattro"             "a4 quattro"             "a4 quattro"            
##  [16] "a6 quattro"             "a6 quattro"             "a6 quattro"            
##  [19] "c1500 suburban 2wd"     "c1500 suburban 2wd"     "c1500 suburban 2wd"    
##  [22] "c1500 suburban 2wd"     "c1500 suburban 2wd"     "corvette"              
##  [25] "corvette"               "corvette"               "corvette"              
##  [28] "corvette"               "k1500 tahoe 4wd"        "k1500 tahoe 4wd"       
##  [31] "k1500 tahoe 4wd"        "k1500 tahoe 4wd"        "malibu"                
##  [34] "malibu"                 "malibu"                 "malibu"                
##  [37] "malibu"                 "caravan 2wd"            "caravan 2wd"           
##  [40] "caravan 2wd"            "caravan 2wd"            "caravan 2wd"           
##  [43] "caravan 2wd"            "caravan 2wd"            "caravan 2wd"           
##  [46] "caravan 2wd"            "caravan 2wd"            "caravan 2wd"           
##  [49] "dakota pickup 4wd"      "dakota pickup 4wd"      "dakota pickup 4wd"     
##  [52] "dakota pickup 4wd"      "dakota pickup 4wd"      "dakota pickup 4wd"     
##  [55] "dakota pickup 4wd"      "dakota pickup 4wd"      "dakota pickup 4wd"     
##  [58] "durango 4wd"            "durango 4wd"            "durango 4wd"           
##  [61] "durango 4wd"            "durango 4wd"            "durango 4wd"           
##  [64] "durango 4wd"            "ram 1500 pickup 4wd"    "ram 1500 pickup 4wd"   
##  [67] "ram 1500 pickup 4wd"    "ram 1500 pickup 4wd"    "ram 1500 pickup 4wd"   
##  [70] "ram 1500 pickup 4wd"    "ram 1500 pickup 4wd"    "ram 1500 pickup 4wd"   
##  [73] "ram 1500 pickup 4wd"    "ram 1500 pickup 4wd"    "expedition 2wd"        
##  [76] "expedition 2wd"         "expedition 2wd"         "explorer 4wd"          
##  [79] "explorer 4wd"           "explorer 4wd"           "explorer 4wd"          
##  [82] "explorer 4wd"           "explorer 4wd"           "f150 pickup 4wd"       
##  [85] "f150 pickup 4wd"        "f150 pickup 4wd"        "f150 pickup 4wd"       
##  [88] "f150 pickup 4wd"        "f150 pickup 4wd"        "f150 pickup 4wd"       
##  [91] "mustang"                "mustang"                "mustang"               
##  [94] "mustang"                "mustang"                "mustang"               
##  [97] "mustang"                "mustang"                "mustang"               
## [100] "civic"                  "civic"                  "civic"                 
## [103] "civic"                  "civic"                  "civic"                 
## [106] "civic"                  "civic"                  "civic"                 
## [109] "sonata"                 "sonata"                 "sonata"                
## [112] "sonata"                 "sonata"                 "sonata"                
## [115] "sonata"                 "tiburon"                "tiburon"               
## [118] "tiburon"                "tiburon"                "tiburon"               
## [121] "tiburon"                "tiburon"                "grand cherokee 4wd"    
## [124] "grand cherokee 4wd"     "grand cherokee 4wd"     "grand cherokee 4wd"    
## [127] "grand cherokee 4wd"     "grand cherokee 4wd"     "grand cherokee 4wd"    
## [130] "grand cherokee 4wd"     "range rover"            "range rover"           
## [133] "range rover"            "range rover"            "navigator 2wd"         
## [136] "navigator 2wd"          "navigator 2wd"          "mountaineer 4wd"       
## [139] "mountaineer 4wd"        "mountaineer 4wd"        "mountaineer 4wd"       
## [142] "altima"                 "altima"                 "altima"                
## [145] "altima"                 "altima"                 "altima"                
## [148] "maxima"                 "maxima"                 "maxima"                
## [151] "pathfinder 4wd"         "pathfinder 4wd"         "pathfinder 4wd"        
## [154] "pathfinder 4wd"         "grand prix"             "grand prix"            
## [157] "grand prix"             "grand prix"             "grand prix"            
## [160] "forester awd"           "forester awd"           "forester awd"          
## [163] "forester awd"           "forester awd"           "forester awd"          
## [166] "impreza awd"            "impreza awd"            "impreza awd"           
## [169] "impreza awd"            "impreza awd"            "impreza awd"           
## [172] "impreza awd"            "impreza awd"            "4runner 4wd"           
## [175] "4runner 4wd"            "4runner 4wd"            "4runner 4wd"           
## [178] "4runner 4wd"            "4runner 4wd"            "camry"                 
## [181] "camry"                  "camry"                  "camry"                 
## [184] "camry"                  "camry"                  "camry"                 
## [187] "camry solara"           "camry solara"           "camry solara"          
## [190] "camry solara"           "camry solara"           "camry solara"          
## [193] "camry solara"           "corolla"                "corolla"               
## [196] "corolla"                "corolla"                "corolla"               
## [199] "land cruiser wagon 4wd" "land cruiser wagon 4wd" "toyota tacoma 4wd"     
## [202] "toyota tacoma 4wd"      "toyota tacoma 4wd"      "toyota tacoma 4wd"     
## [205] "toyota tacoma 4wd"      "toyota tacoma 4wd"      "toyota tacoma 4wd"     
## [208] "gti"                    "gti"                    "gti"                   
## [211] "gti"                    "gti"                    "jetta"                 
## [214] "jetta"                  "jetta"                  "jetta"                 
## [217] "jetta"                  "jetta"                  "jetta"                 
## [220] "jetta"                  "jetta"                  "new beetle"            
## [223] "new beetle"             "new beetle"             "new beetle"            
## [226] "new beetle"             "new beetle"             "passat"                
## [229] "passat"                 "passat"                 "passat"                
## [232] "passat"                 "passat"                 "passat"

ggplot2

Before we create graphics in ggplot we need to load the appropriate package:

library(ggplot2)

Common syntax for ggplot

ggplot(data = ---, aes(x =---, y = ---)) +
  geom_---()

Note that the chunk option {r} was replaced with {r eval = F} to prevent the chunk from running when we knit (since the code is incomplete)

Create a histogram

Histogram for highway mpg

ggplot(data = mpg, aes(x = hwy)) +
  geom_histogram(color = "white", bins = 10)

By adding color = "white" inside the geom_histogram() layer, we outlined each box in white, allowing us to more easily distinguish between boxes.

We also specified the number of boxes using bins = ...

Scatterplots

Create a scatterplot of city vs highway mpg:

ggplot(data = mpg, aes(x = hwy, y = cty )) + 
  geom_point()

ggplot(data = mpg, aes(x = hwy, y = cty )) + 
  geom_point(color = "maroon")

To change the color of points in the plot, we included color = "..." inside the geom_point() layer.

Line Graph

Create the graph of the function \(y = x^2 + 5x +1\)

First, we create a new data frame consisting of x and y coordinates for our function. Note that earlier, we already specified an R function representing the quadratic above. We’ll apply this function to the x values contained in the vector one_hundred_between_0_and_1.

When we create the data frame, we relabled the names of the columns as x and y (instead of their original vector names),

data_for_plotting <- data.frame(X = one_hundred_between_0_and_1, 
                                Y = my_function(one_hundred_between_0_and_1))

Now we plot using geom_line. We use the new names of our variables in the aes() mapping below.

ggplot(data = data_for_plotting, aes(x = X, y= Y )) +
  geom_line()

One final important feature of ggplot. We can plot mutliple functions on the same graph by adding new layers:

Let’s make a new data frame:

last_frame <- data.frame(X = one_hundred_between_0_and_1, Y= one_hundred_between_0_and_1^2)

Now, we start the same as before, and this time, add a new geom_line() layer. Inside this layer, we repeat with the new data frame and new aesthetics. To distinguish between the two plots, we colored the second graph “maroon”.

ggplot(data = data_for_plotting, aes(x = X, y= Y )) +
  geom_line() + 
  geom_line(data = last_frame, aes(x = X, y = Y ), color = "maroon")