The vector is the core R object. It is a collection of either numbers or character strings (although not both in the same collection).
Create vectors:
c()
function to combine or concatenate:primes <- c(2,3,5,7)
primes
## [1] 2 3 5 7
colors <- c("red", "orange", "yellow")
colors
## [1] "red" "orange" "yellow"
note that RStudio automatically shades colors names in vectors, for improved readability (this doens’t change the values of the vector)
counting_numbers <- 0:9
counting_numbers
## [1] 0 1 2 3 4 5 6 7 8 9
seq()
one_hundred_between_0_and_1 <- seq(from = 0, to = 1, by = .01)
one_hundred_between_0_and_1
## [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14
## [16] 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29
## [31] 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44
## [46] 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59
## [61] 0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73 0.74
## [76] 0.75 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89
## [91] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
normal_sample <- rnorm(10, mean = 0, sd = 1)
normal_sample
## [1] -0.6990584 -1.4387050 -0.6291341 0.8715861 0.8281812 -0.1566125
## [7] 1.2265386 0.1074003 -0.5505666 0.2449443
We can transform vectors into other vectors using functions
squares <- counting_numbers^2
squares
## [1] 0 1 4 9 16 25 36 49 64 81
my_function <- function(x){
x^2 + 5*x + 1
}
my_function(counting_numbers)
## [1] 1 7 15 25 37 51 67 85 105 127
count_plus_normal <- counting_numbers + normal_sample
count_plus_normal
## [1] -0.6990584 -0.4387050 1.3708659 3.8715861 4.8281812 4.8433875
## [7] 7.2265386 7.1074003 7.4494334 9.2449443
What happens if vectors are unequal length?
new_vec <- 0:5
new_vec + normal_sample
## Warning in new_vec + normal_sample: longer object length is not a multiple of
## shorter object length
## [1] -0.6990584 -0.4387050 1.3708659 3.8715861 4.8281812 4.8433875
## [7] 1.2265386 1.1074003 1.4494334 3.2449443
sample_mean <- mean(normal_sample)
sample_mean
## [1] -0.01954261
c()
long_vector <- c(counting_numbers, normal_sample)
long_vector
## [1] 0.0000000 1.0000000 2.0000000 3.0000000 4.0000000 5.0000000
## [7] 6.0000000 7.0000000 8.0000000 9.0000000 -0.6990584 -1.4387050
## [13] -0.6291341 0.8715861 0.8281812 -0.1566125 1.2265386 0.1074003
## [19] -0.5505666 0.2449443
We can “combine” vectors (of the same length) into a matrix (also known as a data frame).
To do so, use the data.frame()
function.
my_data <- data.frame(counting_numbers, normal_sample)
my_data
## counting_numbers normal_sample
## 1 0 -0.6990584
## 2 1 -1.4387050
## 3 2 -0.6291341
## 4 3 0.8715861
## 5 4 0.8281812
## 6 5 -0.1566125
## 7 6 1.2265386
## 8 7 0.1074003
## 9 8 -0.5505666
## 10 9 0.2449443
Unlike a vector, data frames can contain columns of different types.
new_numbers <- 1:3
number_colors <- data.frame(new_numbers, colors)
number_colors
## new_numbers colors
## 1 1 red
## 2 2 orange
## 3 3 yellow
In a data frame, columns are treated as variables, and rows are treated as individual observations.
Consider the following data set called mpg
from the
ggplot2
package:
library(ggplot2)
data(mpg)
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # … with 224 more rows
## # ℹ Use `print(n = ...)` to see more rows
To access a particular column from a data frame, use $
as in my_data_frame$variable_name
mpg$model
## [1] "a4" "a4" "a4"
## [4] "a4" "a4" "a4"
## [7] "a4" "a4 quattro" "a4 quattro"
## [10] "a4 quattro" "a4 quattro" "a4 quattro"
## [13] "a4 quattro" "a4 quattro" "a4 quattro"
## [16] "a6 quattro" "a6 quattro" "a6 quattro"
## [19] "c1500 suburban 2wd" "c1500 suburban 2wd" "c1500 suburban 2wd"
## [22] "c1500 suburban 2wd" "c1500 suburban 2wd" "corvette"
## [25] "corvette" "corvette" "corvette"
## [28] "corvette" "k1500 tahoe 4wd" "k1500 tahoe 4wd"
## [31] "k1500 tahoe 4wd" "k1500 tahoe 4wd" "malibu"
## [34] "malibu" "malibu" "malibu"
## [37] "malibu" "caravan 2wd" "caravan 2wd"
## [40] "caravan 2wd" "caravan 2wd" "caravan 2wd"
## [43] "caravan 2wd" "caravan 2wd" "caravan 2wd"
## [46] "caravan 2wd" "caravan 2wd" "caravan 2wd"
## [49] "dakota pickup 4wd" "dakota pickup 4wd" "dakota pickup 4wd"
## [52] "dakota pickup 4wd" "dakota pickup 4wd" "dakota pickup 4wd"
## [55] "dakota pickup 4wd" "dakota pickup 4wd" "dakota pickup 4wd"
## [58] "durango 4wd" "durango 4wd" "durango 4wd"
## [61] "durango 4wd" "durango 4wd" "durango 4wd"
## [64] "durango 4wd" "ram 1500 pickup 4wd" "ram 1500 pickup 4wd"
## [67] "ram 1500 pickup 4wd" "ram 1500 pickup 4wd" "ram 1500 pickup 4wd"
## [70] "ram 1500 pickup 4wd" "ram 1500 pickup 4wd" "ram 1500 pickup 4wd"
## [73] "ram 1500 pickup 4wd" "ram 1500 pickup 4wd" "expedition 2wd"
## [76] "expedition 2wd" "expedition 2wd" "explorer 4wd"
## [79] "explorer 4wd" "explorer 4wd" "explorer 4wd"
## [82] "explorer 4wd" "explorer 4wd" "f150 pickup 4wd"
## [85] "f150 pickup 4wd" "f150 pickup 4wd" "f150 pickup 4wd"
## [88] "f150 pickup 4wd" "f150 pickup 4wd" "f150 pickup 4wd"
## [91] "mustang" "mustang" "mustang"
## [94] "mustang" "mustang" "mustang"
## [97] "mustang" "mustang" "mustang"
## [100] "civic" "civic" "civic"
## [103] "civic" "civic" "civic"
## [106] "civic" "civic" "civic"
## [109] "sonata" "sonata" "sonata"
## [112] "sonata" "sonata" "sonata"
## [115] "sonata" "tiburon" "tiburon"
## [118] "tiburon" "tiburon" "tiburon"
## [121] "tiburon" "tiburon" "grand cherokee 4wd"
## [124] "grand cherokee 4wd" "grand cherokee 4wd" "grand cherokee 4wd"
## [127] "grand cherokee 4wd" "grand cherokee 4wd" "grand cherokee 4wd"
## [130] "grand cherokee 4wd" "range rover" "range rover"
## [133] "range rover" "range rover" "navigator 2wd"
## [136] "navigator 2wd" "navigator 2wd" "mountaineer 4wd"
## [139] "mountaineer 4wd" "mountaineer 4wd" "mountaineer 4wd"
## [142] "altima" "altima" "altima"
## [145] "altima" "altima" "altima"
## [148] "maxima" "maxima" "maxima"
## [151] "pathfinder 4wd" "pathfinder 4wd" "pathfinder 4wd"
## [154] "pathfinder 4wd" "grand prix" "grand prix"
## [157] "grand prix" "grand prix" "grand prix"
## [160] "forester awd" "forester awd" "forester awd"
## [163] "forester awd" "forester awd" "forester awd"
## [166] "impreza awd" "impreza awd" "impreza awd"
## [169] "impreza awd" "impreza awd" "impreza awd"
## [172] "impreza awd" "impreza awd" "4runner 4wd"
## [175] "4runner 4wd" "4runner 4wd" "4runner 4wd"
## [178] "4runner 4wd" "4runner 4wd" "camry"
## [181] "camry" "camry" "camry"
## [184] "camry" "camry" "camry"
## [187] "camry solara" "camry solara" "camry solara"
## [190] "camry solara" "camry solara" "camry solara"
## [193] "camry solara" "corolla" "corolla"
## [196] "corolla" "corolla" "corolla"
## [199] "land cruiser wagon 4wd" "land cruiser wagon 4wd" "toyota tacoma 4wd"
## [202] "toyota tacoma 4wd" "toyota tacoma 4wd" "toyota tacoma 4wd"
## [205] "toyota tacoma 4wd" "toyota tacoma 4wd" "toyota tacoma 4wd"
## [208] "gti" "gti" "gti"
## [211] "gti" "gti" "jetta"
## [214] "jetta" "jetta" "jetta"
## [217] "jetta" "jetta" "jetta"
## [220] "jetta" "jetta" "new beetle"
## [223] "new beetle" "new beetle" "new beetle"
## [226] "new beetle" "new beetle" "passat"
## [229] "passat" "passat" "passat"
## [232] "passat" "passat" "passat"
Before we create graphics in ggplot
we need to load the
appropriate package:
library(ggplot2)
Common syntax for ggplot
ggplot(data = ---, aes(x =---, y = ---)) +
geom_---()
Note that the chunk option {r} was replaced with {r eval = F} to prevent the chunk from running when we knit (since the code is incomplete)
Histogram for highway mpg
ggplot(data = mpg, aes(x = hwy)) +
geom_histogram(color = "white", bins = 10)
By adding color = "white"
inside the
geom_histogram()
layer, we outlined each box in white,
allowing us to more easily distinguish between boxes.
We also specified the number of boxes using
bins = ...
Create a scatterplot of city vs highway mpg:
ggplot(data = mpg, aes(x = hwy, y = cty )) +
geom_point()
ggplot(data = mpg, aes(x = hwy, y = cty )) +
geom_point(color = "maroon")
To change the color of points in the plot, we included
color = "..."
inside the geom_point()
layer.
Create the graph of the function \(y = x^2 + 5x +1\)
First, we create a new data frame consisting of x and y coordinates
for our function. Note that earlier, we already specified an R function
representing the quadratic above. We’ll apply this function to the x
values contained in the vector
one_hundred_between_0_and_1
.
When we create the data frame, we relabled the names of the columns as x and y (instead of their original vector names),
data_for_plotting <- data.frame(X = one_hundred_between_0_and_1,
Y = my_function(one_hundred_between_0_and_1))
Now we plot using geom_line
. We use the new names of our
variables in the aes()
mapping below.
ggplot(data = data_for_plotting, aes(x = X, y= Y )) +
geom_line()
One final important feature of ggplot
. We can plot
mutliple functions on the same graph by adding new layers:
Let’s make a new data frame:
last_frame <- data.frame(X = one_hundred_between_0_and_1, Y= one_hundred_between_0_and_1^2)
Now, we start the same as before, and this time, add a new
geom_line()
layer. Inside this layer, we repeat with the
new data frame and new aes
thetics. To distinguish between
the two plots, we colored the second graph “maroon”.
ggplot(data = data_for_plotting, aes(x = X, y= Y )) +
geom_line() +
geom_line(data = last_frame, aes(x = X, y = Y ), color = "maroon")