[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
Fall 2022
October 26, 2022
To learn the differences and use cases for lists and the apply family of functions.
In R, iterating on something is working through a vector one element at a time.
Vector = c(2, 4, 6, 8, 10)
for(X in Y) { Do Z }
Useful when:
Lists are kinda like super-vectors (JSON-like).
They can contain anything in their elements. You could have:
Getting the content of lists requires special syntax!
Each list element is accessed using double square brackets [[ ]]
```{r}
test_list = list("num_vec" = c(1, 2, 3, 4, 5),
"let_vec" = c("a", "b", "c", "c"),
"df" = head(mtcars))
test_list
```
$num_vec
[1] 1 2 3 4 5
$let_vec
[1] "a" "b" "c" "c"
$df
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
IF
example_vec_1 = c(1, 2, 3)
example_vec_2 = c(“a”, “b”, “c”)
AND
example_list = list(example_vec_1, example_vec_2)
THEN
example_list[[1]] == example_vec_1 == c(1, 2, 3)
example_list[[2]][3] == example_vec_2[3] == “c”
The apply family of functions take every element of a sequence, and does the same thing to all parts.
apply(X, FUN = function)
Apply does the same thing to each element (roughly) all at once.
Apply FUN to element 1 in X.
Apply FUN to element 2 in X.
Apply FUN to element 3 in X.
Apply FUN to element 4 in X.
Apply FUN to element 5 in X.
Apply FUN to element 6 in X.
Apply FUN to element 7 in X.
…
Loops iterate through every element of a sequence one element at a time.
This allows dependence.
Apply functions apply the given functions to every element (roughly) at the same time.
This does not allow dependence.
c( 2, 4, 6, 8, 10 )
lapply
returns a list of the same length as X, each element of which is the result of applying FUN to the corresponding element of X.
For every column in mtcars
, apply the mean()
function.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
sapply
is similar tolapply
, but it returns a vector if it can. Be careful as it’s results can surprise you!
For every column in mtcars
, apply the mean()
function.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
apply
is used for matrices or dataframes. You can supply theMARGIN
argument to make it work over rows or columns.
For every column and then every row in mtcars
, apply the mean()
function.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mpg cyl disp hp drat wt qsec
20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
vs am gear carb
0.437500 0.406250 3.687500 2.812500
You can pass any function to FUN
, including one you write!
This means you can do anything over a large collection of data.
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
```{r}
lapply(X = mtcars, FUN = function(car){
# get the largest value
largest = max(car)
# get the smallest value
smallest = min(car)
# get the difference
result = largest - smallest
# return the difference
return(result)
})
```
$mpg
[1] 23.5
$cyl
[1] 4
$disp
[1] 400.9
$hp
[1] 283
$drat
[1] 2.17
$wt
[1] 3.911
$qsec
[1] 8.4
$vs
[1] 1
$am
[1] 1
$gear
[1] 2
$carb
[1] 7
The built-in parallel
package in R offers several tools to run code in parallel.
Mostly, these take the form on apply family functions.
There can be no dependence between elements.
Lab 6 & Quiz 2 Open
SDS 192-03: Intro to Data Science