Week 2 - Day 2

Fall 2022

Jared Joseph

September 14, 2022

Overview

Timeline

  • Things to keep in mind
  • Learn about R/R studio (Posit)
  • R vocab
  • Functions
  • Vectors
  • Dataframes

Goal

To introduce R and R Studio, as well as basic data terms.

Things to keep in mind

  • Coding is in no way “natural”
  • The first step is often hardest
  • Mistakes are part of the process
  • You won’t break anything (yet)
  • There is hardly ever a right way to do things
  • Work with others
  • Try to think like the computer
  • Treat it like a puzzle game

R Studio (Posit) Interface

Relationship between R and R Studio (Posit)

R

R Studio (Posit)

Tour of the Interface

Coding in R

Project Setup

Where Code Lives

Console

Where code does the thing

Scripts

Where you plan doing the thing

Quarto

Where you write about the thing

Key Terms

  • R stores data as objects, think of them like a box you can put things in
  • You put things in a box using an assignment
  • The stuff that goes in objects have various data types, namely:
    • logical - TRUE or FALSE
    • integer - whole numbers like 1, 5, 100
    • numeric - numbers with decimal places like 5.25
    • character - anything with letters
    • factors - for categorical data, numbers with descriptive labels
    • NA - NAs are missing values
  • Ask R to do things to stuff in the boxes using functions (think verbs)
    • count the things in the box
    • empty the box
    • put the box away
  • A function has arguments which describes how to do the thing
    • lift the box carefully
    • Only take red things from the box

Wikimedia - Luisalvaz

Code as Language

We can ask R to do things using the language it understands, R code.


Say we want to ask R to:

Take the sum of 5, 5, and 10, and put in in a box called “total.”

Take the sum of 5, 5, and 10, and put the results in a box called “total.”

  • function
  • argument
  • assignment
  • object

R would understand

total <- sum(5, 5, 10)

Anatomy of a Function

total <- sum(5, 5, 10)


object <- function(arguments)

You can always learn more about a function using ?, for example ?sum.

Vectors

A vector is an organized arrangement of data.

  • The order matters
  • One vector can only hold one type of data
  • You can make vectors in R with c()
```{r}
example_vector <- c(5, 5, 10)
example_vector
```
[1]  5  5 10


```{r}
example_colors <- c("purple", "orange", "periwinkle")
example_colors
```
[1] "purple"     "orange"     "periwinkle"

Data Fames

A dataframe is a square organization of vectors.

  • Like a spreadsheet
  • Rows are cases, columns are variables
```{r}
example_dataframe = data.frame(example_vector, example_colors)
```
X. example_vector example_colors
1 5 purple
2 5 orange
3 10 periwinkle

Subsetting

A subset of data is a smaller selection of the total data set.

Learning how to effectively subset is one of the most foundational skills in data science.

Base R

example_vector[<positions>]

example_dataframe$<column>
example_dataframe[<rows>, <columns>]

Tidyverse (dplyr)

example_dataframe %>%
  slice(<rows>) %>%
  select(<columns>)

Subsetting by Position

You can ask for data in a specific position in a vector by giving it the number of that position. For example:

vector <- c(1, 3, 5, 7, 9, 11)

  • vector[1]: c(1)
  • vector[2]: c(3)
  • vector[c(1, 2)]: c(1, 3)
  • vector[c(1, 2, 5)]: c(1, 3, 9)

Subsetting Syntax

Vectors

Ask for a subset of a vector using the following format. In English:

Give me vector, such that position is equal to X.

In R Code:

vector[position]


Dataframes

Ask for a sunset of a dataframe using the following format. In English:

Give me dataframe, such that rows are equal to X, and columns are equal to Y.

In R Code:

dataframe[rows, columns] OR dataframe$column

Adding to Dataframes

You can use the same tools to take parts from a dataframe to add to it.

example_dataframe

example_dataframe = data.frame("name" = c("Sam", "Frodo", "Pippin", "Merry"), "number" = c(7, 8, 3, 6))
    name number
1    Sam      7
2  Frodo      8
3 Pippin      3
4  Merry      6

example_vector

example_vector = c("blue", "green", "yellow", "red")
[1] "blue"   "green"  "yellow" "red"   

Combine them!

example_dataframe$new_column <- example_vector
    name number new_column
1    Sam      7       blue
2  Frodo      8      green
3 Pippin      3     yellow
4  Merry      6        red

Conditionals

Conditionals help you ask for things when a condition is TRUE.

  • == - Equal to
  • != - Not equal to
  • > - Greater than
  • >= - Greater than or equal to
  • < - Less than
  • <= - Less than or equal to

For example:

vector <- c(1, 3, 5, 7, 9, 11)

  • vector[vector > 5]: c(7, 9, 11)
  • vector[vector <= 5]: c(1, 3, 5)
  • vector[vector == 5]: c(5)
  • vector[vector != 5]: c(1, 3, 7, 9, 11)

Code Along

For Next Time

Topic

Lab 1: Working with R

To-Do

Finish problem set