Lab 6. Loops and Apply

Author

Jared Joseph

Introduction

Click here to access the lab on Github Classroom: Github Classroom Assignment for Lab 6: Loops and Apply

Both for() loops and the apply functions greatly expand the amount of data we are able to work with. Combined with our new knowledge of functions, these tools drastically increase what we are able to do in our analyses. Today we will be practicing working with more dynamic data formats–going beyond a single dataframe.

The Data

The data we will be using today comes from PEN America’s Book Bans database. This database attempts to catalog all books banned in US schools between July 2021 and June 2022. While it is impossible to know if this is all of the bans, and it does not account for books being un-banned later, it is still quite the dataset to look through.

I’ve provided a CSV version of the database in the data/ directory of this project, as well as a copy of the data documentation in the docs/ directory. Be sure to refer to it as you work.

Question 1

Load in the banned books CSV and assign it to an object named banned_books

#<REPLACE THIS COMMENT WITH YOR ANSWER>

Tidying the Title

We have some un-tidy data in our banned books dataframe, namely that the Title column contains both the title, and the title of the series if it is part of one. We should fix that.

Question 2

Use either a for() loop or an apply family function to go over the Title column of banned_books and output a dataframe that has columns for:

  • A logical if it was part of a series
  • The name of the series if it had one
  • A clean title for the book

HINT: You may encounter an error when trying to separate the title from the series, related to parentheses. If this happens, try adding the argument fixed = TRUE to the relevant function.

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Question 3

Combine our new book series dataframe with our original banned_books dataframe.

#<REPLACE THIS COMMENT WITH YOR ANSWER>

Splitting the Data

We’re going to be doing some analyses by state in this lab, so our first step will be splitting the data into multiple dataframes. Rather than typing out and sub-setting each manually, we will use a for() loop to do so.

Question 4

Use a for() loop to iterate through each state, sub-setting banned_books for each and assigning them as an element in a new list called state_books. You want to end up with a list, state_books, that has an element for each state that appeared in our dataset, with the content of that element being a dataframe of all the books banned in that state.

HINT: The unique() function will be helpful for building your loop. When given a vector it will return all the unique values in that vector.

#<REPLACE THIS COMMENT WITH YOR ANSWER>

Analysis by State

Now we are going to perform some data manipulation by state. We are going to use lapply() and a custom function to get some highlights per state.

CHALLANGE QUESTION

Use lapply() on our state_books list to apply a custom function to each state’s dataframe. We want this custom function to take as an input the state dataframe, and output a named vector which tells us the following:

  • For the whole state, how many bans there were by “Origin of Challenge”
  • How many of the books banned in the state were translated
  • The names of all authors that had at least 5 books banned

You can either write this function separately and then pass it to lapply(), or write it directly inside lapply() itself.

#<REPLACE THIS COMMENT WITH YOR ANSWER>