# Load our API library
library(rscorecard)
# Set our API key
sc_key(<SCORECARD_KEY>)
Lab 4. Advanced Plotting
Introduction
Click here to access the lab on Github Classroom: Github Classroom Assignment for Lab 4: Advanced Plotting
Visualizations help us understand our own data, and communicate our data to other people. Today we will be focussing on making visuals that we would want to share with others. While there are some good rules of thumb, visualizations are an art as well as a science. Take some time in making visualizations you think look good, as long as they are still faithful to the underlying data.
Some things to keep in mind:
- What is the story of your visualization?
- Does your visualization leave any question marks for the viewer?
- Is there anything you can remove from your visualization and still keep your message clear?
- Did you include the data source?
- Have you tried to make your visualization accessable?
The Data
The data we will be using today comes from the U.S. Department of Education College Scorecard. The college scorecard collects important data about universities across the country such as cost of attendance, acceptance rates, graduation rates, and the income of students from that institution after graduation. It also gives information on the student body, including how graduates from each major do after graduation. If you are curious, you can see Smith’s page here.
The full scorecard data set is huge. It includes information about over 6500 institutions in the U.S., and has more than 3000 columns documenting information about those institutions. Today we will be using the rscorecard
package to get a subset of the data.
To use the rscorecard
package, you will need to get an Application Programming Interface (API) key. API keys grant you direct access to data that is often otherwise limited. You get to pull data directly into R without needing to download files from the web, and the provider gets to limit the data you can actually get. This arrangement is usually beneficial for everyone.
To get an API key for the college scorecard data, you will need to request one on the data.gov web portal. Once you fill out your information, you should get an email almost immediately with a personalized API key. This key is unique to you, so it is important to keep it safe. You should never include your API keys in a code file, especially those you commit with git. Other people can look through your git history and find your personal key to use for nefarious purposes!
Once you have your key, run the following command in your console, replacing <SCORECARD_KEY> with your unique key:
This will save your key to your R environment. You will need to re-run sc_key()
whenever you restart R, so it may be helpful to save your key somewhere safe. Run the following code to get our data for today:
# set what variables we want
# school context
= c("unitid", "instnm", "city", "highdeg", "control",
scorecard_variables_context "hbcu", "annhi", "tribal", "aanapii", "hsi", "nanti")
# student info
= c("unitid", "instnm", "ugds", "adm_rate",
scorecard_variables_students "costt4_a", "costt4_p", "pcip27", "pctfloan",
"pctpell", "admcon7", "cdr3")
# Get context data
<- sc_init() |> # Set up our API 'call'
scorecard_2020_context sc_year(2020) |> # Set the year to only 2020
sc_filter(stabbr == "MA") |> # Ask for only MA data
sc_select_(scorecard_variables_context) |> # Set variables
sc_get() # Get the thing!
<- sc_init() |>
scorecard_2017_context sc_year(2017) |>
sc_filter(stabbr == "MA") |>
sc_select_(scorecard_variables_context) |>
sc_get()
<- sc_init() |>
scorecard_2014_context sc_year(2014) |>
sc_filter(stabbr == "MA") |>
sc_select_(scorecard_variables_context) |>
sc_get()
# Get student data
<- sc_init() |>
scorecard_2020_student sc_year(2020) |>
sc_filter(stabbr == "MA") |>
sc_select_(scorecard_variables_students) |>
sc_get()
<- sc_init() |>
scorecard_2017_student sc_year(2017) |>
sc_filter(stabbr == "MA") |>
sc_select_(scorecard_variables_students) |>
sc_get()
<- sc_init() |>
scorecard_2014_student sc_year(2014) |>
sc_filter(stabbr == "MA") |>
sc_select_(scorecard_variables_students) |>
sc_get()
We now have six dataframes containing data for MA universities and colleges. You will also need to download the scorecard documentation for this lab to understand the variables. You can find both the Data Dictionary and the Technical Documentation on the scorecard website. I would save both in the docs/
directory within your project folder.
Once you have downloaded the documentation, take some time to read up about each of the variables we will be using. The search function is helpful here.
#<REPLACE THIS COMMENT WITH YOR ANSWER>
Exploratory Data Analyses (EDA) & Cleaning
Once you have combined the data and familiarized yourself with the variables, we need to take some time to understand the dataset.
#<REPLACE THIS COMMENT WITH YOR ANSWER>
#<REPLACE THIS COMMENT WITH YOR ANSWER>
Communicating with Plots
#<REPLACE THIS COMMENT WITH YOR ANSWER>
#<REPLACE THIS COMMENT WITH YOR ANSWER>
#<REPLACE THIS COMMENT WITH YOR ANSWER>