Lab 1. Fundamental R Coding

Author

Jared Joseph

Introduction

Click here to access the lab on Github Classroom: Github Classroom Assignment for Lab 1: R Coding

This lab will provide an opportunity to practice fundamental R coding skills with real-world data. We will be looking at data from the United Nations (UN) World Population Prospects 20221, which for 27 years has been providing population estimates for countries around the world. This year their estimates span 237 countries, with historical data from 1950-2022, and projections out to 2100. Such population estimates play a crucial function in many long-term planning activities. This includes projects such as predicting and preparing for climate crises, anticipating global health needs, and even projecting targets for business expansion.

This lab marks your first step into understanding the world through your own data analyses. The first step is often the hardest. But by the end of this lab you will already have more data skills than the vast majority of the population. With this, take your first step towards understanding, and through understanding towards change.

Understanding the Data Source

Before we start to work with the data, we need to understand it. What do all of the variables (columns) mean? Let’s start by looking at the data itself. In this project repo (repository/folder), navigate to the “data” folder, and open the file named “WPP2022_Demographic_Indicators_Medium.csv”. This is the raw data file from the United Nations. There are a lot of numbers here, and some of the variable names are nonsense. What does “LE15” even mean?!

This is where data documentation comes in. It is all the meta-data about the data you are looking at. It often explains how the data was made, and what all the data means. Open the file named “WPP2022_Demographic_Indicators_notes.csv” to start looking at some documentation. This file explains what each of the main data columns mean in terms of what is measured, and the units they are measured in. For example, the aforementioned “LE15” stands for “Life Expectancy at Age 15, both sexes,” and is measured in years. That means in the data file, for row 2 and column LE15, for the whole world in 1950, a person at 15 years old had a life expectancy of 47.0395 years.

Question 1

Using the “WPP2022_Demographic_Indicators_notes.csv” file, what variable corresponds to “median age”?

REPLACE THIS TEXT WITH YOR ANSWER

Question 2

Using the data from “WPP2022_Demographic_Indicators_Medium.csv,” what was the median age in the United States in 2020?

REPLACE THIS TEXT WITH YOR ANSWER

Using the same data, what was the median age in Japan in 2020?

REPLACE THIS TEXT WITH YOR ANSWER

We can also look into the sources of the data compiled by the UN. You can find that in the file called “WPP2022_F02_METADATA.XLSX.” The UN relies on a lot of different sources. This can result in impressive looking results, but may hide some issues.

Question 3

Thinking generally, what problems do you think may arise when combining so many different data sets? What problems can occur?

REPLACE THIS TEXT WITH YOR ANSWER

Question 4

More generally, what, if any, problems do you see with how the data is defined? Do you agree with the definitions of all the viaralbes? Is there anything you think is missing?

REPLACE THIS TEXT WITH YOR ANSWER

Loading in the Data

Now that we have an idea of what the data is, let’s work on getting it into R. Using a function we learned this week, read the dataframe in the data folder of this project into R.

Tip

We saw the function to read in data using a web URL in the problem sets, but it works with files on your computer too! Local files (on your computer) can also benefit from auto-complete. Start typing the name of the file in quotes within the function as an argument, and press the TAB key. You should get some suggestions.

Question 5

Read in the UN data file, assigning it to an object called un_pop.

#<REPLACE THIS COMMENT WITH YOR ANSWER>

Understanding the Data

Now that we have our UN Data in R, let’s start to explore it a bit.

Question 6

Use a function to examine the dimensions of the un_pop dataframe and get the type for each of the variables.

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Question 7

Open the un_pop dataframe in a viewer window and scroll through it. How are the rows organized? What sort of groupings do you see?

Learning from the Data

Now that we have the data loaded into R, let’s see what we can learn from this data.

Question 8

Across all recorded years, for the whole world, what is the mean “Life Expectancy at Birth, both sexes”?

Tip

You may have a problem with only getting an NA here, check the help file of your function to see if there are any arguments that have to do with NAs.

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Question 9

Across all recorded years, what is the mean “Life Expectancy at Birth, both sexes” for just the United States of America?

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Question 10

Across all recorded years, what is the mean “Life Expectancy at Birth, both sexes” for just Japan?

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Question 11

Create a new dataframe called pre_40_mort, which is a subset of un_pop containing the columns for “Location”, “Time”, “Mortality before Age 40, both sexes”, “Male mortality before Age 40”, and “Female mortality before Age 40”.

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Question 12

Create a new column called mort_diff in our pre_40_mort dataframe which finds the difference between “Male mortality before Age 40” and “Female mortality before Age 40”.

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Question 13

Interpret the results in our new mort_diff column. What do those numbers mean?

REPLACE THIS TEXT WITH YOR ANSWER

CHALLANGE QUESTION

Calculate the difference between the mean “Net Number of Migrants” for the United States of America and Canada for the years 2000-2010.

#<REPLACE THIS COMMENT WITH YOR ANSWER>
Tip

It is not required, but your code will be a bit easier if you understand Logical Operators, specifically the & sign.

Footnotes

  1. United Nations, Department of Economic and Social Affairs, Population Division (2022). World Population Prospects 2022, Online Edition.↩︎