navigation

Intro to Data Science

Resource LinkDescription
SyllabusQuick access to all important course information.
MoodleGrades and quizzes will be available on the course Moodle.
SlackMain communication channel for the course.
PerusallPerusall page for course readings
Spinelli CenterThe Spinelli Center offers drop-in tutoring hours in Sabin-Reed 301 or on Zoom.
Office HoursSign up for a slot in office hours here.

Overview

InfoValue
WhoDr. Jared Joseph
WhatSDS 192-03: Introduction to Data Science
WhenMondays 1:40-2:55pm; Wednesday/Friday 1:20-2:35pm
WhereStoddard G2

Schedule

Below is the tentative schedule for the course. While we will try to keep to this schedule, unanticipated situations (and mountain day) may require us to adjust. Each row is a class meeting, with the readings and assignments due on that day listed.

WeekDateTopicReadingsDue
19/5/2022 (Mon)No Class
19/7/2022 (Wed)Introduction
19/9/2022 (Fri)What is Data?
  1. Class Syllabus
  2. Kitchin, R., & Lauriault, T. P. (2018). Toward Critical Data Studies: Charting and Unpacking Data Assemblages and Their Work. in J. Thatcher, J. Eckert, & A. Shears (Eds.), Thinking Big Data in Geography: New Regimes, New Research (pp. 3-20). University of Nebraska Press.
  1. Welcome Survey
  2. Data Survey
  3. Install Slack and join the class workspace
29/12/2022 (Mon)Install Day
29/14/2022 (Wed)Intro to R/R Studio (Posit)
  1. Irizarry, R. A. (2022). Chapter 2 R basics | Introduction to Data Science. In Introduction to Data Science.
  1. All Software Installed
29/16/2022 (Fri)Lab 0 & LAB 1
39/19/2022 (Mon)Intro to git/GitHub
  1. Bryan, J. (2018). Excuse Me, Do You Have a Moment to Talk About Version Control? American Statistician, 72(1), 20-27.
39/21/2022 (Wed)Exploratory Data Analyses
  1. Irizarry, R. A. (2022). Chapter 12 Robust summaries. In Introduction to Data Science.
  2. [PAGES 3-12] Grant, R. (2019). Why visualize? in Data Visualization: Charts, Maps, and Interactive Graphics. Chapman and Hall/CRC.
  3. Holtz, Y., & Healy, C. (2018). The issue with pie chart in From data to Viz.
  4. Holtz, Y., & Healy, C. (2018). Venn Diagram in From data to Viz.
  5. Holtz, Y., & Healy, C. (2018). Line chart in From data to Viz.
  6. Holtz, Y., & Healy, C. (2018). Barplot in From data to Viz.
  7. Holtz, Y., & Healy, C. (2018). Scatter plot in From data to Viz.
  8. Holtz, Y., & Healy, C. (2018). Histogram in From data to Viz.
  9. Holtz, Y., & Healy, C. (2018). The Boxplot and its pitfalls in From data to Viz.
  1. Lab 0
  2. Lab 1
39/23/2022 (Fri)LAB 2
49/26/2022 (Mon)Tidy Data/Long-Wide
  1. [Section 6.1-6.3] Baumer, B. S., Kaplan, D. T., & Horton, N. J. (2021). Chapter 6 Tidy data. In Modern Data Science with R. CNC Press.
49/28/2022 (Wed)Aggregation and Merging
  1. Ismay, C., & Kim, A. Y. (2022). Chapter 3 Data Wrangling. In Statistical Inference via Data Science. CNC Press.
  2. [PAGES 1701-1731] Ohm, P. (2009). Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization (SSRN Scholarly Paper No. 1450006).
  1. Lab 2
49/30/2022 (Fri)LAB 3/Quiz 1 Open
510/3/2022 (Mon)Advanced Plotting
  1. Reynolds, P. (2021). 5 Principles of Visual Perception in Principles of Data Visualization.
  2. Irizarry, R. A. (2022). Chapter 8 ggplot2 | Introduction to Data Science. In Introduction to Data Science.
  3. Leo, S. (2019, March 27). Mistakes, weve drawn a few. Medium.
510/5/2022 (Wed)Dynamic Plotting
  1. Scroll through Spurious Correlations
  2. Explore U.S. Gun Deaths
  3. Sievert, C. (2019). 1 Preface. In Interactive web-based data visualization with R, plotly, and shiny.
  4. Holtz, Y. (2018). Interactive charts | the R Graph Gallery.
  1. Lab 3
510/7/2022 (Fri)LAB 4
  1. Quiz 1
610/10/2022 (Mon)No Class
610/12/2022 (Wed)Data Science Ethics
  1. Baumer, B. S., Kaplan, D. T., & Horton, N. J. (2021). Chapter 8 Data science ethics. In Modern Data Science with R. CNC Press.
  1. Lab 4
610/14/2022 (Fri)Project 1
710/17/2022 (Mon)Functions
  1. Grolemund, G., & Wickham, H. (2017). 19 Functions. In R for Data Science. O'Reilly.
  2. Kyle Hill (Director). (2022, August 31). How history's worst software error weaponized a radiation machine.
710/19/2022 (Wed)R Debugging & Conditions
  1. Bryan, J., & Hester, J. (2021). Chapter 11 Debugging R code. In What They Forgot to Teach You About R.
710/21/2022 (Fri)LAB 5
810/24/2022 (Mon)Iteration
  1. Wickham, H., & Grolemund, G. (2017). 21 Iteration. In R for Data Science. O'Reilly.
810/26/2022 (Wed)Lists and Apply
  1. Peng, R. D. (2022). 24 Parallel Computation. In R Programming for Data Science.
  1. Lab 5
810/28/2022 (Fri)LAB 6/Quiz 2 Open
  1. Project 1
910/31/2022 (Mon)Bash
  1. Irizarry, R. A. (2022). Chapter 39 Organizing with Unix. In Introduction to Data Science.
911/2/2022 (Wed)Advanced git/GitHub
  1. Turing Way Community. (2022). Git Branches. In The Turing Way: A handbook for reproducible, ethical and collaborative research. Zenodo.
  2. Turing Way Community. (2022). Merging Branches in Git. In The Turing Way: A handbook for reproducible, ethical and collaborative research. Zenodo.
  3. Turing Way Community. (2022). Retrieving and Comparing Versions. In The Turing Way: A handbook for reproducible, ethical and collaborative research. Zenodo.
  1. Lab 6
911/4/2022 (Fri)LAB 7
  1. Quiz 2
1011/7/2022 (Mon)Data Cleaning
  1. de Jonge, E., & van der Loo, M. (2013). An introduction to data cleaning with R.
  2. Rue, J., & Hernandez, R. K. (2019). Using OpenRefine to Clean Your Data. Berkeley Advanced Media Institute.
  3. Farivar, C. (2016, August 10). Kansas couple sues IP mapping firm for turning their life into a 'digital hell.' Ars Technica.
1011/9/2022 (Wed)Recap
  1. Lab 7
1011/11/2022 (Fri)Project 2
1111/14/2022 (Mon)Web Scraping
  1. Irizarry, R. A. (2022). Chapter 24 Web scraping. In Introduction to Data Science.
  2. Zimmer, M. (2010). 'But the data is already public': On the ethics of research in Facebook. Ethics and Information Technology, 12(4), 313-325.
1111/16/2022 (Wed)Remote Servers & APIs
  1. TBD
1111/18/2022 (Fri)LAB 8/Quiz 3 Open
1211/21/2022 (Mon)Project 2 Day 2
1211/23/2022 (Wed)No Class
  1. Lab 8
1211/25/2022 (Fri)No Class
  1. Project 2 & Quiz 3
1311/28/2022 (Mon)Finals Planning
  1. Final Project Ideas
1311/30/2022 (Wed)Text as Data
  1. Clark, M. (2018). String Theory. In An Introduction to Text Processing and Analysis with R.
  2. Peng, R. D. (2022). 17 Regular Expressions. In R Programming for Data Science.
1312/2/2022 (Fri)Networks as Data/Quiz 4 Open
  1. [Chapters 1-2] Kadushin, C. (2012). Understanding Social Networks: Theories, Concepts, and Findings. Oxford University Press.
  2. Berman, G. (2021, November 31). 'Violence Is Contagious': A Conversation with Andrew Papachristos. Harry Frank Guggenheim Foundation.
1412/5/2022 (Mon)Geospatial Data
  1. Baumer, B. S., Kaplan, D. T., & Horton, N. J. (2021). Chapter 17 Working with geospatial data. In Modern Data Science with R. CNC Press.
  2. de Montjoye, Y.-A., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the Crowd: The privacy bounds of human mobility. Scientific Reports, 3(1), 1376.
  3. Deluca, E., & Nelson, S. (2017). 7. Lying With Maps. In Mapping, Society, and Technology. University of Minnesota Libraries Publishing.
1412/7/2022 (Wed)Mountain Day Saftey Net/Lessons Learned
1412/9/2022 (Fri)Finals Work
  1. Quiz 4
1512/12/2022 (Mon)Finals Presentations/aRt Gallery
1512/14/2022 (Wed)No Class
  1. Final Project Materials