Week 1 - Day 1

Fall 2022

Dr. Jared Joseph

September 08, 2022

Overview

Timeline

  • Course Overview
  • What is data science and what do data scientists do?
  • Introductions
  • Course Reader/Syllabus

Goal

Set course expectations and explain course policies.

Course Overview

Course Intro

Introduction to Data Science (SDS 192) aims to equip students with the knowledge and tools to understand, critically evaluate, manipulate, and explain data. This is an introductory course, and no prior experience is necessary.1 Students will learn how to read and write code, but also how to create, organize, and collaborate on coding projects while critically examining the projects goals and data sources. We will be primarily use the R language, along with supplemental tools.

What Will We Learn?

  • How to code!

What Will We Learn?

  • What is data?
  • How data is made
  • How to code!
  • Coding and collaboration best practices
  • How to visualize and communicate data findings
  • How to critically examine project data and goals

A Note on Learning to Code

  • Learning to code can be intimidating!

  • Coding is often frustrating.

  • Don’t worry about the math for now.

  • More than anything, remember coding is like learning a new language.

If you have only a Chromebook, come talk to me as soon as possible.

Course Roadmap

  1. Infrastructure setup
  2. Basic coding and collaboration
  3. Data manipulation
  4. Data visualization
  5. Ethics
  6. Programming
  7. Command line
  8. Interfacing with the internet
  9. Advanced methods

Course Content Warning

Nothing graphic, but sometimes sad.

  • We will often look at social data
    • Crime
    • Health
    • Poverty
    • Inequality
  • These often show sad realities
  • Fight the existential dread.

Learn how to (hopefully) make things better.

A note about SDS 100

Typical Class Format

Mon + Wed

  • Lecture - 20mins
  • Code-along - 30mins
  • Problem sets - 25mins

Fri

Lab/Project Work

Data Science

What is Data Science?

Credit to Michael Barber

What do Data Scientists do? - Data Vis

What do Data Scientists do? - Create Tools

What do Data Scientists do? - Power Corps

What do Data Scientists do? - Help People

The Power of Data

A little data literacy goes a long way.


I think this is unfair …

I can show how this process impacts people differently.


I think this policy would help …

Models predict this policy would increase X by Y%.


I think this claim is too strong to be true …

The data does not support their conclusion because of X.

And now all the tools are free for everyone!

Data Science in Everyday Life - Rent

Data Science in Everyday Life - Time

Data Science in Everyday Life - Sports

Data Science in Everyday Life - Chance

Credit to AnyDice

Introductions

About Me

Dr. Jared Joseph


  • Ph.D. in Sociology from UC Davis
  • M.A. in Sociology from UC Davis
  • B.A. in Psychology & Japanese from Valparaiso University


  • My research focuses on abuses of power within government
  • I’ve worked with the US and UK governments on machine learning systems

Why a Sociologist for Data Science?

My area is computational social science.

  • I use methods like:
    • Text analysis
    • Network Analysis
    • Geospatial Methods
    • Machine Learning
  • Data Science is about more than raw data
    • How was the data made?
    • How will the systems using this data impact people?

About You

Meet the Neighbors

Data Science, and this class, are collaborative.

You’ll be working with others often, so take some time to introduce yourselves.


Some suggestions:

  • Name, year, major
  • How did you spend this past summer?
  • Why did you want to take this class?
  • etc.

Class Reader

Attendance

SDS Departmental policy and university expectations do not include remote course options outside of Office of Disability Services (ODS) accommodations.

If you are sick, please stay home.

  • I will not be taking attendance in this course, and you do not need to inform me when you will be absent.
  • If you miss a class, you should contact a peer to discuss what was missed.
  • Class materials will be posted on the course reader.
  • I won’t have the capacity this semester to re-deliver missed material in office hours.

Class Communications

The majority of class communications will take place on Slack, a messaging platform used widely at Smith and beyond. Please install Slack and join the class workspace before the next class.


We will spend some time at the start of next class going over Slack usage and etiquette.

Perusall

Class readings are all posted on the class reader and will direct you to Perusall.


Perusall is a reading platform that lets you collaboratively take notes on readings. You can see each others highlights, make comments, and ask questions.

Collaboration

You are encouraged to

  • Ask classmates for help with code
  • Go to tutoring and office hours
  • Work with other students on problem sets, labs, and projects
  • Google for coding help

You Cannot

  • Work with other students on quizzes
  • Copy another student’s work and put your name on it
  • Use code from the internet without attribution
  • Have other people code for you

Standards Based Grading

This course will be using a version of standards-based grading.

Rather than tallying up the percentage of questions you answer correctly, I assess your responses by using a pre-defined set of course standards and then assign a level of proficiency.

Standards

  1. Data Importing
  2. Data Cleaning
  3. Data Reshaping
  4. Data Aggregation & Subsetting
  5. Functions
  6. Iteration
  7. Visualization Structure
  8. Visualization Aesthetics
  9. Visualization Context
  10. Data Ethics
  11. Code Style
  12. Git/Github

Proficiency Levels

  • Does Not Meet Standard
  • Progressing Toward Standard
  • Meets Standard
  • Exceeds Standard
  • Individual Standard

Standards Matrix

Standards Matrix

Grading with Points






Mean of A1-A5

Grading with Standards






Max of A1-A5

Standards-Based Grading Dosn’t Penalize

Standards-based grading provides:

  • Clear learning targets
  • Low-stakes opportunities to learn
  • Communicable skills
  • Easy monitoring towards comprehension
  • Competition free learning

Late Work


No late work will be accepted.

You can request extensions per syllabus policy.

Remember, missing something does not harm your grade, you just miss a chance to show proficiency.

For Next Time

Topic

What is Data?

To-Do