Today we will be considering some of the ethical dilemmas data scientists encounter in their work. Failure to consider the ethical implications of our work can result in truly disastrous consequences. From manipulating the political process1, criminalizing poor parents and removing their children2, all the way to skewing the societal view of race and crime3, data systems have massive potential for harm. It is thus the duty of every data scientists to think carefully about their work, and try to prevent creating the next monster system.
Getting as sense of ethics in data science is not a simple matter. It requires understanding not only the pressures on you during your work, but the hypothetical consequences of that work. That’s not easy even for the most experienced among us. It is also fully possible for reasonable people to disagree on both what should be considered generally under the umbrella of data science ethics, and the best course of action to resolve any individual problem.
Ethics, broadly defined, are the framework with which we decide what is right and wrong. As you might imagine, there is a significant amount of variation between people in their sense of ethics. Yet most people can often find some common ground.
To help people think through the potential dangers of their work, several organizations have attempted to create a code of ethics for data scientists. These codes attempt to create a framework what is considered ethical within a specific discipline. I will be including two such examples here: the National Academies of Sciences Data Science Oath, and the datapractices.org Data Values and Principles Manifesto.
The National Academies of Sciences attempted to create an oath similar to the hippocratic oath the doctors take for data scientists. This is my favorite attempt to codify data science ethics and I encourage you to mediate on it deeply.
I swear to fulfill, to the best of my ability and judgment, this covenant:I will respect the hard-won scientific gains of those data scientists in whose steps I walk and gladly share such knowledge as is mine with those who follow.I will apply, for the benefit of society, all measures which are required, avoiding misrepresentations of data and analysis results. I will remember that there is art to data science as well as science and that consistency, candor, and compassion should outweigh the algorithm’s precision or the interventionist’s influence.I will not be ashamed to say, “I know not,” nor will I fail to call in my colleagues when the skills of another are needed for solving a problem.I will respect the privacy of my data subjects, for their data are not disclosed to me that the world may know, so I will tread with care in matters of privacy and security. If it is given to me to do good with my analyses, all thanks. But it may also be within my power to do harm, and this responsibility must be faced with humbleness and awareness of my own limitations.I will remember that my data are not just numbers without meaning or context, but represent real people and situations, and that my work may lead to unintended societal consequences, such as inequality, poverty, and disparities due to algorithmic bias. My responsibility must consider potential consequences of my extraction of meaning from data and ensure my analyses help make better decisions.I will perform personalization where appropriate, but I will always look for a path to fair treatment and nondiscrimination.I will remember that I remain a member of society, with special obligations to all my fellow human beings, those who need help and those who don’t.If I do not violate this oath, may I enjoy vitality and virtuosity, respected for my contributions and remembered for my leadership thereafter. May I always act to preserve the finest traditions of my calling and may I long experience the joy of helping those who can benefit from my work.
National Academies of Sciences. (2018). Data Science for Undergraduates: Opportunities and Options.
datapractices.org compiled a list of 12 principles they feel provides strong guidance for working ethically in data science. The principles serve as solid guidance for conscientious work, but I caution against seeing them as a “project checklist,” such that if your project matches all the principles it gets an “ethical” stamp of approval.
Data Values and Principles Manifesto
We will be going over the following case study as a group. Please read the scenario carefully, and then write down your own reaction to it. Please consider the following:
In the United States, most students apply for grants or subsidized loans to finance their college education. Part of this process involves filling in a federal government form called the Free Application for Federal Student Aid (FAFSA). The form asks for information about family income and assets. The form also includes a place for listing the universities to which the information is to be sent. The data collected by FAFSA includes confidential financial information (listing the schools eligible to receive the information is effectively giving permission to share the data with them).It turns out that the order in which the schools are listed carries important information. Students typically apply to several schools, but can attend only one of them. Until recently, admissions offices at some universities used the information as an important part of their models of whether an admitted student will accept admissions. The earlier in a list a school appears, the more likely the student is to attend that school.Here’s the catch from the student’s point of view. Some institutions use statistical models to allocate grant aid (a scarce resource) where it is most likely to help ensure that a student enrolls. For these schools, the more likely a student is deemed to accept admissions, the lower the amount of grant aid they are likely to receive.
O’Neil, C. (2017). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Broadway Books. ↩︎
Eubanks, V. (2018). Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press. ↩︎
Noble, S. U. (2018). Algorithms of oppression: How search engines reinforce racism. New York University Press. ↩︎