A few weeks ago, Meghan, our Postdoctoral Researcher in Data Interpretation and Public Engagement, (who’s a bit of an #ethics nerd) attended a training session organized by the Data Hazards project. This facilitator training taught how to run sessions with the project’s Data Hazards labels — think of them as like warning labels on household cleaning products, or danger signs on electrical boxes. The goal behind the labels is to help researchers be mindful of potential ethical concerns within their data science research, all categorized so that problems can be made approachable, and fixable.
The process is fairly simple, and that’s part of why in practice it’s so effective.
According to the Data Hazards organizers, a group of people sit around together, with one person acting as Facilitator, one as Project Owner, and the rest as Audience Members. The Project Owner describes their research project or proposed output to the Audience Members, who then (with the help of the Facilitator) discuss concerns and questions they have about the project. During this process, the Project Owner doesn’t speak, save to answer factual questions posed to them. The Audience Members use the Data Hazards labels to delineate places where the Project Owner needs to be more mindful of ethics or data and participant safety issues in their work.

One of the great things about this approach is that it doesn’t require expertise on the Project Owner’s research field by the Audience Members. The Project Owner gives a five minute (max) talk, sort of like an elevator pitch for their research. Having a varied background of people to act as Audience Members means subjecting the data science proposed to a critique from a variety of life experiences and backgrounds. This, in turn, can help avoid the ‘white, male, and stale’ problem that often comes up in research with data scientists, many of whom are similar in background and experience.

While the Data Hazard project was initially designed by researchers in data science and epidemiology, the labels and the process translate easily into use by those in humanities and social science research (like archaeologists…hint, hint).
Have you tried the Data Hazards labels? Do you have similar tools or methods to encourage ethical uses of data in research? Let us know!