
Networking Archaeological Data and Communities (NADAC), our Institute for Advanced Topics in the Digital Humanities funded by the National Endowment for the Humanities (NEH), officially launched on Saturday, January 21, with the debut of the first monthly workshop of its two-year curriculum!
The first Institute workshop, held on Zoom, focused on creating “clean” data with the goals of learning and applying tools like OpenRefine to standardize “messy” archaeological datasets. But wait–what even are messy data, you ask, and why bother tidying them up?
Well, as any excavator or archaeological collections expert can tell you, archaeologists are used to dealing with dirty data–the kind caked in actual sediment, covered in a fine layer of dust, or mired in the mud. Many are even happy to regale others with lessons in the fine art of cleaning dirt (yes, you read that right) by implementing proper troweling and sweeping techniques in the excavation trench (for the uninitiated, and as Paulina and Meghan, can tell you: it’s all in the wrist). It’s the next stage of the process–beyond the literal dirt–that renders archaeological data “clean” or “messy” after-the-fact of their recovery in the field. The way that these data are collected, organized, maintained, and documented makes that all-important difference between usable and barely coherent datasets.
As Institute faculty and data management extraordinaires Paulina Przystupa and Leigh Lieberman explained–and bravely demonstrated using real datasets–messy data can take many shapes and forms. Like perfecting a certain brushing technique in fieldwork, learning and applying tools for data polishing is not only weirdly satisfying–it is a process that creates meaning from the mess.
This dynamic duo introduced concepts of data quality and cleaning, focusing on how issues of consistency relate to usable, interoperable, and meaningful datasets. The workshop covered ground such as how and why to apply and document rules for data standardization like controlled vocabularies.
Beyond the theories and methods behind these principles of data quality, they also led NADAC Scholars through practical exercises aimed at applying these skills. Through guided tutorials, group discussions, and independent hands-on activities, NADAC Scholars identified archaeological datasets on open repositories, gained training in writing documentation that traces data standards, and practiced leveraging open source tools to check for, and create, internally coherent and consistent datasets. Together, these skill sets contribute to the Institute’s goals of building, utilizing, and sharing clean archaeological data–in this case, with the click of a mouse instead of a flick of the wrist.