This past Saturday, March 18th, Networking Archaeological Data and Communities (NADAC), our Institute for Advanced Topics in the Digital Humanities funded by the National Endowment for the Humanities (NEH), hosted its third workshop in its Data Management Lifecycle series. Led by faculty Anne Chen and Leigh Anne Lieberman, this month’s workshop addressed the myriad challenges and new solutions to the age-old issue of legacy data.
How do you solve a problem like old data (sung to the classic tune from The Sound of Music)? It takes a village! That is, if we’re counting all of those who participated in the data lifecycle–from the hardworking people who produced the original field data many years or decades in the past; to those who pored over these materials to label, store, and curate them over the years; not to mention the unsung heroes who may have digitized the information, and those who are putting in the time and elbow grease to reconcile, rehouse, catalog, and use them today.
Wikidata and other platforms for Linked Open Data (LOD) might be a solution for some legacy datasets. Wikidata, for example, can be leveraged as a tool for thinking like a database–one of the topics covered in February’s Data in the Field workshop–because it requires translating data into a structure of Resource Description Framework (RDF) triples (item-property-value). It allows multilingual querying and editing. Wikidata also has the major benefit of serving as a repository to host community-curated content and comes equipped with an arsenal of tools to enrich data even beyond the scope and format of the original legacy project. For instance, it provides tools for linking geospatial data with artifacts, site plans, and photographs that were not previously georectified. It also enables annotation and linking of photographs and other visual resources to capture data beyond what a traditional caption can communicate.
For all its benefits as a tool for translating legacy data into the LOD landscape, NADAC Scholars discussed how Wikidata is no picnic in the Alps. Although Wikidata can equip users to tackle the peaks and valleys of legacy data, they still face an uphill climb filled with obstacles in their path. Legacy data place in high relief all of the data management challenges that have so far been the main focus of the NADAC Institute: unique identifiers (or lack thereof), terminology traditions and controlled vocabularies (or lack thereof), documentation (or lack thereof) of project processes and methods, and materiality of data curation and its potential for forward-migration, whether in analog or digital forms. Yet old collections may also carry extra baggage. For example, data stakeholders in archaeology might be grappling with how to resolve inherited turf wars over particular data sets, which may have scattered to the wind across different institutions, specialists, or even become lost over time. They may be juggling ethical challenges like curating collections that were created in contexts of colonization, conflict, war, and exploitation. They might also be balancing how to make data openly accessible while respecting the cultural sensitivities and priorities of Indigenous, descendant, and diaspora communities. Talking through the push and pull of these challenges clarified for several participants if and how LOD might work as a solution for their projects’ data sets, whether old paper records they’ve inherited, or new born-digital records they themselves created.
So, how do you solve a problem like old data? Although complete solutions may be elusive, patience, teamwork, and new toolkits and digital platforms can be deployed to enrich data sets, create LOD when appropriate, and enable data producers and consumers to think like a database. These different methods breathe new life into inherited collections. Legacy data, like brown paper packages tied up with string, might just become one of your favorite things.