Open Context is undergoing its fourth major rebuild. Why on earth would we do this? This post explores (hopefully without too much jargon or technobabble) why rebuilds are important and what the 2021 rebuild involves.
Why are rebuilds (“refactoring”) needed?
Since its launch in 2006, Open Context has seen four major waves of development. These were:
- Initial development in 2006-2007, using the PHP language and a MySQL database.
- Addition of Apache Solr in 2010, for faster and richer searching and querying.
- A complete rewrite of Open Context in the Python language, use of a Postgres database, and cloud-based infrastructure in 2014-2015.
- The current (2021) effort to update and streamline code and improve database organization.

Rebuilds involve refactoring (code updates) to make significant improvements to Open Context’s schema (the data’s organizational layout). These rebuilds are necessary for several reasons, the most obvious being to keep up with technological change.
Open Context is built with open source software libraries that evolve. Specifically, libraries patch security issues and improve with new capabilities, so one needs to update these libraries and sometimes that requires making changes across a whole project.
Rebuilds also help get rid of cruft — leftover, redundant code. There are a lot of tweaks and modifications that happen continually in software development. Over time, those tweaks accumulate and make a system difficult to maintain and debug.
These modifications could be new features, bug fixes, or changes made in response to user needs. They’re great because they reflect our understanding of how requirements change. However, over time, they accumulate and start to gum things up. So, they need to be consolidated and refactored so that Open Context is easier to maintain and understand.
Another key motivation for periodic rebuilds is to improve speed. As we better understand our data more and the software libraries we use, we’ve learned how to improve performance. So, we’re making several significant improvements related to efficiency, which reduce the amount of code Open Context uses and improve its performance.
What’s new in this rebuild?
We’re implementing more automated software testing to improve reliability. This means that we can have tests run so that when we change a software component library, Open Context still works. This testing makes the site more reliable. We’re cultivating rigorous and comprehensive documentation during this process so that others better understand the code.
These changes make it easier for us to respond to user needs, such as new features or improvements to the user interface based on Open Context user feedback.
Finally, this rebuild improves our style. This means improvements in style, documentation, and structure that make Open Context easier to use and maintain in the future.
Key takeaways from the current rebuild
Look out for a more tech-heavy post about how Open Context’s database schema (organization) is changing. In the meantime, here are some key takeaways from this rebuild:
- Software needs maintenance and that’s an important sustainability concern for digital humanities and infrastructures. You can’t build it once and call it done. Anything we build needs to be maintained and adapted to stay relevant to the world around it. Rebuilding software is like painting a house or repairing the roof, it keeps the code clean and streamlined. Open Context changes with its context!
- Many changes are invisible to the users. However, behind the scenes, we’re working to seamlessly replace sections so that interfaces and APIs keep working.
- Through time, we’ve learned more about the diversity of archaeological data. Therefore, our understanding of how to better model data has changed. We’ve also got more data than before. Due to this our scaling needs have changed so that these updates can accommodate the growth of collections we publish.