In my last blog entry I detailed the first two layers of a four-layer model for electronic editions and archives. The final two layers are detailed below:
Level 3: Interface layer
While stacks of multimedia files and transcripts in open repositories would, in some ways, improve the current state of digital libraries, interfaces are required if users are to do anything but simply access content a file at a time. Of course, interfaces can be very expensive to develop and tend to become obsolete very quickly. Unfortunately, the funding for interface development rarely lasts longer than a year or two, so the cost of maintaining a large code base usually falls to the hosting institution, which rarely has the resources to do so adequately. A new system and standard for interface development is required if interfaces are to be sustainably developed.
Code modularization and reusability have long been ideals in software development, but have only been realized in limited ways in the digital humanities. Several large infrastructure projects, most notably SEASR, seek to provide a sustainable model for interoperable digital humanities tools, but have yet to achieve wide-scale adoption. Our model will follow the example of SEASR, but with a scope limited to web-based editions and archives, we may therefore impose some code limitations that more broadly intentioned projects could (and should) not.
It should be noted that we are, in fact, proposing to build something like a content management system at a time when the market for such systems is very crowded. Nonetheless, experience with the major systems (Omeka, Drupal, Joomla, etc.) has convinced us that while a few provide some of the functionality we require, none are suited for managing multimedia scholarly editions. Just as Omeka clearly serves a different purpose and audience than Drupal, so will our system meet the similar yet nonetheless distinct needs of critical editors.
Level 4: User generated data layer
Many recent web-based editions have made use of “web 2.0” technologies which allow users to generate data connected to the content. In many ways, this is the most volatile data in current digital humanities scholarship, often stored in hurriedly constructed databases on servers where considerations of scale and long-term data storage have been considered in only the most cursory fashion. Further, the open nature of these sites mean that it is often difficult to separate data generated by inexperienced scholars completing a course assignment from that of experts whose contributions represent real advances in scholarship. Our framework proposes the development of repositories of user-generated content, stored in a standard format, which will be maintained and archived. Of course, storing the data of every user who ever used any of the collections in the framework is impossible. We therefore propose that projects launch “sandbox” databases, out of which the best user-generated content may be selected for inclusion and “publication” in larger repositories. In some cases, these repositories may also store scholarly monographs that include content from a set of archives. Subscription fees may be charged for accessing these collections to ensure their sustainability.
It should be noted that much in the above model is already practiced by some of the best electronic editing projects. However, the best practices have not been articulated in a generalized way. Although we feel confident our model is a good one, it would be the height of hubris to call it “best practice” without further vetting from the community. That, dear reader, is where you come in. The comments are open.