Corpora Space

 >  > Corpora Space
  • Corpora Space

Within Project Bamboo, The Maryland Institute for Technology in the Humanities (MITH) led Corpora Space, which allows scholars to work at the cutting edge of digital humanities and textual analysis research. In Corpora Space, scholars can discover, analyze and curate digital texts across the 450 years of print culture in English from 1473 until 1923, along with the texts from the Classical world upon which that print culture is based. The initial focus was on the following collections: AUSTLit; Nineteenth-century Scholarship Online (NINES); Google Books; HathiTrust; Oxford Text Archive; Perseus Digital Library; and Text Creation Partnership of Early English Books Online (EEBO) and Eighteenth Century Collections Online (ECCO).

As part of the design phase, Corpora Space held a series of workshops. The first meeting between Bamboo and Google gave Corpora Space partners insight into Google’s design principles and process. The second workshop brought together software developers and scholars for a three-day CorporaCamp at MITH. Over the course of the three days, participants successfully built the Woodchipper, a prototype application for exploring distributed, large-scale collections through visualizations supported by techniques from data mining and natural language processing. The Woodchipper was tested across large subsets—up to several hundred million words — of the HathiTrust Digital Library, EEBO-TCP and ECCO-TCP, and the Perseus Digital Library. It allows users to build collections of texts from across these sources and to explore these collections by mapping them in a two-dimensional thematic space.

MITH held ToolMixer, the third workshop in the Corpora Space Design process, on June 6-7, 2011. Tool builders, scholars, and members of Project Bamboo met to work on key issues related to connecting tools and architecture to digital collections. Project Bamboo partners were joined by representatives of the Scholars’ Lab at the University of Virginia, NINES and 18thConnect, HathiTrust Research Center, SEASR, University of Nebraska-Lincoln, and University of California Riverside to hear presentations from tool builders and plan how tools would connect to the Corpora Space infrastructure. Additionally, we conducted break-out sessions to discuss how the various tools would link together in a scholarly workflow. This workshop brought us significantly closer to finalizing decisions about the core set of tools to be included in the initial implementation phase of Corpora Space, slated for Spring 2012.

Oct 2010Apr 2012| Director: Neil Fraistat| Sponsor: | Topics: | Partners: Australian National University · Northwestern University · Tufts University · University of Chicago · University of Illinois Urbana-Champaign · University of Indiana · University of Michigan, Ann Arbor · Oxford University · University of Wisconsin-Madison|