Active OCR: Tightening the Loop in Human Computing for OCR Correction, led by Assistant Director Travis Brown, received a Level 2 start up award in the amount of $41,906.
ActiveOCR proposes a proof-of-concept application that will experiment with the use of active learning and other iterative techniques for the correction of eighteenth-century texts provided by the HathiTrust Digital Library and the 2,231 ECCO text transcriptions released into the public domain by Gale and distributed by the Text Creation Partnership (TCP) and 18thConnect. In an application based on active learning or a similar approach, the user could identify dozens or hundreds of difficult characters that appear in the articles from that same time period, and the system would use this new knowledge to improve optical character recognition (OCR) across the entire corpus. A portion of the team’s efforts will focus on the need to incentivize engagement in tasks of this type, whether they are traditionally crowdsourced or through a more active, iterative process.
ANGLES proposes a bridge between humanities centers who have greater resources to program scholarly software and the scholars who form the core user community for such software through their teaching and research. In conjunction with Hugh Cayless, Doug Reside, and Jon Deering, ANGLES will experiment with a solution to the adoption gap that has developed between scholars with digital materials and technical developers designing the applications scholars are using in their research by combining the model of intensive code development (a.k.a. the “code sprint”) with testing and feedback by domain experts gathered at nationally recognized disciplinary conferences. Announcements of code sprint locations will be released in early Fall 2012.
Topic Modeling for Humanities Research, a one-day workshop directed by Assistant Director Jennifer Guiliano, received a Level 1 start up in the amount of $24,807.
The workshop will facilitate a unique opportunity for cross-fertilization, information exchange, and collaboration between and among humanities scholars and researchers in natural language processing on the subject of topic modeling applications and methods. The workshop will be organized into three primary areas: 1)an overview of how topic modeling is currently being used in the humanities; 2)an inventory of extensions of the LDA model that have particular relevance for humanities research questions; and 3)a discussion of software implementations, toolkits, and interfaces. Calls for participation in this event will be released in Summer 2012.
Additionally, MITH is delighted to announce it will partner with George Williams, Assistant Professor of English at the University of South Carolina Upstate, on his continuing work with Making the Digital Humanities More Open, which received a Level 2 award of $45,959.
Led locally by Assistant Director Jennifer Guiliano, MITH will work with BrailleSC to undertake its second stage of development by designing and deploying a WordPress‐based accessibility tool that will create braille content for end-users who are blind or low vision. Extending the use of Anthologize—a free and open source plug‐in for WordPress that currently translates any RSS text into PDF, ePub, HTML, or TEI—MITH and the BrailleSC team will develop braille output for the tool. As a result, BrailleSC will not only make it easy for content creators to convert a text into braille, thereby extending humanities content to hundreds of thousands of visually disabled readers, but will also experiment with making braille available visually through the WordPress interface.