Pattern Recognition: Research at MITH

Matthew Kirschenbaum
Matthew Kirschenbaum
Associate DirectorMITHUniversity of MarylandWebsiteRead Bio
Carl Stahmer
Associate Director (Acting)MITHUniversity of Maryland

In our last Digital Dialogue of the semester, MITH's directors share some of their own digital humanities research in progress. The specific projects and applications to be discussed are entitled nora, White Rabbit, and Indra. All manifest a general theme of pattern recognition, and a more detailed description of each is available below. Please join us, and watch for our spring semester schedule soon. The goal of the nora project is to produce software for discovering, visualizing, and exploring significant patterns across large collections of full-text humanities resources in existing digital libraries. In search-and-retrieval, we bring specific queries to collections of text and get back (more or less useful) answers to those queries; by contrast, the goal of data-mining (including text-mining) is to produce new knowledge by exposing unanticipated similarities or differences, clustering or dispersal, co-occurrence and trends. Over the last decade, many millions of dollars have been invested in creating digital library collections: at this point, terabytes of full-text humanities resources are publicly available on the web. Those collections, dispersed across many different institutions, are large enough and rich enough to provide an excellent opportunity for text-mining, and we believe that web-based text-mining tools will make those collections significantly more useful, more informative, and more rewarding for research and teaching. nora (which either refers to a character in a William Gibson novel, or is an acronym for "No One Remembers Acronyms," depending on who in the project you ask), is a two-year project funded by the Andrew W. Mellon Foundation. The project began last October, so we're about one year in. It is multi-institutional (there are researchers at five universities) and multi-disciplinary (our group includes literary subject experts, computer scientists, library and information science). At Maryland, MITH has partnered with the Human Computer Interaction Lab for the visualization work. White Rabbit is a non-hierarchical, stand-off markup platform suitable for storing, manipulating, and delivering texts using a variety of overlapping markup schemas. It leverages the searching and sorting power of a SQL database engine while delivering robust and expandable textual markup for both display and web-service accessibility. White Rabbit's tokenized storage system makes it possible to provide an infinite set of related or independent markup schemas for the same corrected text. For example, using White Rabbit it is possible for multiple users, such as students, to markup the same text independently, or for a single user to describe the same text using multiple markup systems, such as, for example, TEI, HTML, or any form of SGML. Additionally, White Rabbit will perform a statistical analysis of the similarities and differences between multiple markups to the same text, providing a scholarly picture of the ways in which multiple users view the structure of the text. Because White Rabbit is driven by a SQL database engine, the platform also offers powerful and robust searching capability. Using White Rabbit, it is possible for any user with a standard web browser to perform complete XML searching and browsing of resource. A user working with a collection of poems could, for example, search for all occurrences of the word "love" that appears in the refrain of a stanza. No special browser or applications are needed to expose and utilize the full depth of a resources XML coding, because the XML parsing and searching is performed server-side by White Rabbit. Indra allows users to easily create RDF files for any web-accessible resource regardless of its markup platform. Enter or browse to a URL and Indra performs a semantic analysis of the resource's content and generates an RDF file based upon the Jena RDF API. Users specify link-depth penetration at runtime for each root URL. The current version of the software, which is scheduled for release in December, 2005, generates one RDF file per root URL. Future versions will perform a more robust link analysis and allow users to control the production of granular, nested RDF files. Indra is an open-source, java application that is being developed as part of the Networked Interface for Nineteenth-Century Electronic Scholarship (NINES) project.

Matthew G. Kirschenbaum is Associate Professor in the Department of English at the University of Maryland and Associate Director of the Maryland Institute for Technology in the Humanities (MITH, an applied thinktank for the digital humanities). He is also an affiliated faculty member with the Human-Computer Interaction Lab at Maryland, and a member of the teaching faculty at the University of Virginia’s Rare Book School. Kirschenbaum served as the first director of the new Digital Cultures and Creativity living/learning program in the Honors College at Maryland. A 2011 Guggenheim Fellow, he specializes in digital humanities, electronic literature, virtual worlds, serious games and simulations, textual studies, and postmodern/experimental literature. His first book, Mechanisms: New Media and the Forensic Imagination, was published by the MIT Press in 2008.Mechanisms has won the 2009 Richard J. Finneran Award from the Society for Textual Scholarship (STS), the 2009 George A. and Jean S. DeLong Prize from the Society for the History of Authorship, Reading, and Publishing (SHARP), and the 16th annual Prize for a First Book from the Modern Language Association (MLA). Much of his work now focuses on the critical and scholarly implications of the shift to born-digital textual and cultural production. He was principal investigator for the NEH funded start-up “Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use” and is also a co-investigator on the NDIIPP-and IMLS-funded project devoted to Preserving Virtual Worlds (2007 to present). In 2010 he co-authored (with Richard Ovenden and Gabriela Redwine) Digital Forensics and Born-Digital Content in Cultural Heritage Collections, a report published by the Council on Library and Information Resources and recognized with a commendation from the Society of American Archivists. He also oversees work on the Deena Larsen Collection at MITH, a vast personal archive of hardware and software furnishing a cross-section of the electronic writing community during its key formative years, roughly 1985-1995. Kirschenbaum serves on the editorial or advisory boards of a number of projects and publications, including Postmodern Culture, Text Technology, Textual Cultures, MediaCommons, and futureArch. An avid tabletop gamer, he contributes to the group blog Play the Past devoted to meaningful play and cultural heritage. His work has received coverage in the Atlantic, New York Times, National Public Radio, Wired, Boing Boing, Slashdot, and the Chronicle of Higher Education. See www.mkirschenbaum.net for more.

Matthew G. Kirschenbaum is Associate Professor in the Department of English at the University of Maryland and Associate Director of the Maryland Institute for Technology in the Humanities (MITH, an applied thinktank for the digital humanities). He is also an affiliated faculty member with the Human-Computer Interaction Lab at Maryland, and a member of the teaching faculty at the University of Virginia’s Rare Book School. Kirschenbaum served as the first director of the new Digital Cultures and Creativity living/learning program in the Honors College at Maryland. A 2011 Guggenheim Fellow, he specializes in digital humanities, electronic literature, virtual worlds, serious games and simulations, textual studies, and postmodern/experimental literature. His first book, Mechanisms: New Media and the Forensic Imagination, was published by the MIT Press in 2008.Mechanisms has won the 2009 Richard J. Finneran Award from the Society for Textual Scholarship (STS), the 2009 George A. and Jean S. DeLong Prize from the Society for the History of Authorship, Reading, and Publishing (SHARP), and the 16th annual Prize for a First Book from the Modern Language Association (MLA). Much of his work now focuses on the critical and scholarly implications of the shift to born-digital textual and cultural production. He was principal investigator for the NEH funded start-up “Approaches to Managing and Collecting Born-Digital Literary Materials for Scholarly Use” and is also a co-investigator on the NDIIPP-and IMLS-funded project devoted to Preserving Virtual Worlds (2007 to present). In 2010 he co-authored (with Richard Ovenden and Gabriela Redwine) Digital Forensics and Born-Digital Content in Cultural Heritage Collections, a report published by the Council on Library and Information Resources and recognized with a commendation from the Society of American Archivists. He also oversees work on the Deena Larsen Collection at MITH, a vast personal archive of hardware and software furnishing a cross-section of the electronic writing community during its key formative years, roughly 1985-1995. Kirschenbaum serves on the editorial or advisory boards of a number of projects and publications, including Postmodern Culture, Text Technology, Textual Cultures, MediaCommons, and futureArch. An avid tabletop gamer, he contributes to the group blog Play the Past devoted to meaningful play and cultural heritage. His work has received coverage in the Atlantic, New York Times, National Public Radio, Wired, Boing Boing, Slashdot, and the Chronicle of Higher Education. See www.mkirschenbaum.net for more.

A continuously updated schedule of talks is also available on the Digital Dialogues page.

Unable to attend the events in person? Archived podcasts can be found on the MITH website, and you can follow our Digital Dialogues Twitter account @digdialog as well as the Twitter hashtag #mithdd to keep up with live tweets from our sessions. Viewers can watch the live stream as well.

All talks free and open to the public. Attendees are welcome to bring their own lunches.

Contact: MITH (mith.umd.edu, mith@umd.edu, 301.405.8927).