Humanities Text Mining in the Digital Library

 >  > MONK: Humanities Text Mining in the Digital Library
  • MONK: Humanities Text Mining in the Digital Library

MONK was a digital environment designed to help humanities scholars discover and analyze patterns in the texts they study. It supported both micro analyses of the verbal texture of an individual text and macro analyses that let you locate texts in the context of a large document space consisting of hundreds or thousands of other texts. Shuttling between the “micro” and the “macro” was a distinctive feature of the MONK environment, where you may read as closely as you wish but can also practice many forms of what Franco Moretti has provocatively called “distant reading.”

MONK stands for Metadata Offer New Knowledge, and metadata (data about data) are at the heart of its very radical “divide and conquer” strategy. For every document in a MONK environment there were explicitly recorded metadata at the top level (bibliographical data), at the bottom level of individual word occurrence (lexical, morphological, and syntactic data), and at the mid-level of discursive organization (chapters, scenes, stanzas, etc). This triple-decker structure of metadata helped to organize the MONK inventory of words in a collection. Visualization tools played a critical role in helping scholars both formulate the questions they want to ask and interpret the result sets their questions produce.

All code produced by the project is open source and is available for download. Public domain TEI texts (with MONK-added enhanced markup) that were used during the project are also available for download. Downloads (code and TEI) can be found here. Be advised that newer versions of both code and TEI texts may be available elsewhere. The MONK download page provides a static snapshot of code and data as of the end of the active development phase of the project (2010).

The MONK Workbench is no longer available (as of the fall of 2013), but select algorithms from the Workbench and alternate plain text (OCR’d) digital instances of some of the works included in MONK are available via the HathiTrust Research Center Portal. Once you create an account on the HTRC Portal, you can log in and use the MONKmatch workset, or create and analyze your own workset.

The MONK project also maintained a wiki which included project documentation and a number reports and meeting summaries. A portion of the wiki has been preserved here, and is available for browsing.

Jan 2007Apr 2009| Director: Matthew Kirschenbaum| Sponsor: | Topics: , | Partner: University of Illinois Urbana-Champaign|