Teaching Machines to Read Milton

Natural Language Processing Challenges for Literary and Historical Texts

Many popular natural language processing techniques and tools rely on annotated training corpora to learn models that can be used to process new data from a similar domain. We can train a parser on Wall Street Journal text from the Penn Treebank, for example, and expect it to perform reasonably well on recent blog posts or movie reviews, but not necessarily on eighteenth-century conduct manuals. Unfortunately it's often hard to find or create appropriate training data for specific literary genres or historical periods, even in English. In this talk Travis Brown, Assistant Director of Research and Development at MITH, will look at some examples of semi-supervised and unsupervised methods that can be used to explore large text collections in domains with little or no available training data.

Speakers

Travis Brown
Travis Brown
Research & Development Software DeveloperMITHUniversity of Maryland

Travis Brown is a Research & Development Software Developer at MITH. He holds an M.A. in English from the University of Texas at Austin and is beginning a dissertation on the use of digital tools and methods in literary studies. While at the University of Texas he worked as an editor for the Walt Whitman Archive and was the lead developer of eComma, a web application for collaborative textual annotation. He also participated in a range of projects in UT’s Computational Linguistics Lab, where he developed tools for dependency parsing, semantic role labeling, and toponym resolution. He is particularly interested in using techniques from computational linguistics to aid in the exploration and visualization of large collections of literary and historical texts.