Lessons from the MALACH Project

Applying New Technologies to Improve Intellectual Access to Large Oral History Collections

In this talk I will describe the goals of the MALACH project (Multilingual Access to Large Spoken Archives) and some of our research results. I'll begin by describing the unique characteristics of the oral history collection that we are using, in which Holocaust survivors, witnesses and rescuers were interviewed in several languages. Each interview has been digitized and extensively catalogued by subject matter experts, thus producing a remarkably rich collection for the application of machine learning techniques. Automatic speech recognition techniques originally developed for the domain of conversational telephone speech were adapted to process with word error rates that are adequate to support interactive search and automated clustering, detection of topic shifts, and topic classification. In this talk, I will describe the studies that we conducted to learn about what needs our systems should be designed to meet and I'll summarize key results from our system development activities. I'll conclude with some remarks about possible future directions for research applying new technologies to improve intellectual access to oral history and other spoken word collections.

Speakers

Douglas Oard
Associate Dean for ResearchCollege of Information StudiesUniversity of Maryland

Douglas Oard is Associate Dean for Research in the College of Information Studies at the University of Maryland. An Associate Professor in the College, he holds a joint appointment in the Institute for Advanced Computer Studies (UMIACS) and affiliate appointments in the Computer Science Department and the Applied Mathematics and Scientific Computation Program. Dr. Oard earned his Ph.D. in Electrical Engineering from the University of Maryland, and his research interests center around the use of emerging technologies to support information seeking by end users. Recent work has focused on interactive techniques for cross-language information retrieval, searching conversational media, and leveraging observable behavior to improve user modeling. Additional information is available at www.glue.umd.edu/~oard.