The Maryland Institute for Technology in the Humanities (MITH), University of Maryland’s digital humanities institute, is seeking a part-time, hourly Summer 2022 Data Curation Assistant to work on a project to expand curated access to the Digital Dialogues (DD) series, MITH’s signature events program which features speakers from various scholarly disciplines discussing topics related to work in the digital humanities. The student may be a graduate student, or an exceptional undergraduate student. Funding for this position is for 230 hours of work for roughly a 12-week period between early June and late August 2022, at a rate of $20/hour.

Project Goals/Duties:

This project will build on substantial work already completed in terms of the digital curation and stewardship of the DD collection, which has already been processed as part of a larger retroactive Digital Asset Management Plan. All 181 recorded talks have been transcribed using Otter.ai, using file naming conventions established by the aforementioned plan. In this phase, MITH will apply natural language processing (NLP) and topic modeling methods on a testbed corpus of 181 transcripts from the series, map the resulting extracted topics to the Alliance of Digital Humanities Organizations (ADHO) taxonomy, and structure the resulting enhanced metadata to allow seamless access to the full collection of talks at specific, time-stamped sections. This work will allow the Digital Dialogues to be searchable alongside the rest of MITH’s research portfolio, while also utilizing linked data principles so that the collection can be connected to external DH resources (conference proceedings, papers, etc) across the web.

Under the supervision of MITH’s Digital Humanities Archivist, the Data Curation Assistant will:

  • Performing cleanup and enhancement on transcripts in Otter;
  • Download and convert transcripts to WebVTT files;
  • Run a natural language processing (NLP) service to the files to analyze the text and extract metadata (topics);
  • Map the resulting extracted topics to an existing taxonomy;
  • Ingest all resulting terms and metadata into the Airtable base feeding the MITH website.

Qualifications:

Minimum Qualifications. This position is open to current undergraduate and graduate students at the University of Maryland. Candidates must be highly organized and deadline oriented. We are seeking interdisciplinary thinkers, who are self- motivated and directed. Excellent communication skills are crucial to the success of this position.

Additional Qualifications. This position is ideal for someone who wishes to expand their breadth of experience in dealing with data curation and machine learning for online digital collections, and who has an interest or knowledge in the field of digital humanities. The ideal candidate would have a specialization or dedicated scholarly interest in archives and data curation, data science and analytics, machine learning, natural language processing, or similar. Special consideration will be given to candidates with coursework or field work in these areas.

Specific experience and qualifications which will be extremely helpful to highlight in your application include:

  • Working knowledge of ontological concepts in digital libraries and archives such as taxonomies, metadata schemas, and shared vocabularies;
  • Experience using one or more of the software programs used on this project, including Airtable (database hosting), Asana (project management), GitHub (static site hosting), Otter.ai (speech to text processing), and IBM®Watson Natural Language Understanding (natural language processing);
  • Working knowledge of the digital humanities as an interdisciplinary field of practice;
  • Experience with data transformations, particularly with the Python or Javascript programming languages;
  • Familiarity with linked open data concepts;
  • Familiarity with with the WebVTT standard, a World Wide Web Consortium (W3C) standard for displaying timed text in connection with the HTML5 element;
  • Experience or familiarity with static website technologies.

About MITH:

MITH is an interdisciplinary group of researchers who collaboratively advance the study of cultural heritage and arts using computational technologies while also training the insights and approaches of the humanities on the computational technologies that shape our world.

As a center within the College of Arts and Humanities at the University of Maryland, College Park, MITH has served as a world-class concentration of expertise for more than 20 years. We also teach courses and host events for campus and public communities in support of our core research mission. You can learn more about MITH’s approach to collaboration, inclusivity, and learning by reading our Values Statement.

To Apply: Email cover letter, resume, and two references as a single PDF file to MITH Digital Humanities Archivist Stephanie Sapienza, at sapienza@umd.edu. Type “Application for Data Curation Assistant - [Last Name]” in the subject line. All applicants will be notified that their application was received. Selected applicants will be contacted for telephone and/or in-person interviews. For best consideration, apply before Friday, May 20 at 6:00pm. Start and end dates and work days/hours are negotiable as candidates’ schedules require. Work will be conducted both virtually and in-person, as mutually decided by the successful candidate and the MITH team, and in line with public health and safety measures outlined by the university and the county/state.