OpenITI AOCP: The Open Islamicate Texts Initiative Arabic-script OCR Catalyst Project

With generous funding from The Andrew W. Mellon Foundation, OpenITI AOCP will create a new digital text production pipeline for Persian and Arabic texts. OpenITI AOCP will catalyze the digitization of the Persian and Arabic written traditions by addressing the central technical and organizational impediments stymying the development of improved OCR for Arabic-script languages. Through a unique interdisciplinary collaboration between humanities scholars, computer scientists, developers, library scientists, and digital humanists, OpenITI AOCP will forge CorpusBuilder 1.0 — an OCR pipeline and post-correction interface — into a user-friendly digital text production pipeline with a wide range of new OCR enhancements and expanded text export functionality. The project will also include a series of workshops, a full corpus development pilot, and a Persian and Arabic typeface inventory, all of which will inform the development of the technical components in important ways. At MITH, Raffaele Viglianti will focus on modeling the textual data in TEI format and produce software to export the project data into a number of formats.