Later this week I will be attending the 2nd International Culture and Computing Conference at the University of Kyoto and presenting the paper "Diggable Data, Scalable Reading, and New Humanities Scholarship." Digital Humanities is rapidly gaining a foothold in Japanese academic scholarship, and this conference features a strand devoted to new methodologies, ideas and outcomes that arise from the application of digital methods.
In this paper, co-written with Neil Fraistat, we address two interrelated gains of the digital turn in humanities scholarship - one political and the other intellectual. First is the popularizing of humanities scholarship through the opening of opportunities for transmission of sources and outputs of digital scholarship. The paper then looks at some approaches to big data in the humanities, critiquing their value and pointing to some of the methodological questions they raise. It then goes on to argue for digital textual scholarship that can move from the massive to the particular, borrowing a phrase from Martin Mueller to argue for 'scalable reading', ultimately explaining how Project Bamboo will support the opening up of scholarship and scalable reading through digital means.
In a 1935 article in the Yale Review the historian Robert C. Binkley wrote, "Micro-copying is a technique that will... give the reader exactly what he wants, and bring it to him wherever he wants to use it." Binkley was an advocate of democratizing scholarship through the application of the new media technologies of the first half of the twentieth century. Similar arguments are often made today by advocates of the digital humanities. There are strong parallels between Binkley's approach and the gains in public humanities that have arisen from the digitization of the artifacts of human culture. The ease of transmission and the relatively low-cost of delivery that digitized works allow has a democratizing effect on scholarship, engaging a much broader public in a range of scholarly activities.
The Google Ngram Viewer brought the idea of using computation to study culture to many who had previously been unaware of its potentials. The Ngram Viewer is based around a very simple idea: type in two or more words and you get a comparison of their occurrence in the Google Books corpora over time. A range of questions of interest to humanities scholars is possible: When did a word enter common usage? When did words fall out of favor? What is the historical trajectory of a concept in, for example, nineteenth-century politics? How much were people writing about a literary figure, or a work of fiction?
The paper critiques this and several other approaches to the use of big data in the analysis of texts - including Franco Moretti's 'Distant Reading' of literary history - and then builds on these to argue for 'scalable' textual scholarship. Scalability in this context utilizes new computational approaches that allow for the interrogation of massive text objects far beyond the capability of the individual reader, while simultaneously allowing for traditional forms of close reading. Rather than only providing the opportunity for abstraction of many texts it should be possible for scholars to investigate closely the component parts that the computer utilized in obtaining the abstraction. For every step away from the text the scholar will be provided with the means to step back into the text and see the passage, stanza or phrase that is represented in the abstraction.