We have arrived in the last months of AOR, which is scheduled to come to an end in late January. This means that we are busy with getting the next version of our digital research environment ready. Part of the remaining work is very exciting and comprises the creation of a set of contextual documents on John Dee, his books that are in the AOR corpus, and his library. To that end, a humanities meeting was recently held at Princeton’s Institute of Advanced Studies (see Neil’s blog for more information). Necessary but unfortunately somewhat less exciting is the work that takes place in the trenches: checking transcriptions, hunting down bugs, and testing the viewer. This blog post addresses a problem which has haunted us over the course of AOR and which we had to face head on last month.
The problem is a rather mundane one: displaying the correct page numbers of a particular book in the AOR viewer. Getting this right proved to be labour intensive, while the problem itself shows the tension between the ways in which a (digitized) book as an object is viewed from distinct humanistic and technical perspectives. The problem emerged due to a combination of the particular technical infrastructure inherited from earlier projects conducted at Johns Hopkins University (JHU) and our transcription policy. Based on earlier work on the Roman de la Rose and the Christine de Pisan projects, the digital images that are ingested into the JHU archive are labelled according to the sequence of manuscript, starting with 1r, followed by 1v, and so forth. This folio numbering system both reflects the objects on which these projects focused, medieval manuscripts, as well as the conceptualization of objects within IIIF viewers as a sequence of images. However, anyone vaguely acquainted with early modern imprints will now that the page number system of these objects in general is much more complex because of the combination of page numbers and signatures, with some books having separate sequences of page numbers and/or signatures for individual sections.
As a result of the internal system of attributing page (i.e. folio) numbers, a mismatch between the page numbers visible on the digital images and the page numbers displayed in the bottom of the viewer emerged, as shown on the image below.
One way to overcome this hurdle was to include information about the page number and signature in the XML transcriptions generated for the project and to display this information in the viewer. We managed to get this working in the new version of the viewer – which currently is under construction and not yet publicly available – but quickly realised that this problem continued to persist in a number of cases. This was caused by our decision not to create XML files of all digital images, but only of those that contain one or more reader interventions, that is, any visible interaction of a reader with that page. Since at least around half of the pages in the books which are included in the AOR corpora are not annotated, XML files for these pages do not exist. As a result, for the pages which do not have an XML file associated with them, the viewer returns to its internal numbering system, creating an extremely awkward combination of page numbers and/or signatures and folio numbers (see below).
Several possible solutions existed, including the creation of a XML transcription for every digital image in the AOR corpora. This would mean having to create several thousand XML transcription which, although the transcriptions themselves would be small, would take in inordinate amount of time. The other, more feasible option was to create spreadsheets for every book in the AOR corpora which comprise information about the file name of the digital image and the information about page numbers and/or signatures (in the absence of the former) contained in the XML transcriptions. Luckily the junior programmer working on AOR, John Abrahams, was able to generate these spreadsheets, which meant that I ‘only’ had to enter manually the information regarding the page numbers of the digital images which do not have a transcription associated to them.
Nevertheless, I still had to go through 35 spreadsheets, check the information provided in them, and add/amend data where necessary. This process was further complicated by the mismatch between the file names of the images to which we refer in our transcriptions and the file names that are used internally in the JHU archive. These internal file names, created during the process of ingesting the digital images into the JHU archives, were provided in the spreadsheets. Based on the data in the transcriptions, I had to match the two sets of file names and then add the correct information to the spreadsheets. While being motivated by the thought that finishing this work would constitute an important step towards beatification, and spurred on by listening to football shows and a healthy dose of death metal, I did manage to populate all the spreadsheets. Although the work itself was everything but interesting, the final result is all the more pleasing. The information provided in the new version of the AOR viewer now matches the page numbers and/or signatures of the early modern imprints. Apart from avoiding confusion, this makes navigating these annotated books and doing research in the AOR digital environment much easier. Moreover, during the process of going through the spreadsheets, I encountered a couple of bugs (e.g. transcriptions being associated with the wrong images), constituting an additional fruit of this work. Apart from being a heroic story of sacrifice, perseverance, and the divine combination of football and metal, this blog shows the extent to which the spade work, done by both by computer engineers and humanists, forms the rock upon which this digital resource is built.
Two weeks ago, a good portion of the AOR Humanities team convened at the Institute for Advanced Study in Princeton for one final meeting before we all meet in London to celebrate the end of the project! Unlike March’s meeting on Dee and his books, the only complications getting all of us in the room came from a few stray Skype outages, a welcome change from the blizzard-induced panic that marked our first group session. There’s a lot left to do before we roll out the viewer, but the meeting was an excellent chance to take stock of how far the project has come, and how best to encapsulate the many uses that Dee and Harvey found for the small subsets of books we’ve drawn from their respective libraries.
After months of transcription work, the massive amount of data from Dee’s annotated books is nearly ready for analysis. The weeks leading up to the meeting were spent ensuring that we had the best, and cleanest possible data set to send out. The culmination of that exercise was an afternoon of scribblemania, as we put our heads together to try and decipher the considerable list of illegible, questionable, or otherwise incomprehensible annotations clinging on for dear life in the margins of Dee’s books and the comment fields of our XML transcriptions.
Some of the astrological marginalia proved the most stubborn of the lot. Cardano’s Libelli Quinque had a number of holdouts, since even knowing the language they were written in wasn’t a guarantee that they’d be comprehensible. Tony Grafton set the gold standard for conundrum cracking on the day when he extracted “Ophiuchi” from this semi-legible note in the bottom corner of Dee’s corrections to Cardano (p.229).
The rest of us then learned that Ophiuchus is a constellation, which we’re now happy to pass on to users of the viewer. Conversations like this one have played out virtually in the comment fields of our transcriptions, but having them in person is always a treat, and a great reminder of how multiple perspectives can enrich the readings of books like these.
Banishing stubborn marginalia wasn’t the only fun that we had during the trip, we also had the chance to look over the Institute’s stellar collection of rare books in the history of science, begun by Lessing J. Rosenwald. Highlights of the many wonderful books that Marcia Tucker, the Historical Studies Librarian, pulled out for us to see included an annotated and expurgated first edition of Copernicus’ De Revolutionibus, along with a newly acquired collection of Spinoza’s works, including his Tractatus theologico-politicus with not one, but three false title pages attached.
We’re now hard at work on the concept groups for the Dee corpus, which should provide a good glimpse into how differently these two men approached their books. We’re also making tweaks to the new version of our viewer, which should be out for beta testing in the next few weeks. Watch this space!
This guest entry comes from Philip Palmer, Head of Research Services at the Clark Library at UCLA. Philip writes about his experience using the AOR schema to encode transcriptions of annotated books held at UCLA
In July of 2014 I started a CLIR postdoctoral fellowship at UCLA’s Clark Library on the subject of “Manuscript Annotations in Early Modern Printed Books.” Less than a month into my postdoc The Archaeology of Reading in Early Modern Europe (AOR) project was announced, and naturally I was excited about potential discussions and collaborations with Earle Havens and his team. The Clark hosted a symposium on annotated books in December of that year, and both Earle and Matthew Symonds were in attendance. At this symposium an international group of scholars, librarians, and curators discussed various topics related to the study and curation of early modern manuscript marginalia; the symposium also coincided with the beginning of a pilot project to digitize ten annotated books from the Clark Library’s collection (since expanded to 60 books from our ongoing NEH digitization project, about which more info to come below). Since the symposium, the AOR team was generous to meet with me about the XML schema they developed and encouraged me to adapt it to the annotated books digitized at UCLA.
One of the main differences between the AOR corpus and the annotated books digitized at UCLA is the latter’s more variable range of annotators and annotation types. Several of the annotators are anonymous and most are somewhat obscure, with only one from the original ten books being a canonical writer (the playwright and literary critic John Dennis). None of these readers annotated more than one book in the group of ten, unlike the focus on two specific readers in AOR. The Clark readers also take many different approaches to their annotations. A copy of Sir Thomas Browne’s Pseudodoxia epidemica (2nd ed. of 1650) annotated by a seventeenth-century English lawyer comprises a complex layering of cross-references to work by Browne and other contemporary scientific texts. Being one of a handful of copies bearing errata corrected in the hand of John Florio, a copy of the 1603 English translation of Montaigne’s Essayes contains marks and marginalia made by a reader in the 1680s—a reader preoccupied with how Montaigne “talks of himself.”
Another book in the UCLA corpus—Richard Allestree’s The Art of Contentment (1675)—features casting-off marks and marginalia made by a printer or compositor, presumably to plan a new edition of the text (though this new edition never materialized). Also digitized is a copy of Aleazar Albin’s The Natural History of English Song-Birds (1779), annotated in the early nineteenth century by an avian enthusiast named Judith Gowing, who supplemented the printed text with handwritten advice on bird-care (and a bit of taxidermy)
The six other books initially digitized from the Clark’s collection range from polemical critique in a 1724 edition of Confucius to devotional marginalia in a 1708 spiritual autobiography. In other words, there is not a common theme, method, or reader in the annotations digitized from UCLA; rather, these ten books are representative of the characteristic idiosyncrasy that historical readers brought to their material readings of books.
With support from the UCLA Digital Library to digitize these ten original volumes, the next step for our project involved transcribing the annotations. At first I explored the viability of using the Text-Encoding Initiative’s (TEI) standards to transcribe and mark-up a test set of transcribed annotations (from Roger Ascham’s A Report and Discourse of 1570). One good reason to use TEI in this case was the existence of an encoded file of the printed text of Ascham’s Report, created through the Text Creation Partnership (TCP) at the University of Michigan. The existence of this file meant all I had to do was add the text of the manuscript annotations to the existing transcription of the printed book and edit the TEI Header (the Oxford TCP website is a good place for finding such files). But one big problem is that TEI is not designed for dealing well with manuscript marginalia and cannot achieve the desired level of granularity in its encoding.
Serendipitously, it was also around this time that I met with Earle, Jaap, and Matthew to discuss the AOR XML schema and how it might be used for non-AOR projects. I was impressed with the level of detail possible with AOR markup, especially compared to the limitations of TEI for annotation encoding. While I did not plan to make too many changes to the AOR schema, there were a few small tweaks I made to accommodate the idiosyncrasy of the Clark Library annotated books. These tweaks included adding more values for handwriting type and marginalia topic, refining the way internal cross-references are encoded, and creating a new attribute for “marginalia type” within the <marginalia> element.
A month or so later the Clark Library applied for and received a small grant from the Gladys Krieble Delmas Foundation that enabled us to hire three UCLA graduate students to transcribe and encode this original corpus of ten annotated books. Two English students (Samantha Morse and Mark Gallagher) and one History student (Sabrina Smith) spent three months during the Summer of 2016 transcribing and marking-up the annotations in seven of the ten books. (The annotations in two books—Sir Richard Blackmore’s Prince Arthur and Voltaire’s Dictionnaire Philosophique—proved too voluminous for the students to finish.) On day one I offered a three-hour crash course in early modern paleography and XML text-encoding; the session was supplemented by a detailed training manual. To make the encoding process easier the Clark purchased the Oxygen XML editor software for each student.
As the transcription project was intended to pilot workflows and methods for transcribing and encoding annotated books, we were just as interested in learning about process as we were in the product of the transcribed annotations themselves. For each of the student transcribers, the beginning of each book posed difficulties, primarily related to learning an individual’s handwriting quirks. XML encoding presented a challenge as well, though the combination of the training manual, AOR schema, and Oxygen’s auto-complete feature helped our transcribers grow accustomed to the work.
This pilot project also entailed comparing the TEI-encoding of manuscript marginalia with transcriptions made according to the AOR schema. Of the seven books the students completed, two were transcribed with TEI mark-up rather than the AOR schema. In both cases, these books are available as existing TEI files through the Text Creation Partnership. In the end we concluded that the AOR schema was preferable for marking-up text to enable research on manuscript marginalia: it captures much more information than is possible with TEI, including the ability to mark-up non-textual annotations such as underlining and symbols. The one aspect of TEI-encoded annotated books I do like, however, is the ability to mark-up both the printed text and the manuscript annotations. (The AOR schema only captures the annotations themselves, though encoding an entire printed text by hand on top of the annotations is a prohibitively time-consuming enterprise!)
When the three-month transcription phase of our project ended in September 2016 there was still much additional work to be done. First I had to edit the transcriptions for accuracy, which proved to be one of the most time-consuming aspects of the project. Next I had to plan how I would display these transcriptions and digitized annotated books online without having to hire a team of programmers. We at the Clark have been fortunate to partner with the UCLA Digital Library and California Digital Library to publish the digital scans of these annotated books on Calisphere, which is a digital object platform for libraries in California, especially from the University of California campuses. The ten digitized books were published on Calisphere in mid-March 2017. But earlier in the project, when preparing the digitized books for transcription work and developing a website to showcase the transcriptions, it was necessary to upload the scanned pages to the Internet Archive, primarily so we could start exposing these annotated books to a wider audience. And since all Internet Archive digital objects now conform to the International Image Interoperability Framework (IIIF) metadata standards, it was possible for me to display our annotated books using IIIF on a custom website. (Calisphere will be using IIIF in the near future too.)
In fact, the Clark won an NEH grant in 2016 to digitize over 250 early modern annotated books, so the Calisphere collection will grow considerably when the project concludes in October 2018 (60 books currently available). Combined with the Clark’s recently completed CLIR grant to digitize over 300 early modern English manuscripts, the Calisphere collection will become one of the largest digital repositories of early modern English manuscript material when both projects are completed.
During all of these digitization and transcription activities it has been wonderful to work with the AOR team and watch developments in their project. As anyone who has worked on Digital Humanities projects knows, it is never a good idea to “reinvent the wheel,” and the existence/accessibility of the AOR XML schema has ensured that the Clark Library annotated book transcriptions are largely interoperable with those produced for the marginalia of Gabriel Harvey and John Dee. I encourage any other annotated books projects out there to follow our lead and re-use the AOR schema for your transcription work, as Earle, Matthew, and Jaap have been extraordinarily generous in sharing their work with the larger academic community.
Blizzard aside, it was great to take AOR on the road for this year’s Renaissance Society of America (RSA) conference in New Orleans. Some great conversations emerged from discussion of Dee’s books, both in and out of the corpus. During the panel “Paging John Dee” Stephen Clucas pointed out a feature of Dee’s notes in his alchemical manuscripts that had caught my eye in the Geoffrey of Monmouth: a number of annotations bearing the initials “I.d.” Encountering these brought to mind a similar question for both of us: “why did Dee sign some annotations and not others?” and, more generally “what could this practice mean?”
Since mentions of people in our individual annotations have been tagged, a quick search of AOR for “John Dee” (in quotation marks) in the person field of an annotation yielded 46 results across 16 of the 22 Dee books we have transcribed.
This search required a little cleanup, as it captured annotations that included references to Dee or his books, or Dee’s ownership inscriptions on a title page. Eliminating those revealed 28 signed annotations across 9 different books.
|Title||Number of Signed Notes|
|Monmouth, Historia regum Britanniae||9|
|Cardano, Libelli quinque||6|
|Paris, Flores historiarum||4|
|Walsingham, Ypodigma Neustriae||2|
A few observations jump out from just the data. First, some texts appear to have been signed more heavily than others, with the Geoffrey of Monmouth signed the most extensively. Good to know that I noticed these notes after being beaten over the head with them (comparatively speaking) in the Monmouth. More specifically, these appear to be Dee’s way of editorializing and, in particular, problem-solving, as he does here, in explaining a correction to Maternus.
Or here, questioning the lineage of Britain’s earliest inhabitants.
Second, in the books with multiple signed annotations, the signed annotations appear to cluster together. Flyleaves and endpapers are consistent candidates, but within the body of the text, for example, we find a succession of signed notes in the Monmouth (fols 15-19) the Cardano (fols 56-63), along with two other signatures next to Dee’s work on tables.
However, in the Cardano, some of these readings are dated, a trend observed in other books that Dee read around 1554-55 like the Mathemalogium, which records the date (and location) of Dee’s shared reading of the text in a Harvey-like note.
By contrast, only one signed annotation in the Monmouth is dated, chronicling Dee’s discovery of a corrected manuscript that supports his (earlier) assumption about ancient place names, discussed in this earlier post.
Some cautions here about the value of data alone – the clustering of signed annotations within any particular volume may be reflective of the overall clustering of annotations. Without broader context, we might not know how representative of a certain genre, time period, or topic this type of intervention is. It isn’t possible to locate all the books in Dee’s library catalogue, and even if we could (as I’ll leave for another post) we know that these aren’t the only books that he was able to put his hand on. Because such rich records survive, and because Dee was such a distinct annotator and intellectual figure, we know enough to locate books that aren’t mentioned in the catalogue that have Dee’s annotations.
However, searching the entire corpus cast light on tendencies that I never would have encountered had I just been looking at Dee’s “historical” books (or indeed, just the Geoffrey of Monmouth). His dating of annotations, in particular, seems to have peaked at an earlier period than his notes in the Monmouth volume. It also allows us to observe Dee actively engaging in conversation with his books (clarifying material and posing questions) and, we must assume, the other readers that encountered them in his library. Even if these outputs don’t fit into neat or, for that matter, readily apparent categories, they allow us to ask questions that wouldn’t be askable of one book, or perhaps even one note in one book. Pulled together quickly, this zoomed out view of Dee’s reading helps test initial questions like “why would Dee ‘authorize’ his own notes?” and tie them to new discoveries in the field.
This approach might also help scholars to identify practices to investigate beyond the corpus. On the same RSA panel, Jenny Rampling showed how far Dee’s notes might travel, tracing one out of the margins of an alchemical manuscript and into a printed book, via a fair copy made at the request of Dee’s traveling companion and “seer,” Edward Kelley (1555-97). If the notes in these books were Dee’s intellectual property, the Q&A session also revealed an early example of its theft: Nicholas Saunder, who made off with books from Dee’s library and tried to disguise their origin (as he has in the Pliny), appears to have written over Dee’s initials in the marginalia as well. Just as we “encounter” Dee in the margins of his books, so too did his contemporaries. What might their impressions been of him?
At the most recent conference of the Renaissance Society of America (RSA) in New Orleans, I planned to speak about the difficulties in writing a more general history of historical reading practices and offer several possible solutions. More specifically, I wanted to explore various strategies which can be employed in order to examine similarities and differences in the reading practices of Gabriel Harvey and John Dee. Sadly, though, a winter storm prevented me from leaving Princeton and ultimately from giving my paper as I only arrived in New Orleans on Friday afternoon (hence missing out on most of the conference). Although my paper was unlikely to revolutionize the field, some of the issues I address are relevant to those working on the history of reading. I therefore would like to make use this space to briefly discuss one particularly vexing problem, namely the difficulty of incorporating topical marginal notes in our analysis.
According to Bill Sherman, a ‘topical note’ are those marginal notes which acted ‘as a concise key to the topic of a passage’ (Sherman, Dee, 81). In general these notes consist of a just few words, often copied from the printed text, which indicate the main topic of a section. It is not that these topic notes completely escape the possibility of scholarly analysis: at their very core they show the particular intellectual interests of a reader. When having the advantage of working with a known reader, such as Dee and Harvey, knowledge of the historical context is of invaluable help when trying to make sense of such annotations. As Sherman remarked, ‘Dee’s notes in these passages [in some medieval books] are rarely interesting in themselves…but their value lies in the fact that he consistently drew attention to the material that would inform his own historical and political discourses’ (Sherman, Dee, 91).
At the same time, even when equipped with (detailed) biographical information about a reader, the lack of interpretation on the part of the reader in turn renders topical notes difficult to interpret for scholars. Hence our inclination to focus on those marginal notes which are more verbose and informative in nature. However, such marginal notes represent only a small minority of the annotations that decorate the pages of most early modern books. Due to our focus on the relatively small number of interpretative notes, our research tends to be rather impressionistic in nature. But in general, topical notes abound: they litter the pages of the books owned by John Dee, while even a substantial number of annotations made by Gabriel Harvey, unusually verbose when annotating his books, are of the topical kind.
Data-driven approaches can be used to show the proliferation of topical notes and might offer a solution to overcome, at least partly, their limitations. A particularly revealing case is Dee’s copy of Ovid’s Ars Amatoria (Paris, 1529). Dee only annotated this book sparsely: he scribbled 181 marginal notes in the margins while he underlined approximately 4000 words of the printed text. Moreover, the average length of these marginal notes was 1.3 words, meaning that the majority of these notes just consisted of one word.
This is very little, even when compared to other books annotated by Dee and Harvey (bear in mind that the transcription work is ongoing and the figures in this overview are based on the statistics generated in late March 2018).
Some books clearly stand out. Dee’s copy of Euclid’s Elementorum, for example, only contained 25 annotations, but with an average length of almost 24 words (23.96). This average is greatly inflated by the appearance of a couple of lengthy marginal notes at the start of the book. The numbers relating to Dee’s copy of the Pantheus’ Voarchadumia are skewed as well: Dee interleaved this book with blank pages onto which he copied the text another tract by Pantheus, the Ars Metallicae. Because these interventions are treated as marginal annotations, the average number of words is greatly inflated. Harvey’s copy of Livy’s History of Rome, boasts a similar average (21.8 words), but based on an astonishing number of 854 annotations. Although massive annotations, such as one which consists of a staggering 718 words, helps to increase the average, the number of annotations which consists of one word are extremely limited: just 17 out of 854 annotations (almost 2%).
(Topical notes in Livy’s Ab urbe condita, p. 27, and Frontinus’ Strategemes, Gii).
In general, Harvey’s annotations were lengthier than those of Dee, as visible in the table above: Harvey’s average of 12.6 words against Dee’s average of 3.5 words.
Let’s return to the example we started with, Dee’s annotations in Ovid’s Ars Amatoria, and have look at the content of these short marginal annotations. When creating a list of the words and the frequency with which they appear, the results are everything but surprising: number one on the list is the word *drumroll* amor (or its declensions), which is mentioned 31 times. In the vast majority of cases, the marginal note solely comprises the word ‘Amor’. What to do with these marginal notes? Close reading is one possibility: which passages did Dee mark with this word and, just as significantly, which passages were not indicated by Dee in this manner. Such an analysis can be expanded by including other books which contain marginal annotations with the word ‘amor’. Such a ‘thematic’ search returns several hits for marginal notes in Cicero’s Opera and Quintilian’s Instititionum and can reveal a reader’s interest in a particular topic across his or her library.
Another, data-driven approach, would be to employ statistical analysis. This is a strand of AOR that we started to develop near the end of the first phase of the project (2014-6) which focused on Gabriel Harvey. Our approach is based on the creation of concept groups, consisting of words which are related to a specific topic, including war, kingship, eloquence, books, action, etc, and which appeared with a certain frequency in Harvey’s marginal annotations. After that we, and by we I mean professional statisticians, calculated whether or not there existed statistically significant correlations between concept groups. That is to say, the extent to which words which are part of one particular concept group appear in conjunction with words that are part of another. In this way, we can discern whether particular topics of interests were related to one another. As such, we are primarily interested in the intellectual patterns that appear in the marginal notes, not in the numbers generated by the statistical analysis themselves.
Although at present it is impossible to subject Dee’s annotations to such an analysis, simply because the transcription work is ongoing, we will be able to do so in a couple of months. By means of thematic searches and data-driven approaches such as statistical analysis, it might be possible to include some topical marginal notes into our scholarly investigations. Such data-driven approaches do not necessarily yield information about individual marginal notes: one-word notes, for example, cannot reveal a correlation between concept groups. However, topical notes are included in concept groups and hence figure in a larger thematic analysis.
We might be even able to expand the current statistical analysis: what happens when we start to study the correlation between books, people, and concept groups? Another possibility is to check the names of the people mentioned in marginal notes against the index of a particular book. Which people were and were not singled out by our readers? I mention these possibilities in order to make clear that there are strategies for the inclusion of topical notes in our analysis. Invariably, such strategies are time-consuming and will require a lot of work from the scholar. However, they might enable us to include a larger number of marginalia in our analysis and to get a more rounded understanding of historical reading practices and strategies.
In an earlier blog, Matt Symonds discussed some losses which are inevitably part of the process of digitization. The material aspects of books in particular – their size, weight, feel and, indeed, smell – are difficult or impossible to convey on a screen. Consider, for example, the title pages of Livy’s History of Rome and Tusser’s Husbandry as displayed in the AOR viewer. The size of the images is exactly the same, obscuring the actual differences in format (the copy of Livy’s History owned by Gabriel Harvey is a hefty folio, whereas Tusser’s Husbandry is a quarto).
In spite of the loss of some of the physical qualities of a book when transferring it to a digital environment, in most cases the pros of digitization still outweigh the cons. Above all, the increased and ready access to books (as well as to archival documents) is a major stimulus to scholarship as it overcomes all kinds of financial, spatial, and institutional hurdles.
When doing research on Catholic marriage practices in the seventeenth-century Dutch Republic, for example, a manuscript letter made mention of a tract written by the priest Joannes Stalenus. A quick search on google returned a digital version of Joannes Stalenus’ Dissertatio Theologo-Politica hoc tempore discvssv & scitv necessaria […] (Cologne, 1677), a book recently digitized by Google (see here). Due to the existence of this digital copy, I did not have to hunt down the book in research libraries in the UK or abroad, but could directly access it. The online availability of a digitized copy of this book was so useful because I was primarily interested in the contents of this book, and not in the physical or other aspects that are specific to this copy. In other words, a digital version of any copy would have sufficed for my purposes (as long as the quality of the digital copy is up to scratch).
As a scholar of historical reading practices, however, I’m very much interested in particular copies of books, namely those which contain reader interventions, physical remnants and traces of the ways in which people used their books in early modern Europe. This morning I wanted to call up a particularly densely annotated copy of Cicero’s Librorum philosophicorum uolumen primum […] (Strasbourg, 1541) which is part of the collections of the British Library (BL) (shelf mark 525.c.1,2.). In the spring of 2016 I stumbled upon this book when searching the holdings of the BL for books annotated by the Elizabethan polymath John Dee, but soon realized that it was not Dee who annotated this book (to my great relief, I have to admit, since transcribing this book is a gigantuous task). This is what some of the pages of the book look like:
To my surprise, I was not able to order this book through the online catalogue of the BL (although I was later assured by a librarian that this should still be possible), but was referred to its digital version – the book has recently been digitized by Google. Duly I opened the digital version of the second volume, but my initial enthusiasm vanished as a saw how sloppy a job had been done.
A number of marginal annotations which decorate this particular copy have been trimmed, instantaneously rendering this book useless for those of us who want to examine the annotations. Compare this image, for example, with the image above (a picture which I took myself some time ago).
Based on a quick inspection of the digital copy, it seems that the marginal notes in the gutter have been completely captured, whereas the marginal annotations in the outer margins often have been trimmed, as can be gleaned from the following image:
Compare this with the picture of the same opening I took:
In general, the annotations in the gutter are most difficult to capture, in particular when a book is tightly bound. That does not seem to have been the problem here, so it is a mystery why so many annotations have been trimmed. Perhaps this has something to do with the particular process of digitization employed by Google, is simply the result of a lack of interest or knowledge, or is caused by the lasting influence of the idea that only the printed text really matters. It is, after all, not so long ago that collectors and booksellers preferred to have ‘clean’ instead of ‘dirty’ books and resorted to various methods in order to restore their books to their (presumed) original and pristine state (see William H. Sherman, Used books, Ch. 8). Whatever may have caused these flaws, this particular digital version is nothing more than a pale and incomplete representation of the original object. It still is useful, but only to a very limited extent. The process of digitization always involves some loss, but digitization done badly hampers rather than furthers scholarship.
How did John Dee make sense of what he was reading? We at AOR have the luxury of examining Dee’s annotations with the apparatus of stable critical editions, the extensive reserves of research libraries, and the even more capacious Google search box at the ready. While Dee enjoyed none of these things, the annotations in his books hint at the breadth of information he brought to the works that he read, and remind us that no one activity dominated his reading. We also get a better sense of how comprehending a book even at the most basic level, could require specialist knowledge in the sixteenth century.
The gloss in Cicero’s Epistolae ad familiares (Letters to Friends), for example, mostly consists of copying out names, places, and even particular words and turns of phrase from the text, while correcting mistakes in it. I was surprised at how many of Dee’s critical comments could only be explained through recourse to footnotes in my modern edition. Some of the typos and omissions could be caught by an educated Latin speaker, but others, like the breaks between letters, show recourse to other versions of the text, perhaps in the form of printed or handwritten commentaries that circulated alongside other editions of Cicero.
In other words, his two-volume, deluxe collection of Cicero’s works couldn’t be met with the same implicit trust (or perhaps willing acknowledgment of my own ignorance) that I brought to the indices and appendices of the Loeb Cicero, trying to keep up with what Dee was putting down.
Aside from being a humbling experience, transcribing these glosses raises an important question in addition to that of Dee’s own comprehension. As a scholarly resource, our transcriptions should allow a modern reader to understand Dee’s annotations, and that can mean tagging more detailed information about people and locations into the transcriptions and translations. But what if those tags might not agree with Dee’s own identifications? In other words, If Dee doesn’t agree with the (modern) text, are we allowed to disagree with him?
Historians of reading can afford to be more flexible than textual critics in how we treat variant or “multiple” readings without needing to label them “misreadings” or “mistakes.” Even so, these departures put us in a place where no clear or convincing explanation can be drawn that doesn’t pass through Dee’s own mind. Fortunately for us, Dee’s marginalia across several books offer evidence of his own approach to the same editorial problem.
Dee’s library, like his reading, was vast and varied, containing the most current reference sources of the time – printed bibliographies and anthologies like the Cicero volume – as well as “ancient” manuscripts in their original languages. He thus had a fairly sophisticated understanding of how mistakes might be made, and (in keeping with his contemporaries) we find this on full display in his use of ancient etymologies. Historical and antiquarian writing of Dee’s time was full of telling toponyms that revealed the ancient history of places or peoples, if only their true meaning could be extracted from the corruption of time and translation.
This technique had been practiced by historians for centuries, and as a result Britain’s ancient history was a minefield of mythical associations. The prevailing narrative, first set in writing by Geoffrey of Monmouth in the early twelfth century, was hotly debated in and around Dee’s own time. It linked the island’s name to Brutus the Trojan, a great grandson of Aeneas who had led a band of Trojan exiles from Greece through the Mediterranean and France before defeating the island’s former (gigantic) inhabitants. Brutus then named the island Britain and built its first city, a New Troy that would eventually be renamed London in memory of another legendary king, Lud.
While Geoffrey’s inaccurate or all-too-convenient descriptions of places had aroused suspicion (or derision) among historians since Geoffrey’s time, in his printed copy (Christ Church Oxford Wb.5.12), Dee treads this well-worn ground and shows himself to be a master of the name game.
His notes in the early chapters of the Historia locate the would-be Britons near the Acheron river in eastern Greece and as they move through the Mediterranean, Dee comments upon the probability of the account (both in general and in his specific copy) by investigating not only changes in language, but the havoc that their recitation and orthography might wreak upon unsuspecting generations of copyists and translators. Here, Dee explains how Tragecia, a small island near Corfu the nonexistent island of Lergetia (or in some copies Leros which, though extant, was in the wrong direction), after one copyist mishears “Targetia” and a second confuses the Greek character tau for a lambda.
Simple enough, provided that you come to your sixteenth-century book with a working knowledge of manuscript copying practices! We also see Dee taking into account the distance between locations here and in the pages that follow, constructing plausible alternatives where necessary.
For Dee, it was possible (perhaps even routine) to learn from a source and critique it at the same time. “Getting it right” involved accounting for and explaining a certain amount of error. His point, and one well taken by those studying the ways early readers approached their books, it is that there can be quite a lot to learn from mistakes.
In a previous blog entry, we talked about how Chris Geekie taught a class studying an annotated Hamlet prompt book from 1676, where the students would study the prompt book in a similar way to how Gabriel Harvey’s marginal annotations were studied in the AOR project. This summer course, funded by the Andrew W. Mellon Foundation, was designed to introduce local community college students to digital humanities research. To support the course, Chris needed a new instance of the AOR viewer setup that would allow his students to study this version of Hamlet. This way, the students could emulate the AOR process to experience research in digital humanities. The AOR technical team provided this new instance, demonstrating how AOR’s technical infrastructure – the technologies that lay underneath the appearance of the books on a webpage – can be adapted for other collections.
The Hamlet prompt book was annotated by the eighteenth-century English actor John Ward, with a greater focus on editing the printed text than Harvey’s more interpretive annotations. On almost every page entire lines are crossed out, words replaced, punctuation added, and more edits that come together to show a reader John Ward’s version of Hamlet.
The current way to represent word substitutions as “errata” did not capture the nuance of the different editing annotations from the Hamlet prompt book. Chris and the technical team decided to represent the various annotations as “substitutions”. These substitutions would be typed, so different edits could be represented, potentially treated differently, and individually searchable. Deletions could be thought of as substituting some letters, words, or lines with nothing. Insertions would basically be blanks substituted with something. This change added one new way to represent annotations in the AOR data model.
The technical team determined how to handle the new annotations in the viewer. There were two main aspects that needed to be addressed: how these annotations appeared in the annotation side bar and how these annotations were searched in the viewer. When viewing the annotations in the sidebar alongside the page image, we decided what information would be useful to a user and supported a student to identify the annotation in the image. For searching, it was important to separately search for the different types of substitution (which informed the choice of fields and what information to index for each field).
Modifying the technical infrastructure to support this change was fairly straightforward – evidence of its extensibility. Referring to the diagram of the AOR technical infrastructure (see below), this change to some degree affected the Archive, IIIF Presentation Service, and IIIF Search Service. (For more about the technical infrastructure that supports AOR, see the documentation page).
To accommodate these changes, we modified the archive to recognize the new annotations added for the class. Once the new data was recognized in the archive, we treated it the same way we treat the rest of the AOR data. In the IIIF Presentation Service, we defined how the new AOR/Hamlet annotation appears as a IIIF annotation. We used the IIIF Search Service to index the annotation data to make it searchable, which included defining the search fields that a user would pick in the search interface. Once these changes were made to the infrastructure, the Mirador interface was automatically able to display and search the new annotations.
It is important to stress the value of the IIIF standards. Since the AOR viewer understands IIIF data, making changes to the underlying AOR data model does not require modifying the viewer. Instead, we treated the new data by transforming it into a IIIF compliant form. The viewer automatically handled the new data because it is in a well understood format.
This past summer I had the opportunity to use the Archaeology of Reading to help teach a course for a group of community college students visiting Johns Hopkins. This course was part of a national, Mellon-funded initiative to bring together community colleges with research institutions such as Hopkins in order to introduce students to high level research environments in both sciences and the humanities. I was tasked with developing a syllabus that would make use of the Archaeology of Reading as a way to engage students directly in digital humanities research.
The plan was to work with the students to analyze and transcribe an annotated copy of Hamlet currently held at Hopkins:
These marginalia and textual changes are found in a copy of the 1676 edition. They were made by the 18th-century English actor and theater director, John Ward.
I produced a syllabus which divided the course into three sections:
1. Reading Hamlet and learning about Early Modern England
2. Learning about digital tools and environments
3. Transcribing the annotations using the workflow developed during AOR phase 1
For the first section of the course, spent a couple of weeks going through the play, talking about the plot, analyzing the characters, and discussing our interpretations. We made use of a good critical edition of the text from Oxford Classics. This edition offers useful footnotes for explicating difficult language and passages.
The Oxford Hamlet also provides an excellent introduction to the complex printing history of the play. In reality, there are three early printed editions of Hamlet, all of which contain substantial differences between them: the “Bad” Quarto (Q1) from 1603, the “Good” Quarto (Q2) from 1604, and the First Folio (1623).
The Oxford edition reproduces the text from the First Folio, while also including discussions of Q1 and providing important sections of Q2 in an appendix. As a result, not only did we discuss the play itself, but we also considered one of the central themes of scholarship dealing with marginalia, the varying interpretations and modes of reading across different periods.
From there I introduced the students to several online resources, such as Early English Books Online and the Archaeology of Reading. With EEBO, we were able to start looking at the format and layout of early editions of Shakespeare.
To acclimate them to rather alien fonts and to help prepare them for transcribing the Hamlet marginalia, I introduced my students to the AOR viewer. I asked them to practice transcribing sections of printed text from Machiavelli’s Arte of Warre (1573), which has a rather tricky font for those unaccustomed to looking at sixteenth-century books.
I then asked them to turn to Gabriel Harvey’s marginalia. We looked at one of his longer notes, found in an English translation of Castiglione’s The Courtyer (1561), which describes the ideal characteristics of a courtier. It also proved to be an interesting moment of comparison with the character of Hamlet, described by Ophelia as “the glass of fashion and the mould of form.”
After some practice in transcribing print and handwriting, I divided up the text into chunks of about 11 pages and assigned them to each of the students. I had eight students total, so this proved to be a very manageable division of labor. The marginalia themselves were also fairly straightforward:
Many of the notes—perhaps unsurprisingly for a stage manager—dealt with the entrances of characters, including where they might be positioned. In this instance, the ghost ought to be “under the stage.”
Yet modifications to the text, at least to this extent, was not something initially included in the development of the AOR transcription paradigm. Harvey and Dee, though often engaging actively with their reading material, are not particularly interested in correcting or changing the printed word to make it easier to perform. With Ward, there were several ways of “interacting” with the text:
In this example we see several novel elements:
1.a large deletion of sections of text
2. a new symbol (looks like a long line with hatching)
3. the replacement of words and phrases (“my good lord” → “good my Lord”)
4. insertion of new elements, such as punctuation (ubiquitous in Ward’s promptbook)
To capture this information, I had to slightly modify our XML schema by incorporating a new tag and including a new symbol. This work was not particularly challenging, and our programmers were able to adapt to this different schema relatively easily.
To help the students in their transcriptions, almost all of whom had never worked with any sort of machine-readable language, I produced a simple transcriber’s manual (a pale imitation of the work done by Jaap and Matt for AOR). I also created a template XML file, which contained examples of the basic elements needed to transcribe a page of the Hamlet. All the students would have to do is copy, paste, and modify in order to capture the relevant information. These files, as well as the final XML files, were uploaded to a GitHub repository, which basically follows the same format as the AOR one.
Overall the students were quite invested in the work, although it took awhile to fall into a rhythm for accurately transcribing texts printed over three hundred years ago. We used class sessions as transcription workshops, where students were able to make use of laptops provided by the library. I was able to answer any questions the students had, and being together made it easier for them to check each others work.
Eventually the students produced XML files for the entire work, which can be found here, on a separate instance of the AOR viewer.
The interface is identical to that of AOR phase 1, although it is immediately clear that Ward’s style of annotation clearly functions much more differently than Harvey’s or Dee’s.
In addition to producing a tool for scholar’s to consult when researching early annotated editions of Hamlet, the students also stumbled across interesting elements in the text. For instance, one student found one of the earliest examples of an emendation to a particularly obscure passage in the play:
After doing some research, we discovered that Ward’s emendation (“hernshaw”) does not derive from any earlier editions of Hamlet but rather represents an attempt to clarify the ambiguous identity of “handsaw,” which is actually a bird and not a carpenter’s tool.
Another student focused on the interesting punctuation in Hamlet’s famous soliloquy in Act 3:
This student ultimately gave a fascinating presentation during a symposium in August on the different uses of punctuation in this very speech. Unsurprisingly, John Ward is relying on grammatical and theatrical conventions peculiar to his own epoch. I would say that the transcription process, however slow going it might have been, actually allowed the students to get much closer to the text than we had during our close reading of the play.
In addition to reading and transcribing Hamlet, we were also treated to a series of fantastic presentations from researchers at Hopkins working on AOR. Earle Havens introduced the class to the digital humanities and the use of digital tools for visualizing history, Jaap Geraerts skyped in from across the pond to talk about the process of developing AOR’s XML schema, Mark Patton described how programmers and humanists work together to make materials accessible to everyone, and Neil Weijer gave multiple presentations on early modern England, the history of the book, and Shakespearean forgeries.
We also had the opportunity to go on several trips to visit various nearby labs and libraries to expose students to relevant and interesting research materials, as well as the many kinds of skilled professionals and scholars who work around them. We got to see some surgery performed on early modern book-binding structures in JHU’s Conservation Lab; we learned about print-making and early Shakespearean prints at the Baltimore Museum of Art; at the Evergreen Library, the students learned about the varieties of early books, including Audubon’s Birds of America in its enormous elephant folio edition. In the last week of class, we visited Washington D.C., where we went to the Library of Congress to see rare objects such as the first map containing America and Thomas Jefferson’s library. We also visited the Folger Shakespeare Library, where we were given a tour of the some of the library’s annotated Shakespeare texts. We also stopped by the ongoing exhibition on painting Shakespeare across time, an exhibit definitely worth seeing, especially since it lets you try on costumes:
The summer course turned out to be excellent research experience for the students, who were able to engage in more “traditional” methods, as well as explore and develop new types of digital scholarship. They were able to collectively explore the text of Hamlet at a high level of detail, learn about the history of the book (including the methods of early printing, typography, and printmaking), and develop an understanding of basic digital humanities tools, particularly the use of XML to help capture marginalia and textual modifications. AOR turned out to be a robust pedagogical tool. It immediately provided a platform for the easy exploration of early modern books, typography, and paloegraphy. More fundamentally, the process of producing a transcription for a new annotated book allowed students to develop new digital skills as well as hone their ability to carefully attend to the word on the page. Transcribing proved to be immensely useful in helping students both learn about the collective nature of research, as well as explore in a new way one of the most fascinating texts of English literature.
At the very beginning of the first phase of AOR (2014-2016), I started working on what would become the Transcriber’s Manual. Initially this document was intended to provide the research assistants with an overview of all the reader’s interventions thus far encountered and with guidelines for capturing these interventions in XML. Back then, the AOR XML schema still was under construction and subjected to frequent modifications. As a result, the Transcriber’s Manual became more than just a manual: it also turned into some sort of a log book in which we documented the decisions made in relation to the development of the XML schema. Therefore the Transcriber’s Manual not only is a useful reference work for those who are interested in the particular ways in which the AOR transcriptions are constructed, but also contains the rationale for our specific approach.
As AOR progressed and new books were digitized and transcribed, the Transcriber’s Manual steadily grew in size, making various internal reorganizations necessary. Due to the large number of high-res images, the document became so unwieldy that my old laptop would invariably crash when trying to amend and save it. Happily, the arrival of a new computer swiftly put an end to these problems. Since the start of AOR2, the Transcriber’s Manual has expanded even further. Moreover, due to the inclusion of several new reader’s interventions, we had to amend the ‘old’ AOR XML schema and created a new schema for phase 2 (2016-2018). Because we had always intended to design a fairly lightweight and flexible XML schema, we managed to include these new reader’s interventions without having to radically alter the structural features of the schema.
In order to document the evolution from AOR1 to AOR2, we decided to create a new version of the Transcriber’s Manual. The AOR2 Transcriber’s Manual still contains most of the content of the old Manual, but lots of new information based on the AOR2 corpus of books annotated by John Dee has been included, too. The dual nature of the Transcriber’s Manual is kept intact: just like its previous iteration the latest version of the Manual contains guidelines for the research assistants and well as an explanation of the decisions we made. Recently, in addition to the AOR1 Transcriber’s Manual, the AOR2 Transcriber’s Manual has been made available on the AOR site. Hopefully these documents are of any use to those who wish to gain a more fuller understanding of our working practices or who would like to embark on a project similar to AOR themselves.
P.S. Last but not least: the AOR2 Transcriber’s Manual contains sections with overviews of unknown/unidentified marks and symbols. Any input would be greatly appreciated!!