Technical Development | Archaeology of Reading

Category Archives: Technical Development

Scholarly dissemination, Technical Development

The Updated AOR Viewer is Now Live

Posted on: February 8, 2019 Neil Weijer Leave a comment

Happy Fri-Dee, everyone.

After much transcribing, tinkering, and typing, we’re happy to unveil the fully-updated Archaeology of Reading viewer! We hope it provides a functional and sharp facelift to the Gabriel Harvey books, which are now joined by the 23 volumes annotated by John Dee. Now you can finally see what we’ve been blogging about for the past year and a half. Feel free to stop reading at this point, click on the red “Go to AOR Viewer” button above on the masthead, and dive in!

We’re going to need a bigger bookwheel – our new gallery now has 34 digitized volumes read by Harvey and Dee

While more detailed descriptions of how to use the viewer are available here, it might be good to call out a few particularly useful features. While all the books appear alphabetically by author in the gallery, the icons next to them will let you know who read which one.

Dee (above) and Harvey (below) finally get the iconic status they deserve.

Secondly, the tagged materials in each note now appear as live links, which can be used to initiate searches within and, in some cases, outside the viewer. Clicking on any one will give you the option to search for it within the book or the entire AOR corpus.

For books and people (documented ones anyway) mentioned in the notes. you also have the option to go to the record preserved in the Universal Short Title Catalog (USTC) or their International Standard Name Identifier (ISNI). As we said, you won’t be able to do this for the legendary British monarchs found in Geoffrey of Monmouth, but you can still search for them in other places (as Dee did when he read historical books)

Lastly, our viewer now features stable URIs for each image in the corpus, as well as some states of the viewer (such as searches). These can be pasted into a browser to immediately refer back to the page you were on, and can also be exported as a list by using the “Export Current Research” button in the top right of the header. You’ll be able to select the pages or searches you want to save and annotate them as a list of links in HTML, or, if you’re more graphically inclined, as a Distributed Scholarly Compound Object (DiSCO) in RMap. If the last part of that sentence didn’t make sense, don’t worry, but we hope that RMap will give the opportunity for searches, findings, and other observations to layer onto each other, and for researchers to see how this particular group of books is being read currently (meta-AOR).

Save a particular set of references, or the meandering process that led to them, with our HTML export function.

Our WordPress site has also undergone some modifications. Descriptive essays for all of the books in the corpus, as well as the larger libraries that they were drawn from are now available in the “Books and their Readers” tab. Click through them to learn more about the tale of two libraries this project now tells.

A huge note of thanks is due to those of you who beta tested a version of this site over the holidays. We were able to do some last-minute adjustments to the search bar, in particular, to make the resource more easily navigable. The viewer wouldn’t look as nice as it does without the efforts of our resident technologists at the DRCC, and the creative eye of Cathy Shaefer and her team at SPLICE Design Group.

We hope that you enjoy the viewer as much as our sixteenth-century readers would have. We’ll be adding more video and teaching content to the site over the upcoming weeks, so watch this space for more!

About Neil Weijer

Technical Development

The challenge of … page numbers

Posted on: November 13, 2018 Jaap Geraerts Leave a comment

We have arrived in the last months of AOR, which is scheduled to come to an end in late January. This means that we are busy with getting the next version of our digital research environment ready. Part of the remaining work is very exciting and comprises the creation of a set of contextual documents on John Dee, his books that are in the AOR corpus, and his library. To that end, a humanities meeting was recently held at Princeton’s Institute of Advanced Studies (see Neil’s blog for more information). Necessary but unfortunately somewhat less exciting is the work that takes place in the trenches: checking transcriptions, hunting down bugs, and testing the viewer. This blog post addresses a problem which has haunted us over the course of AOR and which we had to face head on last month.

The problem is a rather mundane one: displaying the correct page numbers of a particular book in the AOR viewer. Getting this right proved to be labour intensive, while the problem itself shows the tension between the ways in which a (digitized) book as an object is viewed from distinct humanistic and technical perspectives. The problem emerged due to a combination of the particular technical infrastructure inherited from earlier projects conducted at Johns Hopkins University (JHU) and our transcription policy. Based on earlier work on the Roman de la Rose and the Christine de Pisan projects, the digital images that are ingested into the JHU archive are labelled according to the sequence of manuscript, starting with 1r, followed by 1v, and so forth. This folio numbering system both reflects the objects on which these projects focused, medieval manuscripts, as well as the conceptualization of objects within IIIF viewers as a sequence of images. However, anyone vaguely acquainted with early modern imprints will now that the page number system of these objects in general is much more complex because of the combination of page numbers and signatures, with some books having separate sequences of page numbers and/or signatures for individual sections.

As a result of the internal system of attributing page (i.e. folio) numbers, a mismatch between the page numbers visible on the digital images and the page numbers displayed in the bottom of the viewer emerged, as shown on the image below.

One way to overcome this hurdle was to include information about the page number and signature in the XML transcriptions generated for the project and to display this information in the viewer. We managed to get this working in the new version of the viewer – which currently is under construction and not yet publicly available – but quickly realised that this problem continued to persist in a number of cases. This was caused by our decision not to create XML files of all digital images, but only of those that contain one or more reader interventions, that is, any visible interaction of a reader with that page. Since at least around half of the pages in the books which are included in the AOR corpora are not annotated, XML files for these pages do not exist. As a result, for the pages which do not have an XML file associated with them, the viewer returns to its internal numbering system, creating an extremely awkward combination of page numbers and/or signatures and folio numbers (see below).

Several possible solutions existed, including the creation of a XML transcription for every digital image in the AOR corpora. This would mean having to create several thousand XML transcription which, although the transcriptions themselves would be small, would take in inordinate amount of time. The other, more feasible option was to create spreadsheets for every book in the AOR corpora which comprise information about the file name of the digital image and the information about page numbers and/or signatures (in the absence of the former) contained in the XML transcriptions. Luckily the junior programmer working on AOR, John Abrahams, was able to generate these spreadsheets, which meant that I ‘only’ had to enter manually the information regarding the page numbers of the digital images which do not have a transcription associated to them.

Nevertheless, I still had to go through 35 spreadsheets, check the information provided in them, and add/amend data where necessary. This process was further complicated by the mismatch between the file names of the images to which we refer in our transcriptions and the file names that are used internally in the JHU archive. These internal file names, created during the process of ingesting the digital images into the JHU archives, were provided in the spreadsheets. Based on the data in the transcriptions, I had to match the two sets of file names and then add the correct information to the spreadsheets. While being motivated by the thought that finishing this work would constitute an important step towards beatification, and spurred on by listening to football shows and a healthy dose of death metal, I did manage to populate all the spreadsheets. Although the work itself was everything but interesting, the final result is all the more pleasing. The information provided in the new version of the AOR viewer now matches the page numbers and/or signatures of the early modern imprints. Apart from avoiding confusion, this makes navigating these annotated books and doing research in the AOR digital environment much easier. Moreover, during the process of going through the spreadsheets, I encountered a couple of bugs (e.g. transcriptions being associated with the wrong images), constituting an additional fruit of this work. Apart from being a heroic story of sacrifice, perseverance, and the divine combination of football and metal, this blog shows the extent to which the spade work, done by both by computer engineers and humanists, forms the rock upon which this digital resource is built.

About Jaap Geraerts

Technical Development

Extending the Archaeology of Reading to study Hamlet

Posted on: January 11, 2018 John Abrahams Leave a comment

In a previous blog entry, we talked about how Chris Geekie taught a class studying an annotated Hamlet prompt book from 1676, where the students would study the prompt book in a similar way to how Gabriel Harvey’s marginal annotations were studied in the AOR project. This summer course, funded by the Andrew W. Mellon Foundation, was designed to introduce local community college students to digital humanities research. To support the course, Chris needed a new instance of the AOR viewer setup that would allow his students to study this version of Hamlet. This way, the students could emulate the AOR process to experience research in digital humanities. The AOR technical team provided this new instance, demonstrating how AOR’s technical infrastructure – the technologies that lay underneath the appearance of the books on a webpage – can be adapted for other collections.

The Hamlet prompt book was annotated by the eighteenth-century English actor John Ward, with a greater focus on editing the printed text than Harvey’s more interpretive annotations. On almost every page entire lines are crossed out, words replaced, punctuation added, and more edits that come together to show a reader John Ward’s version of Hamlet.

The current way to represent word substitutions as “errata” did not capture the nuance of the different editing annotations from the Hamlet prompt book. Chris and the technical team decided to represent the various annotations as “substitutions”. These substitutions would be typed, so different edits could be represented, potentially treated differently, and individually searchable. Deletions could be thought of as substituting some letters, words, or lines with nothing. Insertions would basically be blanks substituted with something. This change added one new way to represent annotations in the AOR data model.

The technical team determined how to handle the new annotations in the viewer. There were two main aspects that needed to be addressed: how these annotations appeared in the annotation side bar and how these annotations were searched in the viewer. When viewing the annotations in the sidebar alongside the page image, we decided what information would be useful to a user and supported a student to identify the annotation in the image. For searching, it was important to separately search for the different types of substitution (which informed the choice of fields and what information to index for each field).

Modifying the technical infrastructure to support this change was fairly straightforward – evidence of its extensibility. Referring to the diagram of the AOR technical infrastructure (see below), this change to some degree affected the Archive, IIIF Presentation Service, and IIIF Search Service. (For more about the technical infrastructure that supports AOR, see the documentation page).

To accommodate these changes, we modified the archive to recognize the new annotations added for the class. Once the new data was recognized in the archive, we treated it the same way we treat the rest of the AOR data. In the IIIF Presentation Service, we defined how the new AOR/Hamlet annotation appears as a IIIF annotation. We used the IIIF Search Service to index the annotation data to make it searchable, which included defining the search fields that a user would pick in the search interface. Once these changes were made to the infrastructure, the Mirador interface was automatically able to display and search the new annotations.

It is important to stress the value of the IIIF standards. Since the AOR viewer understands IIIF data, making changes to the underlying AOR data model does not require modifying the viewer. Instead, we treated the new data by transforming it into a IIIF compliant form. The viewer automatically handled the new data because it is in a well understood format.

About John Abrahams

Technical Development

Beta testing the AOR viewer – user feedback

Posted on: August 23, 2016 Leave a comment

Overview of Testing

Over the span of four weeks (June 28–July 31, 2016), beta testers were invited to provide feedback on their experience with viewing and manipulating digital surrogates of the 13 books within the Archaeology of Reading corpus using an optimized version of the Mirador image viewer for this purpose at http://bookwheel.org/demo. This blog post summarizes the user experience feedback and proposed response by the AOR technology development team.

A pool of 50 volunteer testers representing a range from minimal to moderate familiarity with the early modern marginalia, reading practices, and the history of the book more generally was identified by the AOR leadership team. Instructions were provided to assist with locating controls and describing the basic functions of the viewer to acclimate beta testers prior to full user project documentation completion scheduled for end of August 2016. There was no restriction on choice of device to be used to complete the test (phone, tablet, laptop, desktop). The evaluation was unmoderated and consisted of five questions intended to capture relative levels of ease and difficulty in finding specific kinds of information, and efficiency in navigating the user interface elements in the viewer. Testers were asked to evaluate the Search capability and the Text Visualization features in particular, and their experience interacting with multiple, simultaneous window openings in the viewer. Beta testers were also invited to provide specific comments and recommendations about using the viewer in general terms at the conclusion of the test.

Findings and Response

34 testers submitted responses. Their comments and recommendations were categorized and ranked according to technical feasibility of content and implementation within the current phase of the overall AOR project.

The majority of initial reactions were positive.
• “Images are very crisp.”
• “The browsing is easy and fast.”
• “Overall comment on the site: fantastic! I love the smooth experience, clear images, and uncomplicated and well-designed feel of the individual image windows.”
• “The digital images are of great quality. I appreciate having multiple options for viewing them (book form, individual page, scroll view).”

Beta tester feedback recommendations will be addressed in upcoming viewer releases, and future development cycles, such as the following:
• “The information drop-down has to be closed manually, if left open it covers up the annotations dropdown. This is a bit distracting.” – (Now available in version 2.1 of the AOR Phase 1 viewer)
• “I’d love to be able to right click on a page and open it in a new window, not just from search results but when looking at all pages in a book especially.; The ‘change layout’ button is really helpful with this, but I’d also like the ability to open something in a brand new tab – this would be especially helpful as I have two screens, so I have the space, just not all within one browser tab.” (New feature for AOR Phase 2 development)
• “It’d be great to be able to ‘pin’ a page open (sort of like pinning window views in oXygen) or add a page to a ‘keep this’ shelf – even if it was only retained during that session.” (New feature for AOR Phase 2 development)
• “It would be good to do an empty search for the items in the drop-down in advanced search without search content specified.” (Planned for AOR Phase 2 development)
• “Superb detail for the taxonomies provided (symbol, mark, language). Would it be an idea to publish definitions in a thesaurus accompanying entry to the site?” (Now available in AOR Phase 1 user documentation)
• “I appreciate that names and titles referenced in the annotations are broken out individually. It would be useful and interesting if those would be hyperlinks leading to a search result of all instances when the names/titles are referenced across the whole collection.” (New feature for AOR Phase 2 development)
• “Also, the resource is extraordinary, but the user unfamiliar with these authors or with Harvey will be at a loss on how to use it without the provision of some kind of framework or introductory essay(s) of some kind.” (Now available in AOR Phase 1 user documentation)

There were no comments that indicated any beta tester could not recover from an error condition.

Summary

In general, AOR Phase 1 beta testers requested more in-depth knowledge of the viewer capabilities, particularly those unfamiliar with the authors and books included in the Phase 1 corpus of annotation books. This will be addressed in forthcoming user documentation, including an in-depth introduction to the authors, texts, bibliographical details of the specific books in the digital corpus, and a broader taxonomy of annotations, among other topics.

The project team is appreciative of the feedback from beta testers. A second round of feedback will be gathered in future AOR Phase 2 project development.

About

Technical Development

Search!

Posted on: July 18, 2016 Jaap Geraerts Leave a comment

Creating a data set is nice, making it publicly available is even nicer, but it is even nicer still if the data can be “interrogated” in various ways. To make this possible, the development of a robust search facility is necessary. When we started the project, the possibilities for search within IIIF and IIIF-compliant viewers were fairly limited. Developments within the digital world move fast, though, and recently IIIF has released its Content Search API. This search API enables searches within the data associated with the “structural components of the presentation API,” including the manifests (e.g., the books, in our case) and the sequences (e.g., a set of pages of a book). Moreover, digital annotations associated with particular objects will be searchable too. Great as this is, the specificity of the AOR data requires a search widget that is more closely tailored to our data set. Our tech wizards Mark Patton and John Abrahams, both based at the Digital Research and Curation Center at Johns Hopkins University, developed exactly such a widget and integrated it into the AOR viewer. This blog will highlight several of the search functionalities and the way in which specific parts of the AOR data can retrieved by them.

As mentioned in an earlier blog, the AOR viewer offers two kinds of search, a basic and an advanced one. Both the basic and advanced searches return the pages that match the specific query constructed by the user. All the search results are clickable links—that can be opened in the current or a new workspace—which immediately take the user to that page. A basic search is a simple string search that covers all the textual data associated with every type of reader intervention within a particular book or across the complete AOR corpus. For example, a search for “Caesar” returns all the pages on which this name is mentioned in the underscored words of the printed text, marginal notes, and their translations, the printed text that has been associated with particular marks, and so on. The basic search thus constitutes a broad search, a large fishing net, if you like, to scoop up large chunks of data relating to a specific keyword. Basic searches can be made more specific though, for instance by adding quotations marks: a search for Julius Caesar might return instances of Julius and Caesar, while searching for “Julius Caesar” only returns the instances of this name.

The advanced search offers various possibilities. First, a user can focus on a specific type of annotation. One can, for example, look for a specific word in only marginal notes, thus further narrowing down a basic search. Another possibility is to look for specific people, books, and geographical locations mentioned by Harvey in his marginal annotations. Through the advanced search it is possible to construct detailed searches based on an aspect of a particular type of annotation. In the XML transcriptions we have recorded whether Harvey made an annotation with pen or with chalk, enabling searches for all annotations that were made in chalk, for example. We have also tagged the language in which marginal notes or the underscored words in the printed text were written. The user can select a particular language in which a marginal note was written and then search for a key word, or just use the search to retrieve all the instances of marginal notes written in a specific language, useful for those who, for instance, want to focus on Harvey’s annotations in Greek.

Second, the advanced search also allowed for the construction of queries consisting of a combination of search terms, enabling the user to look for specific combinations of key words or types of annotation. As mentioned in a previous post, one can, for instance, search for pages that contain the Mars symbol and marginal notes that mention “Caesar.” Another (fairly random) example would be to search for the pages containing a marginal note that mentions “Caesar” and a marginal note in Greek. Users can add search terms at will, and a virtually endless number of combinations are possible. As a result, the search functionalities open up the AOR data set for the various avenues of inquiry that scholars from different disciplines might have.

A soon-to-be-implemented search functionality is the possibility to sort search results. Currently this functionality exists only on the test server, but it should be available soon. This is what it will look like:

Some wide-ranging searches will yield many results, which can be ordered based on relevance and page number. This will make it easier to manage and navigate the search results and to work with our tool. As the search functionalities are still being developed, partly because of some suggestions made by our users (please continue to let us know what you think!), more updates can be expected. All the search functionalities will eventually be addressed in detail in newer versions of the user documentation (the current version can be found here).

P.S. Just a reminder that last week we launched our second data release, which includes all the transcriptions of the latest addition to the AOR corpus, Tusser’s book on husbandry. The data release and accompanying documentation are available here!

About Jaap Geraerts

History of Reading, Technical Development

Marginalia > space

Posted on: October 16, 2015 Jaap Geraerts Leave a comment

For as assiduous an annotator as Gabriel Harvey, few things would have been as annoying as running out of white space. Indeed, he often used every inch of white space available, as this image shows:

Domenichi; f. 187v - f. 188r — Domenichi; f. 187v – f. 188r

It also becomes clear that, even with sufficient space, Harvey still needed to take the layout of the printed text into account, as a result of which his marginal notes snake around and sometimes through it.

The challenges posed by the layout of the text and the size of the page made it necessary for Harvey, who could be fairly long-winded, to link parts of his extensive marginal notes to each other. Often he employed marks or symbols to establish such a link, while he also repeated the first word of the subsequent part of the marginal annotation, thereby imitating a manuscript convention that had also found its way to printed books. By including such signifiers, Harvey provided his readers with a guide as to how his marginal notes should be read.

Domenichi, f. 13v - f. 14r — Domenichi, f. 13v – f. 14r

As this image shows, Harvey enthusiastically started writing in the right margin of the page (f. 14r), but once he had written the word “aut” he found himself out of space. Therefore, he wrote an equal sign and continued the marginal note in the printed text, also starting it with an equal sign to signify the link between the two parts. Harvey still hadn’t finished though, and ran out of space again! He repeated the last word of this part of the marginal note, “Extentu,” in the left margin, where he cheerfully carried on writing. Yet the page continued to challenge him, and he found himself without space yet again. The marginal note ends with a column, a signifier that this marginal note still wasn’t finished. Harvey continued writing in the gutter of the preceding page (f. 13v), showing that he considered the whole opining, not just a side of a folio, as the page.

Now things are starting to get interesting from the point of view of our schema as well, as we’ve decided that the transcriptions capture all the reader’s interventions on a single page but we now come face-to-face with a marginal note that deftly defies the structure of our schema. This poses a challenge we need to solve, for it happens more than once. In another blog post, which is due to appear in the near future, I’ll discuss the solution we came up with.

About Jaap Geraerts

Technical Development

Thinking about data

Posted on: June 26, 2015 Mark Patton Leave a comment

I’m Mark Patton, one of two programmers working on the project. The scholars are focused on the research they want to do. John Abrahams and I are focused on the technology required to support that research.
On digital humanities projects like AOR, I always find it helpful to think about the project in terms of data. The data underlying the project must be preserved and then made available in ways that meet the needs of users. The distinction between preserving the content and providing access lets us deal with each problem separately and provides opportunities for sharing technology across projects.

In order to preserve the data, the technical team has to understand it. This requires a detailed description or model of the data and knowing the file formats used for its storage. With that information we can ensure that the integrity of the data is maintained over time. It is essential to do this early on in a project so that workflows that generate data can be automatically checked. Otherwise you will end up with inconsistent data, which is hard to use.

In AoR, the data consists of high-resolution book images, annotations transcribed by scholars, and bibliographic metadata. At a high level, the data is very similar to other projects which involve digital facsimiles of books such as Roman de la Rose Digital Library and the Christine de Pizan Digital Scriptorium. That fact allowed us to extend our existing infrastructure to handle the new type of transcription data from AOR. With our tools in place, importing new data into our archive as it becomes available is a simple mechanical process.

The technical team worked with the scholars to model the transcriptions and come up with a format for storing them that lent itself to a reasonable workflow. Modeling the transcriptions made the scholars think closely about the eventual ways they wished to use them. The resulting transcriptions are very detailed. All sorts of information are recorded: symbols, people, places, orientation, etc. XML is a good fit for this type of data and a format the scholars could manipulate easily. The technical team worked with the scholars to define a DTD and Schema for the transcriptions. The scholars use GitHub to store the XML and manage their workflow. (The technical team is also using GitHub, but to manage the source code!)

Example of creating XML from Harvey annotated book — Example of creating XML

The efforts to model the AOR data and verify their integrity pay off when we can easily write simple tools to produce data for analysis. Below we’ve used a tool that dumps information about annotation types from our AOR data to spreadsheets. Then that data has been imported into Google Spreadsheets and the chart tool used to compare types of annotations across the corpus. One book looks like it must have been underlined in its entirety!

About Mark Patton

Scholarly dissemination, Technical Development

Visual marginalia!

Posted on: May 19, 2015 Jaap Geraerts Leave a comment

On Friday morning, May 1, Matt and I traveled from London to Cambridge to attend the Early Modern Visual Marginalia workshop, organized by Alexander Marr and Kate Isard. This workshop focused on a specific type of marginalia, namely drawings or other forms of visual annotations that early modern readers drew in (or sometimes literally cut and pasted into!) the marginal spaces of their books. Whereas the study of marginalia is extremely fashionable at the moment, most of these studies focus on the more conventional textual marginalia (i.e., written notes), while the visual interventions in early modern books still are understudied. This could very well have been the result of disciplinary boundaries, as the first speaker, Julian Luxford, remarked, as medievalists have been paying close attention to the visual marginalia that appear in medieval manuscripts. Even though visual marginalia are rarer than their textual counterparts, their appearance in early modern printed books warrants careful study.

The morning session of the workshop consisted of several papers all of which dealt with particular aspects of visual marginalia and the sources in which they appear. It soon became apparent how complex these visual marginalia are, as they often involve a combination of text and image, and what a large variety of visual interventions were used by readers, ranging from emblems and architectural drawings to full-fledged mathematical diagrams. Often, like their textual counterparts, visual marginalia refer to objects outside the book in which they were written down or drawn. Whereas Harvey, for example, frequently refers to other books, visual marginalia could refer to a host of material objects that existed outside of the book, including (no longer existing) buildings.

Anonymous drawing in Cartari’s 'Le imagini de i dei delli antichi' — Anonymous drawing in Cartari’s ‘Le imagini de i dei delli antichi’

After lunch, we continued with a special session in the Cambridge University Library, where Ed Potten, Head of Rare Books, made available a number of early modern books that included visual marginalia (a number of which were selected by the organizers and several participants of the workshop). All these examples of visual marginalia, especially those in an annotated copy of Johannes de Sacrobosco’s Tractatus de Sphaera, made us realize how complex this form of annotation is and how tricky this is going to be to capture in XML. Currently our schema is poised to deal with the relatively easy drawings made by Gabriel Harvey, who hardly made use of this form of annotation (see the image below for a rare example). We need to think hard about how to incorporate the more complex visual marginalia in our schema as well, especially as we are going to work on John Dee in the next phase of the project.

Thomas Hoby (transl.), 'The Courtier...' (London, 1561). — Thomas Hoby (transl.), ‘The Courtier…’ (London, 1561).

All in all, this was a very fruitful day, with some thoughtful papers in the morning and the opportunity to browse wonderfully annotated books in the afternoon. Hitherto visual marginalia are a rather neglected form of annotation that deserves much more attention, and hopefully more workshops like these can and will be organized in the future. In the meantime, we will continue to work on refining our XML schema in order to be able to capture more complicated drawings, so stay put for more updates on this topic! Last but not least: massive thanks to the organizers of this wonderful day!