Search!
Creating a data set is nice, making it publicly available is even nicer, but it is even nicer still if the data can be “interrogated” in various ways. To make this possible, the development of a robust search facility is necessary. When we started the project, the possibilities for search within IIIF and IIIF-compliant viewers were fairly limited. Developments within the digital world move fast, though, and recently IIIF has released its Content Search API. This search API enables searches within the data associated with the “structural components of the presentation API,” including the manifests (e.g., the books, in our case) and the sequences (e.g., a set of pages of a book). Moreover, digital annotations associated with particular objects will be searchable too. Great as this is, the specificity of the AOR data requires a search widget that is more closely tailored to our data set. Our tech wizards Mark Patton and John Abrahams, both based at the Digital Research and Curation Center at Johns Hopkins University, developed exactly such a widget and integrated it into the AOR viewer. This blog will highlight several of the search functionalities and the way in which specific parts of the AOR data can retrieved by them.
As mentioned in an earlier blog, the AOR viewer offers two kinds of search, a basic and an advanced one. Both the basic and advanced searches return the pages that match the specific query constructed by the user. All the search results are clickable links—that can be opened in the current or a new workspace—which immediately take the user to that page. A basic search is a simple string search that covers all the textual data associated with every type of reader intervention within a particular book or across the complete AOR corpus. For example, a search for “Caesar” returns all the pages on which this name is mentioned in the underscored words of the printed text, marginal notes, and their translations, the printed text that has been associated with particular marks, and so on. The basic search thus constitutes a broad search, a large fishing net, if you like, to scoop up large chunks of data relating to a specific keyword. Basic searches can be made more specific though, for instance by adding quotations marks: a search for Julius Caesar might return instances of Julius and Caesar, while searching for “Julius Caesar” only returns the instances of this name.
The advanced search offers various possibilities. First, a user can focus on a specific type of annotation. One can, for example, look for a specific word in only marginal notes, thus further narrowing down a basic search. Another possibility is to look for specific people, books, and geographical locations mentioned by Harvey in his marginal annotations. Through the advanced search it is possible to construct detailed searches based on an aspect of a particular type of annotation. In the XML transcriptions we have recorded whether Harvey made an annotation with pen or with chalk, enabling searches for all annotations that were made in chalk, for example. We have also tagged the language in which marginal notes or the underscored words in the printed text were written. The user can select a particular language in which a marginal note was written and then search for a key word, or just use the search to retrieve all the instances of marginal notes written in a specific language, useful for those who, for instance, want to focus on Harvey’s annotations in Greek.
Second, the advanced search also allowed for the construction of queries consisting of a combination of search terms, enabling the user to look for specific combinations of key words or types of annotation. As mentioned in a previous post, one can, for instance, search for pages that contain the Mars symbol and marginal notes that mention “Caesar.” Another (fairly random) example would be to search for the pages containing a marginal note that mentions “Caesar” and a marginal note in Greek. Users can add search terms at will, and a virtually endless number of combinations are possible. As a result, the search functionalities open up the AOR data set for the various avenues of inquiry that scholars from different disciplines might have.
A soon-to-be-implemented search functionality is the possibility to sort search results. Currently this functionality exists only on the test server, but it should be available soon. This is what it will look like:
Some wide-ranging searches will yield many results, which can be ordered based on relevance and page number. This will make it easier to manage and navigate the search results and to work with our tool. As the search functionalities are still being developed, partly because of some suggestions made by our users (please continue to let us know what you think!), more updates can be expected. All the search functionalities will eventually be addressed in detail in newer versions of the user documentation (the current version can be found here).
P.S. Just a reminder that last week we launched our second data release, which includes all the transcriptions of the latest addition to the AOR corpus, Tusser’s book on husbandry. The data release and accompanying documentation are available here!