The Schoenberg Institute for Manuscript Studies at Penn brings manuscript culture, modern technology and people together.

Manuscript Monday: LJS 189 – Zakhīrah-ʹi Khvārazmshāhī

Dot Porter, Curator, Digital Research Services at the University of Pennsylvania Library, offers a video orientation to Penn Library’s LJS 189,  Zakhīrah-ʹi Khvārazmshāhī, by Ismāʻīl ibn Ḥasan Jurjānī. The manuscript was written in Persia in the 14th century, in Persian. It is a medical encyclopedia in 9 books, with discussions of physiology, anatomy, pathology, diagnosis, fevers, specific diseases, surgery, fractures, poisons, and antidotes. Includes indexes, although some leaves are missing. Most leaves re-margined with pink paper; a few leaves have original margins and extensive marginal notes or commentary.

See the full online facsimile of this work in Penn in Hand.


Manuscript Monday: LJS 198 – De simplicibus

Dot Porter, Curator, Digital Research Services at the University of Pennsylvania Library, offers a video orientation to Penn Library’s LJS 198, De simplicibus, by Arnaldus de Villanova. The manuscript was written in Spain, between 1350 and 1380, in Latin. It is a disbound manuscript of compilation of simples (medicines made from one component) in 85 chapters with lists of plants for general medical functions and for treating specific parts of the body. It includes lists and passages not present in the edition of the work published in Basel in 1585.

See the full online facsimile of this work in Penn in Hand.


Manuscript Monday: LJS 204 – Shesh kenafayim

Dot Porter, Curator, Digital Research Services at the University of Pennsylvania Library, offers a video orientation to Penn Library’s LJS 204, Shesh kenafaim, by Immanuel ben Jacob Bonfils. The manuscript was written in Italy in 1509, in Hebrew, and it is an introduction and compilation in 6 divisions of astronomical tables concerning the movements of the sun and moon, solar and lunar eclipses, and the day of the new moon, calculated for the Jewish calendar and the longitude and latitude of Tarascon, Provence, the home of the author.

See the full online facsimile of this work in Penn in Hand.


Manuscript Road Trip: The Schoenberg Institute for Manuscript Studies

The Flight into Egypt, Walters Art Museum, MS W.188, f.112r

The Flight into Egypt, Walters Art Museum, MS W.188, f.112r

As we head north out of Baltimore on I-95, we’ll cross the Delaware River and head into Wilmington, where there are manuscripts to be found at the University of Delaware.

The pre-1600 manuscripts at the University are part of a collection with the shelfmark “MSS 095.” There’s a list of the relevant records here and some highlights are described here. Of particular interest to me is a relatively recent acquisition, U. Delaware MSS 095 no. 31, a Book of Hours for the use of Noyon. There aren’t any images on the Special Collections website, but there are a few on this blogpost written by a Special Collections staff member, as well as a little information about the manuscript’s history. But I’d like to know more…how did it get to Delaware, and what can be gleaned about its history before…

Manuscript Monday: LJS 215 – Scientific Miscellany compiled by Imbert Fentryer

Dot Porter, Curator, Digital Research Services at the University of Pennsylvania Library, offers a video orientation to Penn Library’s LJS 215, a scientific miscellany compiled by Imbert Fentryer in France around 1511. The manuscript is written in Middle French and Latin and it is a compendium of astrological charts; astronomical and astrological tables; treatises on astronomy (calculating equinoxes and solstices, using an astrolabe) and geometry; and instructions for making dyes and pigments and medical preparations.

See the full online facsimile of this work in Penn in Hand.


LJS 454 – Seiyō Senpaku Zukai

For the majority of the Edo period (1600-1868), the Japanese shogunate enforced a policy of isolationism referred to the sakoku policy, codified in the 1630s and ended with Matthew Calbraith Perry’s (1794-1858) high-pressure negotiations to open Japan to Western trade. The sakoku period, however, did not relegate Japan to the status of hermit kingdom: trade was enacted with both China and Korea, as well as with the Ryukyuan and the Ainu peoples (each of whose domains would eventually become annexed by Japan). Of Western powers, however, only the Dutch were permitted to trade with the Japanese, and only on a small artificial island at Nagasaki Harbor. Along with material goods the Japanese imported a great deal of so-called “Dutch Learning” (Rangaku): medicine; astronomy; geography; engineering; and, as LJS 454 demonstrates, naval sciences.

LJS 454, Seiyō senpaku zukai 西洋舩舶圖解 (uniform title Gunkan zukai 軍艦図解) is documented innocuously enough in The Lawrence J. Schoenberg Collection of Manuscripts (Philadelphia : Schoenberg Institute for Manuscript Studies, 2013) with the descriptive title “Treatise on how to pack a Dutch merchantship.” It was only this past year that the manuscript was made accessible to the Japanese Studies department, who translated the title slip on the scroll Seiyō senpaku zukai as “An illustrated guide to Western ships”. But neither of these titles hint at the original intent of this work: a practical guide to naval self-defense.

Historical Background

In 1792, the Finland-Swede Adam Kirillovich Laxman (1766-1803?) was commissioned by the Russian Empire to return two Japanese castaways to Japan, with the aim of acquiring trading rights from the shogunate. Laxman landed on Hokkaido and was received by the Matsumae clan, the rulers of northernmost fiefdom of the Japanese shogunate centered at Edo (present day Tokyo). While Laxman’s trade concessions were not granted, he was issued documents promising that one Russian ship would be permitted entry at Nagasaki. It would take more than another decade, however, for Russians to attempt to use this travel pass.

Dejima (1820s)

“Plattegrond van de Nederlandse faktorij op het eiland Deshima bij Nangasaki” (1824/1825) (source: Wikimedia)

In the early 1800s, Nikolai Petrovich Rezanov (1764-1807) was commissioned by Tsar Aleksander I to open up trade with Japan at Nagasaki. Despite his attempts to woo the shogunate in 1804, the documents received from the Matsumae clan were not recognized, and Rezanov was sent back to Russia. Embittered by his failure, Rezanov plotted revenge against Japan, and employed two Russian naval officers, Nikolai Khvostov and Gavriil Davydov, to enact his vengeance. The two led a devastating raid on the Japanese settlement at the island of Iturup (whose territory is still in dispute between Japan and Russia today), and at several other points in the Sea of Okhotsk. Along their warpath, Khvostov and Davydov sent a threatening missive in French to the Matsumae clan, warning that further attacks would come if Japan didn’t open itself to Russian trade.

Motoki Shoei portrait

Portrait of Motoki Shōei. (via the City of Nagasaki website)

Despite the fact that these two officers acted on no official capacity, the shogunate considered this a legitimate threat from the Russian Empire, and the Dutch interpreters at Nagasaki were ordered to expand their skillsets by learning French and Russian. One of the interpreters chosen to learn French was Motoki Shōei 本木正栄 (1767-1822), also called Motoki Shōzaemon 本木庄左衛門. Shōei was the son of Motoki Ryōei (else “Yoshinaga” (1735-1794)), who made a name for himself by translating Dutch books on natural sciences, in particular astronomy. Shōei followed in his father’s footsteps as a Dutch interpreter, and his language skills were advanced enough that he was chosen to act as an official interpreter for Rezanov’s mission to Japan in 1804 (Rezanov himself, however, did not have a positive assessment of Motoki, and requested a new interpreter during negotiations). Besides forming the basis of Motoki’s French studies, the Khvostov and Davydov incident also left the government at Edo nervous about Western naval strength. At the behest of the shogunate, Motoki was chosen to translate critical Western materials into Japanese, including a treatise on Dutch gunnery, a map of the world, and a pictorial guide to Dutch warships. While the exact titles of the original materials are not clear, it is this final item that seems to be the basis of the original text of LJS 454, Gunkan zukai.

Gunkan Zukai and its Manuscripts

LJS 454 is one of several manuscript copies of Gunkan zukai extant in the world, and one of the only known copies existing outside of Japan. The variant copies available for inspection show that the textual content remains consistent across extant copies.

LJS 454 scroll

LJS 454 scroll with title piece “Seiyō senpaku zukai”.

The work is broken into three major parts. The first is a general survey of ships, with the section title Gunkan zukai kōrei 軍艦圖解考例 (“Introductory thoughts on illustrations of warships”). This introductory segment is likely the derivation of the title Gunkan zukai, though it is unclear if Motoki intended for his work to be called that. The kōrei is a lengthy discussion of various aspects of ships, including the circumstances leading to the document, the classification and nomenclature of ships, and remarks on the experience of sailing. This section ends with an attribution to Motoki.

The next major section is a series of illustrations. Some, like the copy held at the Museum of Sea Sciences (with a closeup of illustration here) in Kotohira, Kanagawa, show finely detailed shading on the illustrations. That copy, incidentally, is reportedly in Motoki’s own hand, and was owned at one time by the revolutionary Sakamoto Ryōma (1836-1867). Other copies, like Penn’s LJS 454 and Waseda University’s copy (fully digitized) have unshaded diagrams. Still other copies, like the one owned by Tokyo Metropolitan Library (available in reprint) have both shaded and unshaded elements. Other variations include levels of rubrication and the order of illustrations. Finally, some copies have clear notations on their date of copying. The copy held at the Nagasaki Prefectural Nagasaki Library (also available in a 1943 reprint) has a copying date of 1842. LJS 454, unfortunately, has no such information to help date it, though it could have been produced no earlier than 1808.

The final section is a series of remarks on the methods of nautical warfare, and is ostensibly the purpose for this work, despite it being the shortest section of the three.

While Motoki’s work is commonly referred to as Gunkan zukai, again, there is no direct evidence that his document was intended to have that title. The copy in Kotohira (reported to be in Motoki’s hand) is referred to as Seiyō gunkan kōzō bunkai zusetsu 西洋軍艦構造分解図説 (“A pictorial analysis of the structure of Western warships”). The Union Catalogue of Early Japanese Books (Nihon Kotenseki Sōgō Mokuroku) database, an authoritative source for information on Japanese books, offers the variant title Furansu gunkan kaibōzu 払郎察軍艦解剖図 (“An anatomy of French warships”). LJS 454, meanwhile, has a prominent title piece offering Seiyō senpaku zukai 西洋舩舶圖解 (“An illustrated guide to Western ships”). Saigusa Hiroto and Kodama Reizō, the explicators of the reprinted Nagasaki manuscript, had known of this last title but were unable to verify that it was a variant title of the work Gunkan zukai. LJS 454 confirms their supposition that the works are one and the same.

Source Materials

While Motoki is commonly considered the “translator” of this work, in the attribution of LJS 454’s “Introductory thoughts” he is referred to as the yakujutsu 訳述. This is a compound statement of two roles of translator (yaku) and “expressor” (jutsu). In context of Gunkan zukai, yakujutsu might be understood as “creator by way of translation.” Indeed, it appears that Motoki translated and recontextualized several elements of Dutch and possibly French materials to create a new work.

The introductory segment of Gunkan zukai (the kōrei) notes publications that served as its foundation, including a specific reference to a diagram published by “Korunerisu Kiri[p]peru” (Cornelis Kribber, active 1739-1780) in Utrecht. While the specific Kribber print is not immediately available for inspection, a likely related print from 1730s Nuremberg shows remarkable similarities to Motoki’s illustrations. Many of these same illustrations appear in L’Art de batir les vaisseaux et d’en perfectionner la construction, originally published in Amsterdam in 1719. This French edition itself seems to be a compilation from earlier Dutch works. While Motoki likely used a source similar to one of these, it is unclear if his translations derived from Dutch sources exclusively or if it drew from French compilations of them. At best, there are unclear references in Motoki’s manuscript notes on compiling Gunkan zukai (held at the Nagasaki City Museum) to a colleague who owned a pictorial guide to Western ships.

Gunkan zukai sundials

Comparison of three sundial images. From left to right: 1730 Nuremberg print; Waseda University’s Gunkan zukai; LJS 454.

Motoki’s manuscript notes notwithstanding, it is still unclear how many items he used as his source materials, if any were owned by Dutch traders at Nagasaki, and if any were in French. It is also unlikely that Motoki would have acquired significant command of the French language in the months between the Iturup incident in February 1808 and the creation of Gunkan zukai in summer of the same year, though he could have made use of a Dutch/French dictionary on hand at Nagasaki to translate diagrams.

The Legacy of Motoki and Gunkan Zukai

While Gunkan zukai may have been commissioned with the intent to protect Japan against Western threats by using Westerners’ knowledge against them, only a few short months after its initial compilation, Japan once again faced a rogue Western commander. In October 1808, the HMS Phaeton under the command of Fleetwood Pellew (1789-1861) entered Nagasaki harbor in an attempt to capture Dutch trading ships, which were now under the authority of the newly Napoleonic “Kingdom of Holland.” In an attempt to fool the Dutch, Pellew flew the Dutch flag on the Phaeton. When several Dutch traders at Nagasaki rowed out to meet this false friend, Pellew revealed the ship’s true colors, capturing the Dutch and threatening to execute them as well as destroy other ships in the harbor. Outgunned, the Nagasaki government gave into Pellew’s demands.

With the English now demonstrating a potential threat to Japanese interests, the Japanese government ordered its Dutch interpreters to add English to their list of languages. Once again, Motoki Shōei was tasked with learning a new Western language. Motoki went on to create the first English grammar in Japan, Angeria kōgaku shōsen 諳厄利亞興學小筌 (“A beginning to studying English,” 1811), and later the first Japanese-English dictionary of some 6,000 words, Angeria gorin taisei 諳厄利亞語林大成 (“The complete forest of English,” 1814). He also compiled a Japanese-French dictionary and grammar, Furansu jihan 払郎察辞範 (“A model of French vocabulary”), completed in that same year 1814. While none of these texts became standard texts, they surely served as references for future students of Western languages in Japan.

As demonstrated with the attacks at Iturup and the all-too-subsequent Phaeton Incident, Japan’s isolationist policy was simply not strong enough to secure the nation without also assimilating knowledge from the very cultures against whom it was protecting itself. Moreover, despite the strict sakoku policy, unwanted visitors would continue to find their way into Japanese-controlled territories. In only a few short decades Japan would find the chains of sakoku broken with the arrival of Matthew Perry’s “black ships.”

Whether Motoki’s detailed Gunkan zukai was ever used for practical reference is unknown, though with at least seven documented copies in Japan and an eighth here at Penn, it is clear that his work was respected for its invaluable knowledge of 18th century Western maritime culture.

Selected Bibliography

  • Gunkan zukai. Suijōsen setsuryaku 軍艦図解. 水蒸船說略. Edo kagaku koten sōsho 46. Kōwa Shuppan, 1983.
  • Katsumori, Noriko 勝盛典子. “Gunkan zukai” to “Hippokuratesu zō” : Oranda tsūshi Yoshio-ke no bunka bunsei-ki [Gakugeiin no notō kara 65] 「軍艦図解」と「ヒポクラテス像」―阿蘭陀通詞吉雄家の文化・文政期 [学芸員のノートから 65]. [Kōbe Shiritsu] Hakubutsukan dayori 68, p. 6-7, 2000.
  • Loveday, Leo. Language contact in Japan : a sociolinguistic history. Clarendon Press, 1996.
  • March, G. Patrick. Eastern destiny : Russia in Asia and the North Pacific. Praeger, 1996.
  • McOmie, William. From Russia with all due respect : Revisiting the Rezanov Embassy to Japan. The human studies 163, p. A71-A154, December 2007.
  • Sangyō gijutsu hen. Kaijō kōtsū 産業技術篇. 海上交通. Nihon kagaku koten zensho 12. Asahi Shinbunsha, 1943.
  • Tsuzuki, Ichirō 続一郎. Motoki Shōei yakujutsu “Gunkan zukai” to Itō Keisuke yaku “Banpō sōsho gunkan hen yakkō” ni tsuite : Furansugo kotohajime no kanren 本木正栄訳述の「軍艦図解」と伊藤圭介訳「萬宝叢書軍艦篇訳稿」について―フランス語事始との関連. Rangaku shiryō kenkyū 307, p. 117-133, 1976.

Libraries Supporting Digital Scholarship: The Schoenberg Institute for Manuscript Studies as an Object Lesson

A version of this talk was presented as the keynote for the annual meeting of the Association of College and Research Libraries – Delaware Valley Chapter, in Philadelphia PA on November 6, 2014.

Thank you very much, and thank you especially to Terry Snyder for inviting me to speak with you all this morning. Today is a good day to talk about the Schoenberg Institute for Manuscript Studies (SIMS); after this talk I will be heading down the hall to attend the annual SIMS Advisory Board meeting, and tomorrow and Saturday I’ll be attending the 7th annual Schoenberg Symposium on Manuscripts in the Digital Age. So this is an auspicious week for all things SIMS.

The topic of this talk is the Schoenberg Institute for Manuscript Studies and how it may be considered an object lesson for libraries interested in supporting digital scholarship. Penn Libraries has invested a lot in SIMS, and while much of SIMS will be very specific to Penn, I hope our basic practices might provide food for thought for other institutions interested in supporting research and scholarship in the library.

SIMS is a research institute embedded in the Kislak Center for Special Collections, Rare Books and Manuscripts in the University of Pennsylvania Libraries. It exists through the generosity and vision of Larry Schoenberg and his wife, Barbara Brizdle, who donated their manuscript collection (numbering about 300 objects) to Penn Libraries, with the agreement that the Libraries would set up an institute to push the boundaries of manuscript studies, including but not limited to digital scholarship. (Although my job focuses on the digital, indeed that term features in my official title, I also have responsibilities for our physical manuscript collections). Penn did this, and SIMS was launched on March 1, 2013. As a research institute we develop our own projects and push our own agenda, and although many of our projects are highly collaborative we do not “serve” scholars; we are scholars.

Guided by the vision of its founder, Lawrence J. Schoenberg, the mission of SIMS at Penn is to bring manuscript culture, modern technology and people together to bring access to and understanding of our intellectual heritage locally and around the world.
We advance the mission of SIMS by:

  • developing our own projects,
  • supporting the scholarly work of others both at Penn and elsewhere, and
  • collaborating with and contributing to other manuscript-related initiatives around the world.

SIMS has 13 staff members, but it is helpful to know that of this list only two are dedicated to SIMS work full-time (Lynn Ransom, Curator, SIMS Programs and Jeff Chiu, Programmer Analyst for the Schoenberg Database of Manuscripts). Everyone else on staff is either part time (the SIMS Graduate Fellows) or has responsibilities in other areas of the libraries, and beyond. Mitch Fraas, for example, is co-director of the Penn Digital Humanities Forum, a hub for digital humanities at Penn hosted through the School of Arts and Sciences.

Over the last couple of weeks, as I have been considering what I might say to you all this morning, I have also been spending a lot of time working on the Medieval Electronic Scholarly Alliance, a federation of digital medieval collections and projects that I co-direct with Tim Stinson, a professor of English at North Carolina State University. MESA is essentially a cross-search for many and varied digital collections, enabling one (for example) to search for a term – we have a fuzzy search that will include variant spellings in a search – and then one can facet the results by format (for example illustrations, or physical objects), discipline, or genre. One can also federate by “resource”, searching only those items that belong to particular collections

Searching MESA for Jerusalem with fuzzy search enabled, limited to format of “Illustration”.

The work that I’ve been doing for MESA over the past two weeks involves taking data provided to us and converting it from whatever format we get, into the Collex RDF XML format required by MESA. In some cases, this is relatively easy. The Walters Art Museum, for example, through its Digital Walters site, provides high-resolution images of their digitized manuscripts using well-described and consistent naming conventions, and also provides TEI-XML manuscript descriptions that are also consistent as well as being incredibly robust. These files are all released under a Creative Commons Attribution-ShareAlike 3.0 Unported license, and they are easy to grab or point to once you know the organization of the site and the naming conventions.

Walters Art Museum manuscripts on The Digital Walters site.

Not all project data is so simple to access.

The British Library Catalogue of Illuminated Manuscripts, although the data is open access (the metadata under a creative commons license, the images are in the public domain), it is “black boxed” – trapped behind an interface. The only way to access the data is to use the search and browsing capabilities provided by the online catalog. To get the data for MESA, our contact at the BL sent me the Access database that acts as the backend for the website, and I was able to convert that to the formats I needed to be able to generate our RDF.

Images from Harley 603 from the British Library Catalogue of Illuminated Manuscripts.

So what does all this have to do with SIMS? Well, as I was doing this conversion work, I had a bit of an epiphany. I realized that pretty much everything we do at SIMS can be described in terms of


And as I thought about how I might describe our various projects in terms of data reuse, I also realized that reuse of data is not new. In fact, it is ancient, and thinking in these terms puts SIMS at the tail end of a long and storied history of scholarship.


I’m not starting at the beginning, but I do want to give you a sense of what I mean when I say that data has been reused for the past couple thousand years (at least). One of my favorite early examples would have to be ancient Greek epics, such as the Iliad.

Iliad. Book 10. 421-434, 445-460, P. Mich. Inv. 6972, Special Collections Library (2nd c. BCE)

Here is a papyrus fragment, housed in the University of Michigan Libraries and dating from the second century BCE, containing lines from Book 10 of the Iliad. Thousands of similar fragments survive, containing variant lines from the poem.

Marciana Library 822, Venetus A, fol. 24r (10th c.)

And this is a page from the manuscript commonly known as Venetus A, Marciana Library 822, the earliest surviving complete copy of the Iliad, dating from the 10th century (a full 12 centuries younger than the papyrus fragment). In addition to the complete text, you can see that there are many different layers of glosses here: marginal, interlinear, intermarginal. These glosses contain variant readings of the textual lines, variants which are in many cases reflected in surviving fragments.

Penn Ms. Codex 1058, Glossed Psalter, fol. 12r (ca. 1100)

My next example is from a Glossed Psalter from our collection, Ms. Codex 1058, dating from around 1100. This manuscript is also glossed, but rather than variant readings, these glosses are comments from Church Fathers, pulled out of the context of sermons or letters or other texts, and placed in the margin as commentary on the psalm text.

Penn Ms. Codex 1640, Thomas of Ireland Manipulus Florum, fol. 114r

This example is a bit later, an early 14th century Manipulus Florum, Ms. Codex 1640. Like the glossed psalter, quotes from the church fathers and other philosophers are again pulled out of context, but in this case they are grouped together under a heading – in this example, the heading is “magister”, or teacher, and presumably the quotes following describe or define “magister” in ways that are particularly relevant to the needs of the author.

Penn LJS 267, De ludo scacchorum seu de moribus hominum et officiis nobilium … fol. 136v

Text is not the only type of data that can be reused, historically or now. We can also reuse material. Can you all see the sign of material reuse here? Check the top and bottom of the page. This is a palimpsest. What’s happened here is that a text was written on some parchment, and then someone decided that the text was no longer important. But parchment was expensive, so instead of throwing it away (or just putting it on a shelf and forgetting about it) the text was washed or scraped off the page, and new text was written over top. We can still see the remnants of the older text.

Penn LJS 395, Manuscript pastedowns from De proprietatibus rerum, back pastedown side 2

This is a page from LJS 395, a 13th century manuscript fragment that’s been repurposed to form part of the binding for a 16th century printed book. This is really typical reuse, and many fragments that survive do so because they were used in bindings.

How about this one?

Penn Ms. Codex 1056, Book of Hours Use of Rouen, ff. 24v-25r

This is a trick question. This is an opening from a 15th century book of hours from our collection, to compare with this.

Penn Ms. Coll 713, Breviary Collages, No. 1

This 17th century Breviary Collage was created by literally cutting apart a 15th century Flemish Breviary and pasting the scraps onto a square of cardboard. It is a bit horrifying, but it’s my favorite example of both reuse of material and, if not reuse of text, then reuse of illustration. Certainly the content is being reused as much as the material. Although I would never do this to a manuscript (and I hope none of you would do this either), I feel like I have a kindred spirit in the person who did this back in the 1800s, someone who saw this Breviary as a source of data to be repurposed to create something new.

I do this, only I do it with computers. Here is my collage.

Collation Visualization for LJS 266`

Okay, it’s not a collage, it’s a visualization of the physical collation of Penn LJS 266 (La generacion de Adam) from the Schoenberg Collection of Manuscripts, just one created as part of our project to build a system for visualizing the physical aspects of books in ways that are particularly useful for manuscript scholars. Collation visualization creates a page for each quire, and a row on that page for each bifolium in the quire. On the left side of each row is a diagram of the quire, with the “active” bifolium highlighted. To the right of the diagram is an image of the bifolium laid out as it would be if you disbound the book, first with the “inside” of the bifolium facing up, then the “outside” (as though the bifolium is flipped over).

To generate a visualization in the current version of collation visualization, 0.1 (the source XSLT files for which are available via my account on GitHub), I need two things: manuscript images, and a collation formula (the collation formula describes the number of quires in a codex, how many folios in each quire, if any folios are missing, that kind of thing). To create this particular visualization, first I needed to get the images.

LJS 266 in Penn in Hand

Our digitized manuscripts are all available through Penn in Hand, which is very handy for looking at manuscript images and reading descriptive information, but much like the British Library database we looked at earlier, it’s a black box.

Downloading an image file from Penn in Hand

It is possible to use “ctrl-click” to save images from the browser, but the file names aren’t accessible (my system reverts to “resolver.jpg” for all images saved from PiH, and it’s up to me to rename them appropriately).

Collation formula for LJS 266 in Penn in Hand (the third entry under Notes:)

The collation formula is in the description, and it’s easy enough for me to cut and paste that into the XSLT that forms the backbone of Collation 0.1.

It is actually possible to get XML from Penn in Hand, by replacing “html” in the URL with “XML”

XML in Penn in Hand

The resulting XML is messy, but reusable – a combination of Dublin Core, MARC XML, and other various non-standard tagsets.

Screenshot of OPenn (under construction)

Because we know how important it is to have clean, accessible data (indeed my own work and other SIMS projects depend on it), we have been working for the past year on OPENN, which will publish high-resolution digital images (including master TIFF files) and TEI-encoded manuscript descriptions (generated from the Penn in Hand XML) in a Digital Walters-style website – Creative Commons licenses for the TEI, and the images will be in the public domain. OPenn is still in development, but will be launched at the end of 2014.

Having consistent data for our manuscripts in OPenn will enable me to do with our data what I already did with the Digital Walters data: programmatically generate collation visualizations for every manuscript in our collection. Because the Digital Walters data was accessible in a way that made it easy for me to reuse it, and was described and named in such a way that it was easy to figure out what images match up with which folio number, I was able to generate collation visualizations for every manuscript represented in the Digital Walters that includes a collation formula, and I was able to do it in a single afternoon. The complete set of visualizations is available here.

Mock-up of collation form

Version 0.2 of Collation will be based on a form (this is the current mock-up of how the form will look), instead of supplying a collation formula one would essentially build the manuscript, quire by quire, identifying missing, added, and replaced folios, and the output would be both a visualization and a formula.

Why do this? It is a new way of looking at manuscripts in a computer, completely different from the usual page-turning view, and one that focuses on the physicality of the book as opposed to its state as a text-bearing object. A new view will hopefully lead to new research questions, and new scholarship.

Moving on from Collation, the standard-bearing project for SIMS (and one that predates SIMS itself by many years) is the Schoenberg Database of Manuscripts (SDBM). This is a project that reuses data on a massive scale, and does it to great effect.

Entry #1 in A Catalogue of the Medieval Manuscripts in the University Library, Aberdeen, By M. R. James (1932)

This photo is the first entry in the catalog of manuscripts at the University of Aberdeen Library, written by M. R. James. This entry, and other entries from this catalog, and from many other library and sales catalogues, have been entered into the SDBM.

Entry from Schoenberg Database of Manuscripts (current version)

Here is that same entry in the current version of the catalog. However! This year Lynn Ransom received a major grant from the NEH to convert the database to new technologies, and I’d rather show you that version.

Entry in the Schoenberg Database of Manuscripts (new version)

So, here is that same entry again in the new version of the Schoenberg Database, which is currently under development. “What is the big deal?” I hear you ask. As well you may. Let me show you a different entry from that same catalogue.

Entry for a record with eight matching records

You can see in this example, on the “Manuscript” line: “This is 1 of 8 records referring to SDBM_MS_5688.” The SDBM is in effect a database of provenance – it records, not where manuscripts are now but where they have been noted over time, through appearances in sales and collections catalogues. This manuscript has eight records representing catalogs dated from 1829 to 1932. This enables us to trace the movement of the manuscript during the time represented in the database.

Eight records for a single manuscript from SDBM.

Why create the Schoenberg Database? Although it was begun by Lawrence Schoenberg as a private database, which enabled him to track the price of manuscripts, we develop it now to support research around manuscript studies, and around trends in manuscript collecting. Study of private sales in particular could be useful in other areas of studies, such as economic history (since manuscripts are scarce, and expensive, and people will be more likely to purchase them and pay more money for them when they have money to spare).

A new project, one that we have been working on just this year, is Kalendarium. Instead of a database consisting of manuscript descriptions from catalogs, Kalendarium will be a database consisting of data from medieval calendars themselves.

Calendar from Ms. Codex 1056, Book of Hours Use of Rouen, ff. 1v-2r

This is a couple of pages of a calendar from Penn Ms. Codex 1056, a 15th century Book of Hours. Calendars, common in Books of Hours, Breviaries and Psalters, essentially list saints and other celebrations for specific days of the month. Importance may be indicated by color, as you can see here some saints names are written in gold ink while most are alternating red and blue (red and blue being equally weighed, and gold used for more important celebrations).

A major expectation of Kalendarium is that the data will be generated through crowdsourcing, that is, we’ll build a system where librarians can come and input the data for a manuscript in their collection, or scholars and students can input data for a calendar they find online, or while they are looking at a manuscript in a library. The thing is, transcribing these saints names can be difficult, even for someone trained in medieval handwriting. So, instead of transcriptions, we’ll be enabling people to match saints’ names and celebrations to an existing list. And where do we get that list?

Ask and ye shall receive. In the late 1890s, Hermann Grotefend published a book, Zeitrechnung des deutschen mittelalters und der neuzeit… (Hannover, Hahn, 1891-98.),  that included a list of saints, and the dates on which those saints are venerated. And it’s on HathiTrust, so it’s digitized, so we can use it!

Well, it’s in Portable Document Format, more commonly known as PDF. Like Penn in Hand and the British Library Catalog of Illuminated Manuscripts, PDF is another kind of black box. Although it’s fine for reading, it’s not good for reuse (there are ways to extract text from PDF, although it’s usually not very pretty) Luckily, we were able to find another digital version.


This one’s in HTML. Not ideal, not by a long shot, but at least HTML provides some structure, and there is structure internal to the lines (you can see pipes separating dates, for example). Doug Emery, Special Collections Digital Content Programmer and the SIMS staff member responsible for Kalendarium, has been working with a collaborator in Brussels to generate a usable list from this HTML that we can incorporate into Kalendarium as the basis for our identification list.

Kalendarium prototype site

We have a prototype site up, it’s not public and it’s only accessible on campus now. We’ve been experimenting, you can see a handful of manuscripts listed here.

Kalendarium form

Similar to Collation 0.2, in Kalendarium you’re using the system to essentially build a version of your calendar. You can identify colors, and select saints from a drop-down list. Unfortunately we have already found that many saints that are showing up in our calendars aren’t in Grotefend, or they are celebrated on dates not included in Grotefend; but this is an opportunity for us to contribute to the list in a major way.

Why do this at all? Calendars are typically used to localize individual manuscripts – if we see that particular saints are included in a calendar, we can posit that the book containing that calendar was intended to be used in the areas where those saints were venerated. However, if we scale up, we’ll be able to see larger patterns: veneration of saints over time, saints being venerated on different days in different places, and we should be able to see new groupings of books as well.

Another set of projects SIMS is involved in, the Penn Parchment Project in 2013 and the Biology of the Book Project starting in 2014, involves testing the parchment in our manuscripts – literally reusing the manuscript, extracting data from the material itself. This involves taking small, non-destructive samples to gather cells from the surface of the parchment and testing them to see what type of animal the parchment is made from. Results are interesting; as part of the Penn Parchment Project, an individual who wishes to remain anonymous made expert identification of ten manuscripts from the Penn collection, and got only five of them correct. Clearly, parchment identification could benefit from a more scientific approach. More recently we have joined Biology of the Book, a far-reaching collaboration (including folks at University of York in the UK, Manchester University, The Folger Shakespeare Library, the Walters Art Museum, Library of Congress, University of Virginia, The Getty, and others) to begin the slow process of moving forward a much larger project with the aim to perform DNA analysis on larger numbers of manuscripts. Very little is actually known about the practices surrounding medieval parchment making, including the agricultural practices that supported the vast numbers of animals that were used to create the manuscripts that survive today (and, of course, all those that don’t survive). We think of parchment as an untapped biological archive, and a database containing millions of DNA samples would enable us to discover the number of animals used to build manuscripts, where those animals were bred (and how far they were imported and exported), what breeds were used – many questions that are simply impossible to answer now.

Mitch Fraas, Curator, Digital Research Services and Early Modern Manuscripts, creates maps and other visualizations relating to early books, and blogs about them at He’s used data from the Schoenberg Database of Manuscripts (which is available for download in comma separated format on the SDBM website, and is updated every Sunday) and data extracted from Franklin, the Penn Libraries’ catalogue, to generate some different visualizations, one of which is shown here: Charting Former Owners of Penn’s Codex Manuscripts.

Diagram: Charting Former Owners of Penn’s Codex Manuscripts (click for interactive version)

The yellow dots are owners, and the larger the dot, the more manuscripts the owner is connected to (Lawrence Schoenberg and Sotheby’s are quite large, as is Bernard M. Rosenthal, a bookseller in New York). Clicking an owner shows the number of manuscripts connected to that person or institution, and clicking a manuscript shows the number of owners connected to that manuscript. This visualization was developed using data from Franklin, and the blog post linked above provides details on how it was done.

Mapping pre-1600 European manuscripts in the U.S. and Canada

Just this week, for the 7th Annual Lawrence J. Schoenberg Symposium on Manuscript Studies in the Digital Age, Mitch has created a new map, Mapping pre-1600 European manuscripts in the U.S. and Canada, using data from the Directory of Institutions in the United States and Canada with Pre-1600 Holdings. This map shows the location of all holdings included in the directory. Larger collections have larger dots on the map. Clicking a dot will give one more information about the owner and the collection, and there are options for showing current collections or former collections, or for showing only collections with codices (full books, as opposed to fragments or single sheets).

Ms. Roll 1066: Genealogical Chronicle of the Kings of England to Edward IV, circa 1461

We have almost reached the end, but I would like to finish by featuring the project of last year’s SIMS Graduate Fellow, the brand new Dr. Marie Turner, which is still underway, and which is a great example of data reuse to finish on. Several years ago, Marie transcribed our Ms. Roll 1066, a 15th century genealogical roll chronicling the Kings of England from Adam to Edward IV. Her transcription was combined with images of the roll and built into a website, the screenshot here, with links between her transcription and areas on the page. But Marie’s vision is larger than this single roll. There are several other rolls of this type in existence, and her vision is to expand this single project, this silo, to not only incorporate other rolls, but to become a space for collaborative editing (transcription, description, translation, and linking) for the other rolls as well. We have successfully pulled the data from the existing site and converted it into XML, following the Text Encoding Initiative Guidelines, which we’ll use to generate the data we need to import into our new software system.

The new Rolls Project will be built in DM, formerly Digital Mappaemundi, an established tool for annotating and linking images, which has been developed by Martin Foys, a medievalist, and which has recently been brought to SIMS for hosting and continued development.

A screenshot of La Chronique Anonyme Universelle, edited by Lisa Fagin Davis, published in DM

This screenshot illustrates how DM looks in terms of linking annotations to areas of an image, and you can also link areas of images together. Just last week we got a production version of DM set up on our servers at Penn, and next week we’ll be importing our data – the data we exported from the earlier edition of Ms. Roll 1066 project – into that production version. We’ll also be importing images of a half dozen other genealogical rolls. We are immensely excited to move the Rolls project to the next phase – and it was all made possible by


I’d like to close with just a few thoughts about WHAT SIMS IS – and whether or not we are an effective object lesson for libraries supporting digital scholarship is probably up for debate. We certainly do scholarship, effectively, within the context of the library, and we do it ourselves: We are scholars, not service providers. However, I think it’s important to note that our scholarship, our research, our tools and our projects are not ends unto themselves. They will all serve to support more work, to allow other scholars to ask new questions, and hopefully to help them answer those questions.
Since we are not service providers, faculty and graduate students aren’t our clients, they are our collaborators, our equals, our partners. We are in this together!
Finally, and I could have said more about this throughout my talk, we take pride in our data. We want data from all of our projects – all the data that we have reused and brought in from other places – to be consistent, with regard to formatting and documentation, accessible, in the technical sense of being easy to find, and reusable, with regard to both format (it is unlikely you will find PDFs as the sole source for any information on our site) and license. Likewise our code; we make use of Github (a site for publishing open source code) individually and through the Library’s account, and all our code is and will always be open source.

Thanks so much again, and I’m happy to take questions now.


