The Schoenberg Institute for Manuscript Studies at Penn brings manuscript culture, modern technology and people together.

Digital Manuscripts as Critical Edition

The following post is the written version of a presentation that Christoph Flüeler, Director of e-codices and Professor at the University of Fribourg, presented at the 50th International Congress on Medieval Studies in Kalamazoo, MI, May 2015. It has been very lightly edited by Dot Porter. Prof.  Flüeler has long been a leader in digital manuscript studies, and in his talk he proposed an exciting vision of digital manuscripts as critical edition. With Prof. Flüeler’s permission, we are very pleased to share his talk here on the SIMS blog. He will soon develop these thoughts into a longer article, which will be published in a more formal venue.

The point of departure for my contribution is as follows: in coming years an enormous number of manuscripts, tens of thousands of them from thousands of manuscript collections throughout the world, will be digitized and made available on the Internet. A few years from now perhaps a majority of all manuscripts of great cultural, artistic, and scientific value will be accessible online. As this happens, quality requirements regarding image quality, metadata, and user interfaces will markedly increase, and standards will be established, so that all over the world metadata and images can be processed and annotated via comprehensive and specialized manuscript portals and interoperable image viewing platforms. This presumption is based on careful observation of developments during the past ten years and of the large number of projects currently planned or in progress. Everyone who attended yesterday’s session entitled “All Medieval Manuscripts Online: Strategic Plans in Europe” with presentations by the British Library, the Bibliothèque nationale de France, the Bayerische Staatsbibliothek München, and e-codices knows that I refer here only to concretely planned projects.

If digital manuscripts become ever more important for scholarly research in future, the following question arises: whatis the “scholarly research value” of digital manuscripts?

Discussion of the matter has thus far been conducted in an undifferentiated manner by persons interested in defending the exclusive status of the originals and who in some cases go so far as to question whether digital reproductions have any scholarly research value at all. This point of view strikes me as rather unconstructive, because it simply dismisses as unreliable these resources on which most researchers already rely, and on which they will in future base their work to an ever greater degree.

My perspective is a bit different. What we need to do is to ask the following question: what preconditions must be met in order for a digital manuscript to be understood as a reliable resource for scholarly research, such that a scholarly researcher can, without any great misgivings or doubts, utilize the digital object as the basis for serious research and make use of it to the fullest possible extent?

Central for my reflections is the different status of the physical manuscript and the digital manuscript. The fact that the relationship between physical manuscript and digital manuscript has barely been examined up to this point is rather astonishing. It is probably because a serious theoretical consideration of the immediate precursors of the digital manuscript was never undertaken; I speak here of print facsimiles and microfilms. Facsimile editions are hugely popular with collectors. The production of facsimiles is normally understood as a work of fine craftsmanship. While scholarly researchers are employed in their production, they contribute only the accompanying commentary. Theory has obviously been considered out of place when it comes to the production of facsimiles.

Microfilms, on the other hand, have always been seen as not particularly attractive research aids, are often incomplete, often contain errors, and are as a rule only black-and-white. They are still maintained as archival copies, but for scholarly researchers their usefulness as reproductions has (for the most part) been superseded.

In this context I will not raise the matter of the qualities that distinguish a digital manuscript from a facsimile edition or a microfilm. The advantages of the digital manuscript are too obvious to require enumeration here.

I would like to ask, instead, how a digital manuscript stands in relation to a critical edition of a text. Can the publication of a digital manuscript on the internet be understood as an edition? Further: could such an edition even be regarded as a critical edition?

I would like to consider again the statement I made earlier, in which I asserted that for scholarly research purposes a digital manuscript must be understood as a reliable resource to the extent that medievalists from various disciplines (for ex. History, Art History, History of Law, History of Philosophy, Classical Philology, etc.) can utilize the digital object as the basis for serious research and make use of it to the fullest possible extent.

This echoes the proper purpose of a critical text edition. A critical text edition does exactly this, and the science of creating editions has since the 19th century developed methods for achieving this goal. A critical text edition aims to create an authoritative and easily accessible text. Its usefulness is, however, often far greater: a critical text edition can, for example, highlight the historical dimensions of the transmission of a text and use a critical apparatus to tease out intertextual aspects of the text in ways that far exceed simple transcription. In addition, a critical text edition can drill down to a more original text, identify errors in transmission, and provide a text so convincing in its authenticity that it comes to be accepted in the scholarly research community as an authoritative version of the text.

If we do not insist that the definition of edition can only be applied to a traditional text edition, we can in point of fact understand the publication of a digital manuscript on the Internet as a scholarly edition.

In the meantime there are already thousands of texts which have received their first publication as digital manuscripts. This is also true of hundreds of texts found on e-codices. It is important not to underestimate the usefulness to scholarly research of this additional method of editing, especially for texts that have never been edited previously and that would perhaps otherwise never have been critically edited.

What scholars need are good, scientific editions. This is true for both text editions and editions of digital manuscripts. We can only regard as serious critical editions those that follow established scientific criteria, developed with a firm grounding in the concept that the publication can substitute for the original as a resource for research, up to a certain point and for specific purposes, and that it offers some type of added value beyond that of the original. A digital manuscript, like a traditional critical edition, is not merely a cheap copy, but ideally can show aspects of the primary resource, i.e. the original manuscript, that were not visible in such a way when viewing the original.

It is obviously important to note that the critical edition of digital manuscripts is a different task from the critical edition of texts transmitted in manuscripts. It is, however, not any less exacting.

No edition theory has yet been written concerning digital manuscripts. I can only briefly enumerate some relevant themes and desiderata.

A digital manuscript edition should, like a critical text edition, follow documented scholarly research criteria and not produce a plain, unexamined reproduction of the material object—in this case a physical manuscript, but should—as I already emphasized—create some added value and bring out new aspects of the manuscript that have not previously been observed or recognized; and a digital manuscript should obviously provide a reliable foundation for current research of the original manuscript.

The most authentic possible scientific reproduction is the first step. Completeness, high image quality, and true color must be provided. Measurability and verifiability are fundamental to access for all purposes of scholarly research. Digital manuscripts consist of digital reproductions. It is therefore essential to provide not only metadata about the manuscript, but also metadata about the digital object. The colors must be measurable, not only by using a simple color sample strip, but by employing a complex Color Management System. This is actually already standard these days, but as soon as the files are uploaded to the Internet, all the care that goes into this is often ignored. Image metadata, such as IPTC metadata, should be available together with the digital image, and should be linked closely enough that when images are transferred—for example, into another image viewing platform—the image metadata are automatically attached. Dimensions should be measurable in every part of a manuscript. Simply including ruler in an image is here, as in other cases, not sufficient; a digital measuring tool with flexible usability would always be preferable. In practice, we are for the most part still a long way from such precise, reliable and measurable digital images at this point; however, they are fundamental for serious scientific work. How is one to conduct serious research with images, if the images on the screen are often slightly distorted, the colors are not accurate, and no reliable measuring tool is available? Not to mention poor resolutions of less than 300 dpi! Products like this are simply a waste of money.

Digital manuscripts do not consist merely of digital reproductions though. A digital manuscript is a virtual product that reproduces a tangible object in its entirety. This includes the proper sequencing of images. A data model must ensure that the image sequence remains intact when displayed in other viewing platforms. The same is obviously true for metadata regarding the physical manuscript and the digital manuscript, which aid in understanding the manuscript as manuscript, but also as digital object. I am referring to metadata in the broad sense. This includes: basic metadata, structural metadata, scholarly descriptions, image descriptions, metadata regarding codicology, digital object metadata, reports about additional restoration, and ideally even the full range of existing research literature. Finally, this includes—and very importantly—the transcriptions and editions of the text contained in the manuscript. If a critical edition of a digital manuscript is to comprehend the physical manuscript in its entirety, then text editions form part of it. In the future, text editions should not be understood as separate from digital objects, but as integral parts of them. I regard these integral parts not as competing or the edition as an absolute condition, but rather that these are complementary pieces of the ideal whole. Metadata can be added as desired—the richer the data included, the greater the usefulness and the scholarly research value.

A digital manuscript can and should be used to show more than is visible or explicitly contained in the original. Illustrations can be enlarged. Structural elements of the codex and the text can be accentuated. Individual illustrations or parts of the text can be annotated, and transcriptions and editions can be set next to the page images. Codicological features such as quires, watermarks, and color analysis can not only be provided, but can even be analyzed and interpreted within a digital manuscript. The research area of Image and Text Recognition is hard at work on tools to recognize and analyze layout, script types, scribal practices and eventually even texts.

It is important to emphasize that a digital manuscript should display a manuscript in its entirety. But we should even go a step further. A critical edition of a digital manuscript should not treat only a single manuscript, but should include as much data as possible about other related manuscripts and sources, in order to promote viewing the special qualities and features of the particular manuscript in a broader context. In this area as well the established methods of scientific editing aid me in developing criteria that can be applied to digital manuscript editions.

One fundamental task when editing critical editions of medieval texts transmitted in manuscripts is to collate individual transcriptions of texts and thereby obtain new information. The critical apparatus presents variations of the text as transmitted by the manuscripts used for the edition. This critical apparatus delivers indications of explicit and implicit references to other works as well and unfolds the intertextuality of the text. This means that a critical text edition goes beyond transmission of the text found in a single manuscript.

A critical digital edition of a manuscript can for example expand quire analyses, descriptions of illustrations, script analysis, structural analysis, water mark analysis of a single manuscript via metadata for another digital object, or other objects can be incorporated for the purpose of gaining new information. Let me offer just one example: quire composition and layout analysis can be performed across manuscripts from the same scriptorium or other scriptoria in order to recognize features peculiar to a particular manuscript, a scriptorium, or an entire epoch. A digital manuscript is thus more than just a digital version produced from a single physical object. It effectively has the potential toincorporate the entirety of manuscript transmission contained in all medieval manuscripts.

The publication of medieval manuscripts on the Internet has made amazing progress during the past ten years. Digital manuscript libraries have transcended the status of pilot projects. Digital manuscript libraries have become more professional and have by now become an essential part of the research infrastructure. This is surely due to the fact that not just a few individual manuscripts, but over 15,000 medieval manuscripts have been presented online up until now.

However the success and importance of digital manuscript libraries depend not so much on the number of digitized manuscripts as on the scientific quality of those digital manuscripts, which can only achieve fundamental change in the area of manuscript research through a critical theory of the digital manuscript.

Thank you for your kind attention.

Kalamazoo, May 15, 2015

Christoph Flüeler

Medieval Apps


How about this for a truism: a book is a book, and something that is not a book is not a book. This post will knock your socks off if you are inclined to affirm this statement, because in medieval times a book could be so much more than that. As it turns out, tools were sometimes attached to manuscripts, such as a disk, dial or knob, or even a complete scientific instrument. Such ‘add-ons’ were usually mounted onto the page, extending the book’s primary function as an object that one reads, turning it into a piece of hardware.

Adding such tools was an invasive procedure that involved hacking into the wooden binding or cutting holes in pages. In spite of this, they were quite popular in the later Middle Ages, especially during the 15th century. This shows that they served a real purpose, adding value to the book’s contents: some clarified the text’s meaning, while others functioned as a…

View original post 767 more words

Manuscript Road Trip: The Schoenberg Institute for Manuscript Studies

Manuscript Road Trip

The Flight into Egypt, Walters Art Museum, MS W.188, f.112r The Flight into Egypt, Walters Art Museum, MS W.188, f.112r

As we head north out of Baltimore on I-95, we’ll cross the Delaware River and head into Wilmington, where there are manuscripts to be found at the University of Delaware.

The pre-1600 manuscripts at the University are part of a collection with the shelfmark “MSS 095.” There’s a list of the relevant records here and some highlights are described here. Of particular interest to me is a relatively recent acquisition, U. Delaware MSS 095 no. 31, a Book of Hours for the use of Noyon. There aren’t any images on the Special Collections website, but there are a few on this blogpost written by a Special Collections staff member, as well as a little information about the manuscript’s history. But I’d like to know more…how did it get to Delaware, and what can be gleaned about its history before…

View original post 1,290 more words


Libraries Supporting Digital Scholarship: The Schoenberg Institute for Manuscript Studies as an Object Lesson

A version of this talk was presented as the keynote for the annual meeting of the Association of College and Research Libraries – Delaware Valley Chapter, in Philadelphia PA on November 6, 2014.

Thank you very much, and thank you especially to Terry Snyder for inviting me to speak with you all this morning. Today is a good day to talk about the Schoenberg Institute for Manuscript Studies (SIMS); after this talk I will be heading down the hall to attend the annual SIMS Advisory Board meeting, and tomorrow and Saturday I’ll be attending the 7th annual Schoenberg Symposium on Manuscripts in the Digital Age. So this is an auspicious week for all things SIMS.

The topic of this talk is the Schoenberg Institute for Manuscript Studies and how it may be considered an object lesson for libraries interested in supporting digital scholarship. Penn Libraries has invested a lot in SIMS, and while much of SIMS will be very specific to Penn, I hope our basic practices might provide food for thought for other institutions interested in supporting research and scholarship in the library.

SIMS is a research institute embedded in the Kislak Center for Special Collections, Rare Books and Manuscripts in the University of Pennsylvania Libraries. It exists through the generosity and vision of Larry Schoenberg and his wife, Barbara Brizdle, who donated their manuscript collection (numbering about 300 objects) to Penn Libraries, with the agreement that the Libraries would set up an institute to push the boundaries of manuscript studies, including but not limited to digital scholarship. (Although my job focuses on the digital, indeed that term features in my official title, I also have responsibilities for our physical manuscript collections). Penn did this, and SIMS was launched on March 1, 2013. As a research institute we develop our own projects and push our own agenda, and although many of our projects are highly collaborative we do not “serve” scholars; we are scholars.

Guided by the vision of its founder, Lawrence J. Schoenberg, the mission of SIMS at Penn is to bring manuscript culture, modern technology and people together to bring access to and understanding of our intellectual heritage locally and around the world.
We advance the mission of SIMS by:

  • developing our own projects,
  • supporting the scholarly work of others both at Penn and elsewhere, and
  • collaborating with and contributing to other manuscript-related initiatives around the world.

SIMS has 13 staff members, but it is helpful to know that of this list only two are dedicated to SIMS work full-time (Lynn Ransom, Curator, SIMS Programs and Jeff Chiu, Programmer Analyst for the Schoenberg Database of Manuscripts). Everyone else on staff is either part time (the SIMS Graduate Fellows) or has responsibilities in other areas of the libraries, and beyond. Mitch Fraas, for example, is co-director of the Penn Digital Humanities Forum, a hub for digital humanities at Penn hosted through the School of Arts and Sciences.

Over the last couple of weeks, as I have been considering what I might say to you all this morning, I have also been spending a lot of time working on the Medieval Electronic Scholarly Alliance, a federation of digital medieval collections and projects that I co-direct with Tim Stinson, a professor of English at North Carolina State University. MESA is essentially a cross-search for many and varied digital collections, enabling one (for example) to search for a term – we have a fuzzy search that will include variant spellings in a search – and then one can facet the results by format (for example illustrations, or physical objects), discipline, or genre. One can also federate by “resource”, searching only those items that belong to particular collections

Searching MESA for Jerusalem with fuzzy search enabled, limited to format of “Illustration”.

The work that I’ve been doing for MESA over the past two weeks involves taking data provided to us and converting it from whatever format we get, into the Collex RDF XML format required by MESA. In some cases, this is relatively easy. The Walters Art Museum, for example, through its Digital Walters site, provides high-resolution images of their digitized manuscripts using well-described and consistent naming conventions, and also provides TEI-XML manuscript descriptions that are also consistent as well as being incredibly robust. These files are all released under a Creative Commons Attribution-ShareAlike 3.0 Unported license, and they are easy to grab or point to once you know the organization of the site and the naming conventions.

Walters Art Museum manuscripts on The Digital Walters site.

Not all project data is so simple to access.

The British Library Catalogue of Illuminated Manuscripts, although the data is open access (the metadata under a creative commons license, the images are in the public domain), it is “black boxed” – trapped behind an interface. The only way to access the data is to use the search and browsing capabilities provided by the online catalog. To get the data for MESA, our contact at the BL sent me the Access database that acts as the backend for the website, and I was able to convert that to the formats I needed to be able to generate our RDF.

Images from Harley 603 from the British Library Catalogue of Illuminated Manuscripts.

So what does all this have to do with SIMS? Well, as I was doing this conversion work, I had a bit of an epiphany. I realized that pretty much everything we do at SIMS can be described in terms of


And as I thought about how I might describe our various projects in terms of data reuse, I also realized that reuse of data is not new. In fact, it is ancient, and thinking in these terms puts SIMS at the tail end of a long and storied history of scholarship.


I’m not starting at the beginning, but I do want to give you a sense of what I mean when I say that data has been reused for the past couple thousand years (at least). One of my favorite early examples would have to be ancient Greek epics, such as the Iliad.

Iliad. Book 10. 421-434, 445-460, P. Mich. Inv. 6972, Special Collections Library (2nd c. BCE)

Here is a papyrus fragment, housed in the University of Michigan Libraries and dating from the second century BCE, containing lines from Book 10 of the Iliad. Thousands of similar fragments survive, containing variant lines from the poem.

Marciana Library 822, Venetus A, fol. 24r (10th c.)

And this is a page from the manuscript commonly known as Venetus A, Marciana Library 822, the earliest surviving complete copy of the Iliad, dating from the 10th century (a full 12 centuries younger than the papyrus fragment). In addition to the complete text, you can see that there are many different layers of glosses here: marginal, interlinear, intermarginal. These glosses contain variant readings of the textual lines, variants which are in many cases reflected in surviving fragments.

Penn Ms. Codex 1058, Glossed Psalter, fol. 12r (ca. 1100)

My next example is from a Glossed Psalter from our collection, Ms. Codex 1058, dating from around 1100. This manuscript is also glossed, but rather than variant readings, these glosses are comments from Church Fathers, pulled out of the context of sermons or letters or other texts, and placed in the margin as commentary on the psalm text.

Penn Ms. Codex 1640, Thomas of Ireland Manipulus Florum, fol. 114r

This example is a bit later, an early 14th century Manipulus Florum, Ms. Codex 1640. Like the glossed psalter, quotes from the church fathers and other philosophers are again pulled out of context, but in this case they are grouped together under a heading – in this example, the heading is “magister”, or teacher, and presumably the quotes following describe or define “magister” in ways that are particularly relevant to the needs of the author.

Penn LJS 267, De ludo scacchorum seu de moribus hominum et officiis nobilium … fol. 136v

Text is not the only type of data that can be reused, historically or now. We can also reuse material. Can you all see the sign of material reuse here? Check the top and bottom of the page. This is a palimpsest. What’s happened here is that a text was written on some parchment, and then someone decided that the text was no longer important. But parchment was expensive, so instead of throwing it away (or just putting it on a shelf and forgetting about it) the text was washed or scraped off the page, and new text was written over top. We can still see the remnants of the older text.

Penn LJS 395, Manuscript pastedowns from De proprietatibus rerum, back pastedown side 2

This is a page from LJS 395, a 13th century manuscript fragment that’s been repurposed to form part of the binding for a 16th century printed book. This is really typical reuse, and many fragments that survive do so because they were used in bindings.

How about this one?

Penn Ms. Codex 1056, Book of Hours Use of Rouen, ff. 24v-25r

This is a trick question. This is an opening from a 15th century book of hours from our collection, to compare with this.

Penn Ms. Coll 713, Breviary Collages, No. 1

This 17th century Breviary Collage was created by literally cutting apart a 15th century Flemish Breviary and pasting the scraps onto a square of cardboard. It is a bit horrifying, but it’s my favorite example of both reuse of material and, if not reuse of text, then reuse of illustration. Certainly the content is being reused as much as the material. Although I would never do this to a manuscript (and I hope none of you would do this either), I feel like I have a kindred spirit in the person who did this back in the 1800s, someone who saw this Breviary as a source of data to be repurposed to create something new.

I do this, only I do it with computers. Here is my collage.

Collation Visualization for LJS 266`

Okay, it’s not a collage, it’s a visualization of the physical collation of Penn LJS 266 (La generacion de Adam) from the Schoenberg Collection of Manuscripts, just one created as part of our project to build a system for visualizing the physical aspects of books in ways that are particularly useful for manuscript scholars. Collation visualization creates a page for each quire, and a row on that page for each bifolium in the quire. On the left side of each row is a diagram of the quire, with the “active” bifolium highlighted. To the right of the diagram is an image of the bifolium laid out as it would be if you disbound the book, first with the “inside” of the bifolium facing up, then the “outside” (as though the bifolium is flipped over).

To generate a visualization in the current version of collation visualization, 0.1 (the source XSLT files for which are available via my account on GitHub), I need two things: manuscript images, and a collation formula (the collation formula describes the number of quires in a codex, how many folios in each quire, if any folios are missing, that kind of thing). To create this particular visualization, first I needed to get the images.

LJS 266 in Penn in Hand

Our digitized manuscripts are all available through Penn in Hand, which is very handy for looking at manuscript images and reading descriptive information, but much like the British Library database we looked at earlier, it’s a black box.

Downloading an image file from Penn in Hand

It is possible to use “ctrl-click” to save images from the browser, but the file names aren’t accessible (my system reverts to “resolver.jpg” for all images saved from PiH, and it’s up to me to rename them appropriately).

Collation formula for LJS 266 in Penn in Hand (the third entry under Notes:)

The collation formula is in the description, and it’s easy enough for me to cut and paste that into the XSLT that forms the backbone of Collation 0.1.

It is actually possible to get XML from Penn in Hand, by replacing “html” in the URL with “XML”

XML in Penn in Hand

The resulting XML is messy, but reusable – a combination of Dublin Core, MARC XML, and other various non-standard tagsets.

Screenshot of OPenn (under construction)

Because we know how important it is to have clean, accessible data (indeed my own work and other SIMS projects depend on it), we have been working for the past year on OPENN, which will publish high-resolution digital images (including master TIFF files) and TEI-encoded manuscript descriptions (generated from the Penn in Hand XML) in a Digital Walters-style website – Creative Commons licenses for the TEI, and the images will be in the public domain. OPenn is still in development, but will be launched at the end of 2014.

Having consistent data for our manuscripts in OPenn will enable me to do with our data what I already did with the Digital Walters data: programmatically generate collation visualizations for every manuscript in our collection. Because the Digital Walters data was accessible in a way that made it easy for me to reuse it, and was described and named in such a way that it was easy to figure out what images match up with which folio number, I was able to generate collation visualizations for every manuscript represented in the Digital Walters that includes a collation formula, and I was able to do it in a single afternoon. The complete set of visualizations is available here.

Mock-up of collation form

Version 0.2 of Collation will be based on a form (this is the current mock-up of how the form will look), instead of supplying a collation formula one would essentially build the manuscript, quire by quire, identifying missing, added, and replaced folios, and the output would be both a visualization and a formula.

Why do this? It is a new way of looking at manuscripts in a computer, completely different from the usual page-turning view, and one that focuses on the physicality of the book as opposed to its state as a text-bearing object. A new view will hopefully lead to new research questions, and new scholarship.

Moving on from Collation, the standard-bearing project for SIMS (and one that predates SIMS itself by many years) is the Schoenberg Database of Manuscripts (SDBM). This is a project that reuses data on a massive scale, and does it to great effect.

Entry #1 in A Catalogue of the Medieval Manuscripts in the University Library, Aberdeen, By M. R. James (1932)

This photo is the first entry in the catalog of manuscripts at the University of Aberdeen Library, written by M. R. James. This entry, and other entries from this catalog, and from many other library and sales catalogues, have been entered into the SDBM.

Entry from Schoenberg Database of Manuscripts (current version)

Here is that same entry in the current version of the catalog. However! This year Lynn Ransom received a major grant from the NEH to convert the database to new technologies, and I’d rather show you that version.

Entry in the Schoenberg Database of Manuscripts (new version)

So, here is that same entry again in the new version of the Schoenberg Database, which is currently under development. “What is the big deal?” I hear you ask. As well you may. Let me show you a different entry from that same catalogue.

Entry for a record with eight matching records

You can see in this example, on the “Manuscript” line: “This is 1 of 8 records referring to SDBM_MS_5688.” The SDBM is in effect a database of provenance – it records, not where manuscripts are now but where they have been noted over time, through appearances in sales and collections catalogues. This manuscript has eight records representing catalogs dated from 1829 to 1932. This enables us to trace the movement of the manuscript during the time represented in the database.

Eight records for a single manuscript from SDBM.

Why create the Schoenberg Database? Although it was begun by Lawrence Schoenberg as a private database, which enabled him to track the price of manuscripts, we develop it now to support research around manuscript studies, and around trends in manuscript collecting. Study of private sales in particular could be useful in other areas of studies, such as economic history (since manuscripts are scarce, and expensive, and people will be more likely to purchase them and pay more money for them when they have money to spare).

A new project, one that we have been working on just this year, is Kalendarium. Instead of a database consisting of manuscript descriptions from catalogs, Kalendarium will be a database consisting of data from medieval calendars themselves.

Calendar from Ms. Codex 1056, Book of Hours Use of Rouen, ff. 1v-2r

This is a couple of pages of a calendar from Penn Ms. Codex 1056, a 15th century Book of Hours. Calendars, common in Books of Hours, Breviaries and Psalters, essentially list saints and other celebrations for specific days of the month. Importance may be indicated by color, as you can see here some saints names are written in gold ink while most are alternating red and blue (red and blue being equally weighed, and gold used for more important celebrations).

A major expectation of Kalendarium is that the data will be generated through crowdsourcing, that is, we’ll build a system where librarians can come and input the data for a manuscript in their collection, or scholars and students can input data for a calendar they find online, or while they are looking at a manuscript in a library. The thing is, transcribing these saints names can be difficult, even for someone trained in medieval handwriting. So, instead of transcriptions, we’ll be enabling people to match saints’ names and celebrations to an existing list. And where do we get that list?

Ask and ye shall receive. In the late 1890s, Hermann Grotefend published a book, Zeitrechnung des deutschen mittelalters und der neuzeit… (Hannover, Hahn, 1891-98.),  that included a list of saints, and the dates on which those saints are venerated. And it’s on HathiTrust, so it’s digitized, so we can use it!

Well, it’s in Portable Document Format, more commonly known as PDF. Like Penn in Hand and the British Library Catalog of Illuminated Manuscripts, PDF is another kind of black box. Although it’s fine for reading, it’s not good for reuse (there are ways to extract text from PDF, although it’s usually not very pretty) Luckily, we were able to find another digital version.


This one’s in HTML. Not ideal, not by a long shot, but at least HTML provides some structure, and there is structure internal to the lines (you can see pipes separating dates, for example). Doug Emery, Special Collections Digital Content Programmer and the SIMS staff member responsible for Kalendarium, has been working with a collaborator in Brussels to generate a usable list from this HTML that we can incorporate into Kalendarium as the basis for our identification list.

Kalendarium prototype site

We have a prototype site up, it’s not public and it’s only accessible on campus now. We’ve been experimenting, you can see a handful of manuscripts listed here.

Kalendarium form

Similar to Collation 0.2, in Kalendarium you’re using the system to essentially build a version of your calendar. You can identify colors, and select saints from a drop-down list. Unfortunately we have already found that many saints that are showing up in our calendars aren’t in Grotefend, or they are celebrated on dates not included in Grotefend; but this is an opportunity for us to contribute to the list in a major way.

Why do this at all? Calendars are typically used to localize individual manuscripts – if we see that particular saints are included in a calendar, we can posit that the book containing that calendar was intended to be used in the areas where those saints were venerated. However, if we scale up, we’ll be able to see larger patterns: veneration of saints over time, saints being venerated on different days in different places, and we should be able to see new groupings of books as well.

Another set of projects SIMS is involved in, the Penn Parchment Project in 2013 and the Biology of the Book Project starting in 2014, involves testing the parchment in our manuscripts – literally reusing the manuscript, extracting data from the material itself. This involves taking small, non-destructive samples to gather cells from the surface of the parchment and testing them to see what type of animal the parchment is made from. Results are interesting; as part of the Penn Parchment Project, an individual who wishes to remain anonymous made expert identification of ten manuscripts from the Penn collection, and got only five of them correct. Clearly, parchment identification could benefit from a more scientific approach. More recently we have joined Biology of the Book, a far-reaching collaboration (including folks at University of York in the UK, Manchester University, The Folger Shakespeare Library, the Walters Art Museum, Library of Congress, University of Virginia, The Getty, and others) to begin the slow process of moving forward a much larger project with the aim to perform DNA analysis on larger numbers of manuscripts. Very little is actually known about the practices surrounding medieval parchment making, including the agricultural practices that supported the vast numbers of animals that were used to create the manuscripts that survive today (and, of course, all those that don’t survive). We think of parchment as an untapped biological archive, and a database containing millions of DNA samples would enable us to discover the number of animals used to build manuscripts, where those animals were bred (and how far they were imported and exported), what breeds were used – many questions that are simply impossible to answer now.

Mitch Fraas, Curator, Digital Research Services and Early Modern Manuscripts, creates maps and other visualizations relating to early books, and blogs about them at He’s used data from the Schoenberg Database of Manuscripts (which is available for download in comma separated format on the SDBM website, and is updated every Sunday) and data extracted from Franklin, the Penn Libraries’ catalogue, to generate some different visualizations, one of which is shown here: Charting Former Owners of Penn’s Codex Manuscripts.

Diagram: Charting Former Owners of Penn’s Codex Manuscripts (click for interactive version)

The yellow dots are owners, and the larger the dot, the more manuscripts the owner is connected to (Lawrence Schoenberg and Sotheby’s are quite large, as is Bernard M. Rosenthal, a bookseller in New York). Clicking an owner shows the number of manuscripts connected to that person or institution, and clicking a manuscript shows the number of owners connected to that manuscript. This visualization was developed using data from Franklin, and the blog post linked above provides details on how it was done.

Mapping pre-1600 European manuscripts in the U.S. and Canada

Just this week, for the 7th Annual Lawrence J. Schoenberg Symposium on Manuscript Studies in the Digital Age, Mitch has created a new map, Mapping pre-1600 European manuscripts in the U.S. and Canada, using data from the Directory of Institutions in the United States and Canada with Pre-1600 Holdings. This map shows the location of all holdings included in the directory. Larger collections have larger dots on the map. Clicking a dot will give one more information about the owner and the collection, and there are options for showing current collections or former collections, or for showing only collections with codices (full books, as opposed to fragments or single sheets).

Ms. Roll 1066: Genealogical Chronicle of the Kings of England to Edward IV, circa 1461

We have almost reached the end, but I would like to finish by featuring the project of last year’s SIMS Graduate Fellow, the brand new Dr. Marie Turner, which is still underway, and which is a great example of data reuse to finish on. Several years ago, Marie transcribed our Ms. Roll 1066, a 15th century genealogical roll chronicling the Kings of England from Adam to Edward IV. Her transcription was combined with images of the roll and built into a website, the screenshot here, with links between her transcription and areas on the page. But Marie’s vision is larger than this single roll. There are several other rolls of this type in existence, and her vision is to expand this single project, this silo, to not only incorporate other rolls, but to become a space for collaborative editing (transcription, description, translation, and linking) for the other rolls as well. We have successfully pulled the data from the existing site and converted it into XML, following the Text Encoding Initiative Guidelines, which we’ll use to generate the data we need to import into our new software system.

The new Rolls Project will be built in DM, formerly Digital Mappaemundi, an established tool for annotating and linking images, which has been developed by Martin Foys, a medievalist, and which has recently been brought to SIMS for hosting and continued development.

A screenshot of La Chronique Anonyme Universelle, edited by Lisa Fagin Davis, published in DM

This screenshot illustrates how DM looks in terms of linking annotations to areas of an image, and you can also link areas of images together. Just last week we got a production version of DM set up on our servers at Penn, and next week we’ll be importing our data – the data we exported from the earlier edition of Ms. Roll 1066 project – into that production version. We’ll also be importing images of a half dozen other genealogical rolls. We are immensely excited to move the Rolls project to the next phase – and it was all made possible by


I’d like to close with just a few thoughts about WHAT SIMS IS – and whether or not we are an effective object lesson for libraries supporting digital scholarship is probably up for debate. We certainly do scholarship, effectively, within the context of the library, and we do it ourselves: We are scholars, not service providers. However, I think it’s important to note that our scholarship, our research, our tools and our projects are not ends unto themselves. They will all serve to support more work, to allow other scholars to ask new questions, and hopefully to help them answer those questions.
Since we are not service providers, faculty and graduate students aren’t our clients, they are our collaborators, our equals, our partners. We are in this together!
Finally, and I could have said more about this throughout my talk, we take pride in our data. We want data from all of our projects – all the data that we have reused and brought in from other places – to be consistent, with regard to formatting and documentation, accessible, in the technical sense of being easy to find, and reusable, with regard to both format (it is unlikely you will find PDFs as the sole source for any information on our site) and license. Likewise our code; we make use of Github (a site for publishing open source code) individually and through the Library’s account, and all our code is and will always be open source.

Thanks so much again, and I’m happy to take questions now.

Visualizing the Construction of Manuscripts, through Collation and Video (DigiPal IV Symposium)

It’s been a month now since the fabulous DigiPal IV Symposium, and I’ve been meaning to share the video of my own contribution to that event since I returned to Penn in early September. My talk is “Visualizing the Construction of Manuscripts, through Collation and Video,” and introduces two projects that we are actively undertaking here at SIMS. The first is a visualization system for the physical collation of medieval manuscripts (see some example results, and our slightly out-of-date source code on GitHub), and our ongoing project to create videos about manuscripts in our collection.


Volvelles: LJS 64, Illustrations to Peurbach, p. 4, Theorica motus orbis supremi super cetero mundi

Over the next several months, we’ll be creating Vines (short six-second videos) and animated gifs of all the moving volvelles in our copy of Illustrations to Georg von Peurbach’s Novae theoricae planetarum, LJS 64. This project has a few different aims. First, we’d like to show off one of the gems of our collection. This mid-16th century manuscript was created entirely by hand, to illustrate the theories of planetary motion described in Peurbach’s work. Volvelles are rotating diagrams that illustrate motion through the use of rotating circles. Although the volvelles in LJS 64 start out fairly simply (the volvelle shown in this post is a single piece of paper) as the book progresses they become more complex, and include layered circles, some of those layers having varied rotation points, and some with cut-outs that show the layers underneath. A facsimile of the manuscript is online at Penn in Hand, so you can page through a get a sense of what the volvelles look like – but those volvelles won’t move.

To get a sense of how the volvelles function, we’re creating two different virtual versions of each. One is an animated gif, created by layering and animating still images of the volvelle in Photoshop. The second is a short video, created using the Vine app, which shows a hand moving the pieces of the volvelle in real time. The more complex diagrams may require multiple Vines to show the complete movement. This leads us to the final aim of this project: to illustrate how different a fully virtual, contrived interaction with a physical object (an animated gif) is different from a hands-on interaction with that same object. Although the animated gif and the video ostensibly show the same thing, they are substantially different. And although the video supposes to show “here is how it looks in real life,” it still isn’t the same experience that you would have if you were sitting at the table moving the volvelle yourself.

Without further ado, here are our first virtual volvelles. This volvelle is captioned Theorica motus orbis supremi super cetero mundi (Theory/observations of the motion of the highest orb/body above the rest of the world.)

Animated gif, Theorica motus orbis supremi super cetero mundi, p. 4

Theorica motus orbis supremi super cetero mundi, p. 4

Theorica motus orbis supremi super cetero mundi, p. 4

Manuscripts: The Archaeolozoology of Animal Skin, April 10, NOON

Please mark your calendars for this upcoming lecture on Thursday, April 10, at NOON in the Class of ’78 Pavilion, Kislak Center, 6th Floor Van Pelt Library. The presenter is Matthew Collins, professor of bioarchaeology at the University of York and PI on a project to use collagen samples to identify species of animals used for parchment, a project that Penn has been collaborating on since last year. It should be an interesting talk, hope you can make it. Please share this announcement widely!


Manuscripts: The Archaeolozoology of Animal Skin

As Peter Tiersma has argued, writing made it possible to begin distinguishing myth from history. If we were able to capture and map the path of each and every written idea it would look like a fractal tree, with branches expanding as concepts are developed, refined and dissected. Historians try to reconstruct the diversification of these ideas and many see parallels with our planets other great writing schema, the chemical language of DNA.  The rules of DNA are simpler (although this simplicity is nuanced by new discoveries). DNA is the book of life and most geneticists at some point try to recapitulate the history of a population or group, by identifying errors in DNA transcription, missing or newly incorporated text found in different populations or organisms.  The sheer quantity of dated animals skins held in archives across Europe is staggering.  We estimate that in the UK there are more skins (as parchment) from the last 800 years held in libraries and archives than there are sheep living in the island today.

More than a decade ago researchers revealed that the genetic code of the animal was not destroyed when its skin was used for parchment production. However the last year has been a tipping point for parchment research as a consequence of the ability to use the waste from conventional conservation treatment for protein and DNA sequencing.  We will overview results coming out from the EU funded CodeX and Palimpsest projects and consider a change in the landscape of codicology, both in terms of the balance of the relationships between science and the humanities, but also in the scale and scope of questions that can now be addressed.