Web Publications for Digital Archives
Nov 2020 - ongoing
User Requirements on document formats and reading environment for Digitized Collections
Libraries (and other organizations like the National Archives) have millions of documents digitized (books, newspapers, magazines, letters) to make these available online. But what do users want to do with these documents? Much more than with print, the way how digital documents are created, dictates what one can do with them. If a document for instance contains only bitmap images, it could look identical to one that contains highly structured text, but its possibilities are completely different…
If users not only want to view documents visually on screen, but also perform other tasks, then the way a document is formatted matters. Think about:
- Distant reading, in which users use computers to analyze many documents at the same time.
- Access with assistive technology for users that are blind, have low vision or are dyslectic.
- Reformatting of documents to make these easy to read from a small (mobile) screen and/or have these read aloud by a computer voice.
Current digital collections are not uniform in how documents are formatted, nor are they transparent about what a user can expect. Some documents might be accessible, but others not. If one want to do computational analysis, chances are that many will need a lot of (manual) preprocessing before one even can get started. Now that new ways of digitization become available (some using AI technology that was not available before) organizations have to reconsider how they make content available.
Before we start building solutions, we should investigate the real needs of users. To facilitate that, we will make some example documents and a ‘demonstator’ (prototype reading environment).
Web Publications is a (conceptual) format, specified by W3C, based on open web technology like HTML and CSS. For the demonstator in this project, we will convert several digitized books and other documents into a Web Publication format, and discuss the resulting functionalities with real end users. Since web publications are designed to be accessible, linkable and annotatable, we expect these will offer a better foundation, than relying on existing formats like PDF.
The goal of this project is to learn about the needs of the different users and what developments are needed to meet those in an efficient way.