Skip to content

IIIF Manifests & Web Annotations

Ben W. Brumfield edited this page May 9, 2019 · 6 revisions

Introduction

Readux has two sets of annotations:

  • Case 1: OCR Text as Annotations on a IIIF canvas/image
  • Case 2: Scholarly commentary annotations on OCR Text.

I'm working hard to figure out the right way to model both of these sets of annotations in the IIIF manifest and using the web annotation standard because I think if we get it right we can use the same manifest for all three reading/publishing use cases: Readux, IIIF Export, and Jekyll Export. We have to get it right for IIIF, because we don't control the viewers; the other cases are more forgiving because we control the code that drives the viewing experience. It would be nice, though, if our viewers -- Readux's Mirador extension and Jekyll export -- were generally useful to the IIIF community. I think they can be.

Case 1a -- OCR text annotations

OCR text annotations are common and straightforward in IIIF. The body of the annotation is the OCR text as plaintext or HTML; the target is a region of the image expressed as either a simple FragmentSelector or an oa:choice of either an SvgSelector or a FragmentSelector. These targets are both rectangular regions that specify the bounding box around the OCR text. These annotations are displayable in IIIF viewers and exports; while they may not have advanced Readux-flavored functionality in out-of-the-box viewers, they will nevertheless present the OCR text on top of an image.

example json for this goes here

Case 1b -- selectable OCR text

Selectable OCR text same OCR text as in Case 1a, but displayed within Readux in a viewer allowing scholars to select and comment on it more naturally. It mirrors the Readux 1 functionality, and is not concerned with display outside of Readux. This format is vital to Readux as an editing (or in IIIF terminology, "annotation authoring") platform.

This could be combined with Case 1a or kept independent of Case 1a.

example json for this goes here

We should define how this works in each of our reading/publishing use cases:

  1. Readux See prototype at Readux 2 by toggling on annotations and choosing the ocrToolTip, then selecting text on the image.
  2. Mirador (or other IIIF viewers that support annotations) See "The Biocrats" example at https://projectmirador.org/demo/advanced_features.html
  3. Jekyll To be defined.

Case 2 -- Scholarly commentary annotations on OCR Text

The challenge here is that we are creating annotations against the OCR text, but want to show them in Mirador (read/publish scenario #2) which requires them to be a separate layer. A question that came up was if an annotation could have multiple targets. According to the Web Annotation Principles:

Annotations have 1 or more Targets.

Example 9 in the spec shows an annotation with multiple targets.

Rob Sanderson, an editor of the W3C Web Annotation Standard, suggested that we use purpose to differentiate the targets. A purpose for a target is the same as an motivation for an annotation (The motivation for creating the annotation, or the purpose of including that particular resource in it); in fact they take the same list of options such as tagging or commenting or painting.

From the spec:

As well as Textual Bodies, External Web Resources may also be given a Motivation as to their inclusion within the Annotation. This is done using the Specific Resource pattern, as the purpose specifies the way in which the resource is used in the context of the Annotation in the same way as a Selector describes the segment or a State describes the representation.

For our use case we would have two different purposes:

  • scholarly commentary on the image of the text would use painting (although that's not exactly the common use)
  • scholarly commentary on the ocr text would use commenting
json example goes here

note 1: I have serious concerns about the UI of creating these annotations in Readux, but that's outside the scope of this discussion. We should figure this out soon because if a user can't or won't select the text layer (vs the image) to annotate, then much of this will be moot.

note 2: We could also treat the OCR text as a "resource" which is the body of the annotation in case 1, and the target in case 2. I'm not sure if that helps anything or not, other than not duplicating the ocr text entry in the backend database. Although overlapping text would have to be considered.

note 3: overlapping ocr text as targets could be addressed using the RangeSelector https://www.w3.org/TR/annotation-model/#range-selector

Reading/Publishing

How do Case 1 and Case 2 work together in the reading/publishing use cases?

  1. Readux to be defined(unless it is already?)
  2. Mirador 2 Layers -- one for OCR Text and one for Scholarly Commentary. End use can turn on or off each layer as desired. Need to find a good 2 layered example to show. Test it against Mirador 3.
  3. Jekyll To be defined.

Discussion after Wednesday's meeting

The tentative conclusion from Wednesday's meeting was

  • OCR Text is presented within Readux as an HTML overlay on OpenSeadragon. Not only does this already work and match Readux 1's functionality, it gives us a handle on an X(HT)ML document to work with.
  • Scholarly commentary can be modeled as an annotation with two targets:
    • IIIF-flavored target pointing to a region of the image supporting the text being commented upon. This can use the oa:choice selectors to include both
      • an item svgSelector expressing the Tetris-shaped regions of text being annotated, and
      • a default FragmentSelector with the rectangle containing the all the words being annotated (plus some) for use as a fall-back for viewers.
    • An XML-flavored target pointing to the region of text being annotated (in 1b). It's not clear to me (@benwbrum) whether the full HTML document is being loaded into OSD or whether individual word spans are being loaded, but the target of the annotation could reference either of
      • The range of the HTML document representing the text of the page
      • The range of the ALTO XML text used to create the HTML