Configuring an OAI Data Provider

This document describes how to configure your own OAI-PMH data provider based on a jOAI installation. The data provider is configured by specifying the directories where the metadata presides, the format of the files in each directory, and the metadata schema that is being used. In addition to the metadata directories, the sets can also be set in the data provider. Metadata sets are used to classify metadata for selective harvesting.

So, while the most basic configurqation information about the repository has been set in the previous module, under repository information. We explain how to configure the data provider so that it has access to the actual metadata in this module.

Prerequisites

Installation of the jOAI software running on Apache Tomcat. In the following, we assume that the OAI server is running and accessible at http://<fqdn or ip>/oai, where <fqdn or ip> is the IP address or fully qualified domain name of your OAI server.
Admin account: The first time you start the Data Provider setup, you are prompted to login as user admin. By default the password is set as well to admin. Furthermore, you can restrict access to your server or to parts of the stored data as described in section Repository security.
XML files in OAI metadata format: If you followed the module 01 Generate metadata, you should have some DublinCore XML files, at least for the fishproject project. In addition to those, we provide you with some example files in the correct format in subdirectories of samples/DC_examples/.

Configuration and Customization

We will explain how XML files can be indexed for harvesting using the DublinCore XML files generated for the fishproject (see 01. Generation of metadata). The same method should generally work with any other XML files, as long they are formatted in a valid metadata schema.

Add XML records to a metadata directory

Open the GUI of jOAI at http://<fqdn or ip>/oai and go to Data Provider and then Metadata Files Configuration, click on the Add Metadata Directory button and fill out the text fields:

Nickname for these files : This is a label; something that describes the content of your metadata.
Format of files: The metadata format for the files. We choose oai_dc, the OAI abbrevation for Dublin Core.
Path to the directory: Please choose the path to the XML files which should be indexed for harvesting in your OAI provider.
Metadata namespace and metadata schema: The URL of the associated metadata schema (this is automatically filled out, because the oai_dc format is supported by OAI.

If you press Save and Reindex all files the records will be saved and indexed, and you will have, as a result:

Checks

You can check your data provider and the entries you offer for harvesting here: http://<ip address, fully qualified domain name or localhost>:8181/oai/provider?verb=ListIdentifiers&metadataPrefix=oai_dc

Add a new set

To group metadata records together and distinguish them from other sets, you can add a set to your OAI provider. To do this, click on the Data provider option of the menu, and then on Sets Configuration, and in the resulting page, click on the Define a new set button. As an example, we will add an OAI-subset fish for our metadata from the fishproject metadata directory that we hadded in the previosu section.

Set name : A descriptive name for a group of metadata files that are a subgroup of all metadata files in the repository.
SetSPec: A short name or label that identifies the subgroup of metadata records. Harvesters may use this for selective harvesting.
Set description (optional) : Narrative that describes the subgroup in more detail.

Furthermore, you can define which files should be part of this set in the next steps. We restrict the set to all files of the project seisnet:

If you press Save you get:

Excercise 1

Add a metadata directory for the XML files of the project seisnet. If you successfully generated the XML files for the seisnet project in module 01.b, Excercise 2.3 they should reside in oaidata/seisnet-oai_dc/locations_1/xml/. If not, you can as well the files in samples/DC_examples/seisnet/xml already coming with the training material.

In the following figure, an example how the fields should be filled out is shown:

In our case, this leads to only partly successfully indexed files:

Validation

If some of the XML records could not be validated, this is assigned in the column 'Indexing failed', as shown above for the project seisnet, see Excercise 1.

You can now click on the number of the faulty records and then on 'Validate XML record'. But in most cases there are not many error messages.

You get more detailed information if you check your XML files using an XML validator, e.g. http://www.xmlvalidation.com.

Excercise 2

Validate the errornous files and try to find out and correct the bugs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02.a-OAI-data_provider.md

02.a-OAI-data_provider.md

Configuring an OAI Data Provider

Prerequisites

Configuration and Customization

Add XML records to a metadata directory

Checks

Add a new set

Excercise 1

Validation

Excercise 2

Files

02.a-OAI-data_provider.md

Latest commit

History

02.a-OAI-data_provider.md

File metadata and controls

Configuring an OAI Data Provider

Prerequisites

Configuration and Customization

Add XML records to a metadata directory

Checks

Add a new set

Excercise 1

Validation

Excercise 2