This document describes how to configure your own OAI-PMH data provider based on a jOAI installation. The data provider is configured by specifying the directories where the metadata presides, the format of the files in each directory, and the metadata schema that is being used. In addition to the metadata directories, the sets can also be set in the data provider. Metadata sets are used to classify metadata for selective harvesting.
So, while the most basic configurqation information about the repository has been set in the previous module, under repository information. We explain how to configure the data provider so that it has access to the actual metadata in this module.
- Installation of the jOAI software running on Apache Tomcat. In the following, we assume that the OAI server is running and accessible at
http://<fqdn or ip>/oai
, where<fqdn or ip>
is the IP address or fully qualified domain name of your OAI server. - Admin account: The first time you start the
Data Provider
setup, you are prompted to login as user admin. By default the password is set as well to admin. Furthermore, you can restrict access to your server or to parts of the stored data as described in section Repository security. - XML files in OAI metadata format: If you followed the module 01 Generate metadata, you should have some DublinCore XML files, at least for the fishproject project. In addition to those, we provide you with some example files in the correct format in subdirectories of
samples/DC_examples/
.
We will explain how XML files can be indexed for harvesting using the DublinCore XML files generated for the fishproject (see 01. Generation of metadata). The same method should generally work with any other XML files, as long they are formatted in a valid metadata schema.
Open the GUI of jOAI at http://<fqdn or ip>/oai
and go to Data Provider
and then Metadata Files Configuration
, click on the Add Metadata Directory
button and fill out the text fields:
- Nickname for these files : This is a label; something that describes the content of your metadata.
- Format of files: The metadata format for the files. We choose
oai_dc
, the OAI abbrevation for Dublin Core. - Path to the directory: Please choose the path to the XML files which should be indexed for harvesting in your OAI provider.
- Metadata namespace and metadata schema: The URL of the associated metadata schema (this is automatically filled out, because the
oai_dc
format is supported by OAI.
If you press Save
and Reindex all files
the records will be saved and indexed, and you will have, as a result:
You can check your data provider and the entries you offer for harvesting here:
http://<ip address, fully qualified domain name or localhost>:8181/oai/provider?verb=ListIdentifiers&metadataPrefix=oai_dc
To group metadata records together and distinguish them from other sets, you can add a set to your OAI provider. To do this, click on the Data provider
option of the menu, and then on Sets Configuration
, and in the resulting page, click on the Define a new set
button. As an example, we will add an OAI-subset fish
for our metadata from the fishproject
metadata directory that we hadded in the previosu section.
- Set name : A descriptive name for a group of metadata files that are a subgroup of all metadata files in the repository.
- SetSPec: A short name or label that identifies the subgroup of metadata records. Harvesters may use this for selective harvesting.
- Set description (optional) : Narrative that describes the subgroup in more detail.
Furthermore, you can define which files should be part of this set in the next steps. We restrict the set to all files of the project seisnet
:
If you press Save
you get:
Add a metadata directory for the XML files of the project seisnet. If you successfully generated the XML files for the seisnet project in module 01.b, Excercise 2.3 they should reside in oaidata/seisnet-oai_dc/locations_1/xml/
. If not, you can as well the files in samples/DC_examples/seisnet/xml
already coming with the training material.
In the following figure, an example how the fields should be filled out is shown:
In our case, this leads to only partly successfully indexed files:
If some of the XML records could not be validated, this is assigned in the column 'Indexing failed', as shown above for the project seisnet, see Excercise 1.
You can now click on the number of the faulty records and then on 'Validate XML record'. But in most cases there are not many error messages.
You get more detailed information if you check your XML files using an XML validator, e.g. http://www.xmlvalidation.com.
Validate the errornous files and try to find out and correct the bugs.