This document describes how you can interact with a CKAN site by directly submitting HTTP requests using the CKAN API, version 3. Both, the 05.b Search via the GUI and the 05.c Command Line Interface), are based on this API.
In this section, we show how requests can be submitted using three different methods:
- Browser : Enter the HTTP requests directly in the address field of your internet browser
- cURL : Submit the requests from the command line using
curl
- Python ckanapi : Submit the requests from a python shell using the python package
ckanapi
For detailed instructions see below and the CKAN API documentation.
Depending on which of the three methods you want to use, you need the tools described under points 1 to 3. Furthermore, a CKAN repository which you can access must be available, see point 4.
Your server must be able to make outgoing HTTP requests for this functionality.
Supported and tested browsers are:
- Firefox, version 45 or later
- Google Chrome, version 51 or later
The curl
command is used 'to transfer data from or to a server' and is provided by default with all common Linux distributions.
A Python installation is required, preferably Python 2.7. The ckanapi
package can be installed by one of the following commands:
pip install ckanapi
or
easy_install ckanapi
To follow this tutorial you can submit search requests to:
- the B2FIND portal (we are open!), or
- your own CKAN installation (see module 04. Install CKAN), or
- any other CKAN site, e.g. the CKAN Demo site.
Note that some of the searches only make sense for the B2FIND portal, e.g. the facet 'Discipline' is only defined within the metadata schema of the B2FIND service. In the following, the CKAN-URL
variable must be replaced by the URL of the used CKAN installation.
The HTTP request http-request
, directly using the Action API of CKAN, has the following form:
http://<CKAN_URL>/api/3/action/<action>
whereby action
consists in
<function>[?q=<value>[&fq<facet>:<value>]]
with the partly optional parameters:
function
is one of the supported CKAN API functions. While some of these functions do not require parameters (e.g.group_list
), the search commands, such aspackage_search
, expect additional parameters: a value for both the attributesq
andfacet
and avalue
for the attribute fq.facet
is a field of the used schema (in case of B2FIND the B2FIND schema) andvalue
is the corresponding value searched for.
Depending on the method you want to use, you can:
- Enter the HTTP requests directly in the address field of your internet browser or
- Submit the http requests from command line, using the command cURL by executing
curl <http-request>
. To transfer lengthy data as a JSON dictionary use the option-d
. To display the response in a nicer formated JSON dictionary use piping the command in| python -m json.tool
, as shown for the Combined Search at the end. - Use the Python package
ckanapi
: to connect to a CKAN end-point, e.g. to the B2FIND catalogue, issue the following commands in a Python shell:
$ python
> from ckanapi import RemoteCKAN
> ckan = RemoteCKAN("http://b2find.eudat.eu")
All searches are executed in a Python shell using the function call_action.
$ python
> ckan.call_action('<action>')
The data for the facetted search are specied in a dictionary, where the element q
contains the value for the free text search and fq
the facet-value pairs to be searched for, i.e.
> ckan.call_action('package_search',[{['q':'<value'>,]['fq':'<facet1>:<value1>' [ AND 'fq':'<facet2>:<value2> ]] ... }])
Note: combine several facet-value pairs by using the keyword AND
in a single fq
attribute value. Distributing multiple facet-value pairs in comma-separated fq
occurences would not work properly, because only the last fq
attribute value will be excecuted.
Other logical operators than AND are OR and NOT. By combining logical expressions you can create very powerful filters on the data.
If the query did not encounter an error, a submission returns as response a JSON dictionary with three keys:
- "help": the documentation string for the function you called.
- "success": true or false.
- "result": the returned result from the function you called, i.e. a list of (dictionaries describing) the records fulfilling the search criteria.
To illustrate the usage of the CKAN API the following 'search use cases' are provided, whereby only the simple request group_list
and a combined search is applied for each of the three methods. For all other facets only the associated action
is provided. It is left as an exercise to you to apply the different query methods.
To list all CKAN groups use the action groups_list
. For example, to get all communities (stored as CKAN groups) integrated in B2FIND do the following:
-
In a browser, enter the following address: http://b2find.eudat.eu/api/3/action/group_list. The corresponding request responds with a JSON dictionary, with the CKAN groups listed in the field
results
. -
Do the same using curl in the command line:
$ curl http://b2find.eudat.eu/api/3/action/group_list | python -m json.tool
which should return the same standard output as above.
- Using the Python method the list groups are returned as list of unicode values:
$ python
>>> from ckanapi import RemoteCKAN
>>> ckan = RemoteCKAN("http://b2find.eudat.eu")
>>> ckan.call_action('group_list')
[u'aleph', u'b2share', u'cessda', u'clarin', u'dariah', u'datacite',
u'earlinet', u'enes', u'gbif', u'ist', u'ivoa', u'narcis', u'pandata',
u'pdc', u'sdl', u'seanoe', u'theeuropeanlibrary']
>>>
Exercise 1: List all CKAN groups available in other instances using each of the above methods.
Exercise 2: List all tags
available in b2find.eudat.eu
and other CKAN instances by using the function tag_list
and one of the three methods (for B2FIND this results in a very long list, store the list appropriately). How many tags are there?
Exercise 3: List all datasets (packages). This command might take some time, depending on the extent of the CKAN instance used and the method applied to submit the request. How many could you find? Try the same for other CKAN sites.
The function used for these kind of searches is package_search.
We explain the several facets you can search for in detail along a concrete use case:
A scientist is interested in research data on polar coasts. She knows that the 'Polar Data Catalogue' offers access to this kind of data. Furthermore, she knows that her colleague Dustin published some interesting datasets about this topic in the past 26 years. In the first step, she restricts the search results on research concerning coasts to the 20th century. In the second step she, filters for studies on a particular coastal region at the northern boarder between Alaska and Canada.
In the following subsection we first explain how to search on single facets and move on to demonstrating the result of a full combined search, which will hopefully lead to the desired results our researcher needs for her own study.
For free text search the attribute q is used with the value to search for.
Our scientist starts to search for datasets whose metadata body contains "coast", the associated action
is package_search?q=coast
Alternatively, take for the two arguments of ckan.call_action
package_search
and {'q':'coast'}
.
Setting CKAN_URL
to b2find.eudat.eu
results in over 400 data sets. The lengthy output is not shown here.
Exercise Try the same for other CKAN sites and by using curl and the python shell.
For facetted search we use the attribute fq to assign a value
to the facet
should be searched for.
In B2FIND the facet communities
corresponds to the CKAN facet groups
.
The http request to filter all datasets belonging to the community PDC
in the B2FIND catalogue is:
http://b2find.eudat.eu/api/3/action/package_search?fq=groups:pdc
Exercise 1
Search in demo.ckan.org
for all datesets belonging to the group with ID governo
.
Exercise 2
Try to retrieve the display_name
of the group with ID governo
. (Hint : use the function group_show
)
Note: the following searches are dependent on the existence of the facet Discipline. While it is defined in the B2FIND schema it might not exist in your CKAN instance. Please use the CKAN-URL
b2find.eudat.eu to complete the exercises below.
To filter e.g. for all B2FIND datasets assigned to the discipline Environmental Science
enter:
http://b2find.eudat.eu/api/3/action/package_search?fq=Discipline:"Environmental Science"
Note: You have to use the quotes "
to take the blank in Environmental Science
into account.
In B2FIND the facet Creator is mapped to the CKAN field Author. To filter e.g. for all datasets created by Dustin Whalen
in the B2FIND repository enter:
http://b2find.eudat.eu/api/3/action/package_search?fq=author:Dustin?Whalen
Analogously, we can search for datasets in demo.ckan.org whose author is Lagares :
http://demo.ckan.org//api/3/action/package_search?fq=author:Lagares
Sometimes you would like to filter for entries lying in a specific region or mentioning or being created in a specific periode of time. This information, however, is not part or not making sense for all metadata records.
To enable these facets describing coverage CKAN extensions needs to be installed. This is done for the B2FIND service, while the CKAN demo site offers only geospatial search.
The feature to filter for publication year is developed by and implemented in B2FIND by the CKAN extension ckanext-datesearch.
To search for all datasets which are published between the years 1990 and 2016 the intrinsic extra fields ext_startdate
and ext_enddate
are used as shown here:
curl http://b2find.eudat.eu/api/3/action/package_search -d '{ "fq":"PublicationYear:[1990 TO 2017]"}' | python -m json.tool
The feature to filter for temporal coverage is implemented in B2FIND through the CKAN extension ckanext-timeline.
If you want to search for all datasets, which have an overlap with the 20th century, restrict the facets TemporalCoverageBeginDate
and TemporalCoverageEndDate
for the start and end time to according intervals, respectively :
curl http://b2find.eudat.eu/api/3/action/package_search -d '
{"fq":"TemporalCoverageBeginDate:[1900* TO *]",
"fq":"TemporalCoverageEndDate:[* TO 1999*]" }' | python -m json.tool
The feature to filter for geospatial coverage is implemented in B2FIND through the adapted CKAN extension ckanext-spatial.
Imagine we want to filter all datasets with a spatial extent that overlaps with a small region around the coast line at the northern border between Alaska and Canada.
Here we need the latitude and longitude coordinates of this region and have to use the CKAN extra field ext_bbox
as follows:
curl http://b2find.eudat.eu/api/3/action/package_search -d '
{ "extras":
{ "ext_bbox" : "-141.0,69.5,-140.0,70.0" }
}' | python -m json.tool
We now want to combine all the single search criteria to satisfy the use case of our researcher. In other words, we want to filter out all datasets that fulfill each of the following criteria within one call (we only list here the facets which working properly by directly sending combined HTTP CKAN API requests):
- The word
coast
is found in the full text body of the metadata record - The data belongs to the community PDC
- The data is 'created' by
Dustin Whalen
- The data is originated from the discipline 'Environmental Science'
- The data are published between 1990 and 2016
- The data cover the 20th century
- The spatial extent has an intersection with a small section of the northern costal line at the border between Alaska and Canada.
This looks for the several methods as follows :
Enter the following text in the address field:
http://b2find.eudat.eu/dataset?q=coast&groups=pdc&Discipline=Environmental+Science&author=Dustin+Whalen&PublicationYear:[1990 TO 2016]&TemporalCoverageBeginDate:[1900 TO *]&TemporalCoverageEndDate:[* TO 1999*]&ext_bbox=-141.7236328125,69.38031271734351,-140.1416015625,70.18510275498964
Issue the following command:
curl http://b2find.eudat.eu/api/3/action/package_search -d '
{ "q":"coast",
"fq":"groups:pdc",
"fq":"Discipline:Environmental?Science",
"fq":"author:Dustin?Whalen",
"fq":"TemporalCoverageBeginDate:[1900* TO *]",
"fq":"TemporalCoverageEndDate:[* TO 1999*]",
"extras": {
"ext_bbox" : "-141.0,69.5,-140.0,70.0",
"ext_startdate" : "1990-01-01T00:00:00Z",
"ext_enddate" : "2016-12-31T23:59:59Z"
}
}' | python -m json.tool
curl http://b2find.eudat.eu/api/3/action/package_search -d '
{ "q":"coast",
"fq":"groups:pdc",
"fq":"Discipline:Environmental?Science",
"fq":"author:Dustin?Whalen",
"fq":"PublicationYear:[1990 TO 2017]"
}' | python -m json.tool
Enter the following commands in a Python environment:
>>> response = ckan.call_action('package_search',{'q':'coast',
"fq":"groups:pdc AND author:Dustin?Whalen",
"extras":{"ext_bbox" : "-141.0,69.5,-140.0,70.0",
"ext_startdate" : "1990-01-01T00:00:00Z",
"ext_enddate" : "2016-12-31T23:59:59Z" }})
>>> print response["count"]
5
Exercise 1: Play around with all three 'methods' and discuss in groups : What are the pro's and con's, especially for such a lengthy and complex request ?
Exercise 2: Set up your own use cases and try to transpose them in CKAN requests. Docuemnt and discuss your experiences by submitting your requests.
We have already used the key count
, which returns the number of documents found. The key sort
tells us how the documents are sorted in the output. The key results
contains all entries found. We will have a look at them later. First, we would like to draw your attention to facet
and search_facet
.
These two fields are empty for our queries above. Let us inspect these two entries for the query below:
response = ckan.call_action('package_search',{
'fq':'Discipline:Linguistics',
'facet.field':['author', 'Language'],
'facet.limit':10
})
We search for documents from the Discipline
"linguistics" and we are interested in which Languages
we find and which authors published the documents.
To do so, we give an extra parameter facet.field which is a python list and contains the names of two other facets author
and Discipline
. The facet.limit
gives the number of the top hits (default = 1).
We see that most of the documents are published in Dutch, followed by German and that the most active authors are Sotaro Kita and Amanda Brown.
Note: the two facets in facet.field
are not joint, i.e. the documents falling under Dutch do not necessarily have an author from the list of top 10 authors.
The result under
response['search_facets']
gives us the same extra information namely count
, display_name
and name
.
We can use these two fields to retrieve some statistics on our result.
Now we will inspect the results
field of the response. From the value in
response['sort']
we know that the most recent documents are on top of the list.