qu is a platform for publishing data sets via the web, created and maintained by the CFPB to serve our public data sets. Installing this software on your own infrastructure will let you:
- Import data using a format inspired by the Google Dataset Publishing Language.
- Query data using an API, inspired by the Socrata Open Data API.
- Export data in JSON and CSV formats
...some stuff here about how well it performs...
See qu in action:
- Query the CFPB's mortgage data (HMDA) API from our console.
- CFPB's HMDA analysis and navigation tools are driven by the API.
In order to run qu, you need the following languages and tools installed:
Once you have the prerequisites installed and the code downloaded and expanded into a directory (which we will call "qu"), run the following commands:
cd qu
lein deps
npm install -g grunt-cli bower
npm install && bower install
grunt
If editing the JavaScript or CSS, run the following to watch the JS and CSS and make sure your changes are compiled:
grunt watch
You can run grunt
to compile the files once.
To start a Clojure REPL to work with the software, run:
lein repl
In order to run the API as a web server, run:
lein run
Go to http://localhost:3000 and you should see the app running.
Before starting the API, you will want to start MongoDB and load some data into it. Currently, qu only supports connecting to a local MongoDB connection.
All the settings below are shown via environment variables, but they can also be set via Java properties. See the documentation for environ for more information on how to use Java properties if you prefer.
By default, the server will come up on port 3000 and 4 threads will be allocated to handle requests. You can change these settings via environment variables:
HTTP_PORT=3000
HTTP_THREADS=4
In development mode, the application will connect to your local MongoDB server. In production, or if you want to connect to a different Mongo server in dev, you will have to specify the Mongo host and port.
You can do this via setting environment variables:
MONGO_HOST=192.168.21.98
MONGO_PORT=27017
To control the HREF of the links that are created for data slices, you can set the APP_URL environment variable.
For example, given a slice at /data/a_resource/a_slice
, setting the APP_URL variable like so
APP_URL=https://my.data.platform/data-api
will create links such as
_links":[{"rel":"self","href":"https://my.data.platform/data-api/data/a_resource/a_slice.json? ....
when emitted in JSON, JSONP, XML, and so on.
If the variable is not set, then absolute HREFs such as /data/a_resource/a_slice.json
are used. This variable is most useful in production hosting situations where an application server is behind a proxy, and you wish to granularly control the HREFs that are created independent of how the application server sees the request URI.
Make sure you have MongoDB started. To load some sample data, run
lein repl
and enter the following:
(require 'cfpb.qu.loader)
(in-ns 'cfpb.qu.loader)
(ensure-mongo-connection)
(load-dataset "county_taxes")
(load-dataset "census") ; Takes quite a while to run; can skip.
(mongo/disconnect!)
We use Midje to test this project, so to execute the tests, run:
lein midje
If you want the tests to automatically run whenever you change the code, eliminating the JVM startup time and generally being great, run:
lein midje :autotest
We also have integration tests that run tests against a Mongo database. To run these tests:
lein with-profile integration embongo midje