Skip to content

Latest commit

 

History

History
175 lines (122 loc) · 4.43 KB

README.md

File metadata and controls

175 lines (122 loc) · 4.43 KB

qu

Build Status

qu is a platform for publishing data sets via the web, created and maintained by the CFPB to serve our public data sets. Installing this software on your own infrastructure will let you:

  • Import data using a format inspired by the Google Dataset Publishing Language.
  • Query data using an API, inspired by the Socrata Open Data API.
  • Export data in JSON and CSV formats

Performance

...some stuff here about how well it performs...

Examples

See qu in action:

  • Query the CFPB's mortgage data (HMDA) API from our console.
  • CFPB's HMDA analysis and navigation tools are driven by the API.

Getting started

Prerequisites

In order to run qu, you need the following languages and tools installed:

Setup

Once you have the prerequisites installed and the code downloaded and expanded into a directory (which we will call "qu"), run the following commands:

cd qu
lein deps
npm install -g grunt-cli bower
npm install && bower install
grunt

If editing the JavaScript or CSS, run the following to watch the JS and CSS and make sure your changes are compiled:

grunt watch

You can run grunt to compile the files once.

To start a Clojure REPL to work with the software, run:

lein repl

In order to run the API as a web server, run:

lein run

Go to http://localhost:3000 and you should see the app running.

Before starting the API, you will want to start MongoDB and load some data into it. Currently, qu only supports connecting to a local MongoDB connection.

Configuration

All the settings below are shown via environment variables, but they can also be set via Java properties. See the documentation for environ for more information on how to use Java properties if you prefer.

Server port and threads

By default, the server will come up on port 3000 and 4 threads will be allocated to handle requests. You can change these settings via environment variables:

HTTP_PORT=3000
HTTP_THREADS=4

MongoDB

In development mode, the application will connect to your local MongoDB server. In production, or if you want to connect to a different Mongo server in dev, you will have to specify the Mongo host and port.

You can do this via setting environment variables:

MONGO_HOST=192.168.21.98
MONGO_PORT=27017

APP URL

To control the HREF of the links that are created for data slices, you can set the APP_URL environment variable.

For example, given a slice at /data/a_resource/a_slice, setting the APP_URL variable like so

APP_URL=https://my.data.platform/data-api

will create links such as

_links":[{"rel":"self","href":"https://my.data.platform/data-api/data/a_resource/a_slice.json? ....

when emitted in JSON, JSONP, XML, and so on.

If the variable is not set, then absolute HREFs such as /data/a_resource/a_slice.json are used. This variable is most useful in production hosting situations where an application server is behind a proxy, and you wish to granularly control the HREFs that are created independent of how the application server sees the request URI.

Loading data

Make sure you have MongoDB started. To load some sample data, run lein repl and enter the following:

(require 'cfpb.qu.loader)
(in-ns 'cfpb.qu.loader)
(ensure-mongo-connection)
(load-dataset "county_taxes")
(load-dataset "census") ; Takes quite a while to run; can skip.
(mongo/disconnect!)

Testing

We use Midje to test this project, so to execute the tests, run:

lein midje

If you want the tests to automatically run whenever you change the code, eliminating the JVM startup time and generally being great, run:

lein midje :autotest

We also have integration tests that run tests against a Mongo database. To run these tests:

lein with-profile integration embongo midje