Skip to content

Commit

Permalink
Refactor search (#1)
Browse files Browse the repository at this point in the history
* Begin search API refactoring

* Call order_by before limit

* Add offset to search API

* Add docs skeleton

* Rename mkdocs.yml

* Update search docs

* Update search docs

* Reimplement full text search

* Fix _get_container

* Fix coverage reporting

* Implement fts phrase queries

* Show missing lines in coverage report

* Add pycodestyle to travis

* Remove search block from readme

* Abort with full error rather than message

* Add language to Annotation for indexing

* Fix get_language when none

* Index body and target

* Index body and target

* Style fixes

* Remove unused tablename in base model

* Add test

* Add tests

* Remove not as fts operator

* Only join clauses if more than one

* Add tests

* Style fix

* Update tests

* Add export docs

* Remove flatten_json

* Remove export docs from readme

* Prefer py3 import

* Add messages to res.status_code tests

* Add tests

* Update confusing imports

* Update docs

* Don't read json file as bytes

* Let all tests fail with travis

* Add tests

* Update travis config to use postgresql 10

* Try new travis settings

* Try travis config with sudo

* Try new travis db config

* Try new travis db config

* Add tests

* Update test

* Remove flatten args from exporter

* Fix and test exporter

* Remove unecessary 404 check

* Use default generator for annotations only

* Add note to docs

* Remove flask profiler

* Update tests
  • Loading branch information
alexandermendes authored May 28, 2018
1 parent c29efcb commit d318581
Show file tree
Hide file tree
Showing 45 changed files with 1,357 additions and 544 deletions.
3 changes: 2 additions & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
[report]
omit = */test/*
omit = */test/*
show_missing=True
3 changes: 0 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,5 @@ alembic.ini
htmlcov/*
.coverage

# Flask profiler
flask_profiler.sql

# Provisioning
playbook.retry
29 changes: 24 additions & 5 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,21 +1,40 @@
sudo: true
language: python
dist: trusty
python:
- "2.7"
- "3.4"
- "3.5"
- "3.6"
env:
- EXPLICATES_SETTINGS="../settings_test.py"
global:
- PGPORT=5433
- EXPLICATES_SETTINGS="../settings_test.py"
addons:
postgresql: "9.5"
postgresql: "10"
apt:
packages:
- postgresql-10
- postgresql-client-10
install:
- pip install -U pip
- pip install -r requirements.txt
before_script:
- psql -c "create user rtester with createdb login password 'rtester'" -U postgres
- psql -c "create database explicates_test owner rtester encoding 'UTF-8' lc_collate 'en_US.UTF-8' lc_ctype 'en_US.UTF-8' template template0;" -U postgres
- sudo -u postgres psql -c "CREATE USER rtester WITH createdb LOGIN PASSWORD 'rtester';" -U postgres
- sudo -u postgres psql -c "CREATE DATABASE explicates_test OWNER rtester ENCODING 'UTF-8' LC_COLLATE 'en_US.UTF-8' LC_CTYPE 'en_US.UTF-8' TEMPLATE template0;" -U postgres
script:
- pycodestyle
- alembic -c alembictest.ini stamp head
- alembic -c alembictest.ini upgrade head
- nosetests -x test/
- nosetests test/
- mkdocs build --clean
after_success:
- pip install coveralls
- coveralls
deploy:
provider: pages
skip_cleanup: true
github_token: $GITHUB_TOKEN
local_dir: site/
on:
branch: master
50 changes: 2 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,7 @@
## Setup

### Development

A virtual machine setup is provided for easily getting a development server up
and running.
A virtual machine setup is provided for local development.

Download and install
[Vagrant](https://www.vagrantup.com/) >= 4.2.10 and
Expand All @@ -30,14 +27,6 @@ python run.py

By default the server will now be available at http://127.0.0.1:3000.

### Production

The requirements for production servers are:

- Ubuntu 18.04 LTS
- Python 2.7 or >= 3.4
- PostgreSQL >= 10

## Configuration

See [settings.py.tmpl](settings.py.tmpl) for all available configuration
Expand All @@ -49,7 +38,7 @@ cp settings.py.tmpl settings.py

## Testing

Explicates is tested against Python 2.7 and 3.4.
Explicates is tested against [Python 2.7 and 3.4](https://travis-ci.org/alexandermendes/explicates):

```bash
# python 2
Expand All @@ -58,38 +47,3 @@ nosetests test/
# python 3
python3 -m "nose"
```

## Usage

Proper documentation to follow.

### Web Annotations

The Web Annotation endpoints are served from `/annotations/`.

### Search

A search service for Annotations is implemented at `/search/`. Results are
returned as an AnnotationCollection.

The following parameters are provided:

- `contains`: Search for objects that contain some nested value
(e.g. `contains={"motivation":"commenting"}`)
- `fts`: Full text search (e.g. `fts=body::foo`)

### Export

The following endpoint is provided to export the data for the purposes of
migration or offline analysis, where the `collection_id` is the IRI part that
identifies a particular collection.

```http
GET /export/<collection_id>
```

The data is streamed as JSON-LD and the following query parameters are
available:

- `flatten`: Flatten each Annotation.
- `zip`: Return the data as a ZIP file.
40 changes: 40 additions & 0 deletions alembic/versions/0585e7d309a1_add_indexes_to_annotation_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""Add indexes to Annotation table
Revision ID: 0585e7d309a1
Revises: 13ad0b8849e5
Create Date: 2018-05-27 19:44:10.461247
"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = '0585e7d309a1'
down_revision = '13ad0b8849e5'
branch_labels = None
depends_on = None


def upgrade():
sql = ("""
CREATE or REPLACE FUNCTION lang_cast(VARCHAR) RETURNS regconfig
AS 'select cast($1 as regconfig)'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT;
CREATE INDEX idx_annotation_body
ON annotation
USING gin (to_tsvector(lang_cast(language), _data -> 'body'));
CREATE INDEX idx_annotation_target
ON annotation
USING gin (to_tsvector(lang_cast(language), _data -> 'target'));
""")
op.execute(sql)


def downgrade():
op.drop_index('idx_annotation_body')
op.drop_index('idx_annotation_target')
26 changes: 26 additions & 0 deletions alembic/versions/13ad0b8849e5_add_language_to_annotation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
"""Add language to Annotation
Revision ID: 13ad0b8849e5
Revises: 3b8038c6e43e
Create Date: 2018-05-27 17:20:14.775557
"""
from alembic import op
import sqlalchemy as sa


# revision identifiers, used by Alembic.
revision = '13ad0b8849e5'
down_revision = '3b8038c6e43e'
branch_labels = None
depends_on = None


def upgrade():
op.add_column('annotation', sa.Column('language', sa.Text))
sql = "update annotation set language='english'"
op.execute(sql)


def downgrade():
op.drop_column('annotation', 'language')
7 changes: 5 additions & 2 deletions alembic/versions/3b8038c6e43e_add_core_tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ def make_uuid():


def upgrade():
op.create_table('collection',
op.create_table(
'collection',
sa.Column('key', sa.Integer(), nullable=False),
sa.Column('id', sa.Text(), nullable=False, unique=True,
default=make_uuid),
Expand All @@ -39,7 +40,8 @@ def upgrade():
sa.PrimaryKeyConstraint('key')
)

op.create_table('annotation',
op.create_table(
'annotation',
sa.Column('key', sa.Integer(), nullable=False),
sa.Column('id', sa.Text(), nullable=False, unique=True,
default=make_uuid),
Expand All @@ -52,6 +54,7 @@ def upgrade():
sa.ForeignKey('collection.key'), nullable=False)
)


def downgrade():
op.drop_table('annotation')
op.drop_table('collection')
34 changes: 34 additions & 0 deletions docs/annotations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Annotations can be created, read, updated and deleted via the following
endpoints.

## Post

Create an Annotation.

```http
POST /annotations/<container_id>/
```

## Get

Read an Annotation.

```http
GET /annotations/<container_id>/<annotation_id>/
```

## Put

Update an Annotation.

```http
PUT /annotations/<container_id>/<annotation_id>/
```

## Delete

Delete an Annotation.

```http
DELETE /annotations/<container_id>/<annotation_id>/
```
40 changes: 40 additions & 0 deletions docs/collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Annotation Collections can be created, read, updated and deleted via the
following endpoints.

## Post

Create an Annotation Collection.

```http
POST /annotations/
```

## Get

Returns a Annotation Collection containing a minimal representation of all
Annotation Collections on the server.

```http
GET /annotations/
```

## Put

Update an Annotation Collection.

```http
PUT /annotations/my-container/
```

## Delete

Delete an Annotation Collection.

```http
DELETE /annotations/my-container/
```

!!! note "Deletion rules"

Annotation Collections cannot be deleted if they are the last one
remaining on the server, or if they contain Annotations.
29 changes: 29 additions & 0 deletions docs/export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
All Annotations in an Annotation Collection can be exported from the database
via the following endpoint:

```http
GET /export/<collection_id>/
```

The Annotations are streamed back to the client as a JSON list. This endpoint
is intended for programmatic use only. It is not recommended to access it via
a web browser as, depending on the number of Annotations to be exported, it is
likely the browser would run out of memory before the request finishes.

You can add the URL parameter `zip=1` to download the Annotations as a ZIP
file.

!!! summary "Curl example"

```bash
curl https://example.org/annotations/my-container/ > out.json
```

!!! summary "Python pandas example"

```python
import pandas

iri = 'https://example.org/annotations/my-container/'
df = pandas.read_json(iri, orient='records')
```
9 changes: 9 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Explicates is a Web Annotation server built with
[Flask](http://flask.pocoo.org/) and backed by
[PostgreSQl](https://www.postgresql.org/). It complies with the
[Web Annotation Protocol](https://www.w3.org/TR/annotation-protocol/) and
includes additional endpoints for [searching](/search.md) and
[exporting](/export.md) the data.

To find our how to setup a local development server see the
[Setup](/setup.md) section.
Loading

0 comments on commit d318581

Please sign in to comment.