Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Commit

Permalink
Merge pull request #82 from eregs/475-pgsql
Browse files Browse the repository at this point in the history
Add an initial Postgres backend for search
  • Loading branch information
cmc333333 authored Aug 10, 2017
2 parents c4e7404 + 485ecad commit 02522aa
Show file tree
Hide file tree
Showing 24 changed files with 473 additions and 21 deletions.
44 changes: 35 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,14 +85,16 @@ In both cases, you can find the site locally at

## Apps included

This repository contains three Django apps, *regcore*, *regcore_read*, and
*regcore_write*. The former contains shared models and libraries. The "read"
app provides read-only end-points while the "write" app provides write-only
end-points (see the next section for security implications.) We recommend
using *regcore.urls* as your url router, in which case turning on or off
read/write capabilities is as simple as including the appropriate
applications in your Django settings file. Note that you will always need
*regcore* installed.
This repository contains four Django apps, *regcore*, *regcore_read*,
*regcore_write*, and *regcore_pgsql*. The first contains shared models and
libraries. The "read" app provides read-only end-points while the "write" app
provides write-only end-points (see the next section for security
implications.) We recommend using *regcore.urls* as your url router, in which
case turning on or off read/write capabilities is as simple as including the
appropriate applications in your Django settings file. The final app,
*regcore_pgsql* contains all of the modules related to running with a
Postgres-based search index. Note that you will always need *regcore*
installed.


## Security
Expand Down Expand Up @@ -120,7 +122,7 @@ turn `DEBUG` off in your `local_settings.py`

This project allows multiple backends for storing, retrieving, and searching
data. The default settings file uses Django models for data storage and
Haystack for search, but Elastic Search (1.7) can be used instead.
Haystack for search, but Elastic Search (1.7) or Postgres can be used instead.

### Django Models For Data, Haystack For Search

Expand All @@ -142,6 +144,30 @@ SEARCH_HANDLER = 'regcore_read.views.haystack_search.search'
You will need to migrate the database (`manage.py migrate`) to get started and
rebuild the search index (`manage.py rebuild_index`) after adding documents.

### Django Models For Data, Postgres For Search

If running Django 1.10 or greater, you may skip *haystack* and rely
exclusively on Postgres for search. The current search index only indexes at
the CFR section level. Install the `psycopg` (e.g. through `pip install
regcore[backend-pgsql]`) and use the following settings:

```python
BACKENDS = {
'regulations': 'regcore.db.django_models.DMRegulations',
'layers': 'regcore.db.django_models.DMLayers',
'notices': 'regcore.db.django_models.DMNotices',
'diffs': 'regcore.db.django_models.DMDiffs'
}
SEARCH_HANDLER = 'regcore_pgsql.views.search'
APPS.append('regcore_pgsql')
```

You may wish to extend the `regcore.settings.pgsql` module for simplicity.

You will need to migrate the database (`manage.py migrate`) to get started and
rebuild the search index (`manage.py rebuild_pgsql_index`) after adding
documents.

### Elastic Search For Data and Search

If *pyelasticsearch* is installed (e.g. through `pip install
Expand Down
16 changes: 16 additions & 0 deletions docs/regcore.settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,22 @@ regcore\.settings\.base module
:undoc-members:
:show-inheritance:

regcore\.settings\.elastic module
---------------------------------

.. automodule:: regcore.settings.elastic
:members:
:undoc-members:
:show-inheritance:

regcore\.settings\.pgsql module
-------------------------------

.. automodule:: regcore.settings.pgsql
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------
Expand Down
8 changes: 8 additions & 0 deletions docs/regcore.tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,14 @@ regcore\.tests\.layer\_tests module
:undoc-members:
:show-inheritance:

regcore\.tests\.recipes module
------------------------------

.. automodule:: regcore.tests.recipes
:members:
:undoc-members:
:show-inheritance:

regcore\.tests\.responses\_tests module
---------------------------------------

Expand Down
22 changes: 22 additions & 0 deletions docs/regcore_pgsql.management.commands.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
regcore\_pgsql\.management\.commands package
============================================

Submodules
----------

regcore\_pgsql\.management\.commands\.rebuild\_pgsql\_index module
------------------------------------------------------------------

.. automodule:: regcore_pgsql.management.commands.rebuild_pgsql_index
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: regcore_pgsql.management.commands
:members:
:undoc-members:
:show-inheritance:
17 changes: 17 additions & 0 deletions docs/regcore_pgsql.management.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
regcore\_pgsql\.management package
==================================

Subpackages
-----------

.. toctree::

regcore_pgsql.management.commands

Module contents
---------------

.. automodule:: regcore_pgsql.management
:members:
:undoc-members:
:show-inheritance:
30 changes: 30 additions & 0 deletions docs/regcore_pgsql.migrations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
regcore\_pgsql\.migrations package
==================================

Submodules
----------

regcore\_pgsql\.migrations\.0001\_initial module
------------------------------------------------

.. automodule:: regcore_pgsql.migrations.0001_initial
:members:
:undoc-members:
:show-inheritance:

regcore\_pgsql\.migrations\.0002\_documentindex\_doc\_root module
-----------------------------------------------------------------

.. automodule:: regcore_pgsql.migrations.0002_documentindex_doc_root
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: regcore_pgsql.migrations
:members:
:undoc-members:
:show-inheritance:
39 changes: 39 additions & 0 deletions docs/regcore_pgsql.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
regcore\_pgsql package
======================

Subpackages
-----------

.. toctree::

regcore_pgsql.management
regcore_pgsql.migrations
regcore_pgsql.tests

Submodules
----------

regcore\_pgsql\.models module
-----------------------------

.. automodule:: regcore_pgsql.models
:members:
:undoc-members:
:show-inheritance:

regcore\_pgsql\.views module
----------------------------

.. automodule:: regcore_pgsql.views
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: regcore_pgsql
:members:
:undoc-members:
:show-inheritance:
30 changes: 30 additions & 0 deletions docs/regcore_pgsql.tests.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
regcore\_pgsql\.tests package
=============================

Submodules
----------

regcore\_pgsql\.tests\.rebuild\_pgsql\_index\_tests module
----------------------------------------------------------

.. automodule:: regcore_pgsql.tests.rebuild_pgsql_index_tests
:members:
:undoc-members:
:show-inheritance:

regcore\_pgsql\.tests\.views\_tests module
------------------------------------------

.. automodule:: regcore_pgsql.tests.views_tests
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------

.. automodule:: regcore_pgsql.tests
:members:
:undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions docs/regcore_read.tests.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,14 @@ regcore\_read\.tests\.views\_regulation\_tests module
:undoc-members:
:show-inheritance:

regcore\_read\.tests\.views\_seach\_utils\_tests module
-------------------------------------------------------

.. automodule:: regcore_read.tests.views_seach_utils_tests
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------
Expand Down
8 changes: 8 additions & 0 deletions docs/regcore_read.views.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,14 @@ regcore\_read\.views\.notice module
:undoc-members:
:show-inheritance:

regcore\_read\.views\.search\_utils module
------------------------------------------

.. automodule:: regcore_read.views.search_utils
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------
Expand Down
1 change: 1 addition & 0 deletions regcore/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
]

INSTALLED_APPS = [
'django.contrib.contenttypes',
'mptt',
'haystack',
'regcore',
Expand Down
5 changes: 5 additions & 0 deletions regcore/settings/pgsql.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from regcore.settings.base import * # noqa

INSTALLED_APPS.remove('haystack')
INSTALLED_APPS.extend(['regcore_pgsql', 'django.contrib.postgres'])
SEARCH_HANDLER = 'regcore_pgsql.views.search'
8 changes: 8 additions & 0 deletions regcore/tests/recipes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from model_mommy.recipe import Recipe

from regcore.models import Document


# Account for mptt-related edge case
doc_recipe = Recipe(Document, lft=None, rght=None, tree_id=None,
_fill_optional=['title'])
Empty file.
Empty file.
30 changes: 30 additions & 0 deletions regcore_pgsql/management/commands/rebuild_pgsql_index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import logging

from django.core.management.base import BaseCommand
from django.db import transaction

from regcore.models import Document
from regcore_pgsql.models import DocumentIndex


logger = logging.getLogger(__name__)


def section_documents():
return Document.objects\
.filter(label_string__contains='-')\
.exclude(label_string__regex=r'.*-.*-.*')


class Command(BaseCommand):
help = "Rebuild document indexes for searching sections within Postgres"

@transaction.atomic
def handle(self, *args, **options):
DocumentIndex.objects.all().delete()
count = section_documents().count()
for idx, document in enumerate(section_documents().iterator()):
if idx % 100 == 0:
logger.info('Inserted DocumentIndex %s / %s', idx, count)
DocumentIndex.from_document(document).save()
DocumentIndex.rebuild_search_vectors()
21 changes: 21 additions & 0 deletions regcore_pgsql/migrations/0002_documentindex_doc_root.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# -*- coding: utf-8 -*-
# Generated by Django 1.11.3 on 2017-08-04 15:27
from __future__ import unicode_literals

from django.db import migrations, models


class Migration(migrations.Migration):

dependencies = [
('regcore_pgsql', '0001_initial'),
]

operations = [
migrations.AddField(
model_name='documentindex',
name='doc_root',
field=models.SlugField(default='', max_length=200),
preserve_default=False,
),
]
15 changes: 10 additions & 5 deletions regcore_pgsql/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ class DocumentIndex(models.Model):
combined_text = models.TextField()
combined_titles = models.TextField()
root_title = models.TextField()
doc_root = models.SlugField(max_length=200) # denormalized

search_vector = SearchVectorField()

Expand All @@ -17,16 +18,20 @@ def from_document(cls, document):
doc_and_children = document.get_descendants(include_self=True)
return cls(
document=document,
combined_text=' '.join(d.text for d in doc_and_children if d.text),
combined_titles=' '.join(
combined_text='\n'.join(
d.text for d in doc_and_children if d.text),
combined_titles='\n'.join(
d.title for d in doc_and_children if d.title),
root_title=document.title or '',
doc_root=document.label_string.split('-')[0],
)

@classmethod
def rebuild_search_vectors(cls):
cls.objects.update(search_vector=(
SearchVector('root_title', weight='A')
+ SearchVector('combined_titles', weight='B')
+ SearchVector('combined_text', weight='C')
# note that the root title gets double-counted, as it's also in
# combined_titles
SearchVector('root_title', weight='B') +
SearchVector('combined_titles', weight='A') +
SearchVector('combined_text', weight='B')
))
Empty file added regcore_pgsql/tests/__init__.py
Empty file.
Loading

0 comments on commit 02522aa

Please sign in to comment.