Branch | Status |
---|---|
develop | |
master |
This is the SearchServer
service from mf-chsdi3. How the service can be queried, is currently described here:
api3.geo.admin.ch/search. But this will have to be migrated in some way to this repository. This service is a simple Flask Application that query a Sphinx Search Server. Currently supported Sphinx Search server is v2.2.11.
This service uses SemVer as versioning scheme. The versioning is automatically handled by .github/workflows/main.yml
file.
See also Git Flow - Versioning for more information on the versioning guidelines.
The Make targets assume you have python3.9, pipenv, bash, curl and docker installed.
First, you'll need to clone the repo
git clone [email protected]:geoadmin/service-search-wsgi
Then, you can run the setup
target to ensure you have everything needed to develop, test and serve locally
Virtual environment to develop and debug the service
make setup
To run the service you will have to adapt .env.local, which is a copy of .env.default And to set the variables.
For local development you will need access to a running sphinx search server and to the database. To do so you can use ssh port forwarding to the DB and to the current sphinx deployment server.
Right now the database BOD is being accessed , to retrieve the and to do on labels.
In order to have a consistent code style the code should be formatted using yapf
. Also to avoid syntax errors and non
pythonic idioms code, the project uses the pylint
linter. Both formatting and linter can be manually run using the
following command:
make format-lint
Formatting and linting should be at best integrated inside the IDE, for this look at Integrate yapf and pylint into IDE
Testing if what you developed work is made simple. You have four targets at your disposal. test, serve, gunicornserve, dockerrun
make test
This command run the unit tests.
summon make serve
This will serve the application through Flask without any wsgi in front.
summon make gunicornserve
This will serve the application with the Gunicorn layer in front of the application
summon make dockerrun
This will serve the application with the wsgi server, inside a container.
The service is encapsulated in a Docker image. Images are pushed on the swisstopo-bgdi-builder
account of AWS ECR registry. From each github PR that is merged into develop branch, one Docker image is built and pushed with the following tags:
develop.latest
CURRENT_VERSION-beta.INCREMENTAL_NUMBER
From each github PR that is merged into master, one Docker image is built an pushed with the following tag:
VERSION
Each image contains the following metadata:
- author
- git.branch
- git.hash
- git.dirty
- version
These metadata can be seen directly on the dockerhub registry in the image layers or can be read with the following command
# NOTE: jq is only used for pretty printing the json output,
# you can install it with `apt install jq` or simply enter the command without it
docker image inspect --format='{{json .Config.Labels}}' 974517877189.dkr.ecr.eu-central-1.amazonaws.com/service-search-wsgi:develop.latest | jq
You can also check these metadata on a running container as follows
docker ps --format="table {{.ID}}\t{{.Image}}\t{{.Labels}}"
This service is going to be deployed on a vhost. The configuration of the docker-compose.yml of the vhost setup is going to be here: https://github.com/geoadmin/infra-vhost
The service is configured by Environment Variable:
Env | Default | Description |
---|---|---|
HTTP_PORT | 5000 | The port on which the service can be queried. |
SEARCH_WORKERS | 0 |
Number of workers. 0 or negative value means that the number of worker are computed from the number of cpu |
TESTING | False | When TESTING=True, the application does not need a db connection to retrieve a list of topics. A list with the topics used in the tests is being set. |
BOD_DB_NAME | - | Depending on the staging level usually |
BOD_DB_HOST | - | The db host. |
BOD_DB_PORT | 5432 | The db port |
BOD_DB_USER | - | The read-only db user |
BOD_DB_PASSWD | - | The db password. |
GEODATA_STAGING | prod | In the database bod, a dataset itself has the attribute staging. This staging (dev, int and prod) is being filtered when querying the indexes. |
SEARCH_SPHINX_HOST | localhost | The host for sphinx search server. |
SEARCH_SPHINX_PORT | 9321 | The port for sphinx search server. |
SEARCH_SPHINX_TIMEOUT | 3 | Sphinx server timeout |
CACHE_DEFAULT_TIMEOUT | 86400 | The time in seconds in which the db queries for topics and translations will be cached. Default 24 hours, as changing rarely. |
LOGGING_CFG | logging-cfg-local.yml | Logging configuration file |
FORWARED_ALLOW_IPS | * |
Sets the gunicorn forwarded_allow_ips (see https://docs.gunicorn.org/en/stable/settings.html#forwarded-allow-ips). This is required in order to secure_scheme_headers to works. |
FORWARDED_PROTO_HEADER_NAME | X-Forwarded-Proto |
Sets gunicorn secure_scheme_headers parameter to {FORWARDED_PROTO_HEADER_NAME: 'https'} , see https://docs.gunicorn.org/en/stable/settings.html#secure-scheme-headers. |
SCRIPT_NAME | '' | The script name. This will be used once, when we have an idea about how to query search-wsgi later on. F.ex. /api/search/ f.ex. used by gunicorn (wsgi-server). |
CACHE_CONTROL_HEADER | 'public, max-age=600' |
Cache-Control header value for the search endpoint |
GZIP_COMPRESSION_LEVEL | 9 |
GZIP compression level |
WSGI_TIMEOUT | 1 | WSGI timeout, note the final timout used is SEARCH_SPHINX_TIMEOUT + WSGI_TIMEOUT , so WSGI_TIMEOUT should the maximum amount of time that the WSGI app should have to handle the data received from sphinx server. |
GUNICORN_WORKER_TMP_DIR | None |
This should be set to an tmpfs file system for better performance. See https://docs.gunicorn.org/en/stable/settings.html#worker-tmp-dir. |