An API server for the Georgetown University Center for Global Health Science and Security (GHSS) International Disease and Events Analysis (IDEA) Global Health Security (GHS) Tracking site (https://tracking.ghscosting.org/)
A list of all relevant web resources for this project follows.
- https://tracking.ghscosting.org. Distribution URL for main site. By default, uses API server at https://ghs-tracking-api-prod.ghscosting.org/. See repository
gida2/tree/master
for frontend code. - https://devtracking.ghscosting.org. Distribution URL for main site. By default, uses API server at https://ghs-tracking-api-dev3.ghscosting.org/. See repository
gida2/tree/dev
for frontend code. - https://ghs-tracking-api-prod.ghscosting.org/docs. Interactive documentation for main API server. By default, connects to database
ghs-tracking-green
ORghs-tracking-blue
at AWS RDS hostghs-tracking
. Corresponds to Elastic Beanstalk environmentghs-tracking-api-prod-2
. - https://ghs-tracking-api-dev3.ghscosting.org/docs. Interactive documentation for test API server. By default, connects to database
ghs-tracking-wip
at AWS RDS hostghs-tracking
. Corresponds to Elastic Beanstalk environmentghs-tracking-api-dev3
. - https://metric-api-test.talusanalytics.com. Metrics API server consumed by frontend, currently only on the PHEIC pages to obtain COVID-19 case and death data (for example, https://tracking.ghscosting.org/events/2019-2022-covid-19-pandemic). By default, connects to database
metric-amp
at AWS RDS hosttalus-prod
. See repositorymeasles/tree/amp-metric-api/public-api
. TODO refactor this API server code into its own repository to improve maintainability. - Additional web resources are listed in the Google Doc Standard operating procedures for data management in section Key links quick reference. These include various Airtables, Google Sheets, GitHub repositories, and other resources used in GHS Tracking. It also details the data ingest process.
- Additional technical information about the data in GHS Tracking are available in the Georgetown Global Health Security (GHS) Tracking site Technical appendix, which should also be kept current with each data update.
- https://console.cloud.google.com/welcome?project=my-project---tes-1494432576154. Google Cloud Platform project that enables Google Sheets API access from within Python code in the GHS Tracking data ingest system. Currently this is used by
A description of the most important modules and packages in ghs-tracking-api
follows.
api
. Package containing main API functionality, including defining the routing, API documentation, and functions that retrieve data from the database and return it as API responses.cli
. Package containing command line interface (CLI) for data management and ingest. Trypython -m ghst --help
from within your virtual environment to see help for commands, or follow the GHS Tracking data update brief checklist to make data updates.D-Portal-Tracking
. A submodule used in the data ingest process to obtain IATI Registry data; only created if commandpython -m ghst ingest auto -m iati
has been run.db
anddb_metric
. Packages that handle getting a connection to the main COVID AMP database (containing policy data) and the COVID AMP metrics database (containing COVID-19 caseload/death data). Each contains amodels.py
module that defines the entities and data fields in the databases.ingest
. Data ingest code, including packages for ingesting data from certain sources, processing data, tagging data, and more. Most data management can be done through the CLI and you needn't run any code in this package, normally.main.py
. The main entrypoint module of the application (see checklist below for how to start it).
airtableio
. Input/Output boilerplate code for interfacing with Airtable.dataclient
. Implemented by packageairtableio
to support some functionality.docker
. Currently unused. WIP code to Dockerize the application.filewalker
. Utility for iterating over files within a directory with more customized functionality thanpath
package. Only used byingest/assistance/acta
package.googlesheetio
. Input/Output boilerplate code for interfacing with Google Sheets.issuesio
. Input/Output for writing issues to theissues
PostgreSQL table, which can be used to help adjudicate issues arising during data ingest.logs
. Directory where all log files are written (not tracked in version control). A full log is written as well as a warnings-only log that includes any log of level more severe thanINFO
.research
. Primarily ad hoc data analysis packages and modules that are not required for data webservices or data ingest functionality. These are not guaranteed to work as-is and may not be comprehensively documented.services
. Package implementing some common data service operations (CRUD) for entities in the GHS Tracking database. Note: some functionality currently in theservices/stakeholdersvc
is duplicated elsewhere in the code base, but theservices/stakeholdersvc
should be used, and other packages phased out over time.sql
. SQL scripts containing queries or manipulation language that are executed by various Python packages, mostly for data ingest purposes. Note that not all of these SQL files are currently used and some are data definition language for the database that may not be up-to-date with the current data structure.topprojio
. Input/Output code for interfacing with the large projects QA/QC matrix. This matrix is manually reviewed during each data update cycle to ensure the highest-valued projects have been QA/QCd.
The following instructions assume you are running MacOS.
-
Open a Terminal window. If you are unsure, follow instructions here.
-
If you haven't installed
pipenv
already, do so by following instructions here. -
Check if you have installed
pyenv
by doing the following MacOS command. You should get output describing the version ofpyenv
that is installed.pyenv --version
-
If
pyenv
was found, skip to the next step. Otherwise, installpyenv
if you have not already by doingbrew install pyenv
-
To install the necessary Python version, do
pyenv install 3.7.13
-
To set the local Python version, do
pyenv local 3.7.13
-
Install or upgrade your installation of
pipenv
. If you need to install it, follow the steps atIf you installed it with Homebrew, do
brew upgrade pipenv
If you installed it with pip, do
pip install --upgrade pipenv
-
To install Python packages, do
pyenv exec pipenv install --python=3.7.13 --dev
-
To run the GHS Tracking data management setup program, do
pipenv run python setup.py
The program will ask for an Airtable API key (which you can find at https://airtable.com/account) and your local PostgreSQL connection info.
You only need to provide the Airtable API key if you will be doing data ingest. You can edit this info later in the file
.env
. -
Clone a copy of the Tracking database locally with command
pipenv run python -m ghst database clone-from-cloud -u [YOUR_LOCAL_POSTGRESQL_USERNAME] -d tracking -dc ghs-tracking-green
-
Confirm that the database name and username in
.env
aretracking
and your local PostgreSQL username. -
To start the API server locally, do
pipenv run uvicorn main:app --reload