From 1f102a3572ad07a7d9de31e53b746405fc70f41e Mon Sep 17 00:00:00 2001 From: Github aggregate action Date: Tue, 14 Nov 2023 07:37:50 +0000 Subject: [PATCH] Update from neicnordic/sensitive-data-archive at 07:37 on 2023-11-14 --- docs/services/finalize.md | 147 ++++++++++++++++++++----------------- docs/services/ingest.md | 16 ++-- docs/services/intercept.md | 101 +++++++------------------ docs/services/mapper.md | 122 +++++++++++++----------------- docs/services/verify.md | 127 ++++++++++++++------------------ 5 files changed, 222 insertions(+), 291 deletions(-) diff --git a/docs/services/finalize.md b/docs/services/finalize.md index d2af7d3..892fde2 100644 --- a/docs/services/finalize.md +++ b/docs/services/finalize.md @@ -2,19 +2,20 @@ Handles the so-called _Accession ID (stable ID)_ to filename mappings from Central EGA. - ## Configuration There are a number of options that can be set for the finalize service. These settings can be set by mounting a yaml-file at `/config.yaml` with settings. - ex. + ```yaml log: level: "debug" format: "json" ``` + They may also be set using environment variables like: + ```bash export LOG_LEVEL="debug" export LOG_FORMAT="json" @@ -24,40 +25,29 @@ export LOG_FORMAT="json" These settings control how finalize connects to the RabbitMQ message broker. - - `BROKER_HOST`: hostname of the rabbitmq server - - - `BROKER_PORT`: rabbitmq broker port (commonly `5671` with TLS and `5672` without) - - - `BROKER_QUEUE`: message queue to read messages from (commonly `accessionIDs`) - - - `BROKER_ROUTINGKEY`: message queue to write success messages to (commonly `backup`) - - - `BROKER_USER`: username to connect to rabbitmq - - - `BROKER_PASSWORD`: password to connect to rabbitmq - - - `BROKER_PREFETCHCOUNT`: Number of messages to pull from the message server at the time (default to 2) - -### PostgreSQL Database settings: - - - `DB_HOST`: hostname for the postgresql database - - - `DB_PORT`: database port (commonly 5432) - - - `DB_USER`: username for the database - - - `DB_PASSWORD`: password for the database - - - `DB_DATABASE`: database name - - - `DB_SSLMODE`: The TLS encryption policy to use for database connections. - Valid options are: - - `disable` - - `allow` - - `prefer` - - `require` - - `verify-ca` - - `verify-full` +- `BROKER_HOST`: hostname of the RabbitMQ server +- `BROKER_PORT`: RabbitMQ broker port (commonly `5671` with TLS and `5672` without) +- `BROKER_QUEUE`: message queue to read messages from (commonly `accessionIDs`) +- `BROKER_ROUTINGKEY`: message queue to write success messages to (commonly `backup`) +- `BROKER_USER`: username to connect to RabbitMQ +- `BROKER_PASSWORD`: password to connect to RabbitMQ +- `BROKER_PREFETCHCOUNT`: Number of messages to pull from the message server at the time (default to 2) + +### PostgreSQL Database settings + +- `DB_HOST`: hostname for the postgresql database +- `DB_PORT`: database port (commonly 5432) +- `DB_USER`: username for the database +- `DB_PASSWORD`: password for the database +- `DB_DATABASE`: database name +- `DB_SSLMODE`: The TLS encryption policy to use for database connections. + Valid options are: + - `disable` + - `allow` + - `prefer` + - `require` + - `verify-ca` + - `verify-full` More information is available [in the postgresql documentation](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-PROTECTION) @@ -65,50 +55,71 @@ These settings control how finalize connects to the RabbitMQ message broker. Note that if `DB_SSLMODE` is set to anything but `disable`, then `DB_CACERT` needs to be set, and if set to `verify-full`, then `DB_CLIENTCERT`, and `DB_CLIENTKEY` must also be set - - `DB_CLIENTKEY`: key-file for the database client certificate - - - `DB_CLIENTCERT`: database client certificate file +- `DB_CLIENTKEY`: key-file for the database client certificate +- `DB_CLIENTCERT`: database client certificate file +- `DB_CACERT`: Certificate Authority (CA) certificate for the database to use - - `DB_CACERT`: Certificate Authority (CA) certificate for the database to use +### Logging settings -### Logging settings: +- `LOG_FORMAT` can be set to “json” to get logs in json format. + All other values result in text logging - - `LOG_FORMAT` can be set to “json” to get logs in json format. - All other values result in text logging +- `LOG_LEVEL` can be set to one of the following, in increasing order of severity: + - `trace` + - `debug` + - `info` + - `warn` (or `warning`) + - `error` + - `fatal` + - `panic` - - `LOG_LEVEL` can be set to one of the following, in increasing order of severity: - - `trace` - - `debug` - - `info` - - `warn` (or `warning`) - - `error` - - `fatal` - - `panic` +### Storage settings -## Service Description -Finalize adds stable, shareable _Accession ID_'s to archive files. -When running, finalize reads messages from the configured RabbitMQ queue (default "accessionIDs"). -For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message): +Storage backend is defined by the `ARCHIVE_TYPE`, and `BACKUP_TYPE` variables. +Valid values for these options are `S3` or `POSIX` +(Defaults to `POSIX` on unknown values). -1. The message is validated as valid JSON that matches the "ingestion-accession" schema (defined in sda-common). -If the message can’t be validated it is discarded with an error message in the logs. +The value of these variables define what other variables are read. +The same variables are available for all storage types, differing by prefix (`ARCHIVE_`, or `BACKUP_`) -1. if the type of the `DecryptedChecksums` field in the message is `sha256`, the value is stored. +if `*_TYPE` is `S3` then the following variables are available: -1. A new RabbitMQ "complete" message is created and validated against the "ingestion-completion" schema. -If the validation fails, an error message is written to the logs. +- `*_URL`: URL to the S3 system +- `*_ACCESSKEY`: The S3 access and secret key are used to authenticate to S3, +[more info at AWS](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) +- `*_SECRETKEY`: The S3 access and secret key are used to authenticate to S3, +[more info at AWS](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) +- `*_BUCKET`: The S3 bucket to use as the storage root +- `*_PORT`: S3 connection port (default: `443`) +- `*_REGION`: S3 region (default: `us-east-1`) +- `*_CHUNKSIZE`: S3 chunk size for multipart uploads. +- `*_CACERT`: Certificate Authority (CA) certificate for the storage system, tjhis is only needed if the S3 server has a certificate signed by a private entity -1. The file accession ID in the message is marked as "ready" in the database. -On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs. +and if `*_TYPE` is `POSIX`: -1. The complete message is sent to RabbitMQ. On error, a message is written to the logs. +- `*_LOCATION`: POSIX path to use as storage root -1. The original RabbitMQ message is Ack'ed. +## Service Description -## Communication +Finalize adds stable, shareable _Accession ID_'s to archive files. +If a backup location is configured it will perform backup of a file. +When running, finalize reads messages from the configured RabbitMQ queue (default "accessionIDs"). +For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message): - - Finalize reads messages from one rabbitmq queue (default `accessionIDs`). +1. The message is validated as valid JSON that matches the "ingestion-accession" schema. If the message can’t be validated it is discarded with an error message in the logs. +2. If the service is configured to perform backups i.e. the `ARCHIVE_` and `BACKUP_` storage backends is set. Archived files will be copied to the backup location. + 1. The file size on disk is requested from the storage system. + 2. The database file size is compared against the disk file size. + 3. A file reader is created for the archive storage file, and a file writer is created for the backup storage file.. +3. The file data is copied from the archive file reader to the backup file writer. +4. If the type of the `DecryptedChecksums` field in the message is `sha256`, the value is stored. +5. A new RabbitMQ "complete" message is created and validated against the "ingestion-completion" schema. If the validation fails, an error message is written to the logs. +6. The file accession ID in the message is marked as "ready" in the database. On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs. +7. The complete message is sent to RabbitMQ. On error, a message is written to the logs. +8. The original RabbitMQ message is Ack'ed. - - Finalize writes messages to one rabbitmq queue (default `backup`). +## Communication - - Finalize assigns the accession ID to a file in the database using the `SetAccessionID` function. +- Finalize reads messages from one RabbitMQ queue (default `accessionIDs`). +- Finalize writes messages to one RabbitMQ queue (default `backup`). +- Finalize assigns the accession ID to a file in the database using the `SetAccessionID` function. diff --git a/docs/services/ingest.md b/docs/services/ingest.md index d2f1242..95d5cf5 100644 --- a/docs/services/ingest.md +++ b/docs/services/ingest.md @@ -31,17 +31,17 @@ These settings control which crypt4gh keyfile is loaded. These settings control how ingest connects to the RabbitMQ message broker. - - `BROKER_HOST`: hostname of the rabbitmq server + - `BROKER_HOST`: hostname of the RabbitMQ server - - `BROKER_PORT`: rabbitmq broker port (commonly `5671` with TLS and `5672` without) + - `BROKER_PORT`: RabbitMQ broker port (commonly `5671` with TLS and `5672` without) - `BROKER_QUEUE`: message queue to read messages from (commonly `ingest`) - `BROKER_ROUTINGKEY`: message queue to write success messages to (commonly `archived`) - - `BROKER_USER`: username to connect to rabbitmq + - `BROKER_USER`: username to connect to RabbitMQ - - `BROKER_PASSWORD`: password to connect to rabbitmq + - `BROKER_PASSWORD`: password to connect to RabbitMQ - `BROKER_PREFETCHCOUNT`: Number of messages to pull from the message server at the time (default to 2) @@ -123,12 +123,12 @@ The ingest service copies files from the file inbox to the archive, and register When running, ingest reads messages from the configured RabbitMQ queue (default: "ingest"). For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message): -1. The message is validated as valid JSON that matches the "ingestion-trigger" schema (defined in sda-common). +1. The message is validated as valid JSON that matches the "ingestion-trigger" schema. If the message can’t be validated it is discarded with an error message in the logs. 1. If the message is of type `cancel`, the file will be marked as `disabled` and the next message in the queue will be read. -2. A file reader is created for the filepath in the message. +1. A file reader is created for the filepath in the message. If the file reader can’t be created an error is written to the logs, the message is Nacked and forwarded to the error queue. 1. The file size is read from the file reader. @@ -161,9 +161,9 @@ This error does not halt ingestion. ## Communication - - Ingest reads messages from one rabbitmq queue (commonly `ingest`). + - Ingest reads messages from one RabbitMQ queue (commonly `ingest`). - - Ingest writes messages to one rabbitmq queue (commonly `archived`). + - Ingest writes messages to one RabbitMQ queue (commonly `archived`). - Ingest inserts file information in the database using three database functions, `InsertFile`, `StoreHeader`, and `SetArchived`. diff --git a/docs/services/intercept.md b/docs/services/intercept.md index 4495267..e4aaed0 100644 --- a/docs/services/intercept.md +++ b/docs/services/intercept.md @@ -8,12 +8,15 @@ There are a number of options that can be set for the intercept service. These settings can be set by mounting a yaml-file at `/config.yaml` with settings. ex. + ```yaml log: level: "debug" format: "json" ``` + They may also be set using environment variables like: + ```bash export LOG_LEVEL="debug" export LOG_FORMAT="json" @@ -23,86 +26,36 @@ export LOG_FORMAT="json" These settings control how intercept connects to the RabbitMQ message broker. - - `BROKER_HOST`: hostname of the rabbitmq server - - - `BROKER_PORT`: rabbitmq broker port (commonly `5671` with TLS and `5672` without) - - - `BROKER_QUEUE`: message queue to read messages from (commonly `files`) - - - `BROKER_USER`: username to connect to rabbitmq - - - `BROKER_PASSWORD`: password to connect to rabbitmq - -### PostgreSQL Database settings: - - - `DB_HOST`: hostname for the postgresql database - - - `DB_PORT`: database port (commonly 5432) - - - `DB_USER`: username for the database - - - `DB_PASSWORD`: password for the database - - - `DB_DATABASE`: database name +- `BROKER_HOST`: hostname of the RabbitMQ server +- `BROKER_PORT`: RabbitMQ broker port (commonly `5671` with TLS and `5672` without) +- `BROKER_QUEUE`: message queue to read messages from (commonly `from_cega`) +- `BROKER_USER`: username to connect to RabbitMQ +- `BROKER_PASSWORD`: password to connect to RabbitMQ - - `DB_SSLMODE`: The TLS encryption policy to use for database connections. - Valid options are: - - `disable` - - `allow` - - `prefer` - - `require` - - `verify-ca` - - `verify-full` +### Logging settings - More information is available - [in the postgresql documentation](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-PROTECTION) - - Note that if `DB_SSLMODE` is set to anything but `disable`, then `DB_CACERT` needs to be set, - and if set to `verify-full`, then `DB_CLIENTCERT`, and `DB_CLIENTKEY` must also be set - - - `DB_CLIENTKEY`: key-file for the database client certificate - - - `DB_CLIENTCERT`: database client certificate file - - - `DB_CACERT`: Certificate Authority (CA) certificate for the database to use - -### Logging settings: - - - `LOG_FORMAT` can be set to “json” to get logs in json format. - All other values result in text logging - - - `LOG_LEVEL` can be set to one of the following, in increasing order of severity: - - `trace` - - `debug` - - `info` - - `warn` (or `warning`) - - `error` - - `fatal` - - `panic` +- `LOG_FORMAT` can be set to “json” to get logs in json format, all other values result in text logging +- `LOG_LEVEL` can be set to one of the following, in increasing order of severity: + - `trace` + - `debug` + - `info` + - `warn` (or `warning`) + - `error` + - `fatal` + - `panic` ## Service Description -When running, intercept reads messages from the configured RabbitMQ queue (default: "files"). -For each message, these steps are taken (if not otherwise noted, errors halt progress, the message is Nack'ed, the error is written to the log, and to the rabbitMQ error queue. -Then the service moves on to the next message): +When running, intercept reads messages from the configured RabbitMQ queue (default: "from_cega"). +For each message, these steps are taken: -1. The message type is read from the message "type" field. - -1. The message schema is read from the message "msgType" field. - -1. The message is validated as valid JSON following the schema read in the previous step. -If this fails an error is written to the logs, but not to the error queue and the message is not Ack'ed or Nack'ed. - -1. The correct queue for the message is decided based on message type. -This is not supposed to be able to fail. - -1. The message is re-sent to the correct queue. -This has no error handling as the resend-mechanism hasn't been finished. - -1. The message is Ack'ed. +1. The message type is read from the message `type` field. + 1. If the message `type` is not known, an error is logged and the message is Ack'ed. +2. The correct queue for the message is decided based on message type. +3. The message is sent to the queue. This has no error handling as the resend-mechanism hasn't been finished. +4. The message is Ack'ed. ## Communication - - Intercept reads messages from one rabbitmq queue (default `files`). - - - Intercept writes messages to three rabbitmq queues, `accessionIDs`, `ingest`, and `mappings`. +- Intercept reads messages from one queue (default `from_cega`). +- Intercept writes messages to three queues, `accession`, `ingest`, and `mappings`. diff --git a/docs/services/mapper.md b/docs/services/mapper.md index c34467b..03dd5f4 100644 --- a/docs/services/mapper.md +++ b/docs/services/mapper.md @@ -8,12 +8,15 @@ There are a number of options that can be set for the mapper service. These settings can be set by mounting a yaml-file at `/config.yaml` with settings. ex. + ```yaml log: level: "debug" format: "json" ``` + They may also be set using environment variables like: + ```bash export LOG_LEVEL="debug" export LOG_FORMAT="json" @@ -23,64 +26,46 @@ export LOG_FORMAT="json" These settings control how mapper connects to the RabbitMQ message broker. - - `BROKER_HOST`: hostname of the rabbitmq server - - - `BROKER_PORT`: rabbitmq broker port (commonly `5671` with TLS and `5672` without) - - - `BROKER_QUEUE`: message queue to read messages from (commonly `mapper`) - - - `BROKER_USER`: username to connect to rabbitmq - - - `BROKER_PASSWORD`: password to connect to rabbitmq - - - `BROKER_PREFETCHCOUNT`: Number of messages to pull from the message server at the time (default to 2) - -### PostgreSQL Database settings: - - - `DB_HOST`: hostname for the postgresql database - - - `DB_PORT`: database port (commonly 5432) - - - `DB_USER`: username for the database - - - `DB_PASSWORD`: password for the database - - - `DB_DATABASE`: database name - - - `DB_SSLMODE`: The TLS encryption policy to use for database connections. - Valid options are: - - `disable` - - `allow` - - `prefer` - - `require` - - `verify-ca` - - `verify-full` - - More information is available - [in the postgresql documentation](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-PROTECTION) - - Note that if `DB_SSLMODE` is set to anything but `disable`, then `DB_CACERT` needs to be set, - and if set to `verify-full`, then `DB_CLIENTCERT`, and `DB_CLIENTKEY` must also be set - - - `DB_CLIENTKEY`: key-file for the database client certificate - - - `DB_CLIENTCERT`: database client certificate file - - - `DB_CACERT`: Certificate Authority (CA) certificate for the database to use - -### Logging settings: - - - `LOG_FORMAT` can be set to “json” to get logs in json format. - All other values result in text logging - - - `LOG_LEVEL` can be set to one of the following, in increasing order of severity: - - `trace` - - `debug` - - `info` - - `warn` (or `warning`) - - `error` - - `fatal` - - `panic` +- `BROKER_HOST`: hostname of the RabbitMQ server +- `BROKER_PORT`: RabbitMQ broker port (commonly `5671` with TLS and `5672` without) +- `BROKER_QUEUE`: message queue to read messages from (commonly `mapper`) +- `BROKER_USER`: username to connect to RabbitMQ +- `BROKER_PASSWORD`: password to connect to RabbitMQ +- `BROKER_PREFETCHCOUNT`: Number of messages to pull from the message server at the time (default to 2) + +### PostgreSQL Database settings + +- `DB_HOST`: hostname for the postgresql database +- `DB_PORT`: database port (commonly 5432) +- `DB_USER`: username for the database +- `DB_PASSWORD`: password for the database +- `DB_DATABASE`: database name +- `DB_SSLMODE`: The TLS encryption policy to use for database connections. Valid options are: + - `disable` + - `allow` + - `prefer` + - `require` + - `verify-ca` + - `verify-full` + +More information is available in the [postgresql documentation](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-PROTECTION) +Note that if `DB_SSLMODE` is set to anything but `disable`, then `DB_CACERT` needs to be set, and if set to `verify-full`, then `DB_CLIENTCERT`, and `DB_CLIENTKEY` must also be set + +- `DB_CLIENTKEY`: key-file for the database client certificate +- `DB_CLIENTCERT`: database client certificate file +- `DB_CACERT`: Certificate Authority (CA) certificate for the database to use + +### Logging settings + +- `LOG_FORMAT` can be set to “json” to get logs in json format. All other values result in text logging +- `LOG_LEVEL` can be set to one of the following, in increasing order of severity: + - `trace` + - `debug` + - `info` + - `warn` (or `warning`) + - `error` + - `fatal` + - `panic` ## Service Description @@ -89,20 +74,17 @@ The mapper service maps file accessionIDs to datasetIDs. When running, mapper reads messages from the configured RabbitMQ queue (default: "mappings"). For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message): -1. The message is validated as valid JSON that matches the "dataset-mapping" schema (defined in sda-common). -If the message can’t be validated it is discarded with an error message in the logs. - -1. AccessionIDs from the message are mapped to a datasetID (also in the message) in the database. +1. The message is validated as valid JSON that matches the "dataset-mapping" schema. +If the message can’t be validated it is discarded with an error message is logged. +2. AccessionIDs from the message are mapped to a datasetID (also in the message) in the database. On error the service sleeps for up to 5 minutes to allow for database recovery, after 5 minutes the message is Nacked, re-queued and an error message is written to the logs. - -1. The uploaded files for each AccessionID is removed from the inbox +3. The uploaded files related to each AccessionID is removed from the inbox If this fails an error will be written to the logs. - -2. The RabbitMQ message is Ack'ed. - +4. The RabbitMQ message is Ack'ed. ## Communication - - Mapper reads messages from one rabbitmq queue (default `mappings`). - - - Mapper maps files to datasets in the database using the `MapFilesToDataset` function. +- Mapper reads messages from one RabbitMQ queue (default `mappings`). +- Mapper maps files to datasets in the database using the `MapFilesToDataset` function. +- Mapper retrieves the inbox filepath from the database for each file using the `GetInboxPath` function. +- Mapper sets the status of a dataset in the database using the `UpdateDatasetEvent` function. diff --git a/docs/services/verify.md b/docs/services/verify.md index f348cdb..cbf5117 100644 --- a/docs/services/verify.md +++ b/docs/services/verify.md @@ -8,12 +8,15 @@ There are a number of options that can be set for the verify service. These settings can be set by mounting a yaml-file at `/config.yaml` with settings. ex. + ```yaml log: level: "debug" format: "json" ``` + They may also be set using environment variables like: + ```bash export LOG_LEVEL="debug" export LOG_FORMAT="json" @@ -23,47 +26,35 @@ export LOG_FORMAT="json" These settings control which crypt4gh keyfile is loaded. - - `C4GH_FILEPATH`: filepath to the crypt4gh keyfile - - `C4GH_PASSPHRASE`: pass phrase to unlock the keyfile +- `C4GH_FILEPATH`: filepath to the crypt4gh keyfile +- `C4GH_PASSPHRASE`: pass phrase to unlock the keyfile ### RabbitMQ broker settings These settings control how verify connects to the RabbitMQ message broker. - - `BROKER_HOST`: hostname of the rabbitmq server - - - `BROKER_PORT`: rabbitmq broker port (commonly `5671` with TLS and `5672` without) - - - `BROKER_QUEUE`: message queue to read messages from (commonly `archived`) - - - `BROKER_ROUTINGKEY`: message queue to write success messages to (commonly `verified`) - - - `BROKER_USER`: username to connect to rabbitmq - - - `BROKER_PASSWORD`: password to connect to rabbitmq - - - `BROKER_PREFETCHCOUNT`: Number of messages to pull from the message server at the time (default to 2) - -### PostgreSQL Database settings: - - - `DB_HOST`: hostname for the postgresql database - - - `DB_PORT`: database port (commonly 5432) - - - `DB_USER`: username for the database - - - `DB_PASSWORD`: password for the database - - - `DB_DATABASE`: database name - - - `DB_SSLMODE`: The TLS encryption policy to use for database connections. - Valid options are: - - `disable` - - `allow` - - `prefer` - - `require` - - `verify-ca` - - `verify-full` +- `BROKER_HOST`: hostname of the RabbitMQ server +- `BROKER_PORT`: RabbitMQ broker port (commonly `5671` with TLS and `5672` without) +- `BROKER_QUEUE`: message queue to read messages from (commonly `archived`) +- `BROKER_ROUTINGKEY`: message queue to write success messages to (commonly `verified`) +- `BROKER_USER`: username to connect to RabbitMQ +- `BROKER_PASSWORD`: password to connect to RabbitMQ +- `BROKER_PREFETCHCOUNT`: Number of messages to pull from the message server at the time (default to 2) + +### PostgreSQL Database settings + +- `DB_HOST`: hostname for the postgresql database +- `DB_PORT`: database port (commonly 5432) +- `DB_USER`: username for the database +- `DB_PASSWORD`: password for the database +- `DB_DATABASE`: database name +- `DB_SSLMODE`: The TLS encryption policy to use for database connections, valid options are: + - `disable` + - `allow` + - `prefer` + - `require` + - `verify-ca` + - `verify-full` More information is available [in the postgresql documentation](https://www.postgresql.org/docs/current/libpq-ssl.html#LIBPQ-SSL-PROTECTION) @@ -71,11 +62,9 @@ These settings control how verify connects to the RabbitMQ message broker. Note that if `DB_SSLMODE` is set to anything but `disable`, then `DB_CACERT` needs to be set, and if set to `verify-full`, then `DB_CLIENTCERT`, and `DB_CLIENTKEY` must also be set - - `DB_CLIENTKEY`: key-file for the database client certificate - - - `DB_CLIENTCERT`: database client certificate file - - - `DB_CACERT`: Certificate Authority (CA) certificate for the database to use +- `DB_CLIENTKEY`: key-file for the database client certificate +- `DB_CLIENTCERT`: database client certificate file +- `DB_CACERT`: Certificate Authority (CA) certificate for the database to use ### Storage settings @@ -87,34 +76,33 @@ The value of these variables define what other variables are read. The same variables are available for all storage types, differing by prefix (`ARCHIVE_`, or `INBOX_`) if `*_TYPE` is `S3` then the following variables are available: - - `*_URL`: URL to the S3 system - - `*_ACCESSKEY`: The S3 access and secret key are used to authenticate to S3, + +- `*_URL`: URL to the S3 system +- `*_ACCESSKEY`: The S3 access and secret key are used to authenticate to S3, [more info at AWS](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) - - `*_SECRETKEY`: The S3 access and secret key are used to authenticate to S3, +- `*_SECRETKEY`: The S3 access and secret key are used to authenticate to S3, [more info at AWS](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys) - - `*_BUCKET`: The S3 bucket to use as the storage root - - `*_PORT`: S3 connection port (default: `443`) - - `*_REGION`: S3 region (default: `us-east-1`) - - `*_CHUNKSIZE`: S3 chunk size for multipart uploads. -# CA certificate is only needed if the S3 server has a certificate signed by a private entity - - `*_CACERT`: Certificate Authority (CA) certificate for the storage system +- `*_BUCKET`: The S3 bucket to use as the storage root +- `*_PORT`: S3 connection port (default: `443`) +- `*_REGION`: S3 region (default: `us-east-1`) +- `*_CHUNKSIZE`: S3 chunk size for multipart uploads. +- `*_CACERT`: Certificate Authority (CA) certificate for the storage system, this is only needed if the S3 server has a certificate signed by a private entity and if `*_TYPE` is `POSIX`: - - `*_LOCATION`: POSIX path to use as storage root -### Logging settings: +- `*_LOCATION`: POSIX path to use as storage root - - `LOG_FORMAT` can be set to “json” to get logs in json format. - All other values result in text logging +### Logging settings - - `LOG_LEVEL` can be set to one of the following, in increasing order of severity: - - `trace` - - `debug` - - `info` - - `warn` (or `warning`) - - `error` - - `fatal` - - `panic` +- `LOG_FORMAT` can be set to “json” to get logs in json format. All other values result in text logging +- `LOG_LEVEL` can be set to one of the following, in increasing order of severity: + - `trace` + - `debug` + - `info` + - `warn` (or `warning`) + - `error` + - `fatal` + - `panic` ## Service Description @@ -124,7 +112,7 @@ When running, verify reads messages from the configured RabbitMQ queue (default: For each message, these steps are taken (if not otherwise noted, errors halt progress and the service moves on to the next message. Unless explicitly stated, error messages are *not* written to the RabbitMQ error queue, and messages are not NACK or ACKed.): -1. The message is validated as valid JSON that matches the "ingestion-verification" schema (defined in sda-common). +1. The message is validated as valid JSON that matches the "ingestion-verification" schema. If the message can’t be validated it is discarded with an error message in the logs. 1. The service attempts to fetch the header for the file id in the message from the database. @@ -159,11 +147,8 @@ Otherwise the processing continues with verification: ## Communication - - Verify reads messages from one rabbitmq queue (commonly `archived`). - - - Verify writes messages to one rabbitmq queue (commonly `verified`). - - - Verify gets the file encryption header from the database using `GetHeader`, - and marks the files as `verified` (`COMPLETED` in db version <= 2.0) using `MarkCompleted`. - - - Verify reads file data from archive storage and removes data from inbox storage. +- Verify reads messages from one RabbitMQ queue (commonly `archived`). +- Verify writes messages to one RabbitMQ queue (commonly `verified`). +- Verify gets the file encryption header from the database using `GetHeader`, +and marks the files as `verified` (`COMPLETED` in db version <= 2.0) using `MarkCompleted`. +- Verify reads file data from archive storage and removes data from inbox storage.