Skip to content

Commit

Permalink
merge updates from master
Browse files Browse the repository at this point in the history
  • Loading branch information
tuzov committed Dec 13, 2023
2 parents 39864cf + bf28308 commit 4b9266d
Show file tree
Hide file tree
Showing 40 changed files with 1,377 additions and 855 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/new-question.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: Support Issue
about: Ask for support on running and/or developing LocalEGA
about: Ask for support on running and/or developing FederatedEGA
labels: Support

---
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/aggregate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- run: |
sudo apt-get install -yq aspell
pip3 install -q pyspelling
- uses: actions/checkout@v4
with:
ssh-key: ${{ secrets.SSH_PRIVATE_KEY }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/spellcheck.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:

steps:
- uses: actions/checkout@v4
- uses: rojopolis/spellcheck-github-actions@0.34.0
- uses: rojopolis/spellcheck-github-actions@0.35.0
name: Spellcheck
with:
config_path: .pyspelling.yml
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Source code for core components is available at: https://github.com/neicnordic/s

| Component | Role |
|---------------|------|
| inbox | SFTP, S3 or HTTPS server, acting as a dropbox, where user credentials are fetched from CentralEGA or via LifeScience AAI. [s3inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda/cmd/s3inbox/s3inbox.md) or [sftp-inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda-sftp-inbox/README.md) |
| inbox | SFTP, S3 or HTTPS server, acting as a dropbox, where user credentials are fetched from CentralEGA or via [Life Science AAI](https://lifescience-ri.eu/). [s3inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda/cmd/s3inbox/s3inbox.md) or [sftp-inbox](https://github.com/neicnordic/sensitive-data-archive/tree/main/sda-sftp-inbox/README.md) |
| intercept | The intercept service relays message between the queue provided from the federated service and local queues. **(Required for Federated EGA use case)** |
| ingest | Split the Crypt4GH header and move the remainder to the storage backend. No cryptographic task, nor access to the decryption keys. |
| verify | Decrypt the stored files and checksum them against their embedded checksum. |
Expand Down
4 changes: 4 additions & 0 deletions aggregate-mappings.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
"sda/cmd/intercept/intercept.md": "docs/services/intercept.md",
"sda/cmd/mapper/mapper.md": "docs/services/mapper.md",
"sda/cmd/verify/verify.md": "docs/services/verify.md",
"sda/cmd/s3inbox/s3inbox.md": "docs/services/s3inbox.md",
"sda/cmd/syncapi/syncapi.md": "docs/services/syncapi.md",
"sda/cmd/sync/sync.md": "docs/services/sync.md",
"GETTINGSTARTED.md": "docs/guides/sda-dev-test-doc.md",
"sda/sda.md": "docs/services/sda.md"
}
}
15 changes: 15 additions & 0 deletions aggregate-repositories.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,21 @@ do
fi
done

# add special use case for sda.md links

sed -i -E 's#cmd\/([a-z0-9\-]+)\/#''#g' docs/services/sda.md
git add docs/services/sda.md

# update wordlist
spell_result=$(pyspelling | awk '!/^<context>|^Misspelled|^--|check failed|Spelling check passed/ && NF > 0')

if [ -n "$spell_result" ]
then
echo "$spell_result" >> docs/dictionary/wordlist.txt
sort -u docs/dictionary/wordlist.txt -o docs/dictionary/wordlist.txt
git add docs/dictionary/wordlist.txt
fi

# check if there are any changes
if ! git status | grep 'nothing to commit'
then
Expand Down
14 changes: 6 additions & 8 deletions docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# Contributing guidelines

We thank you in advance :thumbsup: :tada: for taking the time to contribute, whether with *code* or with *ideas*, to the NeIC SDA project.
# Contributing Guidelines

We thank you in advance 👍 🎉 for taking the time to contribute, whether with *code* or with *ideas*, to the NeIC SDA project.

## Did you find a bug?

Expand Down Expand Up @@ -43,7 +42,6 @@ Once the feature is done you can request it to be merged back into `master` by m
Before making the pull request it is a good idea to rebase your branch to `master` to ensure that eventual conflicts with the `master` branch is solved before the PR is reviewed and we can therefore have a clean merge.



### General stuff about git and commit messages

In general it is better to commit often. Small commits are easier to roll back and also makes the code easier to review.
Expand All @@ -69,7 +67,7 @@ Some tips about writing helpful commit messages:
6. Wrap the body at 72 characters.
7. Use the body to explain what and why vs. how.

For an in-depth explanation of the above points, please see [How to Write a Git Commit Message](http://chris.beams.io/posts/git-commit/).
For an in-depth explanation of the above points, please see [How to Write a Git Commit Message](https://chris.beams.io/posts/git-commit/).


### How we do code reviews
Expand Down Expand Up @@ -100,6 +98,6 @@ If it takes long for some partner to review code we try to contact them on slack
Thanks again,
/NeIC System Developers

[searching under Issues]: https://github.com/neicnordic/neic-sda/issues?utf8=%E2%9C%93&q=is%3Aissue%20label%3Abug%20%5BBUG%5D%20in%3Atitle
[open a new one]: https://github.com/neicnordic/neic-sda/issues/new?title=%5BBUG%5D
[template to report a bug]: https://github.com/neicnordic/neic-sda/issues/new?template=bug-report.md
[searching under Issues]: https://github.com/neicnordic/sensitive-data-archive/issues?utf8=%E2%9C%93&q=is%3Aissue%20label%3Abug%20%5BBUG%5D%20in%3Atitle
[open a new one]: https://github.com/neicnordic/sensitive-data-archive/issues/new?title=%5BBUG%5D
[template to report a bug]: https://github.com/neicnordic/sensitive-data-archive/issues/new?template=bug-report.md
102 changes: 51 additions & 51 deletions docs/connection.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
Interfacing with CEGA ⇌ SDA
===========================

All Local EGA instances are connected to Central EGA using
All `FederatedEGA` instances are connected to `CentralEGA` using
[RabbitMQ](http://www.rabbitmq.com), a Message Broker, that allows the
components to send and receive messages, which are queued, not lost, and
resent on network failure or connection problems.

The RabbitMQ message brokers of each SDA instance are the **only**
components with the necessary credentials to connect to Central EGA
components with the necessary credentials to connect to `CentralEGA`
message broker.

We call `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes know as `sda-mq`),
the RabbitMQ message brokers of, respectively, `Central EGA` and `SDA`/`LocalEGA`.
the RabbitMQ message brokers of, respectively, `CentralEGA` and `SDA`/`FederatedEGA`.

Local Message Broker
--------------------
Expand All @@ -29,18 +29,18 @@ The following environment variables can be used to configure the broker:
> We use [RabbitMQ](https://hub.docker.com/_/rabbitmq) >= `3.8.16` including
> the management plugins.
Variable | Description
:--------------------|:----------------------------------------------
`MQ_VHOST` | Default vhost other than `/`
`MQ_VERIFY` | Set to `verify_none` to disable verification of client certificate
`MQ_USER` | Default user (with admin rights)
`MQ_PASSWORD_HASH` | Password hash for the above user
`CEGA_CONNECTION` | DSN URL for the shovels and federated queues with CentralEGA
`MQ_SERVER_CERT` | Path to the server SSL certificate
`MQ_SERVER_KEY` | Path to the server SSL key
`MQ_CA` | Path to the CA root certificate
`MQ_VERIFY` | Require the clients to have valid TLS certificates (`verify_peer`) or do not require clients to have certificates (`verify_none`)
`NOTLS` | Run the server without TLS enabled (default is to run the server with TLS activated)
| Variable | Description |
|:-------------------|:----------------------------------------------------------------------------------------------------------------------------------|
| `MQ_VHOST` | Default vhost other than `/` |
| `MQ_VERIFY` | Set to `verify_none` to disable verification of client certificate |
| `MQ_USER` | Default user (with admin rights) |
| `MQ_PASSWORD_HASH` | Password hash for the above user |
| `CEGA_CONNECTION` | DSN URL for the shovels and federated queues with CentralEGA |
| `MQ_SERVER_CERT` | Path to the server SSL certificate |
| `MQ_SERVER_KEY` | Path to the server SSL key |
| `MQ_CA` | Path to the CA root certificate |
| `MQ_VERIFY` | Require the clients to have valid TLS certificates (`verify_peer`) or do not require clients to have certificates (`verify_none`) |
| `NOTLS` | Run the server without TLS enabled (default is to run the server with TLS activated) |

> NOTE:
> For SDA stand-alone do not use `CEGA_CONNECTION` and do not set up
Expand All @@ -49,7 +49,7 @@ Variable | Description
> would need to be set up to send and recive messages between other
> services.
Central EGA connection
CentralEGA connection
----------------------

`CEGAMQ` declares a `vhost` for each SDA instance. It also creates the
Expand All @@ -70,28 +70,28 @@ amqp[s]://<user>:<password>@<cega-host>:<port>/<vhost>
versioning and is internal to CentralEGA. The queues connected to that
exchange are also internal to CentralEGA.

Name | Purpose
:----------------|:------------------------------------------------
files | Triggers for file ingestion
completed | When files are backed up
verified | When files are properly ingested and verified
errors | User-related errors
inbox | Notifications of uploaded files
| Name | Purpose |
|:----------|:----------------------------------------------|
| files | Triggers for file ingestion |
| completed | When files are backed up |
| verified | When files are properly ingested and verified |
| errors | User-related errors |
| inbox | Notifications of uploaded files |

`LocalMQ` contains two exchanges named `sda` and `to_cega`, and the
following queues, in the default `vhost`:

Name | Purpose
:----------------|:---------------------------------------
archived | Archived files.
completed | Files are backed up
error | User-related errors
files | Receive notification for ingestion from `CEGAMQ` or Orchestrator
inbox | Notifications of uploaded files
ingest | Trigger for file ingestion
mappings | Received Dataset to file mapping
accessionIDs | Receive Accession IDs from `CEGAMQ` or Orchestrator
verified | Files ingested and verified
| Name | Purpose |
|:-------------|:-----------------------------------------------------------------|
| archived | Archived files. |
| completed | Files are backed up |
| error | User-related errors |
| files | Receive notification for ingestion from `CEGAMQ` or Orchestrator |
| inbox | Notifications of uploaded files |
| ingest | Trigger for file ingestion |
| mappings | Received Dataset to file mapping |
| accessionIDs | Receive Accession IDs from `CEGAMQ` or Orchestrator |
| verified | Files ingested and verified |

`LocalMQ` registers `CEGAMQ` as an *upstream* and listens to the
incoming messages in `files` using a *federated queue*. Ingestion
Expand All @@ -102,46 +102,46 @@ Service will wait for messages to arrive.

> NOTE:
> More information can be found also at
> [localEGA](https://localega.readthedocs.io/en/latest/amqp.html#message-interface-api-cega-connect-lega).
> [localEGA repository](https://localega.readthedocs.io/en/latest/amqp.html#message-interface-api-cega-connect-lega) - repository that provides functionality for `FederatedEGA` use case.
`CEGAMQ` receives notifications from `LocalMQ` using a *shovel*.
Everything that is published to its `to_cega` exchange gets forwarded to
CentralEGA (using the routing key based on the name
`files.<internal_queue_name>`). We propagate the different status of the
workflow to CentralEGA, using the following routing keys:

Name | Purpose
---------------------|:-------------------------------------------------
files.completed | For back-up files, ready to be distributed
files.error | In case a user-related error is detected
files.inbox | For inbox file operations
files.verified | For files ready to request accessionID
| Name | Purpose |
|-----------------|:-------------------------------------------|
| files.completed | For back-up files, ready to be distributed |
| files.error | In case a user-related error is detected |
| files.inbox | For inbox file operations |
| files.verified | For files ready to request accessionID |

Note that we do not need at the moment a queue to store the completed
message, nor the errors, as we forward them to Central EGA.
message, nor the errors, as we forward them to `CentralEGA`.

![RabbitMQ setup](./static/CEGA-LEGA.png)

Connecting SDA to Central EGA
Connecting SDA to CentralEGA
-----------------------------

Central EGA only has to prepare a user/password pair along with a
`CentralEGA` only has to prepare a user/password pair along with a
`vhost` in their RabbitMQ.

When Central EGA has communicated these details to the given Local EGA
instance, the latter can contact Central EGA using the federated queue
When `CentralEGA` has communicated these details to the given `FederatedEGA`
instance, the latter can contact `CentralEGA` using the federated queue
and the shovel mechanism in their local broker.

CentralEGA should then see 2 incoming connections from that new LocalEGA
`CentralEGA` should then see 2 incoming connections from that new `FederatedEGA`
instance, on the given `vhost`.

The exchanges and routing keys will be the same as all the other
LocalEGA instances, since the clustering is done per `vhost`.
`FederatedEGA` instances, since the clustering is done per `vhost`.

### Message Format

It is necessary to agree on the format of the messages exchanged between
Central EGA and any Local EGAs. Central EGA's messages are
`CentralEGA` and any `FederatedEGA`s. `CentralEGA`'s messages are
JSON-formatted.

The JSON schemas can be found in:
Expand Down Expand Up @@ -200,14 +200,14 @@ of messages:
- `type=cancel`: an ingestion cancellation
- `type=accession`: contains an accession id
- `type=mapping`: contains a dataset to accession ids mapping
- `type=heartbeat`: A mean to check if the Local EGA instance is
- `type=heartbeat`: A mean to check if the `FederatedEGA` instance is
"alive"

> IMPORTANT:
> The `encrypted_checksums` key is optional. If the key is not present the
> sha256 checksum will be calculated by `Ingest` service.
The message received from Central EGA to start ingestion at a Federated EGA node.
The message received from `CentralEGA` to start ingestion at a Federated EGA node.
Processed by the the `ingest` service.

```javascript
Expand Down
2 changes: 2 additions & 0 deletions docs/css/neic-sda.css
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.wy-table-responsive tbody td {
white-space: normal;
}

.wy-nav-content {max-width: 1000px !important;}
Loading

0 comments on commit 4b9266d

Please sign in to comment.