Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecarton/cumulus 3751 from 18.5.2 #3900

Open
wants to merge 261 commits into
base: release-18.5.x
Choose a base branch
from

Conversation

etcart
Copy link
Contributor

@etcart etcart commented Jan 10, 2025

Summary: task which takes granules and a target collection, updating granules to belong to that target collection in s3 and cumulus datastore, idempotently

Addresses CUMULUS-3751: Workflow task that updates a granule to a new collection

Changes

  • adds new task to update S3 and cumulus data stores moving granules across collections
  • adds integration tests and associated resources in the example project
  • re-distributes ecs cluster resources among ecs tasks in example project

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

cumulus-bot and others added 30 commits September 19, 2024 14:02
* Fix isThrottlingException function to check error name

* update changelog and add name/code check in errors

* linter fix

* changelog

* typo fix

---------

Co-authored-by: Hailiang Zhang <[email protected]>
Co-authored-by: etcart <[email protected]>
* Update deployment templates for Aurora Serverless v2 (#3623)

* update CL

* update terraform templates to serverless v2

* add terraform variable validation

* remove upgrade variables

* add prevent_destroy = true

* add prevent_destroy = true

* CUMULUS-3670 Develop upgrade/migration process Aurora Serverless v1 to v2 (#3643)

* remove prevent_destroy to allow automated CI migrations

* set force_ssl = 0 (#3658)

Co-authored-by: Tim Clark <[email protected]>

* [CUMULUS-3671]: Update docs for Serverless V2 (#3666)

* initial commit

* serverless v2 doc updates

* Update serverless V2 docs

* Fix lint issue

* set DISABLE_PG_SSL: true to support CI

* fix lint error

* set disableSSL = true

* remove DISABLE_PG_SSL

* set rejectUnauthorized: 'false'

* update CL for v2 changes

* fix changelog

* add migration notes to changelog, add v2 docs to sidebar

* fix changelog

---------

Co-authored-by: Tim Clark <[email protected]>
Co-authored-by: Nate Pauzenga <[email protected]>
)

* Update AWS errors to use the V3 error classes

* Fix lint

* Import aws sdk directly to avoid circular dep

* Update CL

* Remove module in favor of aws imports directly

* Revert change to ThrottlingException error type

* Add comments

* Fix lint

* Remove unnecessary dependency

* add debug logging for CI

* update type and debug comment

* temporarily revert to name checks

* Remove logging and type check on conditional exception.

Instance of does not work in this case. I believe we're calling the service "dynamodbDocClient" using the non-V3 syntax.

* Fix lint

* Update tests to throw correct aws-sdk error

* Update tests with new aws-sdk error types

* Import error type correctly

* Correctly import sfn error

* Instantiate errors like I know what I'm doing

* Basic syntax 🤦

* update tests

* Remove unnecessary comment

* fixup for clarity

* Update test for clarity

* Update test fixture and logging for consistency
* Allow override of sfEventSqsLambda timeout with associated queue adjustments

* Update CHANGELOG

* Respond to PR feedback

* Update per PR feedback request
* CUMULUS-3906 - Update to ORCA v10.0.0

* Resolved CL conflict.

* Removes required wording for 3906 from CL
* Fixes merge conflict

* Adds diff link for v18.5.0
…e-granules-cmr-metadata-… (#3791)

* Added excludeFileRegex configuration to update-granules-cmr-metadata-file-links (#3790)

Updated tests to exercise new file-exclusion feature

* linter fixes

* remove explicit null for un-found regexpattern

* switch to logging when no excludable files found

* changelog broken into multiple lines

* linter fixes in changelog

* name in changelog after lambda function name

* remove TODO. non-mocked is a truer representation of function

* small refactor

* typo in passthrough of fileregex

* nyc values with new tests

* version requirement update

* fixed merge weirdness

* fix jsonpath in the other places it's flagged

* remove unneeded explicit pin in aws-client

* check like instead of deepequal on credentials return

---------

Co-authored-by: Mike Dorfman <[email protected]>
* Fix cumulus versions due to bad merge

* Update aws-sdk versions to revert bad merge
* update dependencies to latest cma, cma-python, cumulus-process

* changelog

* fixed shas in locks

* whitelist jsonpath for buiggy audit behavior

* remove incorrect changelog entry

---------

Co-authored-by: etcart <[email protected]>
* CUMULUS-3891: Add fastGet download option to sftp data file download

* add sftpFastDownload config

* fix fastDownload boolean vs string

* add unit tests

* fix aws-client services unit test

* test SFTP_DEBUG

* remove only

* add changelog entry

* remove unused code

* remove jsonpath from common

* update latest-version and add jsonpath-plus to audit-ci

* serial

* update readme remove serial

* add sftp test
…3830)

* Update Orca version

* Update orca variables for v10 release series

* Update orca var to default value

* Update Orca version to official release
…ync-granules (#3823)

* Iniital commit updating sync-granule behavior

* Clean up comments

* Update schema config to match changes

* Update typings

* Fix sync-granules typing

* Update config docs to true

* Update task README

* Update CHANGELOG

* Minor fix

* Update spec tests with new default hashed granId path

* Update @cumulus/types to allow for explicit export of api/collections

* Abstract typings to seperate file

* Update _ingestGranule param based on PR feedback

* Fix unit test not updated on merge

* Add method unit tests for collection(name/version)From methods

* Fix integration helper

* Add hashed path to SyncGranules
…zed (#3832)

* CUMULUS-3919:Added terraform variables disableSsl and rejectUnauthorized

* disableSsl->disableSSL
…s services. (#3838)

* Allowing force_new_deployment to be configurable for ecs services.

* Update CHANGELOG

---------

Co-authored-by: Michael Hall <[email protected]>
* guarantee non-numeric nonNumericString

* bringing in pg8.13 snyk suggestion

* trying to kcik up the sync-granules task error

* trying to get repeatable tries to sync-granules

* focus on just the important part in syncgranule

* trying to get publish to work

* does only this need to change to import?

* move to import function import

* reverting bad code for testing reasons

* changelog

* keep string length the same, no reason to twiddle this
* Release 19.1.0 (#3816)

* version bump

* Update CL

* Update docs

* Update CL link

* Add note for clarity

* update missed deps

* Ecarton/cumulus 3928 imf work (#3831) (#3841)

* guarantee non-numeric nonNumericString

* bringing in pg8.13 snyk suggestion

* trying to kcik up the sync-granules task error

* trying to get repeatable tries to sync-granules

* focus on just the important part in syncgranule

* trying to get publish to work

* does only this need to change to import?

* move to import function import

* reverting bad code for testing reasons

* changelog

* keep string length the same, no reason to twiddle this

Co-authored-by: etcart <[email protected]>

* fix no top level await (#3843) (#3844)

Co-authored-by: etcart <[email protected]>

---------

Co-authored-by: etcart <[email protected]>
* Update CHANGELOG from release 18.5.1

* Add CHANGELOG footer

* Fix CHANGELOG
* Passed through sqs_message_consumer_watcher_time_limit and sqs_message_consumer_watcher_message_limit

* Added a changelog entry for CUMULUS-3904

---------

Co-authored-by: mikedorfman <[email protected]>
* CUMULUS-3876:Fix S3 Replicator cross region bucket writes

* add target_region variable

* fix typing

* add unit test

* add target_region to deployment

* covert to ts

* add s3-replicator.tf

* update tsconfig and webconfig

* remove s3-replicator example

* update package.json

* update changelog

* fix types

* add s3-replicator.tf.example

* update typo
* Fix 2564 docs

* Update sync-granules doc to be correct

const activityStep = new ActivityStep();

describe('The MoveGranuleCollection workflow using ECS', () => {
Copy link
Contributor

@jennyhliu jennyhliu Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the source and target collections? I can't tell from the specs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the difference between MoveGranuleCollectionWorkflowSpec and MoveGranuleCollectionsSpec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to run movegranulecollectionsSoec as a test of the functionality of the lambda, while the workflow is meant to run the workflow as it would exist with example data, to be added to with CMR, lizards etc when those tickets are done. Right now the only functional difference is that it tests the lambda in the ECS deployment


const getSourceCollection = (sourceUrlPrefix) => (
{
files: [
Copy link
Contributor

@jennyhliu jennyhliu Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be part of the collection configuration, and shouldn't need to specify here. We ingest different granules/files to s3 to avoid conflicts.

}
);

const getTargetCollection = (targetUrlPrefix) => ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: This should be part of the target collection configuration

packages/api-client/src/granules.ts Show resolved Hide resolved
packages/cmrjs/src/types.ts Outdated Show resolved Hide resolved
| ---------- | ---- | ------- | ------ | -----------

| buckets | object | (required) | | Object specifying AWS S3 buckets used by this task
| collection | object | (required) | | The cumulus-api collection object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 'target' collection, right? Should this be part of the task input? Since the task can be invoked to move granules to different collections.

tasks/move-granule-collections/src/index.ts Show resolved Hide resolved
};
}

async function buildTargetGranules(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So buildTargetGranules does not only build but also update granule metadata in s3?
Where is the metadata got removed?
I'm confused with the sequence of events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds the granule records as they should exist once we're done, so that they can later be used as a roadmap for the updates in s3 and pg

@etcart
Copy link
Contributor Author

etcart commented Jan 20, 2025

I'm thinking of pulling the example ecs deployment here. it was put in place to allow for exceeding 15 minutes (and other potential resource constraints). but splitting into subsets of granule_ids should allow us to control for that, an date ecs example adds (unnecessary?) complexity to this pr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see other tasks/* have this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to avoid the start of the workflow triggering the granule updates, since the input payload has granules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would a simple. rename to "granuleIds" solve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants