Skip to content

Commit

Permalink
WX-1710 Move option for final output files (#7472)
Browse files Browse the repository at this point in the history
  • Loading branch information
aednichols authored Jul 22, 2024
1 parent bd2bbe5 commit d5cc343
Show file tree
Hide file tree
Showing 30 changed files with 227 additions and 78 deletions.
22 changes: 21 additions & 1 deletion centaur/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,21 @@
For information on Cromwell's Integration Testing Suite, see the [Cromwell documentation on Centaur](https://cromwell.readthedocs.io/en/develop/developers/Centaur/).
For information on Cromwell's Integration Testing Suite, see the [Cromwell documentation on Centaur](https://cromwell.readthedocs.io/en/develop/developers/Centaur/).

### `centaur/src/it`

Classes extending `org.scalatest` that ingest `.test` files and turn them into runnable test suites.

### `centaur/src/main`

#### `/resources`

Collection of `.test` cases. In `test.inc.sh` we map Github Action jobs to case directories with `create_centaur_variables()`. Not all cases are run!

As of July 2024, Centaur searches **recursively** for `.test` files, so they can be placed in subdirectories along with their resources.

#### `/scala`

Functionality to start, stop, and restart the Cromwell server under test. Also contains abstractions for asserting on metadata and workflow outputs.

### `centaur/src/test`

Tests for Centaur itself.
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ abstract class AbstractCentaurTestCaseSpec(cromwellBackends: List[String],
SuccessReporters.getClass

private def testCases(baseFile: File): List[CentaurTestCase] = {
val files = baseFile.list.filter(_.isRegularFile).toList
val files = baseFile.listRecursively.filter(isTestFile).toList
val testCases = files.traverse(CentaurTestCase.fromFile(cromwellTracker))

testCases match {
Expand All @@ -43,6 +43,9 @@ abstract class AbstractCentaurTestCaseSpec(cromwellBackends: List[String],
}
}

private def isTestFile(file: File) =
file.isRegularFile && file.extension.contains(".test")

def allTestCases: List[CentaurTestCase] = {
val optionalTestCases = CentaurConfig.optionalTestPath map (File(_)) map testCases getOrElse List.empty
val standardTestCases = testCases(CentaurConfig.standardTestCasePath)
Expand Down
4 changes: 2 additions & 2 deletions centaur/src/main/resources/reference.conf
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,9 @@ centaur {
genomics.endpoint-url = ${?CROMWELL_BUILD_PAPI_ENDPOINT_URL}
genomics.location = "us-central1"
batch.location = "us-central1"
auth = "Error: BA-6546 The environment variable CROMWELL_BUILD_PAPI_AUTH_MODE must be set/export pointing to a valid auth such as 'application-default'"
auth = "service-account"
auth = ${?CROMWELL_BUILD_PAPI_AUTH_MODE}
json-dir = "Error: BA-6546 The environment variable CROMWELL_BUILD_RESOURCES_DIRECTORY must be set/export pointing to a valid path such as 'target/ci/resources'"
json-dir = "target/ci/resources"
json-dir = ${?CROMWELL_BUILD_RESOURCES_DIRECTORY}
auths = [
{
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"final_workflow_outputs_dir": "gs://cloud-cromwell-dev-self-cleaning-fast"
"final_workflow_outputs_dir": "gs://centaur-ci-us-east1"
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ workflow large_final_workflow_outputs_dir {
# In this case we're copying by using final_workflow_outputs_dir functionality.
#
# Because the file used in the test is large, via the workflow options we copy to
# gs://cloud-cromwell-dev-self-cleaning-fast which is setup with a short lifecycle for deletion of objects.
# gs://centaur-ci-us-east1 which is setup with a short lifecycle for deletion of objects.
#
# See also https://github.com/broadinstitute/rawls/blob/c39049945867d9d6d1bb5e1cbda30a09a19147f7/automation/src/test/scala/org/broadinstitute/dsde/test/api/RawlsApiSpec.scala#L768-L783
#
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: gcpWdlResultsCopying
testFormat: workflowsuccess
tags: ["copyGcp"]

# Will run on a Cromwell that supports any one of these backends
backendsMode: any

# Asserting on the source file `gs://cloud-cromwell-dev-self-cleaning/.../simpleStdoutTask.log` currently fails on Batch.
# This is because Batch does not produce a `simpleStdoutTask.log` and instead sends logs go to Cloud Logging. Burwood is going to add a config to allow the old behavior.
# backends: [Papi, Papiv2, GCPBatch]
backends: [Papi, Papiv2]

files {
workflow: wdlResultsCopying/simpleWorkflow.wdl
options: wdlResultsCopying/gcp/options.json
}

metadata {
status: Succeeded
}

fileSystemCheck: "gcs"
outputExpectations: {
"gs://centaur-ci-us-east1/wf_results/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/output.txt": 1
"gs://centaur-ci-us-east1/wf_logs/workflow.<<UUID>>.log": 1
"gs://centaur-ci-us-east1/cl_logs/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/stderr": 1
"gs://centaur-ci-us-east1/cl_logs/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/stdout": 1
"gs://centaur-ci-us-east1/cl_logs/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/simpleStdoutTask.log": 1
"gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/simpleStdoutTask.log": 1
"gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/output.txt": 1
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
name: gcpWdlResultsCopyingRelative
testFormat: workflowsuccess
tags: ["copyGcp"]

# Will run on a Cromwell that supports any one of these backends
backendsMode: any

# Asserting on the source file `gs://cloud-cromwell-dev-self-cleaning/.../simpleStdoutTask.log` currently fails on Batch.
# This is because Batch does not produce a `simpleStdoutTask.log` and instead sends logs go to Cloud Logging. Burwood is going to add a config to allow the old behavior.
# backends: [Papi, Papiv2, GCPBatch]
backends: [Papi, Papiv2]

files {
workflow: wdlResultsCopying/simpleWorkflow.wdl
options: wdlResultsCopying/gcp/optionsRelative.json
}

metadata {
status: Succeeded
}

fileSystemCheck: "gcs"
outputExpectations: {
"gs://centaur-ci-us-east1/wf_results/output.txt": 1
"gs://centaur-ci-us-east1/wf_logs/workflow.<<UUID>>.log": 1
"gs://centaur-ci-us-east1/cl_logs/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/stderr": 1
"gs://centaur-ci-us-east1/cl_logs/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/stdout": 1
"gs://centaur-ci-us-east1/cl_logs/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/simpleStdoutTask.log": 1
"gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/simpleStdoutTask.log": 1
"gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/output.txt": 1
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: gcpWdlResultsMoving
testFormat: workflowsuccess
tags: ["copyGcp"]

# Will run on a Cromwell that supports any one of these backends
backendsMode: any
backends: [Papi, Papiv2, GCPBatch]

files {
workflow: wdlResultsCopying/simpleWorkflow.wdl
options: wdlResultsMoving/gcp/options.json
}

metadata {
status: Succeeded
}

# The `centaur-ci-us-east1` bucket is in a different region than the workflow runs
fileSystemCheck: "gcs"
outputExpectations: {
"gs://centaur-ci-us-east1/move_destination/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/output.txt": 1
"gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/output.txt": 0
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: gcpWdlResultsMovingFail
testFormat: workflowfailure
tags: ["copyGcp"]

# Will run on a Cromwell that supports any one of these backends
backendsMode: any
backends: [Papi, Papiv2, GCPBatch]

files {
workflow: wdlResultsCopying/simpleWorkflow.wdl
options: wdlResultsMoving/gcp/options_fail.json
}

metadata {
status: Failed
}

# The copy to non-existent bucket failed so the delete should not have happened
# (compare to `gcpWdlResultsMoving.test`)
fileSystemCheck: "gcs"
outputExpectations: {
"gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci/simpleWorkflow/<<UUID>>/call-simpleStdoutTask/output.txt": 1
}
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ name: localWdlResultsCopying
testFormat: workflowsuccess
tags: ["copyLocal"]

ignore: true

files {
workflow: wdlResultsCopying/simpleWorkflow.wdl
options: wdlResultsCopying/local/options.json
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ name: localWdlResultsCopyingRelative
testFormat: workflowsuccess
tags: ["copyLocal"]

ignore: true

files {
workflow: wdlResultsCopying/simpleWorkflow.wdl
options: wdlResultsCopying/local/optionsRelative.json
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"use_relative_output_paths": false,
"final_workflow_outputs_dir": "gs://centaur-ci-us-east1/wf_results",
"final_workflow_outputs_mode": "copy",
"final_workflow_log_dir": "gs://centaur-ci-us-east1/wf_logs",
"final_call_logs_dir": "gs://centaur-ci-us-east1/cl_logs",
"read_from_cache": false,
"write_to_cache": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"use_relative_output_paths":true,
"final_workflow_outputs_dir":"gs://centaur-ci-us-east1/wf_results",
"final_workflow_log_dir":"gs://centaur-ci-us-east1/wf_logs",
"final_call_logs_dir":"gs://centaur-ci-us-east1/cl_logs",
"read_from_cache": false,
"write_to_cache": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"jes_gcs_root": "gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci",
"final_workflow_outputs_dir": "gs://centaur-ci-us-east1/move_destination",
"final_workflow_outputs_mode": "move",
"read_from_cache": false,
"write_to_cache": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"jes_gcs_root": "gs://cloud-cromwell-dev-self-cleaning/cromwell_execution/ci",
"final_workflow_outputs_dir": "gs://non-existent-bucket/move_destination",
"final_workflow_outputs_mode": "move",
"read_from_cache": false,
"write_to_cache": false
}
13 changes: 13 additions & 0 deletions core/src/main/scala/cromwell/core/WorkflowOptions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,19 @@ object WorkflowOptions {
case object FinalCallLogsDir extends WorkflowOption("final_call_logs_dir")
case object FinalWorkflowOutputsDir extends WorkflowOption("final_workflow_outputs_dir")
case object UseRelativeOutputPaths extends WorkflowOption(name = "use_relative_output_paths")
case object FinalWorkflowOutputsMode extends WorkflowOption("final_workflow_outputs_mode") {
// Default to Copy because that was originally the only behavior
def fromString(s: Option[String]): FinalWorkflowOutputsMode =
s match {
case Some("copy") => Copy
case Some("move") => Move
case _ => Copy
}
}

sealed trait FinalWorkflowOutputsMode
case object Copy extends FinalWorkflowOutputsMode
case object Move extends FinalWorkflowOutputsMode

// Misc.
case object DefaultRuntimeOptions extends WorkflowOption("default_runtime_attributes")
Expand Down
2 changes: 1 addition & 1 deletion docs/developers/Centaur.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Centaur is an integration testing suite for the [Cromwell](http://github.com/bro
Centaur expects to find a Cromwell server properly configured and running in server mode, listening on port 8000.
This can be configured by modifying the `cromwellUrl` parameter in `application.conf`.

You can get a build of your current cromwell code with [these instructions](Building.md).
You can get a build of your current Cromwell code with [these instructions](Building.md).
The server can be run with `java -jar <Cromwell JAR> server`, checkout [this page](../CommandLine.md)
for more detailed instructions.
You can now run the tests from another terminal.
Expand Down
16 changes: 9 additions & 7 deletions docs/wf_options/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,19 +75,21 @@ Example `options.json`:
```

## Output Copying
|Option|Value|Description|
|---|---|---|
|`final_workflow_outputs_dir`|A directory available to Cromwell|Specifies a path where final workflow outputs will be written. If this is not specified, workflow outputs will not be copied out of the Cromwell workflow execution directory/path.|
|`use_relative_output_paths`| A boolean | When set to `true` this will copy all the outputs relative to their execution directory. my_final_workflow_outputs_dir/~~MyWorkflow/af76876d8-6e8768fa/call-MyTask/execution/~~output_of_interest . Cromwell will throw an exception when this leads to collisions. When the option is not set it will default to `false`.|
|`final_workflow_log_dir`|A directory available to Cromwell|Specifies a path where per-workflow logs will be written. If this is not specified, per-workflow logs will not be copied out of the Cromwell workflow log temporary directory/path before they are deleted.|
|`final_call_logs_dir`|A directory available to Cromwell|Specifies a path where final call logs will be written. If this is not specified, call logs will not be copied out of the Cromwell workflow execution directory/path.|
|Option| Value | Description |
|---|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|`final_workflow_outputs_dir`| A directory available to Cromwell | Specifies a path where final workflow outputs will be written. If this is not specified, workflow outputs will not be copied out of the Cromwell workflow execution directory/path. |
|`final_workflow_outputs_mode`| `"copy"` or `"move"` | `"copy"` is the default and preserves the source files. `"move"` performs a copy-delete sequence to clean up the source.<br/><br/>Note: as of this writing, the `/outputs` endpoint points to the source location. It is planned that for the `"move"` option only, `/outputs` will point to the destination.
|`use_relative_output_paths`| A boolean | When set to `true` this will copy all the outputs relative to their execution directory. my_final_workflow_outputs_dir/~~MyWorkflow/af76876d8-6e8768fa/call-MyTask/execution/~~output_of_interest . Cromwell will throw an exception when this leads to collisions. When the option is not set it will default to `false`. |
|`final_workflow_log_dir`| A directory available to Cromwell | Specifies a path where per-workflow logs will be written. If this is not specified, per-workflow logs will not be copied out of the Cromwell workflow log temporary directory/path before they are deleted. |
|`final_call_logs_dir`| A directory available to Cromwell | Specifies a path where final call logs will be written. If this is not specified, call logs will not be copied out of the Cromwell workflow execution directory/path. |

Note that these directories should be using the same filesystem as the workflow. Eg if you run on Google's PAPI, you should provide `gs://...` paths.

Example `options.json`:
```json
{
"final_workflow_outputs_dir": "/Users/michael_scott/cromwell/outputs",
"final_workflow_outputs_mode": "copy",
"use_relative_output_paths": true,
"final_workflow_log_dir": "/Users/michael_scott/cromwell/wf_logs",
"final_call_logs_dir": "/Users/michael_scott/cromwell/call_logs"
Expand All @@ -107,7 +109,7 @@ final_workflow_outputs_dir/my_output_picture.jpg
final_workflow_outputs_dir/created_subdir/submarine.txt
```

This will create file collisions in `final_workflow_outputs_dir` when a workflow is run twice. When cromwell
This will create file collisions in `final_workflow_outputs_dir` when a workflow is run twice. When Cromwell
detects file collisions it will throw an error and report the workflow as failed.

## Call Caching Options
Expand Down
Loading

0 comments on commit d5cc343

Please sign in to comment.