-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job output transform #548
base: master
Are you sure you want to change the base?
Job output transform #548
Changes from all commits
07a27e7
3c1ef15
6d23b4c
d85aa30
b02bb01
d7c727f
b3b4f52
dcde84e
d6b8c6c
b3048bb
a23519e
ee9c208
a8d4473
574314c
0304b3a
4cf0e80
81721dd
e66c582
34cd699
fe71427
71f6f96
34ddd87
5fe2af0
9fedeb7
13d3d17
bc14eac
a837ff0
037a8a4
a658579
38314b6
472ac6b
4ee2247
a71187d
e6143b5
6bfb158
7591617
41147b0
3448038
ce4bbb3
9d823e8
969f76b
42c772d
e4c3194
c3b0db7
7544834
a7fb477
bf1118d
3c24142
7c1ff74
e70233e
32bdb81
a7f8216
c017929
2ce3667
f9aee2f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,13 @@ Changes | |
|
||
Changes: | ||
-------- | ||
- Add support for various GeoTIFF formats, allowing flexible handling and representation of GeoTIFFs in outputs | ||
(fixes `#100 <https://github.com/crim-ca/weaver/issues/100>`_). | ||
- Add support for ``GET /results/{id}`` and `` GET /outputs/{id}`` routes to enable direct access to individual | ||
job result items by ID. This enhancement includes: support alternate representations based on the Accept header. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
If an alternate format (e.g., YAML for a JSON source) is requested it will be automatically generated and returned. | ||
Link headers containing all possible output formats, allowing retrieval via query parameters | ||
(e.g., output?f=application/x-yaml). (fixes `#18 <https://github.com/crim-ca/weaver/issues/18>`_). | ||
Comment on lines
+20
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make this a separate item, and apply "- Return Apply the formatting to the example, and use the full |
||
- Add support of *OGC API - Processes - Part 4: Job Management* endpoints for `Job` creation and execution | ||
(fixes `#716 <https://github.com/crim-ca/weaver/issues/716>`_). | ||
- Add `CLI` operations ``update_job``, ``trigger_job`` and ``inputs`` corresponding to the required `Job` operations | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,7 @@ boto3-stubs[s3] | |
# https://github.com/celery/billiard/issues/313 | ||
billiard>2; sys_platform != "win32" # avoid issue with use_2to3 | ||
billiard>3.2,<3.4; sys_platform == "win32" | ||
cairosvg | ||
# pymongo>=4 breaks for some kombu combinations corresponding to pinned Celery | ||
# - https://github.com/crim-ca/weaver/issues/386 | ||
# - https://github.com/celery/kombu/pull/1536 | ||
|
@@ -50,6 +51,7 @@ duration | |
esgf-compute-api @ git+https://github.com/ESGF/[email protected] | ||
# invalid 'zarr' requirement in 'geotiff' dependencies required by 'pywps' fail to install | ||
# (https://github.com/KipCrossing/geotiff/pull/59) | ||
fpdf | ||
geotiff>=0.2.8 | ||
# gunicorn >20 breaks some config.ini loading parameters (paste) | ||
# use pserve to continue supporting config.ini with paste settings | ||
|
@@ -58,6 +60,7 @@ gunicorn>=22 | |
# even more reduced dependency constraints (https://github.com/vinitkumar/json2xml/pull/195) | ||
json2xml==4.1.0 | ||
jsonschema>=3.0.1 | ||
|
||
# FIXME: kombu for pymongo>=4 not yet released as 5.3.0 (only pre-releases available) | ||
# - https://github.com/crim-ca/weaver/issues/386 | ||
# - https://github.com/celery/kombu/pull/1536 | ||
|
@@ -68,13 +71,16 @@ mako | |
# force use of later mistune (https://github.com/common-workflow-language/schema_salad/pull/619#issuecomment-1346025607) | ||
# employed by cwltool -> schema-salad -> mistune | ||
#mistune>=2.0.3,<2.1 | ||
multipagetiff | ||
mypy_boto3_s3 | ||
numpy>=1.22.2,<2; python_version < "3.10" | ||
numpy>=1.22.2; python_version >= "3.10" | ||
# esgf-compute-api (cwt) needs oauthlib but doesn't add it in their requirements | ||
oauthlib | ||
owslib==0.29.3 | ||
pandas | ||
PasteDeploy>=3.1.0; python_version >= "3.12" | ||
Pillow | ||
pint | ||
psutil | ||
# notes: https://github.com/geopython/pygeofilter | ||
|
@@ -102,9 +108,11 @@ pystac | |
pystac_client | ||
python-box | ||
python-dateutil | ||
python-magic | ||
pytz | ||
pywps==4.6.0 | ||
pyyaml>=5.2 | ||
rasterio | ||
rdflib>=5 # pyup: ignore | ||
requests>=2.32.2 | ||
requests_file | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -567,6 +567,14 @@ def test_deploy_process_io_no_format_default(self): | |
expect_outputs["file"]["formats"][0]["default"] = False | ||
expect_outputs["file"]["formats"][1]["default"] = True | ||
expect_outputs["file"]["formats"][2]["default"] = False | ||
# Alternate type added automatically in offering. | ||
alternative_formats = [ | ||
{"mediaType": ContentType.IMAGE_GIF}, | ||
{"mediaType": ContentType.IMAGE_TIFF}, | ||
{"mediaType": ContentType.IMAGE_SVG_XML}, | ||
{"mediaType": ContentType.APP_PDF} | ||
] | ||
expect_outputs["file"]["formats"].extend(alternative_formats) | ||
expect_outputs["file"]["schema"] = { | ||
Comment on lines
+577
to
578
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Not sure if the check is incorrect here, since I've seen other tests where you checked that the schemas were extended... to investigate. |
||
"oneOf": [ | ||
{"type": "string", "format": "binary", | ||
|
@@ -1508,14 +1516,20 @@ def test_deploy_merge_complex_io_with_multiple_formats_and_defaults(self): | |
# assert "default" not in format_spec | ||
|
||
assert proc["outputs"][0]["id"] == "single_value_single_format" | ||
assert len(proc["outputs"][0]["formats"]) == 1 | ||
assert len(proc["outputs"][0]["formats"]) == 4 # Alternative format added in process | ||
assert proc["outputs"][0]["formats"][0]["mediaType"] == ContentType.APP_JSON | ||
assert proc["outputs"][0]["formats"][0]["default"] is True | ||
assert proc["outputs"][0]["formats"][1]["mediaType"] == ContentType.TEXT_CSV | ||
assert proc["outputs"][0]["formats"][2]["mediaType"] == ContentType.APP_XML | ||
assert proc["outputs"][0]["formats"][3]["mediaType"] == ContentType.APP_YAML | ||
assert proc["outputs"][1]["id"] == "single_value_multi_format" | ||
assert len(proc["outputs"][1]["formats"]) == 3 | ||
assert len(proc["outputs"][1]["formats"]) == 6 # Alternative format added in process | ||
assert proc["outputs"][1]["formats"][0]["mediaType"] == ContentType.APP_JSON | ||
assert proc["outputs"][1]["formats"][1]["mediaType"] == ContentType.TEXT_PLAIN | ||
assert proc["outputs"][1]["formats"][2]["mediaType"] == ContentType.APP_NETCDF | ||
assert proc["outputs"][1]["formats"][3]["mediaType"] == ContentType.TEXT_CSV | ||
assert proc["outputs"][1]["formats"][4]["mediaType"] == ContentType.APP_XML | ||
assert proc["outputs"][1]["formats"][5]["mediaType"] == ContentType.APP_YAML | ||
assert proc["outputs"][1]["formats"][0]["default"] is True # mandatory | ||
assert proc["outputs"][1]["formats"][1].get("default", False) is False # omission is allowed | ||
assert proc["outputs"][1]["formats"][2].get("default", False) is False # omission is allowed | ||
|
@@ -3042,10 +3056,12 @@ def test_deploy_merge_complex_io_from_package(self): | |
assert "minOccurs" not in proc["outputs"][0] | ||
assert "maxOccurs" not in proc["outputs"][0] | ||
assert isinstance(proc["outputs"][0]["formats"], list) | ||
assert len(proc["outputs"][0]["formats"]) == 1 | ||
assert len(proc["outputs"][0]["formats"]) == 3 | ||
assert isinstance(proc["outputs"][0]["formats"][0], dict) | ||
assert proc["outputs"][0]["formats"][0]["mediaType"] == ContentType.TEXT_PLAIN | ||
assert proc["outputs"][0]["formats"][0]["default"] is True | ||
assert proc["outputs"][0]["formats"][1]["mediaType"] == ContentType.TEXT_HTML | ||
assert proc["outputs"][0]["formats"][2]["mediaType"] == ContentType.APP_PDF | ||
expect = KNOWN_PROCESS_DESCRIPTION_FIELDS | ||
fields = set(proc.keys()) - expect | ||
assert len(fields) == 0, f"Unexpected fields found:\n Unknown: {fields}\n Expected: {expect}" | ||
|
@@ -3145,15 +3161,23 @@ def test_deploy_merge_complex_io_from_package_and_offering(self): | |
assert isinstance(proc["outputs"], list) | ||
assert len(proc["outputs"]) == 2 | ||
assert proc["outputs"][0]["id"] == "complex_output_only_cwl_minimal" | ||
assert len(proc["outputs"][0]["formats"]) == 1, \ | ||
"Default format should be added to process definition when omitted from both CWL and WPS" | ||
assert len(proc["outputs"][0]["formats"]) == 3, ( | ||
"Default format and alternate formats should be added " | ||
"to process definition when omitted from both CWL and WPS" | ||
) | ||
assert proc["outputs"][0]["formats"][0]["mediaType"] == ContentType.TEXT_PLAIN | ||
assert proc["outputs"][0]["formats"][0]["default"] is True | ||
assert proc["outputs"][0]["formats"][1]["mediaType"] == ContentType.TEXT_HTML | ||
assert proc["outputs"][0]["formats"][2]["mediaType"] == ContentType.APP_PDF | ||
assert proc["outputs"][1]["id"] == "complex_output_both_cwl_and_wps" | ||
assert len(proc["outputs"][1]["formats"]) == 1, \ | ||
"Default format should be added to process definition when omitted from both CWL and WPS" | ||
assert len(proc["outputs"][1]["formats"]) == 3, ( | ||
"Default format and alternate formats should be added " | ||
"to process definition when omitted from both CWL and WPS" | ||
) | ||
assert proc["outputs"][1]["formats"][0]["mediaType"] == ContentType.TEXT_PLAIN | ||
assert proc["outputs"][1]["formats"][0]["default"] is True | ||
assert proc["outputs"][1]["formats"][1]["mediaType"] == ContentType.TEXT_HTML | ||
assert proc["outputs"][1]["formats"][2]["mediaType"] == ContentType.APP_PDF | ||
assert proc["outputs"][1]["title"] == "Additional detail only within WPS output", \ | ||
"Additional details defined only in WPS matching CWL I/O by ID should be preserved" | ||
|
||
|
@@ -3271,9 +3295,11 @@ def test_deploy_literal_and_complex_io_from_wps_xml_reference(self): | |
assert proc["outputs"][1]["description"] == "Collected logs during process run." | ||
assert "minOccurs" not in proc["outputs"][1] | ||
assert "maxOccurs" not in proc["outputs"][1] | ||
assert len(proc["outputs"][1]["formats"]) == 1 | ||
assert len(proc["outputs"][1]["formats"]) == 3 | ||
assert proc["outputs"][1]["formats"][0]["default"] is True | ||
assert proc["outputs"][1]["formats"][0]["mediaType"] == ContentType.TEXT_PLAIN | ||
assert proc["outputs"][1]["formats"][1]["mediaType"] == ContentType.TEXT_HTML | ||
assert proc["outputs"][1]["formats"][2]["mediaType"] == ContentType.APP_PDF | ||
|
||
def test_deploy_enum_array_and_multi_format_inputs_from_wps_xml_reference(self): | ||
body = { | ||
|
@@ -4118,8 +4144,9 @@ def test_execute_single_output_response_raw_reference_literal(self): | |
assert results.content_type is None | ||
assert results.headers["Content-Location"] == results_href | ||
assert ("Link", output_data_link) in results.headerlist | ||
rel_pattern = re.compile(r"rel=\"?([^\"]+)\"?") | ||
assert not any( | ||
any(out_id in link[-1] for out_id in ["output_json", "output_text"]) | ||
any(out_id in rel_pattern.search(link[1]).group(1) for out_id in ["output_json", "output_text"]) | ||
for link in results.headerlist if link[0] == "Link" | ||
), "Filtered outputs should not be found in results response links." | ||
outputs = self.app.get(f"/jobs/{job_id}/outputs", params={"schema": JobInputsOutputsSchema.OGC_STRICT}) | ||
|
@@ -4345,9 +4372,7 @@ def test_execute_single_output_multipart_accept_link(self): | |
}, | ||
} | ||
|
||
# FIXME: implement (https://github.com/crim-ca/weaver/pull/548) | ||
@pytest.mark.oap_part1 | ||
@pytest.mark.xfail(reason="not implemented") | ||
def test_execute_single_output_multipart_accept_alt_format(self): | ||
""" | ||
Validate the returned contents combining an ``Accept`` header as ``multipart`` and a ``format`` in ``outputs``. | ||
|
@@ -4402,22 +4427,24 @@ def test_execute_single_output_multipart_accept_alt_format(self): | |
output_json_as_yaml = yaml.safe_dump({"data": "test"}) | ||
results_body = self.fix_result_multipart_indent(f""" | ||
--{boundary} | ||
Content-Disposition: attachment; name="output_json"; filename="result.yml" | ||
Content-Type: {ContentType.APP_YAML} | ||
Content-Location: {out_url}/{job_id}/output_json/result.yml | ||
Content-ID: <output_json@{job_id}> | ||
Content-Length: 12 | ||
Content-Length: 11 | ||
|
||
{output_json_as_yaml} | ||
--{boundary}-- | ||
""") | ||
results_text = self.remove_result_multipart_variable(results.text) | ||
assert results.content_type.startswith(ContentType.MULTIPART_MIXED) | ||
assert results_text == results_body | ||
for line1, line2 in zip(results_text.splitlines(), results_body.splitlines()): | ||
assert line1 == line2 | ||
outputs = self.app.get(f"/jobs/{job_id}/outputs", params={"schema": JobInputsOutputsSchema.OGC_STRICT}) | ||
assert outputs.content_type.startswith(ContentType.APP_JSON) | ||
assert outputs.json["outputs"] == { | ||
"output_data": "test", | ||
"output_json": { | ||
"href": f"{out_url}/{job_id}/output_json/output.yml", | ||
"href": f"{out_url}/{job_id}/output_json/result.yml", | ||
"type": ContentType.APP_YAML, | ||
}, | ||
} | ||
|
@@ -4426,11 +4453,9 @@ def test_execute_single_output_multipart_accept_alt_format(self): | |
result_json = self.app.get(f"/jobs/{job_id}/results/output_json", headers=self.json_headers) | ||
assert result_json.status_code == 200, f"Failed with: [{resp.status_code}]\nReason:\n{resp.text}" | ||
assert result_json.content_type == ContentType.APP_JSON | ||
assert result_json.text == "{\"data\":\"test\"}" | ||
assert result_json.text == "{\"data\": \"test\"}" | ||
|
||
# FIXME: implement (https://github.com/crim-ca/weaver/pull/548) | ||
@pytest.mark.oap_part1 | ||
@pytest.mark.xfail(reason="not implemented") | ||
def test_execute_single_output_response_document_alt_format_yaml(self): | ||
proc = "EchoResultsTester" | ||
p_id = self.fully_qualified_test_name(proc) | ||
|
@@ -4479,32 +4504,34 @@ def test_execute_single_output_response_document_alt_format_yaml(self): | |
output_json_as_yaml = yaml.safe_dump({"data": "test"}) | ||
results_body = self.fix_result_multipart_indent(f""" | ||
--{boundary} | ||
Content-Disposition: attachment; name="output_json"; filename="result.yml" | ||
Content-Type: {ContentType.APP_YAML} | ||
Content-Location: {out_url}/{job_id}/output_json/result.yml | ||
Content-ID: <output_json@{job_id}> | ||
Content-Length: 12 | ||
Content-Length: 11 | ||
|
||
{output_json_as_yaml} | ||
--{boundary}-- | ||
""") | ||
results_text = self.remove_result_multipart_variable(results.text) | ||
assert results.content_type.startswith(ContentType.MULTIPART_MIXED) | ||
assert results_text == results_body | ||
for line1, line2 in zip(results_text.splitlines(), results_body.splitlines()): | ||
assert line1 == line2 | ||
|
||
outputs = self.app.get(f"/jobs/{job_id}/outputs", params={"schema": JobInputsOutputsSchema.OGC_STRICT}) | ||
assert outputs.content_type.startswith(ContentType.APP_JSON) | ||
assert outputs.json["outputs"] == { | ||
"output_data": "test", | ||
"output_json": { | ||
"href": f"{out_url}/{job_id}/output_json/output.yml", | ||
"href": f"{out_url}/{job_id}/output_json/result.yml", | ||
"type": ContentType.APP_YAML, | ||
}, | ||
} | ||
|
||
# FIXME: implement (https://github.com/crim-ca/weaver/pull/548) | ||
# validate the results can be obtained with the "real" representation | ||
result_json = self.app.get(f"/jobs/{job_id}/results/output_json", headers=self.json_headers) | ||
assert result_json.status_code == 200, f"Failed with: [{resp.status_code}]\nReason:\n{resp.text}" | ||
assert result_json.content_type == ContentType.APP_JSON | ||
assert result_json.text == "{\"data\":\"test\"}" | ||
assert result_json.text == "{\"data\": \"test\"}" | ||
|
||
@pytest.mark.oap_part1 | ||
def test_execute_single_output_response_document_alt_format_json_raw_literal(self): | ||
|
@@ -4571,12 +4598,11 @@ def test_execute_single_output_response_document_alt_format_json_raw_literal(sel | |
}, | ||
} | ||
|
||
# FIXME: add check of direct request of output (https://github.com/crim-ca/weaver/pull/548) | ||
# validate the results can be obtained with the "real" representation | ||
# result_json = self.app.get(f"/jobs/{job_id}/results/output_json", headers=self.json_headers) | ||
# assert result_json.status_code == 200, f"Failed with: [{resp.status_code}]\nReason:\n{resp.json}" | ||
# assert result_json.content_type == ContentType.APP_JSON | ||
# assert result_json.json == {"data": "test"} | ||
result_json = self.app.get(f"/jobs/{job_id}/results/output_json", headers=self.json_headers) | ||
assert result_json.status_code == 200, f"Failed with: [{resp.status_code}]\nReason:\n{resp.json}" | ||
assert result_json.content_type == ContentType.APP_JSON | ||
assert result_json.json == {"data": "test"} | ||
|
||
@pytest.mark.oap_part1 | ||
def test_execute_single_output_response_document_default_format_json_special(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update paths to start with the
/jobs/{jobId}/...
Remove the extra space between
``
andGET
causing bad parsing.