[WIP] Allow dataset inputs via {src: "url", ...} dictionaries. #18797

jmchilton · 2024-09-10T12:47:32Z

I'm trying to keep it highly aligned with the data fetch API. Hence {"src": "url", ...} and not {"src": "uri", ....} - also "ext" instead "file_type" and "dbkey" instead of "genome_build". I'll try to implement all the rest of the options there. So far I have implemented deferred: true/false though in both tools and workflows. The implementation is slightly different for deferred and undeferred datasets and the implementations of these is discussed more below.

Tool Implementation

The tool implementation requires the new/unmerged "tool request" API (xref [WIP] Implement Tool Request API #18745).
This can work by keeping the HDAs deferred until the job or by "materializing" them in the Celery task and handing them off as normal ("ok") HDAs to the rest of the job creation/tool execution stack. There are advantages to both approaches but keeping them deferred isn't going to be super useful for main until we make Pulsar work with deferred datasets.
Building on the tool state work was a joy compared basic.py and everything is very regimented and validated at each step. It is great to see the idea play out with a new application so cleanly.
The celery task is getting a bit bulky. Even with all the new features added - this is still way better than what is happening a web thread currently because it is in Celery workers but I fun future direction might be to use task chaining to break the processing happening in queue_jobs into multiple tasks.

Workflow Implementation

The workflow mode just creates the needed HDAs during the in web process workflow queueing process. This isn't intrinsically any more computationally intensive than any of the operations related to copying inputs into a new history or copying library datasets into the history - all of which currently happens in the web thread.
For deferred datasets, these datasets are ready to go to job handlers as is. If the datasets aren't sent to the API as deferred, there is a new step in workflow invocation iterations that is responsible for "materializing" the datasets. The workflow invocation is created in a "requires_materialization" state - it will never be set back to this state so this shouldn't slow down scheduling a workflow invocation after its first iteration.

Other Work & Broader Context

Builds on the tool request API (formerly #18745 part of the structured tool state work #17393).

I've also started work on the tool and workflow "landing" concept at 08b6ed9. If we can create landing requests with URLs supplied for inputs - external sites will be able to provide really nice launch points into tools and workflows with their own hosted data.

That said... being able to just use these APIS without needing to understand how to "fetch" or "upload" data into Galaxy is a really nice win in its own right - this work doesn't depend on the "landing" concept to be useful.

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

jmchilton added 6 commits September 8, 2024 19:49

Refactor input_models -> parameters (fix TODO, a lot cleaner)

9e1e5d4

Type enhancement for for test case api interactor

11b6637

Fix stray xml attribute that does nothing.

b2914c2

Rework test case validation schema.

e3da665

Is it a bug that column params need to be strings?

e134910

Tool Request API.

30ddcdd

jmchilton added kind/enhancement area/API labels Sep 10, 2024

jmchilton force-pushed the tool_request_api branch from b5fc2e6 to 6b7ec05 Compare September 10, 2024 16:40

jmchilton added 3 commits September 10, 2024 16:36

Database migration for tool request API.

643e3a0

Rebuild schema for tool request APIs...

192da5e

Implement {src: "url"} on the new request API...

76ca391

jmchilton force-pushed the tool_request_api branch from 6fbaed0 to 339bdfd Compare September 10, 2024 20:36

Allow workflows to consumer deferred {src: url} dicts..

fff7137

jmchilton force-pushed the tool_request_api branch from 339bdfd to fff7137 Compare September 10, 2024 20:39

jmchilton closed this Sep 17, 2024

jmchilton mentioned this pull request Dec 2, 2024

Tool Request API #19231

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Allow dataset inputs via {src: "url", ...} dictionaries. #18797

[WIP] Allow dataset inputs via {src: "url", ...} dictionaries. #18797

jmchilton commented Sep 10, 2024 •

edited

Loading

[WIP] Allow dataset inputs via {src: "url", ...} dictionaries. #18797

[WIP] Allow dataset inputs via {src: "url", ...} dictionaries. #18797

Conversation

jmchilton commented Sep 10, 2024 • edited Loading

Tool Implementation

Workflow Implementation

Other Work & Broader Context

How to test the changes?

License

jmchilton commented Sep 10, 2024 •

edited

Loading