Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs return standardized and flexible output #815

Conversation

njbrake
Copy link
Contributor

@njbrake njbrake commented Feb 6, 2025

What's changing

Tl;dr this design de-couples the presentation (frontend) layer from the data management/storage layer (backend), as well as the execution layer (job). Although the frontend still has hardcoded references to several keys, they are now organized in such a way that the frontend can implement more dynamic techniques and management of keys when they're ready.

In order to make it easier to add an manage jobs as they are added, this PR creates a loose interface between job result (in the result.json file) and the backend.

The job can create a schema as they please as long as the main structure of the file they save has only the keys "artifacts", "parameters", and "metrics". This allows for easy storage in the tracking interface (e.g. mlflow) and easy access for the frontend.

The design focuses on keeping the jobs as a flexible entity that doesn't require knowledge about how the frontend works. The design also allows for new visualizations of existing data, which means that the frontend is able to make decisions about how to display the data, and it isn't tied to the backend.

If this PR is related to an issue or closes one, please link it here.

Refs #678

How to test it

The unit tests handle testing of the SDK and the backend, but this should be tested by running make local-up and then using the Web UI to upload a dataset, create an experiment, and view all the results.

Additional notes for reviewers

Anything you'd like to add to help the reviewer understand the changes you're proposing.

I already...

  • Tested the changes in a working environment to ensure they work as expected
  • Added some tests for any new functionality
  • Updated the documentation (both comments in code and product documentation under /docs)
  • Checked if a (backend) DB migration step was required and included it if required

@github-actions github-actions bot added backend frontend schemas Changes to schemas (which may be public facing) labels Feb 6, 2025
@github-actions github-actions bot added the sdk label Feb 6, 2025
@njbrake njbrake marked this pull request as ready for review February 6, 2025 17:30
@njbrake
Copy link
Contributor Author

njbrake commented Feb 6, 2025

@khaledosman adding you to this review for the frontend perspective. I ran the UI to ensure that the changes I made to the Vue code don't break anything I could see. I think that these changes should enable you to simplify how you load and organize the data you get back from the backend.

@njbrake njbrake requested review from javiermtorres and removed request for aittalam February 6, 2025 18:02
Copy link
Contributor

@javiermtorres javiermtorres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only a couple of issues:

  • Add another layer between artifacts and the columns (admittedly waivable if we assume this specific jobs will contain only these columns and will only accept one input, but would make it more generic, like having several in/out params)
  • Try to shape the result object fields (params, metrics, etc) into a more "appropriate" type that can be more flexible and at the same time more restrictive. Since I don't have a firm proposal yet :-/ we can leave it for next PR, but just let me know your thoughts on this.

Co-authored-by: javiermtorres <[email protected]>
Signed-off-by: Nathan Brake <[email protected]>
Copy link
Contributor Author

@njbrake njbrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review, Javier! You made some great points 💯

lumigator/backend/backend/services/jobs.py Show resolved Hide resolved
lumigator/schemas/lumigator_schemas/jobs.py Outdated Show resolved Hide resolved
…turn-results-data-in-a-task-agnostic-format-for-frontend-consumption
@njbrake njbrake changed the title Standardize the results file across jobs Jobs return standardized and flexible output Feb 10, 2025
@njbrake
Copy link
Contributor Author

njbrake commented Feb 10, 2025

LGTM. Only a couple of issues:

* Add another layer between `artifacts` and the columns (admittedly waivable if we assume this specific jobs will contain only these columns and will only accept one input, but would make it more generic, like having several in/out params)

Agree that this needs to be improved. I created #835 to track this, because implementing this change will require refactors in a handful of places and I'd like to prevent this PR from getting too unwieldy.

* Try to shape the result object fields (params, metrics, etc) into a more "appropriate" type that can be more flexible and at the same time more restrictive. Since I don't have a firm proposal yet :-/ we can leave it for next PR, but just let me know your thoughts on this.

This is a good comment and I think in a way it will be evergreen: there's this push and pull between making the output type standardized, opinionated, but also flexible. I liked your suggestion of turning the fields into at least Pydantic BaseModels and initializing them as empty dicts if the fields aren't provided; I implemented that change. I think that this will require more refinement down the road, but at least now this gets us into a world where we have three high level buckets to put things in for the SDK and frontend to access: parameters, metrics, and artifacts.

javiermtorres and others added 3 commits February 10, 2025 15:47
…task-agnostic-format-for-frontend-consumption
…ic-format-for-frontend-consumption' of github.com:mozilla-ai/lumigator into 678-backend-should-return-results-data-in-a-task-agnostic-format-for-frontend-consumption
@njbrake
Copy link
Contributor Author

njbrake commented Feb 10, 2025

  • Try to shape the result object fields (params, metrics, etc) into a more "appropriate" type that can be more flexible and at the same time more restrictive. Since I don't have a firm proposal yet :-/ we can leave it for next PR, but just let me know your thoughts on this.

This is a good comment and I think in a way it will be evergreen: there's this push and pull between making the output type standardized, opinionated, but also flexible. I liked your suggestion of turning the fields into at least Pydantic BaseModels and initializing them as empty dicts if the fields aren't provided; I implemented that change. I think that this will require more refinement down the road, but at least now this gets us into a world where we have three high level buckets to put things in for the SDK and frontend to access: parameters, metrics, and artifacts.

Update: I had to undo the suggestion of changing the JobResultOutput fields into Pydantic BaseModels: pydantic models require you to know all of the keys that you want the class to hold, so it can't function as an arbitrary dict. :(

@njbrake njbrake merged commit c815c32 into main Feb 10, 2025
15 checks passed
@njbrake njbrake deleted the 678-backend-should-return-results-data-in-a-task-agnostic-format-for-frontend-consumption branch February 10, 2025 16:12
@javiermtorres
Copy link
Contributor

Now I remember this issue about the use of Base model :( Thanks for trying out anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend frontend schemas Changes to schemas (which may be public facing) sdk
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Backend should return results data in a task agnostic format for frontend consumption
2 participants