Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backend should return results data in a task agnostic format for frontend consumption #678

Closed
1 task done
njbrake opened this issue Jan 17, 2025 · 4 comments · Fixed by #815
Closed
1 task done
Assignees
Labels
api Changes which impact API/presentation layer backend enhancement New feature or request schemas Changes to schemas (which may be public facing) sdk

Comments

@njbrake
Copy link
Contributor

njbrake commented Jan 17, 2025

Motivation

I believe this is also a relevant issue for the task that @HareeshBahuleyan is working on:

As I understand it currently, the metric names and settings are hardcoded into the frontend, e.g. https://github.com/mozilla-ai/lumigator/blob/main/lumigator/frontend/src/components/molecules/LExperimentResults.vue#L69

As we support new metrics and custom user metrics in the future, I'm thinking that the frontend should be able to remain ignorant about what the specific metric is.

In order to support this, I think we need to restructure the file whose path we provide to the frontend, which relates to #670.

Currently, the file that the eval job creates and saves to S3 (which then the backend creates a presigned url so that the frontend can download the file) looks like this:

{
  "rouge": {
    "rouge1": [
      0.27272727272727276,
      0.27586206896551724
    ],
    "rougeLsum": [
      0.27272727272727276,
      0.20689655172413796
    ],
    "rouge1_mean": 0.27429467084639503,
    "rougeLsum_mean": 0.23981191222570536
  },
......
  "bertscore": {
    "precision": [
      0.8871338963508606,
      0.8618351817131042
    ],
    "recall": [
      0.8770406246185303,
      0.8888148069381714
    ],
...
    "hashcode": "hashymchashhash",
    "precision_mean": 0.8744845390319824,
    "recall_mean": 0.8829277157783508,
  },
  "predictions": [
    "a summary",
    "a summary2"
  ],
  "ground_truth": [
    "the real summary",
    "the real summary2"
  ]
} 

Can we re-org this to something that looks more like

{
  "metrics": {
    "rouge1": {
      "displayName": "ROUGE-1",
      "values": [
        0.27272727272727276,
        0.27586206896551724
      ],
      "mean": 0.27429467084639503
    },
    "rougeL": {
      "displayName": "ROUGE-L",
      "values": [
        0.27272727272727276,
        0.27586206896551724
      ],
      "mean": 0.27429467084639503
    },
    "bertscorePrecision": {
      "displayName": "BERTScore Precision",
      "values": [
        0.8871338963508606,
        0.8618351817131042
      ],
      "mean": 0.8744845390319824
    },
    "bertscoreRecall": {
      "displayName": "BERTScore Recall",
      "values": [
        0.8770406246185303,
        0.8888148069381714
      ],
      "mean": 0.8829277157783508
    }
  },
  "artifacts": {
    "predictions": [
      "a summary",
      "a summary2"
    ],
    "ground_truth": [
      "the real summary",
      "the real summary2"
    ]
  },
  "parameters": {
    "max_input_length": 512,
    "max_output_length": 150,
    "num_beams": 4,
    "length_penalty": 2.0,
    "early_stopping": true
  }
}

This would be the first step towards a more structured output. As I understand it, the backend is agnostic about what is in the output result.json from a job, and the frontend relies upon the job to create the result.json in the right format. My thought was that it would help if the backend was responsible for ensuring that the job output conforms to the standards that the frontend requires.

I think(?) this should help to make our code more flexible to handle new tasks.

In order to make this backwards compatible at first, we can overhaul each job to save both the existing result.json file in addition to a new file that conforms to our new agreed upon format. This way we don't need to as tightly coordinate the backend with frontend changes.

Alternatives

This is more of an incremental step. We could try to take a more massive bite and try to completely overhaul how files are saved and provided to the frontend, for example, saving and providing 3+ files to the frontend instead of the single results.json.

Contribution

Happy to help work on this if this direction sounds good.

Have you searched for similar issues before submitting this one?

  • Yes, I have searched for similar issues
@njbrake njbrake added enhancement New feature or request sdk backend api Changes which impact API/presentation layer schemas Changes to schemas (which may be public facing) labels Jan 17, 2025
@njbrake
Copy link
Contributor Author

njbrake commented Jan 17, 2025

I think @george-mzai is the right person to ask about what he needs the new file to contain so that he can access everything he needs to display the content? I'm thinking that this restructure will make your life on the frontend much easier, but let me know if that's not the case 😆

I think @peteski22 may be the right person to pull in to ask about whether this API makes sense from the backend perspective? Right now as I understand it, the backend doesn't do any validation of the artifact result.json created by a job, I'm not sure where the validation should go.

I think @ividal or @veekaybee or @aittalam may be the ones to comment about whether this makes sense to alter the job output to conform to this output?

@ividal ividal changed the title [FEATURE]: Decouple metric names from the frontend Decouple metric names from the frontend Jan 22, 2025
@njbrake njbrake assigned njbrake and unassigned njbrake Jan 22, 2025
@njbrake njbrake changed the title Decouple metric names from the frontend Backend return results data in a task agnostic format for frontend consumption Jan 22, 2025
@njbrake njbrake changed the title Backend return results data in a task agnostic format for frontend consumption Backend should return results data in a task agnostic format for frontend consumption Jan 22, 2025
@njbrake njbrake self-assigned this Jan 22, 2025
@ividal
Copy link
Contributor

ividal commented Jan 24, 2025

My 2 cents, this makes perfect sense.

Two upcoming bigger bodies of work will be:

  • Supporting Translation - this can start with metrics we already support, but it will likely make sense to extend to COMET or others.
  • Supporting task-agnostic metrics (e.g. related to robustness, persona drift...). It´s indeed not practical at all to have to change hardcoded values in UI and BE for it, when the UI could just read what metrics were returned.

We just need to keep an eye out on how we show experiments that contain different sets of metrics (e.g. once we have summarization, translation, etc.).

@aittalam
Copy link
Member

I think having task-agnostic visualization of results will help a lot with generalizing our evals! What'd be the best place to store this mapping information?

  • currently, all the jobs that run under Ray are agnostic of what happens in the front-end (as ppl might call them via the SDK or directly through the API). I think it makes sense as requiring ppl who want to write a new job to also know about the UI raises the barrier to participation

  • if we want to change something in the presentation of results and have viz information saved together with the results, we won't be able to automatically apply the new mappings to old data

  • if we opt for having this in the backend, any small change in the UI (e.g. a label is too large to fit a display, so a shorter text has to be provided) will require a change in the backend code. This is probably not a deal-breaker (as it will likely be a configuration file as it would be in frontend), but I wanted to raise this because IMO it makes FE/BE more coupled than needed

WDYT?

@njbrake
Copy link
Contributor Author

njbrake commented Feb 5, 2025

Thank you for the feedback! You made good points that I hadn't considered. It sounds like:

  • The design should focus on keeping the jobs as a flexible entity that doesn't require knowledge about how the frontend works.
  • The design should allow for new visualizations of existing data, which means that the frontend should be able to make decisions about how to display the data, and it shouldn't be tied to the backend.
    Tl;dr it sounds like the ideal design will de-couple the presentation (frontend) layer from the data management/storage layer (backend), as well as the execution layer (job).

With this in mind, it seems like the design I propose in the original post on this thread is mostly good, with the exception that I shouldn't add that displayName key. I'll take a crack at pulling something together for this 👍

@njbrake njbrake linked a pull request Feb 6, 2025 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Changes which impact API/presentation layer backend enhancement New feature or request schemas Changes to schemas (which may be public facing) sdk
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants