Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove discovery documents that no longer have a submission attached #148

Open
sblack-usu opened this issue Oct 23, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working
Milestone

Comments

@sblack-usu
Copy link
Contributor

When running the dsp report in September, an error occurred blocking the creation of the report. Some investigation found that there was a discoverable record that did not have a Submission attached to it. This is only an issue in creating the report when the provider is Zenodo as there is some metadata on the Submission document that is necessary for the report. This does not affect HydroShare, Earthchem, or external records.

A patch was introduced to the reporting endpoint to check if a submission exists for the discovery document. This fixed the reporting issue but there is still an issue with a discovery document without a Submission document. It look like the Submission document was deleted but the discovery document was not deleted with it. It is worth noting that there are legacy discoverable records that were imported and do not have a Submission attached to them. This is not the case for this record since there are no zenodo records in the set of legacy records.

The deletion should have occurred in the triggers file. It is possible that the deletion could have correlated with a deployment that interrupted the trigger. There is a pretty narrow window where this could occur. There are ways to protect against this by tracking change streams tokens.

We do have a daily job that updates discoverable datasets. The job is not catching this case because it looks at the submissions and then updates the discoverable documents. We could update it to check for non-legacy records that do not have a Submission.

@sblack-usu sblack-usu added the bug Something isn't working label Oct 23, 2024
@horsburgh horsburgh transferred this issue from cznethub/dspback Jan 27, 2025
@horsburgh horsburgh added this to the v1.6.0 milestone Jan 27, 2025
@horsburgh
Copy link

@pkdash and @sblack-usu - I moved this issue from the dsp-back repository. I'm not sure whether it still exists, but if it does I've added it to the next milestone for DSP. If it has already been fixed, you can close it. I assigned this work to @pkdash

@pkdash
Copy link

pkdash commented Feb 18, 2025

@horsburgh I will be looking into this issue this week and if needed will make a fix to resolve it.

@pkdash
Copy link

pkdash commented Feb 19, 2025

@sblack-usu and @horsburgh It seems the daily scheduler already has the code to cleanup discovery records that are missing submission records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants