Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Datalad-Registry with datalad-usage-dashboard #287

Merged
merged 39 commits into from
Jan 17, 2024

Conversation

candleindark
Copy link
Collaborator

@candleindark candleindark commented Jan 12, 2024

This PR does the following.

  1. Implements a periodic Celery task for ensuring all active repos presented in datalad-usage-dashboard are registered in Datalad-Registry (excepts repos from OSF which will be supported later). This PR closes Automate (cron job script) getting updates from the datalad-usage-dashboard #271.
  2. Provides a script to invoke the above Celery task on demand. Thus, an empty instance of Datalad-Registry can be populated with all the active repos presented in the usage dashboard by running this script.
  3. Refactor some of the Pydantic models used by tools.populate.py so that these models can be used by the aforementioned Celery task as well.
  4. responses will be added as a dependency for testing for the purpose of mocking out the requests library.

Note: Changes of this PR has been applied to the Typhon instance.

With the one defined in `datalad_registry.tasks.utils.usage_dashboar`
With the one defined in `datalad_registry.tasks.utils.usage_dashboar`
For referring to the path of the dataset URLs resource
on the DataLad Registry instance relative to the base API
endpoint of the instance.
The definition is defined in
`datalad_registry.blueprints.api.dataset_urls`
From `datalad_registry.blueprints.api.dataset_urls` to
`datalad_registry.blueprints.api`
So that organization is consistent with the definition of
`DATASET_URLS_PATH`
mypy does like the default to be just plain str
Copy link

codecov bot commented Jan 13, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (ac489e8) 98.74% compared to head (9078f66) 98.73%.
Report is 1 commits behind head on master.

Files Patch % Lines
datalad_registry/tasks/utils/usage_dashboard.py 95.12% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #287      +/-   ##
==========================================
- Coverage   98.74%   98.73%   -0.02%     
==========================================
  Files          48       50       +2     
  Lines        2237     2366     +129     
==========================================
+ Hits         2209     2336     +127     
- Misses         28       30       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@candleindark candleindark added the enhancement New feature or request label Jan 15, 2024
@candleindark candleindark marked this pull request as ready for review January 16, 2024 17:43
@@ -631,3 +636,101 @@ def chk_url_to_update(
db.session.commit()

return ChkUrlStatus.OK_UPDATED if is_record_updated else ChkUrlStatus.OK_CHK_ONLY


class FailedSubmission(TypedDict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why TypedDict and not dataclasses which seems to be more common construct pattern to use?

Copy link
Collaborator Author

@candleindark candleindark Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I used type TypedDict is because a dictionary displays better in the Flower interface. I don't have fine control of how the result is displayed by Flower. Modifying __str__ or __repr__ didn't work, possibly due to serialization in the transmission from the worker service to the Flower service. Thus, I picked a type that displays more nicely by default.


class FailedSubmission(TypedDict):
"""
A TypedDict a failed submission of a repo URL to Datalad-Registry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a sentence... not sure if worth stating TypedDict type here instead of purpose only

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, stating TypedDict here is unnecessary. The docstring has been updated.

@yarikoptic
Copy link
Member

@jwodder please review

@candleindark
Copy link
Collaborator Author

@yarikoptic Is it OK to merge now?

@yarikoptic
Copy link
Member

sure... I didn't fully grasp why we need so many changes but if it accomplishes the drill -- let's proceed!

@yarikoptic yarikoptic merged commit f6cb03c into datalad:master Jan 17, 2024
4 of 6 checks passed
@candleindark candleindark deleted the usage-dashboard-sync branch January 18, 2024 01:29
@candleindark
Copy link
Collaborator Author

sure... I didn't fully grasp why we need so many changes but if it accomplishes the drill -- let's proceed!

I think the additional changes you are referring to are those resulted from the refactoring of the tools/populate.py, testing with responses for mocking the requests calls, and the additional script to invoke the Celery tasks on demand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automate (cron job script) getting updates from the datalad-usage-dashboard
3 participants