-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add 2nd set of Datacite field methods #181
Conversation
Why these changes are being introduced: * Continue refactoring Datacite to use field methods How this addresses that need: * Add field methods and associated private methods for dates, edition, file_formats, format, funding_information, identifiers, languages, links, and locations * Update related_items code block to generate related_identifiers list that was moved into get_identifiers * Add unit tests for new field methods Side effects of this change: * None Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/TIMX-284
assert Datacite.get_file_formats(source_record) is None | ||
|
||
|
||
def test_get_format_success(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The usual tests were deliberately omitted given the method's guaranteed output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looking real good. Left a couple of comments, but nothing major. I'm finding the file, each time, easier and easier to scan.
) | ||
|
||
@classmethod | ||
def _get_dates(cls, source_record: Tag) -> Iterator[timdex.Date]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me if we've already discussed this, but I keep tripping on this private method _get_dates()
and it's relationship to get_dates()
. What about a rename to something like _get_other_dates()
? I realize that "other" isn't very exact, but I think it definitively differentiates from the higher level field method get_dates()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about get _get_datacite_dates
which I think is more accurate? I could also update _get_contributors
to _get_datacite_contributors
since that's what originally prompted the conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks for the reminder, I thought it had come up.
I don't mean to be difficult, but even adding datacite
feels odd... like we know it's datacite already. But I don't really have a better solution.
I suppose one unique property is that we're setting the date kind
dynamically by the attribute dateType
; wonder if that could come into play naming wise? Maybe something like _get_dates_by_datetype_attribute()
? It's long... but it tells you precisely what's going on?
Or we could focus on the element used, like _get_dates_by_date_element()
? or _parse_dates_from_date_elements()
.
Open to pushback here. If folks are okay with _get_dates()
and _get_contributors()
I could be too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like _get_dates_by_date_element
, I think I'll go with that
* Rename methods for clarity
@ghukill @jonavellecuerdo Pushed a new commit based on Graham's comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dig it! Thanks for the changes. At this stage, they feel really understandable. Maybe in "phase 2" we take another pass at naming, once we've seen a lot, but appreciate the changes now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@classmethod | ||
def get_languages(cls, source_record: Tag) -> list[str] | None: | ||
languages = [] | ||
if language := source_record.metadata.find("language", string=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Non-blocking] When do we include the "_element" suffix to a variable?
- Some cases, we just use the field name:
- get_funding_information
- get_languages
- get_edition
- Other cases, we include the suffix "_element":
This is something I've personally been getting tripped up on in the transforms I've been working on --naming conventions assigned to find
/find_all
method calls. These differences don't cause any problems but consistency would be nice...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was using the _element
convention where element attributes are accessed in addition to element.string
so it probably should be used for get_funding_information
as well. To me, it doesn't feel as necessary when we're just grabbing element.string
. Interested in consistency as well so I want to hear everyone's thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the thought process there @ehanson8, and I agree that consistency would be nice, but it doesn't feel like a deal breaker to me with some variation. Given how common the convention is to a) find an element, and then b) use that element variable to extract a string or an attribute value, I think the _element
could get pretty chatty.
It might be worth an issue? Then we could keep moving through this refactor, but make a global pass at some point.
In summary: I'm personally okay with variations, given different times in codebae, but an issue to revist could be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing! I'll apply that thought process as I work on other transforms, @ehanson8 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, created an issue
Purpose and background context
Continue refactoring
Datacite
transform to use field methods.How can a reviewer manually see the effects of these changes?
Run the following command to see that the
Datacite
transform still transforms a source file:Includes new or updated dependencies?
NO
Changes expectations for external applications?
NO
What are the relevant tickets?
Developer
Code Reviewer(s)