-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update OpenAlex translators #3379
base: master
Are you sure you want to change the base?
Conversation
@@ -114,7 +114,7 @@ module.exports = { | |||
} | |||
else if (testCase.type === 'search') { | |||
// console.log(JSON.stringify(testCase.input)) | |||
const expected = ['DOI', 'ISBN', 'PMID', 'arXiv', 'identifiers', 'contextObject', 'adsBibcode', 'ericNumber', 'openAlex']; | |||
const expected = ['DOI', 'ISBN', 'PMID', 'arXiv', 'identifiers', 'contextObject', 'adsBibcode', 'ericNumber', 'OpenAlex']; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought I'd align with the way it's usually written and displayed. For instance, arXiv
is not lowercase, DOI
is uppercase, etc. I'm fine with anything in the end, but wouldn't it be better if the label was case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-camelcase key names for DOI, ISBN, and PMID have caused us some grief, which is why we haven't done that with adsBibcode
(ADS Bibcode) or ericNumber
(ERIC number).
if (items[i].OpenAlex && (oaID = cleanOpenAlexID(items[i].OpenAlex))) { | ||
oaIDs.push(oaID); | ||
} | ||
else if (items[i].openAlex && (oaID = cleanOpenAlexID(items[i].openAlex))) { | ||
oaIDs.push(oaID); | ||
} | ||
else if (typeof items[i] == 'string' && (oaID = cleanOpenAlexID(items[i]))) { | ||
oaIDs.push(oaID); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely no need to support multiple capitalizations or string inputs. Callers just need to pass valid input, which is of the form { openAlex: '...' }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to support only one capitalisation (preferably OpenAlex
). However for just the string, I believe the arXiv translator at least also supports strings in addition to objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However for just the string, I believe the arXiv translator at least also supports strings in addition to objects.
Some search translators that have been around for a very long time do, but arXiv doesn't, at least as of the current version. Zotero will always pass a single object, so there's no need to handle anything but single objects anymore.
(The only exception is if we want to allow another translator to call the search translator in some specific nonstandard way, but I think there's usually a way to design around that.)
let openAlexID = data.ids.openalex.match(/W\d+$/i)[0]; | ||
item.setExtra("OpenAlex", openAlexID); | ||
item.libraryCatalog = "OpenAlex"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasoning for this? I think we settled on using openalex.org/
URLs as canonical identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what you settled on in the past, but my suggestion is to focus on the identifier. Sure, OpenAlex stores them with the URL, but even their documentation shows that the actual identifier for works is just the part that starts with W. In the end, it's the same reason why we canonically store just the identifier for arXiv or DOI, even though you can build the URLs with them. Lastly, it's just more terse and legible IMO. In any event, the translator supports searching with both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we used the version with domain is that that's defined as the OpenAlex ID by OpenAlex:
https://docs.openalex.org/how-to-use-the-api/get-single-entities#the-openalex-id
The OpenAlex ID is the primary key for all entities. It's a URL shaped like this: https://openalex.org/<OpenAlex_key>. Here's a real-world example:
I'm not supper opposed to handling this differently, and I can see this looks like it's inconsistent, but for DOIs, PMIDs, and arXiv IDs, the ID explicitly does not include the URL schema, so this isn't the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it help if we internally agreed that we're not storing the OpenAlex ID per se but the OpenAlex key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what we would be doing, but I'm not convinced going with the key is the right choice. Legibility is obviously a matter of taste, but storing identifiers in a format in which they're self-identifying seems like a good idea: DOIs and ISBNs are to by virtue of their format; arXiv IDs canonically include the arxiv: namespace (yes, we do store the openAlex: label in Extra, but that's as part of a key:value pair, not as part of the identifier). PMIDs are a constant problem. OpenAlex keys are marginally better because of the W, but only marginally so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stick with the URLs for backwards compatibility and disambiguation. We can allow searching by bare W[...]
IDs, of course.
Thanks! Most looks good, but I'm confused about a few changes (above). |
Regarding the label: to give more context: I contributed to the diegodlh/zotero-cita#300 extension, where we fetch and store identifiers and references from multiple sources, including OpenAlex. There, we display the identifiers (both "officially" supported ones like arXiv, DOI, ISBN and unofficial ones like OpenAlex, Semantic Scholar's CorpusID, etc.) in editable fields, with a button to open the corresponding URL (just like the standard DOI field). These extra identifiers are stored in the Extra field, and there, capitalization should be somewhat uniform. Regarding storing the URL, the documentation is sometimes a little vague about the key/ID distinction. As they do consider the Key the "unique primary key that identifies a given resource in our database", I think the value stored should definitely be that one. For ease-of-use of the URL, as I mentioned in our use case, the extension constructs the link automatically (just like Zotero does for the DOI). As to the "recognizability" of the identifier, I see your point, but I'm not sure that I agree with regards to "disregarding" the field label. If I know nothing of English literature, seeing the field value "Jane Eyre" I would identify it as an author, and not a book title. Not to go all too hermeneutical, but context is not only helpful to give meaning, but necessary. In other words, I believe the label to be essential to give meaning to most field values, and not just some internal key in a dictionary. Lastly, while it's true that Sorry if I'm sounding/being a bit anal, in the end it won't change the world whatever you/we decide, I just wanted to clarify my reasoning. I'll perform the other changes requested ASAP |
Totally fine - we can store the ID in Extra as |
In
OpenAlex JSON.js
:null
(such astitle
orprimary_location
).In
OpenAlex.js
: