-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empirical solution to name representation #18
Comments
Seesm that most of these are easy European-style names. These have
These seem like Japanese names:
Some authors' names are spelled differently in different papers. I'm not sure whether we should preserve this. |
Here's what CSL-JSON expects: "definitions": {
"name-variable": {
"anyOf": [
{
"type": "object",
"properties": {
"family": {
"type": "string"
},
"given": {
"type": "string"
},
"dropping-particle": {
"type": "string"
},
"non-dropping-particle": {
"type": "string"
},
"suffix": {
"type": "string"
},
"comma-suffix": {
"type": ["string", "number", "boolean"]
},
"static-ordering": {
"type": ["string", "number", "boolean"]
},
"literal": {
"type": "string"
},
"parse-names": {
"type": ["string", "number", "boolean"]
}
},
"additionalProperties": false
}
]
}, |
And what CFF expects: "person": {
"additionalProperties": false,
"description": "A person.",
"properties": {
...
"family-names": {
"description": "The person's family names.",
"minLength": 1,
"type": "string"
},
...
"given-names": {
"description": "The person's given names.",
"minLength": 1,
"type": "string"
},
"name-particle": {
"description": "The person's name particle, e.g., a nobiliary particle or a preposition meaning 'of' or 'from' (for example 'von' in 'Alexander von Humboldt').",
"examples": [
"von"
],
"minLength": 1,
"type": "string"
},
"name-suffix": {
"description": "The person's name-suffix, e.g. 'Jr.' for Sammy Davis Jr. or 'III' for Frank Edwin Wright III.",
"examples": [
"Jr.",
"III"
],
"minLength": 1,
"type": "string"
},
... The CFF person record supports other interesting data (e.g. website) that is not strictly related to names. |
@omasanori Would you like to try synthesizing from these schemas and the BibTeX format a name representation that works for the names listed above? |
Yeah, I will try. Thank you so much for your survey, @lassik !
Wow, CSL could distinguish "John Doe, Jr." and "John Doe Jr."
It is probably fine to unify "Friedman, Daniel P" and "Friedman, Daniel P." into one "Daniel P. Friedman", for instance. The situation were awful if we had found "J. McCarthy" since that person could at least be John McCarthy or Jay A. McCarthy in the context of Lisp dialects. In general, if we are confident we can unify but otherwise we should keep as-is. |
On (probably) Japanese names, I found five:
They all follow the Family, Given format so the sorting is okay. |
I wonder how |
How do these look: "Benson Jr, Brent W"
(family "Benson")
(given "Brent" "W")
(suffix "Jr") "Halstead Jr, Robert H"
(family "Halstead")
(given "Robert" "H")
(suffix "Jr") "Steele Jr, Guy L"
(family "Steele")
(given "Guy" "L")
(suffix "Jr") |
Van is... difficult. In some countries, Van shall be ignored as the sorting key, while in other countries Van shall be counted. Whether it is capitalized or not also depends on countries or languages (or usage). Regarding David Van Horn, David always uses capitalized form and BibTeX normally counts capitalized token as part of surname, so, I guess that it is not awfully bad to treat Van Horn as the surname. |
And David does not spell "David V. Horn" so let's keep Van as-is. In most case, their own usages matter. |
Yes, people are the best authority on their own names. If the default sort key is the family name, then the following would suffice. (family "Van Horn")
(given "David") This means that "Horn" never makes sense without the "Van" prefix; the name is always filed under "Van Horn". There is another schemer, Anton van Straaten, who has at least one paper in the bibliography (not yet converted to S-expression metadata). In his name, the "van" is in lowercase. So I don't know whether it's "Straaten, Anton van" or "van Straaten, Anton" (and in the latter case, it could be alphabetized under "v" or "s" - who knows.) |
In BibTeX, the letter one is preferred, as van is a prefix of family name and the sorting ignores it (the von part in the BibTeX terminology) anyways. In CSL terminology, that (ignored in sorting) van is a dropping-particle or a non-dropping-particle. Dropping is whether it should be dropped when family name is displayed alone in, ex. "For details, see [Name, 2023]" vs. "For details, see [van Name, 2023]". |
The command:
grep -h '^(author ' page*.scm | sort | uniq | sed -e 's/^(author //' -e 's/)$//' -e 's/"//g' | grep -v others
gives all the names we have so far:
The text was updated successfully, but these errors were encountered: