-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arxivce 1273 browse refactor #240
Conversation
arxiv/base/config.py
Outdated
|
||
CLASSIC_DB_URI = os.environ.get("CLASSIC_DB_URI", DEFAULT_DB) | ||
LATEXML_DB_URI = os.environ.get("LATEXML_DB_URI", DEFAULT_LATEXML_DB) | ||
ECHO_SQL = os.environ.get("ECHO_SQL", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we would want the default ECHO_SQL to be true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We very much do not want ECHO_SQL default enabled.
arxiv/config/__init__.py
Outdated
|
||
Usually the same as BASE_SERVER but can be configured. | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for completeness RSS also has a server (although I doubt anyone outside of RSS cares)
arxiv/db/models.py
Outdated
|
||
|
||
|
||
t_arXiv_bogus_subject_class = Table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are some tables and some classes for the models?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few reasons this happens. One example is that some tables don't have primary keys and therefore can't be orm models. Another reason is that sqlacodegen decides to make tables that represent many-to-many relationships tables instead of orm models.
This is more of a draft PR. Lots of stuff left to do here:
|
arxiv/identifier/iteration.py
Outdated
|
||
with get_db() as session: | ||
self.query = session.query(Metadata.paper_id, Metadata.version) \ | ||
.filter(Metadata.paper_id >= start_yymm) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi this will only work for papers after march 2007
arxiv/identifier/iteration.py
Outdated
.filter(Metadata.paper_id >= start_yymm) \ | ||
.filter(Metadata.paper_id < f'{end_yymm}.999999') | ||
if categories: | ||
self.query = self.query.filter(or_(*[Metadata.abs_categories.like(f'%{category.id}%') for category in categories])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible hiccup here due to some entries not including the most modern/relevant name of their category. Technically this code is fine, but to find every paper whatever calls this would have to include all possible category names (both versions of aliases and subsumed arxivs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I wrote this piece of code sort of carelessly, and I don't know what it would be useful for. I'd like to search/sort by announce date but that doesn't seem easy with the columns available to me. I may just delete it and write again when I start work on converting the full corpus to HTML
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ive spent way too much time figuring how to search for all papers of a given demographic. LMK when you work on that
arxiv/py.typed
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding empty file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tells mypy in other packages to observe the type hints from this package
arxiv/taxonomy/definitions.py
Outdated
@@ -304,7 +318,14 @@ | |||
'test': date(2010, 1, 1) | |||
} | |||
|
|||
CATEGORIES = { | |||
class tCategory(TypedDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote for combining these with the Archive Category and Group classes. Anything trying to use them will want the features of both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about my not-so-curret docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some minor changes requested. Mostly I've left comments.
arxiv/base/config.py
Outdated
|
||
CLASSIC_DB_URI = os.environ.get("CLASSIC_DB_URI", DEFAULT_DB) | ||
LATEXML_DB_URI = os.environ.get("LATEXML_DB_URI", DEFAULT_LATEXML_DB) | ||
ECHO_SQL = os.environ.get("ECHO_SQL", True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We very much do not want ECHO_SQL default enabled.
Now with typed data. classes can also point to their parents and children
incorporate get cannonical display
filled out alternate names for category versions of subsumed archives
Combining tcategory and category
…d the str canonical property canonical_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This currently has a bug in it with the new category structure, I've fixed it in a branch that hasn't been pulled into this one yet. Also I think it would be good to get all of the tests running.
PR here: #244
More category improvements
rename tests to match pytest form
No description provided.