Arxivce 1273 browse refactor #240

mnazzaro · 2024-03-11T18:56:48Z

No description provided.

kyokukou · 2024-03-11T19:27:22Z

arxiv/base/config.py

+
+CLASSIC_DB_URI = os.environ.get("CLASSIC_DB_URI", DEFAULT_DB)
+LATEXML_DB_URI = os.environ.get("LATEXML_DB_URI", DEFAULT_LATEXML_DB)
+ECHO_SQL = os.environ.get("ECHO_SQL", True)


not sure we would want the default ECHO_SQL to be true

We very much do not want ECHO_SQL default enabled.

kyokukou · 2024-03-11T20:25:08Z

arxiv/config/__init__.py

+
+Usually the same as BASE_SERVER but can be configured.
+"""
+


for completeness RSS also has a server (although I doubt anyone outside of RSS cares)

kyokukou · 2024-03-11T20:37:00Z

arxiv/db/models.py

+
+
+
+t_arXiv_bogus_subject_class = Table(


why are some tables and some classes for the models?

There are a few reasons this happens. One example is that some tables don't have primary keys and therefore can't be orm models. Another reason is that sqlacodegen decides to make tables that represent many-to-many relationships tables instead of orm models.

arxiv/document/metadata.py

mnazzaro · 2024-03-12T14:00:44Z

This is more of a draft PR. Lots of stuff left to do here:

Make object store useful
Try to get rid of cloudpathlib stuff. Only browse uses it...
Get tests running, including new tests for relevant stuff from browse
Make the config into a pydantic model
Refactor everything in arxiv/base because that's the biggest mess

arxiv/config/__init__.py

kyokukou · 2024-03-12T15:35:25Z

arxiv/identifier/iteration.py

+
+        with get_db() as session:
+            self.query = session.query(Metadata.paper_id, Metadata.version) \
+                .filter(Metadata.paper_id >= start_yymm) \


fyi this will only work for papers after march 2007

kyokukou · 2024-03-12T15:40:51Z

arxiv/identifier/iteration.py

+                .filter(Metadata.paper_id >= start_yymm) \
+                .filter(Metadata.paper_id < f'{end_yymm}.999999')
+            if categories:
+                self.query = self.query.filter(or_(*[Metadata.abs_categories.like(f'%{category.id}%') for category in categories]))


possible hiccup here due to some entries not including the most modern/relevant name of their category. Technically this code is fine, but to find every paper whatever calls this would have to include all possible category names (both versions of aliases and subsumed arxivs)

Hmm yeah I wrote this piece of code sort of carelessly, and I don't know what it would be useful for. I'd like to search/sort by announce date but that doesn't seem easy with the columns available to me. I may just delete it and write again when I start work on converting the full corpus to HTML

ive spent way too much time figuring how to search for all papers of a given demographic. LMK when you work on that

kyokukou · 2024-03-12T15:43:33Z

arxiv/py.typed

adding empty file?

This tells mypy in other packages to observe the type hints from this package

kyokukou · 2024-03-12T16:00:36Z

arxiv/taxonomy/definitions.py

@@ -304,7 +318,14 @@
    'test': date(2010, 1, 1)
 }

-CATEGORIES = {
+class tCategory(TypedDict):


I vote for combining these with the Archive Category and Group classes. Anything trying to use them will want the features of both

ntai-arxiv

Sorry about my not-so-curret docstring.

arxiv/files/__init__.py

bdc34

Looks good. Some minor changes requested. Mostly I've left comments.

arxiv/authors/__init__.py

arxiv/base/__init__.py

bdc34 · 2024-03-14T13:17:17Z

arxiv/base/config.py

+
+CLASSIC_DB_URI = os.environ.get("CLASSIC_DB_URI", DEFAULT_DB)
+LATEXML_DB_URI = os.environ.get("LATEXML_DB_URI", DEFAULT_LATEXML_DB)
+ECHO_SQL = os.environ.get("ECHO_SQL", True)


We very much do not want ECHO_SQL default enabled.

arxiv/base/config.py

arxiv/config/__init__.py

arxiv/ops/fastly_log_ingest/app.py

arxiv/urls/__init__.py

arxiv/util/tests/test_authors.py

mypy.ini

pyproject.toml

Now with typed data. classes can also point to their parents and children

incorporate get cannonical display

filled out alternate names for category versions of subsumed archives

…and-category

Combining tcategory and category

…d the str canonical property canonical_id

…e-refactor

kyokukou

This currently has a bug in it with the new category structure, I've fixed it in a branch that hasn't been pulled into this one yet. Also I think it would be good to get all of the tests running.

PR here: #244

More category improvements

rename tests to match pytest form

kyokukou reviewed Mar 11, 2024

View reviewed changes

arxiv/document/metadata.py Outdated Show resolved Hide resolved

mnazzaro marked this pull request as draft March 12, 2024 14:00

mnazzaro commented Mar 12, 2024

View reviewed changes

arxiv/config/__init__.py Outdated Show resolved Hide resolved

kyokukou reviewed Mar 12, 2024

View reviewed changes

ntai-arxiv requested changes Mar 12, 2024

View reviewed changes

arxiv/files/__init__.py Outdated Show resolved Hide resolved

arxiv/files/__init__.py Outdated Show resolved Hide resolved

arxiv/files/__init__.py Outdated Show resolved Hide resolved

bdc34 requested changes Mar 14, 2024

View reviewed changes

kyokukou and others added 17 commits March 14, 2024 08:38

update readme test instructions

33892fd

Category, Archive, Group class redesign.

becbf59

Now with typed data. classes can also point to their parents and children

Remove cloudpathlib

1ed6b22

remove some extra dependencies

3bdc316

definitions rewritten in new structure

bd67a61

address circular imports

4c69e0f

incorporate get cannonical display

refactor places GROUP, CAT, ARCH are used

798752c

taxonomy tests rewritten

31d2eb0

fixing imports

4667b47

Make pymysql mandatory

cd44be7

Make pymysql mandatory

75cef14

Use mysqlclient instead of PyMySQL

7984e66

Use mysqlclient instead of PyMySQL

65b5469

removed _alt_canonical

4cc2aa0

filled out alternate names for category versions of subsumed archives

repaired test cases

d600b4f

No mysql lib?

c3aa18c

No mysql lib update poetry

b454e40

mnazzaro and others added 11 commits March 19, 2024 14:38

Add validators

69c6ddc

adapt macro template use of categories

6911679

updated DocMetadata functions to work with new category structure

38c088d

sample case has secondary categories

f49119c

Merge branch 'ARXIVCE-1273-browse-refactor' into combining-tcategory-…

fcd0e3b

…and-category

Merge pull request #241 from arXiv/combining-tcategory-and-category

39567a7

Combining tcategory and category

fixed DocMetadata methods, category objects are hashable and comparable

50f7d25

get functions now can get only active categories/archives

06f606d

new funcrtion to get canonical version of archvie or category, rename…

554fb86

…d the str canonical property canonical_id

Merge remote-tracking branch 'origin/develop' into ARXIVCE-1273-brows…

0cc94c2

…e-refactor

Off by 0.05% coverage!

416b0cb

mnazzaro marked this pull request as ready for review March 27, 2024 19:33

bdc34 self-requested a review March 27, 2024 19:39

bdc34 approved these changes Mar 27, 2024

View reviewed changes

ntai-arxiv self-requested a review March 27, 2024 19:40

kyokukou self-requested a review March 27, 2024 19:41

kyokukou requested changes Mar 27, 2024

View reviewed changes

ntai-arxiv approved these changes Mar 27, 2024

View reviewed changes

kyokukou and others added 3 commits March 27, 2024 13:04

Merge pull request #244 from arXiv/more-category-improvements

a099e22

More category improvements

rename tests to match pytest form

7ab5999

Merge pull request #245 from arXiv/rename-tests-for-pytests

d11439c

rename tests to match pytest form

kyokukou approved these changes Mar 27, 2024

View reviewed changes

mnazzaro merged commit 0dfef6c into develop Mar 28, 2024
1 check passed

kyokukou deleted the ARXIVCE-1273-browse-refactor branch April 1, 2024 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arxivce 1273 browse refactor #240

Arxivce 1273 browse refactor #240

mnazzaro commented Mar 11, 2024

kyokukou Mar 11, 2024 •

edited

Loading

mnazzaro Mar 12, 2024

bdc34 Mar 14, 2024

kyokukou Mar 11, 2024

kyokukou Mar 11, 2024

mnazzaro Mar 12, 2024

mnazzaro commented Mar 12, 2024

kyokukou Mar 12, 2024

kyokukou Mar 12, 2024

mnazzaro Mar 12, 2024

kyokukou Mar 13, 2024

kyokukou Mar 12, 2024

mnazzaro Mar 12, 2024

kyokukou Mar 12, 2024

ntai-arxiv left a comment

bdc34 left a comment

bdc34 Mar 14, 2024

kyokukou left a comment •

edited

Loading




		t_arXiv_bogus_subject_class = Table(

Arxivce 1273 browse refactor #240

Arxivce 1273 browse refactor #240

Conversation

mnazzaro commented Mar 11, 2024

kyokukou Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mnazzaro commented Mar 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntai-arxiv left a comment

Choose a reason for hiding this comment

bdc34 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyokukou left a comment • edited Loading

Choose a reason for hiding this comment

kyokukou Mar 11, 2024 •

edited

Loading

kyokukou left a comment •

edited

Loading