Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the Linker, Ranker, Recogniser and Pipeline #282

Open
wants to merge 189 commits into
base: dev
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
189 commits
Select commit Hold shift + click to select a range
00a3607
Ignore .DS_Store files
thobson88 Sep 10, 2024
a3c8c7a
Refactor linking methods as Linker subclasses
thobson88 Sep 11, 2024
a114f2c
Add RelDisamb Linker subclass & fix tests
thobson88 Oct 30, 2024
769488e
Move rel_params to the RelDisambLinker class
thobson88 Oct 30, 2024
5422243
Remove superfluous return value
thobson88 Oct 30, 2024
e8282fb
Refactor Ranker into subclasses
thobson88 Oct 31, 2024
7a1e7a5
Modify Ranker class hierarchy; fix unit tests
thobson88 Nov 1, 2024
b42f6e5
Fix method_name in DeezyMatchRanker
thobson88 Nov 1, 2024
3bf6fa6
Fix integration tests
thobson88 Nov 1, 2024
936cc21
Move train method to DeezyMatchRanker
thobson88 Nov 5, 2024
5219598
Fix docstrings
thobson88 Nov 5, 2024
c94c166
Move deezy-specific code to Ranker subclass
thobson88 Nov 5, 2024
c3cf63b
Update docstrings
thobson88 Nov 5, 2024
daf97ad
Refactor the Recogniser class
thobson88 Nov 5, 2024
b31d07e
Simplify variable names & rename module to avoid clash
thobson88 Nov 6, 2024
2e7d53d
Simplify Pipeline constructor
thobson88 Nov 6, 2024
dc97af8
Remove superfluous return value
thobson88 Nov 6, 2024
ee02a65
Add Pipeline constructor unit test
thobson88 Nov 6, 2024
e13743f
Remove inconsistency in treating place_wqid
thobson88 Nov 6, 2024
d28f5ad
Remove superfluous return value
thobson88 Nov 6, 2024
7c39f6d
Add dataclasses to represent ranking candidates
thobson88 Nov 7, 2024
e42192d
Integrate dataclasses into the Ranker
thobson88 Nov 8, 2024
2088beb
Integrate dataclasses into Linker & Pipeline
thobson88 Nov 12, 2024
29de527
Add method return types
thobson88 Nov 12, 2024
1394718
Move dataclasses to own module
thobson88 Nov 12, 2024
6d8fa19
Add new dataclasses for Ranker & Linker output
thobson88 Nov 13, 2024
34260f0
Update dataclasses and integrate into Linker & Pipeline
thobson88 Nov 17, 2024
004bd53
Reorder dataclasses module
thobson88 Nov 18, 2024
d54392a
Rename Ranker cache & load method; Edit docstrings
thobson88 Nov 18, 2024
ffe8a59
Replace Ranker method_name() with class attribute
thobson88 Nov 18, 2024
e837037
Replace Linker method_name() with class attribute
thobson88 Nov 18, 2024
9cc84f9
Rename Ranker string matching method
thobson88 Nov 18, 2024
66a5a96
Rename ranker run method argument
thobson88 Nov 18, 2024
93e8276
Remove obsolete Linker method
thobson88 Nov 18, 2024
d9f6558
Improve Linker run method signature
thobson88 Nov 20, 2024
8ea624d
Add geo coords & Wikidata class fields in WikidataLink
thobson88 Nov 21, 2024
1a6c468
Add Recogniser dataclasses & run method
thobson88 Nov 21, 2024
6e1392d
Refactor pipeline to remove linking logic
thobson88 Dec 3, 2024
a30d4f6
Move REL predict method call to Linker
thobson88 Dec 4, 2024
9a0cc56
Handle empty candidates case
thobson88 Dec 4, 2024
2a87c97
Handle mentions with empty candidates
thobson88 Dec 4, 2024
0f55cff
Tidy up
thobson88 Dec 4, 2024
8293bf2
Add modular/stepwise pipeline methods
thobson88 Dec 5, 2024
7956f7b
Adapt REL model training to new dataclasses
thobson88 Dec 9, 2024
ada5cb7
Adapt experiments to new dataclasses
thobson88 Dec 9, 2024
7d3b3ff
Add best_disambiguation_score method
thobson88 Dec 9, 2024
4e584dd
Add text() method in TextCandidates dataclass
thobson88 Dec 9, 2024
43cda06
Rename Linker & Recogniser methods to load()
thobson88 Dec 9, 2024
98ff3cc
Move dataclasses module to utils folder
thobson88 Dec 9, 2024
7742e88
Add methods to Predictions dataclass
thobson88 Dec 10, 2024
1c21032
Add static constructor in Ranker class
thobson88 Dec 10, 2024
ec82ab3
Add static constructor in Linker class
thobson88 Dec 10, 2024
70e0fb4
Fix bug in Linker __str__ method
thobson88 Dec 10, 2024
a651960
Use refactored Ranker & Linker in experiments
thobson88 Dec 10, 2024
edf6ced
Rename ner module
thobson88 Dec 11, 2024
7c6557f
Update pipeline run_disambiguation signature
thobson88 Dec 11, 2024
f8d049f
Fix import
thobson88 Dec 11, 2024
10b9afe
More robust guard clause in Linker training
thobson88 Dec 11, 2024
6bdfc89
Distinguish empty predictions from empty candidates
thobson88 Dec 11, 2024
3db22d4
Fix tests in test_experiments.py
thobson88 Dec 12, 2024
3805762
Rename dataclasses
thobson88 Dec 12, 2024
098a240
Add pretty print method for Predictions
thobson88 Dec 12, 2024
8536d61
Add pretty print methods for SentenceMentions
thobson88 Dec 12, 2024
478bf48
Move dataclass tests to test_dataclasses.py
thobson88 Dec 12, 2024
bcf7c59
Add pretty print methods for CandidateLinks
thobson88 Dec 12, 2024
ca75801
Add pretty print method for Candidates
thobson88 Dec 13, 2024
dbcf6e6
Tweak pretty printing
thobson88 Dec 13, 2024
40086f1
Update example notebook: basic pipeline
thobson88 Dec 13, 2024
f56a212
Update example notebook: Deezy mostpopular
thobson88 Dec 13, 2024
8be92c0
Update example notebooks: Deezy REL
thobson88 Dec 13, 2024
6454d0a
Update example notebook: Pipeline modular
thobson88 Dec 13, 2024
dff4d2d
Update example notebook: Perfect mostpopular
thobson88 Dec 13, 2024
6a35cbd
Add guard clauses
thobson88 Dec 13, 2024
bdad46c
Update example notebook: Load & use NER model
thobson88 Dec 13, 2024
0a39158
Remove obsolete comment
thobson88 Dec 13, 2024
aa183e3
Update app config
thobson88 Dec 13, 2024
e60481f
Update app: run_ner endpoint
thobson88 Dec 14, 2024
efe9d5d
Update app: run_candidate_selection
thobson88 Dec 15, 2024
65469dd
Update app: run_disambiguation & pipeline
thobson88 Dec 15, 2024
6092dcc
Remove obsolete code
thobson88 Dec 15, 2024
676d823
Update ci.yml
thobson88 Dec 17, 2024
6530d2e
Update ci.yml
thobson88 Dec 17, 2024
771fcae
Granular test parameters
thobson88 Dec 17, 2024
45446db
Register test markers in pyproject.toml
thobson88 Dec 17, 2024
99a51ab
Move gazetteer to ByDistanceLinker subclass
thobson88 Dec 17, 2024
a437048
Remove ranker argument
thobson88 Dec 17, 2024
a87f89a
Simplify logic
thobson88 Dec 17, 2024
f9469cf
Simplify logic
thobson88 Dec 17, 2024
791280d
Refactor Linker run method
thobson88 Dec 18, 2024
0ff4746
Remove superfluous field from MentionCandidates
thobson88 Dec 18, 2024
58b39c3
Simplify logic
thobson88 Dec 18, 2024
e84bd55
Simplify logic in reldisamb score calculation
thobson88 Dec 18, 2024
2a41a29
Simplify logic in PartialMatchRanker
thobson88 Dec 18, 2024
db6c7aa
Align test assertions with canonical resources
thobson88 Dec 18, 2024
cea01f5
Reorder Pipeline run method args
thobson88 Dec 19, 2024
6d7d48c
Use test resources in pipeline test
thobson88 Dec 19, 2024
0fe401d
Handle empty candidates in Predictions dataclass
thobson88 Dec 20, 2024
ff3d2d8
Update comments
thobson88 Dec 20, 2024
589303b
Add method to access interim (prior) predictions
thobson88 Dec 20, 2024
946533c
Rename APIQuery class
thobson88 Dec 20, 2024
95714c5
Update API app_template.py to new pipeline
thobson88 Dec 23, 2024
ed439cb
Add mkdocs site and autotranslate from Sphinx
thobson88 Dec 23, 2024
6804383
Fix navigation & rst mis-translation
thobson88 Dec 27, 2024
1c6692a
Change docs colour scheme
thobson88 Dec 27, 2024
c5eaf22
Add navigation grid in docs homepage
thobson88 Dec 27, 2024
21bab0d
Fix docs homepage text
thobson88 Jan 2, 2025
c01448a
Add dataclasses docstrings & docs page
thobson88 Jan 2, 2025
baa5c99
Add docstring to dataclasses module
thobson88 Jan 2, 2025
3169246
Consistent method parameter names
thobson88 Jan 2, 2025
434f055
Update deezy_processing docstrings & docs page
thobson88 Jan 2, 2025
53bb53f
Update get_data docstrings & docs page
thobson88 Jan 2, 2025
6cc0a45
Add docs pages for geoparser modules
thobson88 Jan 2, 2025
4660dda
Update ner_utils docstrings & docs page
thobson88 Jan 2, 2025
a79444d
Update preprocess_data docstrings & docs page
thobson88 Jan 2, 2025
0992237
Update process_data docstrings & docs page
thobson88 Jan 2, 2025
4eb0c3c
Update process_wikipedia docstrings & docs page
thobson88 Jan 6, 2025
12d6706
Update rel_e2e & rel_utils docstrings & docs page
thobson88 Jan 6, 2025
e5536e6
Add pages & navigation for utils.REL module
thobson88 Jan 7, 2025
f5a1d79
Update entity_disambiguation docstrings & docs page
thobson88 Jan 7, 2025
7d2d1d8
Update mulrel_ranker docstrings & docs page
thobson88 Jan 7, 2025
4910d8e
Update utils.REL utils docstrings & docs page
thobson88 Jan 7, 2025
0a7366c
Update vocabulary docstrings & docs page
thobson88 Jan 7, 2025
5ad4f5e
Simplify page titles
thobson88 Jan 7, 2025
aef7e95
Update NER module docstrings & docs page
thobson88 Jan 7, 2025
a0eb05a
Update ranking module docstrings & docs page
thobson88 Jan 7, 2025
ec23781
Update linking module docstrings & docs page
thobson88 Jan 7, 2025
9da6095
Update pipeline module docstrings & docs page
thobson88 Jan 7, 2025
594b5ae
Update docs workflow from sphinx to mkdocs
thobson88 Jan 8, 2025
f9bb962
Don't copy the command prompt character
thobson88 Jan 8, 2025
510ab47
Update installation & resources docs pages
thobson88 Jan 8, 2025
150bd5e
Partially update complete-tour docs page
thobson88 Jan 10, 2025
f7094d9
Hide unfinished mkdocs pages
thobson88 Jan 10, 2025
7f94fa7
Merge pull request #285 from Living-with-machines/mkdocs
thobson88 Jan 10, 2025
d7a038e
Merge branch 'mkdocs' into 276-refactor
thobson88 Jan 10, 2025
ba874ee
Fix integration test with microtoponym
thobson88 Jan 10, 2025
a8bd73e
Fix Pipeline integration test with microtoponym
thobson88 Jan 10, 2025
ce14a48
Fix Pipeline integration tests with empty candidates
thobson88 Jan 10, 2025
98e9c25
Move coords field into WikidataLinks dataclass
thobson88 Jan 10, 2025
95d8be5
Fix bugs in Linker
thobson88 Jan 16, 2025
cdf2719
Add static constructor for Recogniser
thobson88 Jan 16, 2025
2571ec9
Fix bug in Ranker
thobson88 Jan 16, 2025
242cf5a
Handle empty candidates in Predictions dataclass
thobson88 Jan 16, 2025
7bcd279
Add BatchJob class, tests & sample files
thobson88 Jan 16, 2025
021f936
Add batch-job executable script
thobson88 Jan 16, 2025
5dfc48a
Fix microtoponym handling
thobson88 Jan 22, 2025
0ef17bc
Fix empty candidates handling
thobson88 Jan 22, 2025
6301dc8
Add batch job API and executable script
thobson88 Jan 22, 2025
8020a81
Add log level config parameter in batch job
thobson88 Jan 23, 2025
f013aea
Add debug-level logging
thobson88 Jan 23, 2025
a435949
Avoid non-String input to NER run method
thobson88 Jan 23, 2025
25a2798
Avoid non-String input to pipeline run_text_recognition
thobson88 Jan 23, 2025
2f46edc
Add BatchJob methods for place of pub info
thobson88 Jan 24, 2025
81ae5d0
Fix place of publication handling in BatchJob
thobson88 Jan 24, 2025
c480038
Fix place of publication logging in BatchJob
thobson88 Jan 24, 2025
b0262d6
Handle case of empty list of WikidataLinks
thobson88 Jan 24, 2025
89ae34b
Fix bug in Candidates dataclass
thobson88 Jan 27, 2025
5c86fbe
Fix bugs in REL linking & dataclasses
thobson88 Jan 28, 2025
dbd5a2c
Limit to compatible versions of transformers dependency
thobson88 Jan 28, 2025
6921f83
Merge pull request #289 from Living-with-machines/287-transformers
thobson88 Jan 28, 2025
7d15a2e
Allow partial REL parameter config
thobson88 Jan 28, 2025
1f89c1a
Add logic to favour place of pub if a candidate
thobson88 Jan 28, 2025
6345f83
Handle empty predictions in BatchJob results
thobson88 Jan 28, 2025
5bd701e
Merge pull request #288 from Living-with-machines/286-batch-processing
thobson88 Jan 29, 2025
e3685f6
Merge branch '286-batch-processing' into 277-combined-score
thobson88 Jan 29, 2025
af331f0
Add combined score linking logic
thobson88 Jan 29, 2025
3449c13
Add error message
thobson88 Jan 29, 2025
88bddc6
Handle failure to compute great circle distance
thobson88 Jan 29, 2025
802f82d
Set normalize=True in haversine call
thobson88 Jan 29, 2025
d2e7093
Fix missing return statement
thobson88 Jan 31, 2025
c7f48e0
Add GPU device param to Recogniser init
thobson88 Feb 4, 2025
2206164
Add GPU device config to Linker & Recogniser
thobson88 Feb 5, 2025
c8f4813
Add linker model name in log message
thobson88 Feb 11, 2025
ebe2ee5
Set Deezy config to verbose logging
thobson88 Feb 11, 2025
55919cb
Add Deezy model print statements
thobson88 Feb 11, 2025
fe8f5de
Pass HuggingFace Dataset to recogniser pipeline
thobson88 Feb 11, 2025
a9262b4
Omit empty SentenceMentions in batch job
thobson88 Feb 12, 2025
a0e498f
Allow partial DeezyMatch parameter config
thobson88 Feb 12, 2025
57b5865
Test non-default Deezy params in BatchJob
thobson88 Feb 12, 2025
f32ef40
Handle empty article text in BatchJob
thobson88 Feb 12, 2025
64b9a2a
Mark test as resource-dependent
thobson88 Feb 12, 2025
96031b1
Remove print statements
thobson88 Feb 13, 2025
687866f
Add log message for REL device
thobson88 Feb 14, 2025
cfd2f01
Add log message for Recogniser device
thobson88 Feb 14, 2025
baa6d07
Add best coords methods in Predictions dataclass
thobson88 Feb 14, 2025
12e4d12
Update Ranker API to take a list of toponym mentions
thobson88 Feb 17, 2025
3a6fe36
Add CombinedScores dataclass & integration test
thobson88 Feb 19, 2025
b2bd844
Add predicted coordinates in BatchJob CSV results
thobson88 Feb 19, 2025
9d7c85b
Add combined scores test with default place of pub
thobson88 Feb 19, 2025
9c8b20d
Merge pull request #291 from Living-with-machines/274-gpu
thobson88 Feb 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Limit to compatible versions of transformers dependency
thobson88 committed Jan 28, 2025
commit dbd5a2ce1b264e3d769b4c9917150feab3ec053d
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -12,7 +12,7 @@ pandas = "^1.3.4"
wget = "^3.2"
DeezyMatch = "^1.3.4"
datasets = "^1.18.0"
transformers = "^4.40.2"
transformers = ">=4.16.1, <=4.40.2"
pydash = "^5.1.0"
wikimapper = "^0.1.5"
numpy = "^1.22.1"