-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative articles processing flavors #1202
base: master
Are you sure you want to change the base?
Conversation
…ution, uniform naming
# Conflicts: # .gitignore # CHANGELOG.md # gradle.properties
I added end to end evaluation score for the new flavors: c09aba1 The scores are limited to the title, authors, first authors for the header and citations (as in the grobid normal processing) to make sure we don't introduce regressions (we shouldn't but we never know) 🙂 |
here the updated documentation page with all the evaluation |
… flavor, fix parameters
# Conflicts: # doc/Benchmarking-biorxiv.md # grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java
# Conflicts: # build.gradle # doc/Grobid-specialized-processes.md # grobid-core/src/main/java/org/grobid/core/GrobidModels.java # grobid-core/src/main/java/org/grobid/core/engines/Engine.java # grobid-core/src/main/java/org/grobid/core/engines/FullTextParser.java # grobid-core/src/main/java/org/grobid/core/main/batch/GrobidMain.java # grobid-service/src/main/java/org/grobid/service/GrobidRestService.java # grobid-trainer/src/main/java/org/grobid/trainer/HeaderTrainer.java # grobid-trainer/src/main/java/org/grobid/trainer/SegmentationTrainer.java # grobid-trainer/src/main/java/org/grobid/trainer/TrainerRunner.java
@DefaultValue("-1") @FormDataParam("start") int startPage, | ||
@DefaultValue("-1") @FormDataParam("end") int endPage, | ||
@FormDataParam("generateIDs") String generateIDs, | ||
@FormDataParam("segmentSentences") String segmentSentences, | ||
@FormDataParam("teiCoordinates") List<FormDataBodyPart> coordinates) throws Exception { | ||
@FormDataParam("teiCoordinates") List<FormDataBodyPart> coordinates |
Check warning
Code scanning / CodeQL
Information exposure through an error message Medium
This PR implements two alternatives segmentation flavors:
article/light
: Segment the document into header and body, and extract only title, authors, dois, and publication date if available, leaving everything else in the body.article/light-ref
: Segment the document into header, body and references, and extract only title, authors, dois, and publication date if available, leaving everything else in the body.The article's body is then composed by two paragraphs:
The PR #1151 was tested in this PR.