Releases: impresso/impresso-linguistic-processing
Releases · impresso/impresso-linguistic-processing
v2.0.0
- Add linguistically processed titles and title status information (most often titles are included in the full text)
- Switch to new json schema v2, changing the id to ci_id and adding more information
v1-0-3
v1-0-1
- fix: POS tagging of lb was buggy (all tags set to X). This has been fixed.
- feat: Generate log files for each newspaper/year pair and upload it to s3.
- feat: Support agreed nameing convention for output files.
- feat: Process directly from s3 input data, on-the-fly mirroring per newspaper for
slim builds - note: no change to spaCy pipelines apart from lb POS tag mapping
v2024.04.04
Version with spaCy 3.6.0 models for de, en, fr, lb.