Skip to content

Releases: impresso/impresso-linguistic-processing

v2.0.0

02 Jan 21:45
Compare
Choose a tag to compare
  • Add linguistically processed titles and title status information (most often titles are included in the full text)
  • Switch to new json schema v2, changing the id to ci_id and adding more information

v1-0-3

27 Nov 09:31
Compare
Choose a tag to compare
  • Improved build and log messages
  • limit max char length to 50000 by default
  • adding max_doc_length into the output file for documentation and transparency

v1-0-1

25 Nov 10:39
Compare
Choose a tag to compare
  • fix: POS tagging of lb was buggy (all tags set to X). This has been fixed.
  • feat: Generate log files for each newspaper/year pair and upload it to s3.
  • feat: Support agreed nameing convention for output files.
  • feat: Process directly from s3 input data, on-the-fly mirroring per newspaper for
    slim builds
  • note: no change to spaCy pipelines apart from lb POS tag mapping

v2024.04.04

13 Apr 15:15
7fb8df4
Compare
Choose a tag to compare

Version with spaCy 3.6.0 models for de, en, fr, lb.