All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Improve handling of optional dependencies
- Fix model path changed since speechbrain v1.0
- Make iamsystem an optional dependency
- Add notice for downloading example documents
- Warn if dot is unavailable when displaying provenance graph
- Require typing-extensions >= 4.6.0
- Add NER benchmark to cookbook
- Backport or use itertools.batched from Python 3.12
- Use fork of mtsamplesFR under medkit-lib
- Fix returned value in batching utility
- Use ISO 8601 timestamp for model checkpoint paths
- Fix test of iamsystem matcher on Python 3.12
- Add nlstruct-based entity matcher
- Improve robustness of PASpeakerDetector
- Allow to specify model output language with HFTranscriber
- Use link to new repository
- When parsing BRAT, preserve leading space in entities
- Replace unidecode by anyascii
- Document attributes are now supported (both for text and audio) and are added/accessed the same way as annotations attributes
- Brat Input and Output converters can now load and save UMLS CUIs stored in notes
- new from_dir()/from_file() helper methods added to TextDocument/AudioDocument
- new text classification, audio diarization and audio transcription metrics
- the Trainer now saves both the last checkpoint and the best checkpoint, instead of only the last checkpoint
- most operations loading models from HuggingFace can now receive an authentication token (useful to access private repositories)
- support for remapping entity labels in Seq2SeqEvaluator (useful when predicted and reference label do not match exactly)
- easier initialization of PASpeakerDetector
- medkit is now compatible with the latest (0.9) EDS-NLP
- custom attributes (DateAttribute, UMLSNormAttribute) don't have None as a value anymore