Releases: IBM/unitxt
Unitxt 1.8.0
What's Changed
In this release, the main improvement focuses on introducing type checking within Unitxt tasks. Tasks are fundamental to the Unitxt protocol, acting as standardized blueprints for those integrating new datasets into Unitxt. They facilitate the use of task-specific templates and metrics. To guarantee precise dataset processing in line with the task schema, we've introduced explicit types to the task fields.
For example, consider the NER task in Unitxt, previously defined as follows:
add_to_catalog(
FormTask(
inputs=["text", "entity_types"],
outputs=["spans_starts", "spans_ends", "text", "labels"],
metrics=["metrics.ner"],
),
"tasks.ner",
)
Now, the NER task definition includes explicit types:
add_to_catalog(
FormTask(
inputs={"text": "str", "entity_types": "List[str]"},
outputs={
"spans_starts": "List[int]",
"spans_ends": "List[int]",
"text": "List[str]",
"labels": "List[str]",
},
prediction_type="List[Tuple[str,str]]",
metrics=["metrics.ner"],
),
"tasks.ner",
)
This enhancement aligns with Unitxt's goal that definitions should be easily understandable and capable of facilitating validation processes with appropriate error messages to guide developers in identifying and solving issues.
Right now , using the original definition format without typing , will continue to work but generate a warning message. You should begin to adapt your tasks definition by adding types.
'inputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['question', 'question_id', 'topic']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.
'outputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['reference_answers', 'reference_contexts', 'reference_context_ids', 'is_answerable_label']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.
Special thanks to @pawelknes who implemented this important feature. It truly demonstrates the collective power of the Unitxt community and the invaluable contributions made by Unitxt users beyond the core development team. Such contributions are highly appreciated and encouraged.
- For more detailed information, please refer to #710
Breaking Changes
"metrics.spearman", "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
"metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1
Bug Fixes
- Set empty list if preprocess_steps is None by @marukaz in #780
- Fix UI load failure due to typo by @yoavkatz in #785
- Fix huggingface uploads by @elronbandel in #793
- Fix typo in error message by @marukaz in #777
New Assets
- add perplexity with Mistral model by @lilacheden in #713
New Features
- Type checking for task definition by @pawelknes in #710
- Add open and ibm_genai to llm as judge inference engine by @OfirArviv in #782
- Add negative class score for binary precision, recall, f1 and max f1 by @lilacheden in #788
- Add negative class score for binary precision, recall, f1 and max f1, e.g. f1_binary now returns also "f1_binary_neg".
- Support Unions in metric prediction_type
- Add processor cast_to_float_return_nan_if_failed
- Breaking change: Make prediction_type of metrics numeric:
A. "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
B. "metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1
- Group shuffle by @sam-data-guy-iam in #639
Documentation
- Fix a small typo by @dafnapension in #779
- Update instructions to install HELM from PyPI by @yifanmai in #783
- Update few-shot instructions in Unitxt with HELM by @yifanmai in #774
Full Changelog: 1.7.7...1.8.0
Full Changelog: 1.8.1...1.8.0
Unitxt 1.7.9
What's Changed
- Set empty list if preprocess_steps is None by @marukaz in #780
- fix a small typo by @dafnapension in #779
- Fix typo by @marukaz in #777
- Group shuffle by @sam-data-guy-iam in #639
- add perplexity with Mistral model by @lilacheden in #713
- Fix UI load failure due to typo by @yoavkatz in #785
- Type checking for task definition by @pawelknes in #710
- Add open and ibm_genai to llm as judge inference engine by @OfirArviv in #782
- Avoid creating a demo pool if num_demos is 0. by @yoavkatz in #787
- Update test_helm.yml by @elronbandel in #789
- Update instructions to install HELM from PyPI by @yifanmai in #783
- Update few-shot instructions in Unitxt with HELM by @yifanmai in #774
- Update version to 1.7.8 by @elronbandel in #790
- Fix huggingface uploads by @elronbandel in #793
- Update version to 1.7.9 by @elronbandel in #794
Full Changelog: 1.7.7...1.7.9
Unitxt 1.7.8
What's Changed
- Set empty list if preprocess_steps is None by @marukaz in #780
- fix a small typo by @dafnapension in #779
- Fix typo by @marukaz in #777
- Group shuffle by @sam-data-guy-iam in #639
- add perplexity with Mistral model by @lilacheden in #713
- Fix UI load failure due to typo by @yoavkatz in #785
- Type checking for task definition by @pawelknes in #710
- Add open and ibm_genai to llm as judge inference engine by @OfirArviv in #782
- Avoid creating a demo pool if num_demos is 0. by @yoavkatz in #787
- Update test_helm.yml by @elronbandel in #789
- Update instructions to install HELM from PyPI by @yifanmai in #783
- Update few-shot instructions in Unitxt with HELM by @yifanmai in #774
- Update version to 1.7.8 by @elronbandel in #790
Full Changelog: 1.7.7...1.7.8
1.7.7
What's Changed
- adding multi-lingual bert score model by @assaftibm in #755
- Add HELM Integration: Guide, Examples and Tests by @elronbandel in #743
- Add production-time recipe processing capability to unitxt by @elronbandel in #739
- Add tags and descriptions for assets on the website by @elronbandel in #760
- Changed HELM integration docs to point to point to output result file by @yoavkatz in #761
- Allow FilterByCondition to condition also on subfields by @dafnapension in #762
- fix a small bug in BinaryMaxAccuracy by @dafnapension in #757
- Fix Reward metric warnings by @assaftibm in #765
- Added post processor to take first line in quantization templates by @yoavkatz in #770
- Support for parsing all strings representing valid Python type hints by @pawelknes in #754
- simplify bitwiseor-to-union and show a scheme for Literal by @dafnapension in #772
- Adding NLI model via perplexity by @assaftibm in #766
- Implement LLM as judge metrics by @eladven in #771
- Return loading step to enforce loader limit. by @yoavkatz in #775
- Update formats by @elronbandel in #769
Full Changelog: 1.7.6...1.7.7
Unitxt 1.7.6
What's Changed
The most significat change in this release is the addition of the notion of \N
(slash capital N) to formats. With \N
you can define places where you want a single new line removing all newlines ahead.
A very detailed explanation if you want to go deeper:
The Capital New Line Notation (\N) transforms a given string by applying the Capital New Line Notation.
The Capital New Line Notation (\N) is designed to manage newline behavior in a string efficiently.
This custom notation aims to consolidate multiple newline characters (\n) into a single newline under
specific conditions, with tailored handling based on whether there's preceding text. The function
distinguishes between two primary scenarios:
1. If there's text (referred to as a prefix) followed by any number of \n characters and then one or
more \N, the entire sequence is replaced with a single \n. This effectively simplifies multiple
newlines and notation characters into a single newline when there's preceding text.
2. If the string starts with \n characters followed by \N without any text before this sequence, or if
\N is at the very beginning of the string, the sequence is completely removed. This case is
applicable when the notation should not introduce any newlines due to the absence of preceding text.
This allows us two things:
First define system formats that are not having unnecassry new lines when instruciton of system prompt are missing.
Second, to ignore any new lines created by the template ensuring the number of new lines will be set by the format only.
For example if we defined the system format in the following way:
from unitxt.formats import SystemFormat
format = SystemFormat(model_input_format="{system_prompt}\n{instruction}\n|user|\n{source}\n|assistant|\n{target_prefix}")
We faced two issues:
- If the system prompt is empty or the instruction is empty we have two trailing new lines for no reason.
- If the source finished with new line (mostly due to template structre) we would have unnecassry empty line before the "|user|"
Both problems are solved with \N notation:
from unitxt.formats import SystemFormat
format = SystemFormat(model_input_format="{system_prompt}\\N{instruction}\\N|user|\n{source}\\N|assistant|\n{target_prefix}")
Breaking changes
- Fix typo in MultipleChoiceTemplate field choices_seperator -> choices_separator
- Deprecation of use_query option in all operators , for now it is just raising warning but will be removed in the next major release. The new default behavior is equivalent to use_query=True.
All Changes
Bug Fixes:
- Fix error in unitxt versions conflict and improve message by @elronbandel in #730
- Fix wrong handling of list in dict_get by @yoavkatz in #733
- Fix classification datasets with wrong schema by @elronbandel in #735
- Fix codespell by @elronbandel in #742
- Fix UI errors cause by grammar tasks by @elronbandel in #750
- Fix src layout and enforce its rules with pre-commit hooks by @elronbandel in #753
Assets Fixes:
New Features:
- Add notion of \N to formats, to fix format new line clashes by @elronbandel in #751
- Ability to dynamically change InstanceMetric inputs + grammar metrics by @arielge in #736
- Add DeprecatedFIeld for more informative procedure for deprecating fields of artifacts by @dafnapension in #741
New Assets:
- Add rerank recall metric to unitxt by @jlqibm in #662
- Add many selection and human preference tasks and datasets by @elronbandel in #746
- Adding Detector metric for running any classifier from huggingface as a metric by @mnagired in #745
- Add operators: RegexSplit, TokensSplit, Chunk by @elronbandel in #749
- Add bert score large and base versions by @assaftibm in #748
Enhancments:
- Remove use_dpath parameter from dict_get and dict_set by @dafnapension in #727
- Add mock judge test to cohere for ai by @perlitz in #720
New Contributors
Full Changelog: 1.7.4...1.7.6
Unitxt 1.7.4
In the 1.7.4 release, we've made significant improvements to unitxt, further enhancing its developer friendliness. This update marks a step towards our goal of offering a well-documented and user-friendly library. A key feature of this release is the introduction of a type verification mechanism, designed to enhance the developer experience by increasing transparency and preemptively addressing errors.
4 Most Important Changes:
Add Description and Tags to unitxt Artifacts (1/4)
You can now enrich unitxt artifacts with descriptions and tags. These additions aim to enhance the upcoming online catalog, enabling users to search and filter artifacts by tags for an improved browsing experience.
For instance, to add context to a TaskCard:
TaskCard(
...,
__description__="This is the WNLI dataset",
__tags__={"task":"nli", "license":"apache2.0"}
)
See more in #725
Metrics and Postrprocess Override Through Recipe (2/4)
Now metrics and postprocessors can specified directly through the recipe and override those in the dataset card.
For exmaple if we want to use "metrics.rouge" instead of "metrics.accuracy" for WNLI we can now achieve this with:
load_dataset("card=cards.wnli, ... , metrics=[metrics.rouge]")
See more in #663
Metrics Type Validation (3/4: ⚠️ Breaking Change ⚠️ )
Context: The initiative to enhance developer friendliness at unitxt, especially through type checking, aims to guide developers more effectively and preemptively identify issues.
Previously, metrics individually determined if predictions and references were correctly typed, with many lacking such checks.
Now, Metric incorporates universal code to verify the types of predictions/references and to determine if a metric supports single or multiple references per prediction.
Introducing new parameters for each metric:
# Set 'prediction_type' to the expected types of predictions and references, e.g., "List[str]", "List[Dict]", "string".
# Defaults to "None", triggering a warning for now, but future versions of unitxt will treat this as an error.
prediction_type: str = None
# Indicates if a metric allows multiple references per prediction; otherwise, it supports only one reference per prediction.
single_reference_per_prediction: bool = False
Incompatibility Notice: If any existing post-processing pipeline violates the type schema, it will emit an error.
Important: unitxt's default behavior is to handle multiple references per prediction, as seen in the HF dataset (predictions as strings, references as lists of strings), with post-processing applied accordingly. For some metrics, like those measuring retrieval, predictions and references are lists of document IDs. In scenarios like few-shot learning, this adjustment ensures metrics correctly handle lists of lists.
See more in #667
Dialog Processing Capabilities (4/4)
Dialog data is essential for tasks like dialog completion, dialog summarization, etc. Thus, we've made an initial attempt to support dialog processing in unitxt. The challenges were twofold: (1) dialog is influenced by the system format, and (2) dialog consists of multiple turns, each potentially considered as the final turn for evaluation. To address these, we've introduced a new class of dialog processing operators, which you can review here:
https://unitxt.readthedocs.io/en/latest/unitxt.dialog_operators.html.
You can review an example of card construction utilizing a few dialog processing tools here: https://github.com/IBM/unitxt/blob/main/prepare/cards/coqa.py
This card's usage can be demonstrated with the following recipe:
card=cards.coqa.completion,template=templates.completion.abstractive.standard,format=formats.textual_assistant
Resulting in this input data:
Write the best response to the dialog.
<|user|>
The Vatican Apostolic Library.... The Vatican Library is a research library for history, law, philosophy, science and theology. The Vatican Library is open to anyone who can document their qualifications and research needs. Photocopies for private study of pages from books published between 1801 and 1990 can be requested in person or by mail....from this period, though some are very significant.
When was the Vat formally opened?
<|assistant|>
It was formally established in 1475
<|user|>
what is the library for?
<|assistant|>
research
<|user|>
for what subjects?
And this target:
history, and law
See more in #640
All Changes In Unitxt 1.7.4
Breaking Changes
- Add generic mechanism to check prediction and reference types in metrics by @yoavkatz in #667 See explaination in the previoues sections for why this change is breaking.
New Features
- Add ability to fuse sources with disjoint splits by @yoavkatz in #707
- Allow max reduction type in metric to find the best overall score over all instances by @yoavkatz in #709
- Add string operators module with many standard string operaotrs by @elronbandel in #721
- Allow disabling per group f1 scores in customF1 by @yoavkatz in #719
- Add improved type inference capabilities, inferring type_string from a given object, and infer_type therefrom via parse_type_string by @dafnapension in #706
- Add description and tags to every catalog artifact by @elronbandel in #725
- allow contexts not to be entered to metric by @perlitz in #653
- Add control over metrics and postprocessors through the recipe by @elronbandel in #663
- Add coqa and dialog processing capabilites by @elronbandel in #640
- Add pandas_load_args for LoadCSV by @elronbandel in #696
- Add safe and complete type parsing function to type_utils, for allowing better type checking. by @elronbandel in #688
- Add deprecation decorator for warning and errors for deprecation of functions and classes by @elronbandel in #689
- Add choices shuffling to MultipleChoiceTemplate by @elronbandel in #678
- Make settings utils type sensetive by @elronbandel in #674
New Assets
- Add intl to korean and arabic + improved packaged dependency checks by @pklpriv in #698
- Added BERT Score with new embedding model "distilbert-base-uncased" by @shivangibithel in #703
- Grammatical error correction task by @arielge in #718
- Add trec dataset by @elronbandel in #723
- Add templates for flan text similarity by @elronbandel in #728
- Add metrics for binary tasks with float predictions by @lilacheden in #654
- Add mistral format by @elronbandel in #660
- Added new metric for unsorted_list_exact_math by @yoavkatz in #685
- Add flan wnli truthfulness format by @elronbandel in #665
- DuplicateInstances operator by @pawelknes in #682
- introduce arabic to normalized sacrebleu by @pklpriv in #638
- 20newsgroup from sklearn by @ilyashnil in #659
- Add match_closest_option post processor for multiple choice qa by @elronbandel in #679
- Duplicate instance operator - new functionality by @pawelknes in #687
- Add babi qa dataset by @elronbandel in #666
Asset Fixes
- Add missing instruction in labrador zero shot format by @alonh in #716
- Fix title template for classification by @elronbandel in #722
- prevent cohere4ai using judge as default by @perlitz in #664
- fix summarization template by @gitMichal in #652
Bug Fixes
- Fix handling of boolean environment variables by @arielge in #711
- Handle all env variables with expected types by @arielge in #714
- Properly define the abstract fields by @elronbandel in #724
- Fix places not using general settings or logger by @elronbandel in #656
- removal of dpath -- ready for review by @dafnapension in #680
- fix: LoadFromIBMCloud empty data_dir breaks processing by @jezekra1 in #668
- Fix bug in references with none by @elronbandel in #677
- Validating that the prepare dir is consistent with catalog by @eladven in #683
New Contributors
- @shivangibithel made their first contribution in #703
- @jezekra1 made their first contribution in #668
- @pklpriv made their first contribution in #638
- @pawelknes made their first contribution in #682
Full Changelog: 1.7.1...1.7.4
Unitxt 1.7.3
What's Changed
- added BERT Score with new embedding model "distilbert-base-uncased" by @shivangibithel in #703
- Fix handling of boolean environment variables by @arielge in #711
- Allow max reduction type in metric to find the best overall score over all instances by @yoavkatz in #709
- Add ability to fuse sources with disjoint splits by @yoavkatz in #707
- Handle all env variables with expected types by @arielge in #714
- Add missing instruction in labrador zero shot format by @alonh in #716
- Add string operators module by @elronbandel in #721
- Fix title template for classification by @elronbandel in #722
- Grammatical error correction task by @arielge in #718
- properly define the abstract fields by @elronbandel in #724
- Add trec dataset by @elronbandel in #723
- add intl to korean and arabic + improved packaged dependency checks by @pklpriv in #698
New Contributors
- @shivangibithel made their first contribution in #703
Full Changelog: 1.7.2...1.7.3
Unitxt 1.7.2
What's Changed
- Add metrics for binary tasks with float predictions by @lilacheden in #654
- Fix places not using general settings or logger by @elronbandel in #656
- Add mistral format by @elronbandel in #660
- allow contexts not to be entered to metric by @perlitz in #653
- Add control over metrics and postprocessors through the recipe by @elronbandel in #663
- prevent cohere4ai using judge as default by @perlitz in #664
- 20newsgroup from sklearn by @ilyashnil in #659
- Add flan wnli truthfulness format by @elronbandel in #665
- Add babi qa dataset by @elronbandel in #666
- fix: LoadFromIBMCloud empty data_dir breaks processing by @jezekra1 in #668
- Make settings utils type sensetive by @elronbandel in #674
- Fix bug in references with none by @elronbandel in #677
- Add choices shuffling to MultipleChoiceTemplate by @elronbandel in #678
- Add match_closest_option post processor for multiple choice qa by @elronbandel in #679
- introduce arabic to normalized sacrebleu by @pklpriv in #638
- DuplicateInstances operator by @pawelknes in #682
- Validating that the prepare dir is consistent with catalog by @eladven in #683
- fix summarization template by @gitMichal in #652
- Added new metric for unsorted_list_exact_math by @yoavkatz in #685
- Add deprecation decorator for warning and errors for deprecation of functions and classes by @elronbandel in #689
- Duplicate instance operator - new functionality by @pawelknes in #687
- Add safe and complete type parsing function to type_utils, for allowing better type checking. by @elronbandel in #688
- Add pandas_load_args for LoadCSV by @elronbandel in #696
- Add coqa and dialog processing capabilites by @elronbandel in #640
- Add generic mechanism to check prediction and reference types in metrics by @yoavkatz in #667
- removal of dpath -- ready for review by @dafnapension in #680
- Update version to 1.7.2 by @elronbandel in #704
New Contributors
- @jezekra1 made their first contribution in #668
- @pklpriv made their first contribution in #638
- @pawelknes made their first contribution in #682
Full Changelog: 1.7.1...1.7.2
1.7.1
What's Changed
- Update version to 1.7.0 by @elronbandel in #630
- Return copies of artifacts from the artifacts cache by @matanor in #612
- Avoid RuntimeWarning in confidence interval computation by @matanor in #632
- Add essential table processing operators by @csrajmohan in #627
- Add Capitalize and Substring operators. Add tests. by @jlqibm in #609
- Add codespell spell checker to pre-commit and fix spelling by @elronbandel in #633
- Add processors and metrics by @lilacheden in #634
- Add recipe metadata to the internal stream by @elronbandel in #636
- Add instance field operator by @elronbandel in #637
- Fix split in mmlu which was removed in huggingface by @elronbandel in #645
- Seperate inputs processing and instruction processing in templates by @elronbandel in #644
- Add some operators requirements by @elronbandel in #643
- more careful before rejecting queries by @dafnapension in #647
- Add format args and labrador format by @elronbandel in #649
- Fix instruction preparation for multiple choice by @elronbandel in #651
- Add utilities for comparing datasets examples between unitxt versions by @eladven in #650
- add LlamaIndexCorrectnessMetric by @perlitz in #594
New Contributors
Full Changelog: 1.7.0...1.7.1
Unitxt 1.7.0
What Changed in Unitxt 1.7.0
This release introduces a few significant changes that modify existing conventions:
- Instructions renamed to system_prompts
This means that from now on, to define a new system-level instruction, you can use this code:
system_prompt = TextualSystemPrompt( # <<<< Class name has changed
"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n"
)
add_to_catalog(system_prompt, "system_prompts.models.alpaca", overwrite=True) # <<<< Catalog name has changed
It also means that all the system-level instructions were moved to the catalog under system_prompts
instead of instructions
.
This change is breaking old instruction but was necassry to enable the next very useful change.
- Templates can now (1) generate task specific instruction once at the head of the example, and (2) can add few words the model will say before the models' final prediction
This change was requested by many pepole.
For example here in this COLA dataset example:
User: Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable. text: Fred watered the plants flat.
Agent: acceptable
User: Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable. text: The pond froze solid.
Agent:
The instruction "Classify the ..." is reapted for every demonstration. Also with the current template there is no way to put few words that the agent will say before the prediciton for instance: "Agent: The class is ". With the new changes both of these important features are enabled.
If the old way for defining tempaltes for classification was:
add_to_catalog(
InputOutputTemplate(
input_format="Classify the {type_of_class} of the following {text_type} to one of these options: {classes}. {text_type}: {text}",
output_format="{label}",
),
"templates.classification.multi_class.default_no_instruction",
overwrite=True,
)
It is now defined this way:
add_to_catalog(
InputOutputTemplate(
input_format="{text_type}: {text}", # <<<< Changed
output_format="{label}",
target_prefix="The {type_of_class} is ", # <<<< Added
instruction="Classify the {type_of_class} of the following {text_type} to one of these options: {classes}.\n", # <<<< Added
),
"templates.classification.multi_class.instruction",
overwrite=True,
)
The new template fields instruction
and target_prefix
will produce this example:
Classify the grammatical acceptability of the following text to one of these options: unacceptable, acceptable.
User: text: Fred watered the plants flat.
Agent: The grammatical acceptability is acceptable
User: text: The pond froze solid.
Agent: The grammatical acceptability is
Notice how the instruction appears only once, and the target prefix is appearing after the 'Agent:'.
Read more in the tutorial on preparing templates.
- Loading from catalog with modifications
Now you can load an item from the catalog and change its fields. For example, if you want to use a task but with a different metric, you can use this syntax:
card = TaskCard(
loader=LoadHF(path="glue", name="cola"),
preprocess_steps=[...],
task="tasks.classification.multi_class[metrics=[metrics.matthews_correlation]]", # <<<< Modified
templates="templates.classification.multi_class.all",
)
add_to_catalog(card, "cards.cola", overwrite=True)
Read more in the tutorial on loading from the catalog.
- Renaming of
additional_inputs
totask_data
In an effort to more accurately represent the origin of certain fields within our system, we've renamed the additional_inputs
parameter to task_data
. This modification underscores the fact that these fields are derived directly from the task definition itself. This change is crucial for maintaining the integrity and reliability of metrics, as it ensures these fields are validated against the task schema. Consequently, developers crafting metrics for specific tasks can effortlessly ascertain which fields are accessible to them by simply referring to the task schema. This alignment between task definitions and metrics development fosters a more intuitive and efficient workflow for unitxt contributors.
Release Changes
BugFixes:
- Fix parser to allow source name that starts with numeric by @marukaz in #530
- Avoid race condition when download files to IBM COS cache by @yoavkatz in #536
- Updating perplexity computation, to apply exp(-x) by @assaftibm in #534
- Avoid duplicate values in UI by @Roni-Friedman in #552
- Fixed the test that generated a new entry in the catalog by @dafnapension in #550
- Fix artifact initialization dict creation to be recursive by @elronbandel in #559
- Enforce tests to use only local catalogs by @elronbandel in #564
- Fix multi label classification template and improve debugging by @yoavkatz in #571
- Fix classification code so multi-label metrics are not aware of 'none' by @yoavkatz in #580
- Fix MultiReferenceTemplate import by @perlitz in #583
- Add uncomitted processor by @elronbandel in #588
- Add missing processor in catalog by @yoavkatz in #590
- Docfix: Fix incorrect artifact names in Adding Dataset doc by @yifanmai in #591
- fixes to perplexity metric, updates to catalog by @assaftibm in #592
- Fix many datasets and templates by @elronbandel in #599
- Fix Test catalog preperation without hugginface access by @elronbandel in #601
- Fix format instruction same as source in templates by @dafnapension in #607
- Fixed formats and system prompts by @elronbandel in #604
- Add scipy to base requirements by @matanor in #611
- Reverese undocumented capitalization in templates by @elronbandel in #616
- Fix broken OptionalField in dataclass by @elronbandel in #619
- Fix some features of the Tempate for ffqa by @dafnapension in #613
- Fix problem in process_instance by @yoavkatz in #628
New Assets:
- Added table serializers operators and add Wikitq table question answering dataset by @csrajmohan in #544
- Added human eval dataset by @OfirArviv in #509
- Added Clinc and news datasets by @ilyashnil in #578
- Added cards for cohere for ai aya dataset by @dafnapension in #579
- Add multi class relation classification task and change nli datasets to use it by @elronbandel in #586
- Eval metrics by @lilacheden in #587
- Add tab_fact dataset, a dataset for classification of textual entailment from tables by @csrajmohan in #582
- Add filtered ffqa dataset by @marukaz in #593
- Add universal_ner by @elronbandel in #622
- Add atis dataset by @elronbandel in #629
Enhancments
- Tests can be done now also on PRs from forks. by @elronbandel in #537 #538
- Show artifact class details in the documentation. by @dafnapension in #528
- UI improvements by @Roni-Friedman in #541
- Update README.md by @eltociear in #540
- Add artifact_identifier to Artifact objects loaded from the catalog, linking them to their catalog name. by @matanor in #545 #547 #546
- allow imports list for executequery and filterbyquery and rename to ExecuteExpression and FilterByExpression by @dafnapension in #542
- Add tests for api is presented in the unitxt paper. by @elronbandel in #558
- Extend the function that evaluate with unitxt metric on external data to new types of data by @assaftibm in #557
- Add Kendall's tau metric by @lilacheden in #535
- Add new table operators for serialization & truncation by @csrajmohan in #567
- Unitxt should operate with no package requirements by default. This adds some tools to do so. by @elronbandel in #570
- Seperate library tests and catalog preperation by @elronbandel in #572
- Add class for constants handling by @elronbandel in #575
- Add code needed for evaluating metrics as models by @lilacheden in #573
- Improved error message when using TemplateDict ...