Releases: IBM/unitxt
Unitxt 1.15.9
Main changes
- Artifacts in the catalog can now be links to other artifacts and can also be marked deprecated.
![image](https://private-user-images.githubusercontent.com/68273864/391333577-b2240516-db5a-4fe0-934b-50e2a7c00b04.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk0NTAxNjMsIm5iZiI6MTczOTQ0OTg2MywicGF0aCI6Ii82ODI3Mzg2NC8zOTEzMzM1NzctYjIyNDA1MTYtZGI1YS00ZmUwLTkzNGItNTBlMmE3YzAwYjA0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjEzVDEyMzEwM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFiOGMyM2NiYTRlOGM5Yjc0YWVhMzA1YzBkMTczZDcyZGRlOTk2ZmYyNmRjMTc0NWU4ZmNlNWE0OTQyZTI5ZDAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Wn8hUTBMsFqBAnTKn7_DZMi7FIJUf9Hp4WVYMdGxboQ)
What's Changed
- artifact link by @dafnapension in #1363
- Add processors also as operators by @antonpibm in #1397
- added 'add_link_to_catalog' for easily adding artifact_links with/without deprecation msg by @dafnapension in #1398
- Safety updates by @bnayahu in #1391
- Reduce error message clutter by @yoavkatz in #1401
- Update version to 1.15.9 by @yoavkatz in #1404
Full Changelog: 1.15.8...1.15.9
Unitxt 1.15.8
Main changes
Added support for RITS Inference Engine
Inference Engines
- Add inference engines to the catalog by @martinscooper in #1394
- Add support for OpenAI custom base url and default headers + RITS Inference engine by @martinscooper in #1385
Assets
- Add vectara's hhem2.1 faithfulness model as a metric by @lilacheden in #1382
Bug Fixes
- fix template in Arena Hard card and example by @OfirArviv in #1390
Full Changelog: 1.15.7...1.15.8
1.15.7
Assets
- add llama-3-405b-instruct wml classification engine by @lilacheden in #1383
Usability
- Support MerticsList - to store a list of metrics by @lilacheden in #1379
Bug fixes
- Made sure null augmentor works as expected by @yoavkatz in #1381
- Fixes and improvements to task based llm as judge by @lilacheden in #1366
- Fix package dir in settings by @yoavkatz in #1387
Documentation
- Typos in the rst files by @dafnapension in #1380
- Chat api blog post by @elronbandel in #1371
Inference Engine
- Tests and minor changes Changes to GenAI, WML and HF inference engines by @pawelknes in #1290
Full Changelog: 1.15.6...1.15.7
Unitxt 1.15.6 - Chat Inference
Main changes
-
Added support for generating output in ChatAPI format (user/assistant turns) and for inference engines to process ChatAPI input.
See details in blog. -
Improved catalog browsing experience with cleaner formatting of catalog assets, and clickable hyper links between catalog assets and between catalog assets and code. See for example.
New Features
Inference Engines that support ChatApi interface
- Add target_prefix erasing post processor by default by @elronbandel in #1361
- Add multi api inference engine by @elronbandel in #1343
- Add chat api format with standard open ai chat format by @elronbandel in #1314
- Add option selecting huggingface inference engine by @elronbandel in #1357
Improved multi model support
- Add seed bench dataset and support for videos by @elronbandel in #1309
- Add LMMSEvalInferenceEngine by @elronbandel in #1301
- Vision robustness blog by @elronbandel in #1318
New Asserts
- added QTSUMM taskcard for query-focused table summarization task by @csrajmohan in #1304
- Add OptionSelectingByLogProbsInferenceEngine by @martinscooper in #1317
- Replace 20 newsgroup with a shorter version in bluebench by @perlitz in #1347
- Bluebench Update by @perlitz in #1342
- Update Blue Bench description by @elronbandel in #1354
- Batched multi class classification by @yoavkatz in #1340
- move rag binary llmaj under rag metrics by @lilacheden in #1338
- adding generic inference binary+idk judges by @Roni-Friedman in #1316
- Add table augmentors by @elronbandel in #1328
- Align augmenters with task and types mechanisms by @elronbandel in #1356
- add serializers to catalog + new table operators by @ShirApp in #1365
Performance
- Add loaders cache by @elronbandel in #1333
Usuability
- Allow turning single stream to dataset by @elronbandel in #1335
Documentation
- Add ability to load_dataset without a template for simpler usage for beginners by @elronbandel in #1350
- add score name prefix for judge_raw_output/input in llmaj metric by @OfirArviv in #1323
- Add link to source in catalog assets by @elronbandel in #1362
- Fix docs compilation and links from docs to github by @elronbandel in #1359
- Fix website docs-code links by @elronbandel in #1360
- Update error checking and documentation of processors by @yoavkatz in #1325
- Unified catalog terminology by @yoavkatz in #1355
- Improved documentation formatting by @dafnapension in #1334
- Fix catalog links by @elronbandel in #1348
- Print catalog entries as yamls by @dafnapension in #1351
CI/CD
- a more elaborated message from performace-test-summary, and doc-string of card_profiler by @dafnapension in #1307
- Make package requirements compatible with requirements.txt like format by @elronbandel in #1310
- Make inference engine tests run only when inference.py has changed by @elronbandel in #1311
- Seperate examples tests by @elronbandel in #1322
- Fix pyproject.toml to be standalone and comply with modern standards by @elronbandel in #1324
- Fix GitHub Actions concurrence execution by @elronbandel in #1349
- Make tests faster and clearer by @dafnapension in #1345
New Contributors
- @martinscooper made their first contribution in #1317
Full Changelog: 1.14.1...1.15.6
Unitxt 1.14.1 - Faster Unitxt 🚀
Important Change: Unitxt is Faster!
To improve Unitxt’s performance, we've made several optimizations:
-
Operator Acceleration: Many operators have been sped up by removing unnecessary deep copying in their code, enhancing runtime efficiency.
-
Caching Hugging Face Datasets: We added the option to cache Hugging Face datasets in loaders, which can prevent redundant loading operations. To enable this, you can either:
- Set it globally in code:
import unitxt unitxt.settings.disable_hf_datasets_cache = False
- Use the settings context:
with settings.context(disable_hf_datasets_cache=False): # your code
- Or set the environment variable:
export UNITXT_DISABLE_HF_DATASETS_CACHE=False
- Set it globally in code:
-
Eager Execution Mode: Running Unitxt without streaming, which can be faster in certain scenarios. Enable eager execution using the environment variable or directly in code:
unitxt.settings.use_eager_execution = True # or with settings.context(use_eager_execution=True): # your code
-
Partial Stream Loading: This feature lets you load only the necessary data instances, avoiding full dataset loads when not required. Here's an example:
from unitxt import load_dataset dataset = load_dataset( card="cards.doc_vqa.lmms_eval", template="templates.qa.with_context.title", format="formats.models.llava_interleave", loader_limit=300, streaming=True, ) print(next(iter(dataset["test"][0]))) # Loads only the first instance
Complete Example: Combining the optimizations above can lead to near 1000x faster dataset loading:
from unitxt import load_dataset, settings with settings.context( disable_hf_datasets_cache=False, use_eager_execution=True, ): dataset = load_dataset( card="cards.doc_vqa.lmms_eval", template="templates.qa.with_context.title", format="formats.models.llava_interleave", loader_limit=300, streaming=True, ) print(next(iter(dataset["test"][0]))) # Loads only the first instance
-
Execution Speed Tracking: A GitHub action has been added to monitor Unitxt’s execution speed in new pull requests, helping ensure that optimizations are maintained.
Summary
This release is focused on accelerating performance in Unitxt by introducing several key optimizations. Operator efficiency has been enhanced by removing deep copies, making operations faster. Users can now enable dataset caching for Hugging Face datasets to prevent redundant loading, configured directly in code or through environment variables. An optional eager execution mode has been added, bypassing streaming to increase speed in certain scenarios. Additionally, partial stream loading allows selective instance loading, reducing memory usage and improving response times. To maintain these improvements, a new GitHub action now monitors Unitxt’s execution speed in pull requests, ensuring consistent performance across updates.
All Changes
- Enhancements to inference engines by @lilacheden in #1243
- add post processor to convert log probs dictionary to probabilities of a specific class by @lilacheden in #1247
- CI for metrics other than main + Bugfix in RetrievalAtK by @lilacheden in #1246
- Add huggingface cache disabling option to unitxt settings by @elronbandel in #1250
- Make F1Strings faster by @elronbandel in #1248
- Fix duplicate column deletion bug in pandas serializer by @elronbandel in #1249
- revived no_deep just to compare performance by @dafnapension in #1254
- fixed scigen post-processor by @csrajmohan in #1253
- Add prediction length metric by @perlitz in #1252
- Fix faithfulness confidence intervals by @matanor in #1257
- Allow role names to be captialized in SerializeOpenAiFormatDialog by @yoavkatz in #1259
- Accelerate image example 1000X by @elronbandel in #1258
- Fix the empty few-shot target issue when using produce() by @marukaz in #1266
- fix postprocessors in turl_col_type taskcard by @csrajmohan in #1261
- Fix answer correctness confidence intervals by @matanor in #1256
- add BlueBench as a benchmark to the catalog by @shachardon in #1262
- Fix MultipleSourceLoader documentation by @marukaz in #1270
- Ignore unitxt-venv by @marukaz in #1269
- Add mmmu by @elronbandel in #1271
- A fix for a bug in metric pipeline by @elronbandel in #1268
- Added Tablebench taskcard by @csrajmohan in #1273
- Fix missing deep copy in MapInstanceValues by @yoavkatz in #1267
- Add stream name to generation of dataset by @elronbandel in #1276
- Fix demos pool inference by @elronbandel in #1278
- Fix quality github action by @elronbandel in #1281
- add operators for robustness check on tables by @csrajmohan in #1279
- Instruction in SystemFormet demo support. by @piotrhelm in #1274
- change the max_test_instances of bluebench.recipe.attaq_500 to 100 by @shachardon in #1285
- Add documentation for types and serializers by @elronbandel in #1286
- Add example for image processing with different templates by @elronbandel in #1280
- Integrate metrics team LLMaJ with current unitxt implemantation by @lilacheden in #1205
- performance profiler with visualization by @dafnapension in #1255
- Remove split arg to support old hf datasets versions by @elronbandel in #1288
- add post-processors for tablebench taskcard by @csrajmohan in #1289
- recursive copy seems safer here by @dafnapension in #1295
- Fix performance tracking action by @elronbandel in #1296
- try num of instances in nested global scores by @dafnapension in #1282
- Update version to 1.14.0 by @elronbandel in #1298
- expand performance table by @dafnapension in #1299
- Fix doc_vqa lmms_eval by @elronbandel in #1300
- prepare for int-ish group names and type names and add the exposing card by @dafnapension in #1303
- remove groups breakdowns from global score of grouped instance metrics by @dafnapension in #1306
- Update the safety metric batch size to 10 by @perlitz in #1305
New Contributors
- @piotrhelm made their first contribution in #1274
Full Changelog: 1.13.1...1.14.1
Unitxt 1.14.0 - Faster Unitxt
What's Changed
- Simplify qa example by @yoavkatz in #1234
- allow multiple references for f1 strings metric by @ShirApp in #1225
- Add bluebench recipes by @shachardon in #1237
- Allow templates dicts to be python dicts and fix a bug in the TemplatesDict definition by @elronbandel in #1240
- Deep copy artifacts that fetched twice by @elronbandel in #1239
- Adding of ANLS metric to doc_vqa and info_vqa datasets by @alfassy in #1241
- Update README.md by @elronbandel in #1242
- Update version to 1.13.1 by @elronbandel in #1244
- Enhancements to inference engines by @lilacheden in #1243
- add post processor to convert log probs dictionary to probabilities of a specific class by @lilacheden in #1247
- CI for metrics other than main + Bugfix in RetrievalAtK by @lilacheden in #1246
- Add huggingface cache disabling option to unitxt settings by @elronbandel in #1250
- Make F1Strings faster by @elronbandel in #1248
- Fix duplicate column deletion bug in pandas serializer by @elronbandel in #1249
- revived no_deep just to compare performance by @dafnapension in #1254
- fixed scigen post-processor by @csrajmohan in #1253
- Add prediction length metric by @perlitz in #1252
- Fix faithfulness confidence intervals by @matanor in #1257
- Allow role names to be captialized in SerializeOpenAiFormatDialog by @yoavkatz in #1259
- Accelerate image example 1000X by @elronbandel in #1258
- Fix the empty few-shot target issue when using produce() by @marukaz in #1266
- fix postprocessors in turl_col_type taskcard by @csrajmohan in #1261
- Fix answer correctness confidence intervals by @matanor in #1256
- add BlueBench as a benchmark to the catalog by @shachardon in #1262
- Fix MultipleSourceLoader documentation by @marukaz in #1270
- Ignore unitxt-venv by @marukaz in #1269
- Add mmmu by @elronbandel in #1271
- A fix for a bug in metric pipeline by @elronbandel in #1268
- Added Tablebench taskcard by @csrajmohan in #1273
- Fix missing deep copy in MapInstanceValues by @yoavkatz in #1267
- Add stream name to generation of dataset by @elronbandel in #1276
- Fix demos pool inference by @elronbandel in #1278
- Fix quality github action by @elronbandel in #1281
- add operators for robustness check on tables by @csrajmohan in #1279
- Instruction in SystemFormet demo support. by @piotrhelm in #1274
- change the max_test_instances of bluebench.recipe.attaq_500 to 100 by @shachardon in #1285
- Add documentation for types and serializers by @elronbandel in #1286
- Add example for image processing with different templates by @elronbandel in #1280
- Integrate metrics team LLMaJ with current unitxt implemantation by @lilacheden in #1205
- performance profiler with visualization by @dafnapension in #1255
- Remove split arg to support old hf datasets versions by @elronbandel in #1288
- add post-processors for tablebench taskcard by @csrajmohan in #1289
- recursive copy seems safer here by @dafnapension in #1295
- Fix performance tracking action by @elronbandel in #1296
- try num of instances in nested global scores by @dafnapension in #1282
- Update version to 1.14.0 by @elronbandel in #1298
New Contributors
- @alfassy made their first contribution in #1241
- @piotrhelm made their first contribution in #1274
Full Changelog: 1.13.0...1.14.0
Unitxt 1.13.1
Update version to 1.13.1 (#1244)
Unitxt 1.13.0
Unitxt 1.13.0 - Multi Modality and Types
New type handling capabilities
The most significant change in this release is the introduction of type serializers to unitxt.
Type serializers in charge of taking a specific type of data structure such as Table, or Dialog and serialize it to textual representation.
Now you can define tasks in unitxt that have complex types such as Table or Dialog and define serializers that handle their transformation to text.
This allows to control the representation of different types from the recipe api:
from unitxt import load_dataset
from unitxt.struct_data_operators import SerializeTableAsMarkdown
serializer = SerializeTableAsMarkdown(shuffle_rows=True, seed=0)
dataset = load_dataset(card="cards.wikitq", template_card_index=0, serializer=serializer)
And if you want to serialize this table differently you can change any of the many available table serializers.
Defining New Type
If you wish to define a new type with custom serializers you can do so by using python typing
library:
from typing import Any, List, TypedDict
class Table(TypedDict):
header: List[str]
rows: List[List[Any]]
Once your type is ready you should register it to unitxt type handling within the code you are running:
from unitxt.type_utils import register_type
register_type(Table)
Now your type can be used anywhere across unitxt (e.g in task definition or serializers).
Defining a Serializer For a Type
If you want to define a serializer for your custom type or any typing type combination you can do so by:
class MySerizlizer(SingleTypeSerializer):
serialized_type = Table
def serialize(self, value: Table, instance: Dict[str, Any]) -> str:
# your code to turn value of type Table to string
Multi-Modality
You now can process Image-Text to Text or Image-Audio to Text datasets in unitxt.
For example if you want to load the doc-vqa dataset you can do so by:
from unitxt import load_dataset
dataset = load_dataset(
card="cards.doc_vqa.en",
template="templates.qa.with_context.title",
format="formats.models.llava_interleave",
loader_limit=20,
)
Since we have data augmentation mechanisms it is just natural to use it for images. For example if you want your images in grey scale:
dataset = load_dataset(
card="cards.doc_vqa.en",
template="templates.qa.with_context.title",
format="formats.models.llava_interleave",
loader_limit=20,
augmentor="augmentors.image.grey_scale", # <= Just like the text augmenters!
)
Then if you want to get the scores of a model on this dataset you can use:
from unitxt.inference import HFLlavaInferenceEngine
from unitxt.text_utils import print_dict
from unitxt import evaluate
inference_model = HFLlavaInferenceEngine(
model_name="llava-hf/llava-interleave-qwen-0.5b-hf", max_new_tokens=32
)
test_dataset = dataset["test"].select(range(5))
predictions = inference_model.infer(test_dataset)
evaluated_dataset = evaluate(predictions=predictions, data=test_dataset)
print_dict(
evaluated_dataset[0],
keys_to_print=["source", "media", "references", "processed_prediction", "score"],
)
Multi modality support in unitxt is building upon the type handling introduced in the previous section with two new types: Image and Audio.
What's Changed
- add revision option to hf loader by @OfirArviv in #1189
- Support dataset field in nested JSON files by @antonpibm in #1188
- Add TURL Table column type annotation task card by @csrajmohan in #1186
- Update operators.py - copy edits (grammar, consistency, clarity) by @welisheva22 in #1187
- Numeric nlg postproc by @ShirApp in #1185
- Add support for Literal, TypedDict and NewType for unitxt type checking by @elronbandel in #1191
- Scarebleu metric: remove mecab_ko and mecab_ko_dic from metric requir… by @eladven in #1197
- Add rag dataset + openai format dialog operator by @OfirArviv in #1192
- Update README.md by @elronbandel in #1198
- add decorator with init warning by @MikolajCharchut in #1200
- Add mock inference mode setting and allow testing without gen ai key by @elronbandel in #1204
- Fix using OpenAiInferenceEngine for LLMAsJudge by @yifanmai in #1194
- Add TogetherAiInferenceEngine by @yifanmai in #1203
- Fix OpenAiInferenceEngine by @yifanmai in #1193
- Add serializers to templates and reorganize and unite all templates by @elronbandel in #1195
- Add demos to task_data by @elronbandel in #1206
- Move test_context_correctness by @matanor in #1207
- Add image-text to text datasets by @elronbandel in #1211
- Refactor augmentors to be more scaleable + add image aumgentors by @elronbandel in #1212
- Fix grey scale augmentor and add to image example by @elronbandel in #1213
- Add images to UI by @elronbandel in #1216
- add unified decorator for warnings and unit tests by @MikolajCharchut in #1209
- Add templates list option to standard recipe by @elronbandel in #1219
- Use read token for huggingface datasets reading by @elronbandel in #1223
- add Llava-next system prompt by @OfirArviv in #1221
- Improve performance for huggingface tokenizer based format by @elronbandel in #1224
- Fix compute expression to use the instance variables as globals by @elronbandel in #1217
- Add generic inference engine to allow dynamic selection by the user by @eladven in #1226
- A suggested PR for issue 1106: More meaningful error message when catalog consistency fails by @dafnapension in #1201
- Add random templates for bluebench by @perlitz in #1222
- A suggested PR for issue #1214: fixed a bug in score_prefix for grouped instance scores by @dafnapension in #1228
- Add control over serizliers from recipe + improve serializers construction + allow seed for table shuffling serizliers by @elronbandel in #1229
- Fix table tasks to use default table serializers by @elronbandel in #1230
- Add concurency_limit parameter to WMLInferenceEngine by @elronbandel in #1231
- Add wml and generic based llmaj metric by @perlitz in #1227
- Update version to 1.13.0 by @elronbandel in #1232
New Contributors
- @MikolajCharchut made their first contribution in #1200
Full Changelog: 1.12.4...1.13.0
1.12.4
Main changes
- Enable to define benchmark in Unitxt by adding the ability to produce scores of groups based on task attributes and recipe metadata. For more information see https://www.unitxt.ai/en/latest/docs/benchmark.html by @elronbandel in #1130
- Enable inference/production APIs to support invocation by task without specifying a card. It enables using any task in the Unitxt catalog as an inference function. Check https://www.unitxt.ai/en/latest/docs/production.html for details (#957)
- Add support for multi-modality. For details see https://www.unitxt.ai/en/latest/docs/multimodality.html by @elronbandel in #1175
Additions to catalog
- Add ProvoQ dataset artifacts by @bnayahu in #1168
- Add Wikitq metric by @ShirApp in #1167
- Add more LLMs as judges ensembles by @pvn25 in #1171
- Add Scigen table2text task with llm_as_judge metric by @csrajmohan in #1134
New Features
- Add LLM as judge ensemble metrics, and add LLMaaJ ensemble example by @pvn25 in #1081
- Refactor RenameFields operator to Rename. The old operator is still supported but raises a deprecation warning by @elronbandel in #1123
Bug Fixes
- Make cache compatible with python 3.8 by @elronbandel in #1172
- Deprecated field used to print warning message with wrong reason @dafnapension in #1174
Documentation changes
- Update llm_as_judge.py --- copy edits (grammar, consistency, clarity) by @welisheva22 in #1164
- Update formats.py --- copy edits (grammar, consistency, clarity) by @welisheva22 in #1163
- Update loaders.py --- copy edits (grammar, consistency, clarity) by @welisheva22 in #1162
- Update card.py - minor documentation changes by @welisheva22 in #1161
- Update adding_dataset.rst - a few more minor documentation changes by @welisheva22 in #1160
- Update artifact.py --- documentation edits (grammar, consistency, cla… by @welisheva22 in #1159
- Update glossary.rst --- copy edits (grammar, consistency, clarity) by @welisheva22 in #1155
- Update helm.rst --- copy edits (grammar, consistency, clarity) by @welisheva22 in #1154
- Update operators.py --- copy edits (grammar, consistency, clarity) - take 2 by @welisheva22 in #1158
- Docfix: Fix typo in Installation doc by @yifanmai in #1181
New Contributors
1.12.3
Main changes
-
New option to use multiple templates and/or num_demos in single dataset recipe. Unitxt will randomly sample from the provided templates and possible number of demos for each instance.
See example : https://github.com/IBM/unitxt/blob/main/examples/evaluate_different_templates_num_demos.py -
A warning is now generated when a metric generate a score with the same name as that of another metric and overwrites it
See more details on how to deal with conflicting metric names in https://www.unitxt.ai/en/latest/docs/adding_metric.html#metric-outputs-with-multiple-metrics
Non backward compatible changes in catalog
- change rag metrics name convention (e.g. "metrics.rag.mrr" -> "metrics.rag.context_correctness.mrr",) - catalog non backward compatible change by @assaftibm in #1104
- Update summarization task and templates to support multiple reference summaries - by @yoavkatz in #1126
- Fix belebele due to new convention by @elronbandel in #1145
Additions to catalog
- Add DeepSeek-Coder format and system prompt by @oktie in #1105
- Add a metric to calculate the ratio of references included in the prediction by @marukaz in #1091
- adding RAG bge metrics by @assaftibm
New Features
- Add option to run multiple templates and or num_demos in single dataset recipe. Now it is possible to give a list of templates or num_demos. Unitxt will randomly sample from the templates and for each instance assign a random template from the list. by @elronbandel in #1110
- A warning is now generated when a metric generate a score with the same name as that of another metric and overwrites it @dafnapension in #1124
- MetricPipeline fields postpreprocess_steps has been renamed to postprocess_steps. The old field (postpreprocess_steps) still exists for backward compatible but depricated. by @dafnapension in #1117
- Decrease runtime of demo examples
- Add tests for RAG metrics by @matanor
- Adding dedicated Unitxt warning and error classes to link online documentation by @yoavkatz in
- The code now uses a central controllable deepcopy function by @elronbandel in #1120
Bug Fixes
- Create a dedicated nltk a mixin, for downloading all versions of punkt which needed by metrics code. by @elronbandel in #1151
- For bulk instance metrics, Replace mean function with nanmean to support aggregation in case of nan scores. by @elronbandel in #1150
- Fix helm test by @elronbandel in #1109
- Fix bug with RAG metrics: Fix use of minilm model by @assaftibm in #1115
- Fix data classification of WML model to include 'public' classification by @yoavkatz in #1118
- Fix WMLInferenceEngine by @pawelknes in #1122
- Fix belebele HF path due to new convention by @elronbandel in #1145
Documentation changes
- Improve debugging.rst wording
- Improve examples.rst wording by @welisheva22 in #1138
- Improve data_classification_policy.rst wording by @welisheva22 in #1139
- Improve rag_support.rst wording by @welisheva22 in #1139
- Improve production.rst wording by @welisheva22 in #1148
- Improve the clarity of the code examples.
- Improve load_datasets.rst wording by @welisheva22
- Improve introduction.rst wording by @welisheva22
- Improve installation.rst wording by @welisheva22
- Improve adding_format.rst wording by @welisheva22
- Improve adding_task.rst wording by @welisheva22
- Improve adding_template.rst wording by @welisheva22
- mprove adding_dataset.rst wording by @hanansinger
- improve index.rst page by @yoavkatz
- Fix link to llama blog in adding_format.rst by @andersonm-ibm in #1113
- Added example of RAG response by @yoavkatz in #1121
New Contributors
- @andersonm-ibm made their first contribution in #1113 by @welisheva22 in #1152