Releases: IBM/unitxt
Unitxt 1.18.0 - Faster Loading
The main improvements in this version focus on caching strategies, dataset loading, and speed optimizations.
Hugging Face Datasets Caching Policy
We have completely revised our caching policy and how we handle Hugging Face datasets in order to improve performance.
- Hugging Face datasets are now cached by default.
This means that LoadHF loader will cache the downloaded datasets in the HF cache directory (typically ~/.cache/huggingface/datasets).
- To disable this caching mechanism, use:
unitxt.settings.disable_hf_datasets_cache = True
-
All Hugging Face datasets are first downloaded and then processed.
- This means the entire dataset is downloaded, which is faster for most datasets. However, if you want to process a huge dataset, and the HF dataset supports streaming, you can load it in streaming mode
LoadHF(name="my-dataset", streaming=True)
-
To enable streaming mode by default for all Hugging Face datasets, use:
unitxt.settings.stream_hf_datasets_by_default = True
While the new defaults (full download & caching) may make the initial dataset load slower, subsequent loads will be significantly faster.
Unitxt Datasets Caching Policy
By default, when loading datasets with unitxt.load_dataset
, the dataset is prepared from scratch each time you call the function.
This ensures that any changes made to the card definition are reflected in the output.
-
This process may take a few seconds, and for large datasets, repeated loading can accumulate overhead.
-
If you are using fixed datasets from the catalog, you can enable caching for Unitxt datasets and thus cache the unitxt datasets.
The datasets are cached in the huggingface cache (typically ~/.cache/huggingface/datasets).from unitxt import load_dataset ds = load_dataset(card="my_card", use_cache=True)
Faster Unitxt Dataset Preparation
To improve dataset loading speed, we have optimized how Unitxt datasets are prepared.
Background:
Unitxt datasets are converted to Hugging Face datasets because they store data on disk while keeping only the necessary parts in memory (via PyArrow). This enables efficient handling of large datasets without excessive memory usage.
Previously, unitxt.load_dataset
used built-in Hugging Face methods for dataset preparation, which included unnecessary type handling and verification, slowing down the process.
Key improvements:
- We now create the Hugging Face dataset directly, reducing preparation time by almost 50%.
- With this optimization, Unitxt datasets are now faster than ever!
What's Changed
- End of year summary blog post by @elronbandel in #1530
- Updated documentation and examples of LLM-as-Judge by @tejaswini in #1532
- Eval assist documentation by @tejaswini in #1537
- Update notification banner styles and add 2024 summary blog link by @elronbandel in #1538
- Add more granite llm as judge artifacts by @martinscooper in #1516
- Fix Australian legal qa dataset by @elronbandel in #1542
- Set use 1 shot for wikitq in tables_benchmark by @yifanmai in #1541
- Bugfix: indexed row major serialization fails with None cell values by @yifanmai in #1540
- Solve issue of expired token in Unitxt Assistant by @eladven in #1543
- Add Replicate inference support by @elronbandel in #1544
- add a filter to wikitq by @ShirApp in #1547
- Add text2sql tasks by @perlitz in #1414
- Add deduplicate operator by @elronbandel in #1549
- Fix the authentication problem by @eladven in #1550
- Attach assitant answers to their origins with url link by @elronbandel in #1528
- Add mtrag benchmark by @elronbandel in #1548
- Update end of year summary blog by @elronbandel in #1552
- Add data classification policy to CrossProviderInferenceEngine initialization based on selected model by @elronbandel in #1539
- Fix recently broken rag metrics by @elronbandel in #1554
- Renamed criterias in LLM-as-a-Judge metrics to criteria - Breaking change by @tejaswini in #1545
- Finqa hash to top by @elronbandel in #1555
- Refactor safety metric to be faster and updated by @elronbandel in #1484
- Improve assistant by @elronbandel in #1556
- Feature/add global mmlu cards by @eliyahabba in #1561
- Add quality dataset by @eliyahabba in #1563
- Add CollateInstanceByField operator to group data by specific field by @sarathsgvr in #1546
- Fix prompts table benchmark by @ShirApp in #1565
- Create new IntersectCorrespondingFields operator by @pklpriv in #1531
- Add granite documents format by @elronbandel in #1566
- Revisit huggingface cache policy - BREAKING CHANGE by @elronbandel in #1564
- Add global mmlu lite sensitivity cards by @eliyahabba in #1568
- Add schema-linking by @KyleErwin in #1533
- fix the printout of empty strings in the yaml cards of the catalog by @dafnapension in #1567
- Use repr instead of to_json for unitxt dataset caching by @elronbandel in #1570
- Added key value extraction evaluation and example with images by @yoavkatz in #1529
New Contributors
- @tejaswini made their first contribution in #1532
- @KyleErwin made their first contribution in #1533
Full Changelog: 1.17.0...1.18.0
Unitxt 1.17.2
What's Changed
- Feature/add global mmlu cards by @eliyahabba in #1561
- Add quality dataset by @eliyahabba in #1563
- Add CollateInstanceByField operator to group data by specific field by @sarathsgvr in #1546
- Fix prompts table benchmark by @ShirApp in #1565
- Create new IntersectCorrespondingFields operator by @pklpriv in #1531
- Add granite documents format by @elronbandel in #1566
- Revisit huggingface cache policy by @elronbandel in #1564
- Add global mmlu lite sensitivity cards by @eliyahabba in #1568
- Update version to 1.17.2 by @elronbandel in #1569
Full Changelog: 1.17.1...1.17.2
Unitxt 1.17.1
What's Changed
Non backward compatible change
- Renamed criterias in LLM-as-a-Judge metrics to criteria - Breaking change by @tejaswini in #1545
New features
- Add Replicate inference support by @elronbandel in #1544
- Add text2sql tasks by @perlitz in #1414
- Add deduplicate operator by @elronbandel in #1549
New Assets
- Add more granite llm as judge artifacts by @martinscooper in #1516
- Add mtrag benchmark by @elronbandel in #1548
Documentation
- End of year summary blog post by @elronbandel in #1530
- Update notification banner styles and add 2024 summary blog link by @elronbandel in #1538
- Updated documentation and examples of LLM-as-Judge by @tejaswini in #1532
- Eval assist documentation by @tejaswini in #1537
Bug Fixes
- Fix Australian legal qa dataset by @elronbandel in #1542
- Set use 1 shot for wikitq in tables_benchmark by @yifanmai in #1541
- Bugfix: indexed row major serialization fails with None cell values by @yifanmai in #1540
- Solve issue of expired token in Unitxt Assistant by @eladven in #1543
- add a filter to wikitq by @ShirApp in #1547
- Fix the authentication problem by @eladven in #1550
- Attach assitant answers to their origins with url link by @elronbandel in #1528
- Update end of year summary blog by @elronbandel in #1552
- Add data classification policy to CrossProviderInferenceEngine initialization based on selected model by @elronbandel in #1539
- Fix recently broken rag metrics by @elronbandel in #1554
- Finqa hash to top by @elronbandel in #1555
- Refactor safety metric to be faster and updated by @elronbandel in #1484
- Improve assistant by @elronbandel in #1556
New Contributors
- @tejaswini made their first contribution in #1532
Full Changelog: 1.17.0...1.17.1
Unitxt 1.17.0 - New LLM as Judges!
Importnat Changes
write abstract for update talk about unitxt covering the following topics:
- Criteria based LLM as Judges - Improved class of llm as judges with customizable judging criteria (read more)
- Unitxt assistant - A textual assistant expert in unitxt to help developers (read more)
- New benchmarks: Tables, Vision - Benchmarks for table understanding and image understanding compiled by the community and collaborators (read more)
- Support for all major inference providers - Inference for evaluation or llm as judges can be channel to any inference provider such as: azure, aws and watsonx (read more)
Detailed Changes
- Fix typing notation for python 3.8 by @elronbandel in #1453
- Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
- simplify class parameter listing on web page by @dafnapension in #1454
- Bring code coverage tests back to life by @elronbandel in #1455
- Fix coverage tests by @elronbandel in #1456
- make demos_pool a local var rather than a separate stream by @dafnapension in #1436
- Adding upper case and last non empty line processor by @antonpibm in #1458
- performance by bluebench by @dafnapension in #1457
- Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
- remove redundant lines from performance.yml by @dafnapension in #1462
- Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
- Add SocialIQA dataset by @elronbandel in #1468
- Add parallelization to RITS inference by @arielge in #1441
- Fix the type handeling for tasks to support string types by @elronbandel in #1470
- Update version to 1.16.1 by @elronbandel in #1472
- extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
- Add GPQA dataset by @elronbandel in #1474
- Add simple QA dataset by @elronbandel in #1475
- Add LongBench V2 dataset by @elronbandel in #1476
- Adding typed recipe test by @antonpibm in #1473
- Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
- Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
- Add multi document support and FRAMES benchmark by @elronbandel in #1477
- Update version to 1.16.2 by @elronbandel in #1483
- Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
- Benjams/fix bioasq card by @BenjSz in #1486
- add separator to csv loader by @BenjSz in #1488
- Fix bug in metrics loading in tasks by @elronbandel in #1487
- Update version to 1.16.3 by @elronbandel in #1489
- Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
- Update version to 1.16.4 by @elronbandel in #1491
- Simplify artifact link [Non Backward Compatible!] by @elronbandel in #1494
- Added NER example by @yoavkatz in #1492
- Add example for evaluating tables as images using Unitxt APIs by @elronbandel in #1495
- Mm updates by @alfassy in #1465
- Fix wrong saving of artifact initial dict by @elronbandel in #1499
- Accelerate and improve RAG Metrics by @elronbandel in #1497
- Make clinc preparation faster by @elronbandel in #1501
- Fix templates lists in vision cards by @elronbandel in #1500
- Add vision benchmark example by @elronbandel in #1502
- Update vis bench by @elronbandel in #1505
- Add Balance operator by @elronbandel in #1507
- Fix for demos_pool with images. by @elronbandel in #1509
- Remove new balance operator and use existing implementation by @elronbandel in #1510
- Fixes and adjustment in rag metrics and related inference engines by @lilacheden in #1466
- Tables bench by @ShirApp in #1506
- Keep metadata over main unitxt stages by @eladven in #1512
- Fix: Improved handling of
place_correct_choice_position
for flexibl… by @eliyahabba in #1511 - Fixes in LLMJudge by @lilacheden in #1498
- Verify metrics prediction_type without loading metric by @elronbandel in #1519
- Add Unitxt Assistant beta by @elronbandel in #1513
- Ensure fusion do not call streams before use by @elronbandel in #1518
- Minor llm as judge fix/changes by @martinscooper in #1467
- Fix: Selected option for supporting negative indexes in place_correct… by @eliyahabba in #1522
- Refactor rag metrics and judges by @lilacheden in #1515
- Add Llama 3.1 on Vertex AI to CrossProviderInferenceEngine by @yifanmai in #1525
- fix external_rag example by @lilacheden in #1526
- Add search to assistant for much faster response by @elronbandel in #1524
- fixed division by 0 in compare performance results by @dafnapension in #1523
- Add two criteria based direct llm judges by @lilacheden in #1527
- Update version to 1.17.0 by @elronbandel in #1535
New Contributors
- @eliyahabba made their first contribution in #1464
Full Changelog: 1.16.0...1.17.0
Unitxt 1.16.4
What's Changed
- Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
Unitxt 1.16.3
What's Changed
- Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
- Benjams/fix bioasq card by @BenjSz in #1486
- add separator to csv loader by @BenjSz in #1488
- Fix bug in metrics loading in tasks by @elronbandel in #1487
Unitxt 1.16.2
What's Changed
- extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
- Add GPQA dataset by @elronbandel in #1474
- Add simple QA dataset by @elronbandel in #1475
- Add LongBench V2 dataset by @elronbandel in #1476
- Adding typed recipe test by @antonpibm in #1473
- Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
- Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
- Add multi document support and FRAMES benchmark by @elronbandel in #1477
New Contributors
- @eliyahabba made their first contribution in #1464
Unitxt 1.16.1
- Fix typing notation for python 3.8 by @elronbandel in #1453
- Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
- simplify class parameter listing on web page by @dafnapension in #1454
- Bring code coverage tests back to life by @elronbandel in #1455
- Fix coverage tests by @elronbandel in #1456
- make demos_pool a local var rather than a separate stream by @dafnapension in #1436
- Adding upper case and last non empty line processor by @antonpibm in #1458
- performance by bluebench by @dafnapension in #1457
- Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
- remove redundant lines from performance.yml by @dafnapension in #1462
- Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
- Add SocialIQA dataset by @elronbandel in #1468
- Add parallelization to RITS inference by @arielge in #1441
- Fix the type handeling for tasks to support string types by @elronbandel in #1470
1.16.0
Main Changes
What's Changed
Usability
- Add error message when saving artifacts that got changed by @elronbandel in #1417
- A simple way to create and evaluate given a 'task' in the catalog and python data structure by @yoavkatz in #1413
- Evaluation results class for easier access to results by @elronbandel in #1326
- Eval Assist integration by @martinscooper in #1409
Documentation
- Update to new logo by @elronbandel in #1427
- Indentation within docstrings to improve appearance on web pages, on the way - eliminating two red lines from "make docs-server" by @dafnapension in #1429
- Add catalog search with tags filtering by @elronbandel in #1430
- Update catalog search engine by @elronbandel in #1431
- Add custom titles to catalog items by @elronbandel in #1432
- Change card to dataset in the catalog search tags by @elronbandel in #1433
- Updated documentation to show use of installed version and chat api by @yoavkatz in #1435
- Fix documentation for task registration example by @Etelis in #1443
Bug Fixes
- fix mistral format used in llmaj (when not using chat_api) by @lilacheden in #1425
- Fix LMMSEval Inference Engine to work with chat api and fix examples by @elronbandel in #1440
- metadata is set only once in recipe by @dafnapension in #1437
- verify only fresh artifacts are fetched by @dafnapension in #1444
- add data_classification_policy_to_clapnq by @BenjSz in #1451
CI/CD
- eliminate exceeding line_limit errors, and many red lines from "make docs-server" by @dafnapension in #1434
New Contributors
Full Changelog: 1.15.10...1.16.0
1.15.10
What's Changed
- Fix arenahard bluebench template by @perlitz in #1405
- Fixed formal types of infer() and also added runtime check by @yoavkatz in #1406
- not using "score" as metric main_score by @lilacheden in #1407
- Fix model strings for Llama 3 on Together AI by @yifanmai in #1411
- Adjust binary llmaj to new engines and add rits support by @lilacheden in #1408
- Granite Guardian RAG metrics by @arielge in #1393
- Solved many red lines in 'make docs-server' by @dafnapension in #1418
- Fix artifact dict assignment bug by @elronbandel in #1419
- Remove top level imports from guerdian metric (as it adds dependencis to unitxt) by @elronbandel in #1420
- Make types compatible with python 3.8 by @elronbandel in #1423
- Benjams/loaders fix separator by @BenjSz in #1424
- Update version to 1.15.10 by @elronbandel in #1426
Full Changelog: 1.15.9...1.15.10