Releases · IBM/unitxt

04 Feb 14:20

elronbandel

1.18.0

2ef9091

Unitxt 1.18.0 - Faster Loading Latest

Latest

The main improvements in this version focus on caching strategies, dataset loading, and speed optimizations.

Hugging Face Datasets Caching Policy

We have completely revised our caching policy and how we handle Hugging Face datasets in order to improve performance.

Hugging Face datasets are now cached by default.

This means that LoadHF loader will cache the downloaded datasets in the HF cache directory (typically ~/.cache/huggingface/datasets).

To disable this caching mechanism, use:

unitxt.settings.disable_hf_datasets_cache = True

All Hugging Face datasets are first downloaded and then processed.
- This means the entire dataset is downloaded, which is faster for most datasets. However, if you want to process a huge dataset, and the HF dataset supports streaming, you can load it in streaming mode
```
LoadHF(name="my-dataset", streaming=True)
```
To enable streaming mode by default for all Hugging Face datasets, use:
```
unitxt.settings.stream_hf_datasets_by_default = True
```

While the new defaults (full download & caching) may make the initial dataset load slower, subsequent loads will be significantly faster.

Unitxt Datasets Caching Policy

By default, when loading datasets with unitxt.load_dataset, the dataset is prepared from scratch each time you call the function.
This ensures that any changes made to the card definition are reflected in the output.

This process may take a few seconds, and for large datasets, repeated loading can accumulate overhead.
If you are using fixed datasets from the catalog, you can enable caching for Unitxt datasets and thus cache the unitxt datasets.
The datasets are cached in the huggingface cache (typically ~/.cache/huggingface/datasets).
```
from unitxt import load_dataset

ds = load_dataset(card="my_card", use_cache=True)
```

Faster Unitxt Dataset Preparation

To improve dataset loading speed, we have optimized how Unitxt datasets are prepared.

Background:

Unitxt datasets are converted to Hugging Face datasets because they store data on disk while keeping only the necessary parts in memory (via PyArrow). This enables efficient handling of large datasets without excessive memory usage.

Previously, unitxt.load_dataset used built-in Hugging Face methods for dataset preparation, which included unnecessary type handling and verification, slowing down the process.

Key improvements:

We now create the Hugging Face dataset directly, reducing preparation time by almost 50%.
With this optimization, Unitxt datasets are now faster than ever!

What's Changed

End of year summary blog post by @elronbandel in #1530
Updated documentation and examples of LLM-as-Judge by @tejaswini in #1532
Eval assist documentation by @tejaswini in #1537
Update notification banner styles and add 2024 summary blog link by @elronbandel in #1538
Add more granite llm as judge artifacts by @martinscooper in #1516
Fix Australian legal qa dataset by @elronbandel in #1542
Set use 1 shot for wikitq in tables_benchmark by @yifanmai in #1541
Bugfix: indexed row major serialization fails with None cell values by @yifanmai in #1540
Solve issue of expired token in Unitxt Assistant by @eladven in #1543
Add Replicate inference support by @elronbandel in #1544
add a filter to wikitq by @ShirApp in #1547
Add text2sql tasks by @perlitz in #1414
Add deduplicate operator by @elronbandel in #1549
Fix the authentication problem by @eladven in #1550
Attach assitant answers to their origins with url link by @elronbandel in #1528
Add mtrag benchmark by @elronbandel in #1548
Update end of year summary blog by @elronbandel in #1552
Add data classification policy to CrossProviderInferenceEngine initialization based on selected model by @elronbandel in #1539
Fix recently broken rag metrics by @elronbandel in #1554
Renamed criterias in LLM-as-a-Judge metrics to criteria - Breaking change by @tejaswini in #1545
Finqa hash to top by @elronbandel in #1555
Refactor safety metric to be faster and updated by @elronbandel in #1484
Improve assistant by @elronbandel in #1556
Feature/add global mmlu cards by @eliyahabba in #1561
Add quality dataset by @eliyahabba in #1563
Add CollateInstanceByField operator to group data by specific field by @sarathsgvr in #1546
Fix prompts table benchmark by @ShirApp in #1565
Create new IntersectCorrespondingFields operator by @pklpriv in #1531
Add granite documents format by @elronbandel in #1566
Revisit huggingface cache policy - BREAKING CHANGE by @elronbandel in #1564
Add global mmlu lite sensitivity cards by @eliyahabba in #1568
Add schema-linking by @KyleErwin in #1533
fix the printout of empty strings in the yaml cards of the catalog by @dafnapension in #1567
Use repr instead of to_json for unitxt dataset caching by @elronbandel in #1570
Added key value extraction evaluation and example with images by @yoavkatz in #1529

New Contributors

@tejaswini made their first contribution in #1532
@KyleErwin made their first contribution in #1533

Full Changelog: 1.17.0...1.18.0

Contributors

yifanmai, tejaswini, and 11 other contributors

Assets 2

02 Feb 20:27

elronbandel

1.17.2

ce6e6b3

Unitxt 1.17.2

What's Changed

Feature/add global mmlu cards by @eliyahabba in #1561
Add quality dataset by @eliyahabba in #1563
Add CollateInstanceByField operator to group data by specific field by @sarathsgvr in #1546
Fix prompts table benchmark by @ShirApp in #1565
Create new IntersectCorrespondingFields operator by @pklpriv in #1531
Add granite documents format by @elronbandel in #1566
Revisit huggingface cache policy by @elronbandel in #1564
Add global mmlu lite sensitivity cards by @eliyahabba in #1568
Update version to 1.17.2 by @elronbandel in #1569

Full Changelog: 1.17.1...1.17.2

Contributors

elronbandel, sarathsgvr, and 3 other contributors

Assets 2

27 Jan 13:00

elronbandel

1.17.1

fddf5e3

Unitxt 1.17.1

What's Changed

Non backward compatible change

Renamed criterias in LLM-as-a-Judge metrics to criteria - Breaking change by @tejaswini in #1545

New features

Add Replicate inference support by @elronbandel in #1544
Add text2sql tasks by @perlitz in #1414
Add deduplicate operator by @elronbandel in #1549

New Assets

Add more granite llm as judge artifacts by @martinscooper in #1516
Add mtrag benchmark by @elronbandel in #1548

Documentation

End of year summary blog post by @elronbandel in #1530
Update notification banner styles and add 2024 summary blog link by @elronbandel in #1538
Updated documentation and examples of LLM-as-Judge by @tejaswini in #1532
Eval assist documentation by @tejaswini in #1537

Bug Fixes

Fix Australian legal qa dataset by @elronbandel in #1542
Set use 1 shot for wikitq in tables_benchmark by @yifanmai in #1541
Bugfix: indexed row major serialization fails with None cell values by @yifanmai in #1540
Solve issue of expired token in Unitxt Assistant by @eladven in #1543
add a filter to wikitq by @ShirApp in #1547
Fix the authentication problem by @eladven in #1550
Attach assitant answers to their origins with url link by @elronbandel in #1528
Update end of year summary blog by @elronbandel in #1552
Add data classification policy to CrossProviderInferenceEngine initialization based on selected model by @elronbandel in #1539
Fix recently broken rag metrics by @elronbandel in #1554
Finqa hash to top by @elronbandel in #1555
Refactor safety metric to be faster and updated by @elronbandel in #1484
Improve assistant by @elronbandel in #1556

New Contributors

@tejaswini made their first contribution in #1532

Full Changelog: 1.17.0...1.17.1

Contributors

yifanmai, tejaswini, and 5 other contributors

Assets 2

21 Jan 12:30

elronbandel

1.17.0

8f249ef

Unitxt 1.17.0 - New LLM as Judges!

Importnat Changes

write abstract for update talk about unitxt covering the following topics:

Criteria based LLM as Judges - Improved class of llm as judges with customizable judging criteria (read more)
Unitxt assistant - A textual assistant expert in unitxt to help developers (read more)
New benchmarks: Tables, Vision - Benchmarks for table understanding and image understanding compiled by the community and collaborators (read more)
Support for all major inference providers - Inference for evaluation or llm as judges can be channel to any inference provider such as: azure, aws and watsonx (read more)

Detailed Changes

Fix typing notation for python 3.8 by @elronbandel in #1453
Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
simplify class parameter listing on web page by @dafnapension in #1454
Bring code coverage tests back to life by @elronbandel in #1455
Fix coverage tests by @elronbandel in #1456
make demos_pool a local var rather than a separate stream by @dafnapension in #1436
Adding upper case and last non empty line processor by @antonpibm in #1458
performance by bluebench by @dafnapension in #1457
Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
remove redundant lines from performance.yml by @dafnapension in #1462
Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
Add SocialIQA dataset by @elronbandel in #1468
Add parallelization to RITS inference by @arielge in #1441
Fix the type handeling for tasks to support string types by @elronbandel in #1470
Update version to 1.16.1 by @elronbandel in #1472
extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
Add GPQA dataset by @elronbandel in #1474
Add simple QA dataset by @elronbandel in #1475
Add LongBench V2 dataset by @elronbandel in #1476
Adding typed recipe test by @antonpibm in #1473
Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
Add multi document support and FRAMES benchmark by @elronbandel in #1477
Update version to 1.16.2 by @elronbandel in #1483
Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
Benjams/fix bioasq card by @BenjSz in #1486
add separator to csv loader by @BenjSz in #1488
Fix bug in metrics loading in tasks by @elronbandel in #1487
Update version to 1.16.3 by @elronbandel in #1489
Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490
Update version to 1.16.4 by @elronbandel in #1491
Simplify artifact link [Non Backward Compatible!] by @elronbandel in #1494
Added NER example by @yoavkatz in #1492
Add example for evaluating tables as images using Unitxt APIs by @elronbandel in #1495
Mm updates by @alfassy in #1465
Fix wrong saving of artifact initial dict by @elronbandel in #1499
Accelerate and improve RAG Metrics by @elronbandel in #1497
Make clinc preparation faster by @elronbandel in #1501
Fix templates lists in vision cards by @elronbandel in #1500
Add vision benchmark example by @elronbandel in #1502
Update vis bench by @elronbandel in #1505
Add Balance operator by @elronbandel in #1507
Fix for demos_pool with images. by @elronbandel in #1509
Remove new balance operator and use existing implementation by @elronbandel in #1510
Fixes and adjustment in rag metrics and related inference engines by @lilacheden in #1466
Tables bench by @ShirApp in #1506
Keep metadata over main unitxt stages by @eladven in #1512
Fix: Improved handling of place_correct_choice_position for flexibl… by @eliyahabba in #1511
Fixes in LLMJudge by @lilacheden in #1498
Verify metrics prediction_type without loading metric by @elronbandel in #1519
Add Unitxt Assistant beta by @elronbandel in #1513
Ensure fusion do not call streams before use by @elronbandel in #1518
Minor llm as judge fix/changes by @martinscooper in #1467
Fix: Selected option for supporting negative indexes in place_correct… by @eliyahabba in #1522
Refactor rag metrics and judges by @lilacheden in #1515
Add Llama 3.1 on Vertex AI to CrossProviderInferenceEngine by @yifanmai in #1525
fix external_rag example by @lilacheden in #1526
Add search to assistant for much faster response by @elronbandel in #1524
fixed division by 0 in compare performance results by @dafnapension in #1523
Add two criteria based direct llm judges by @lilacheden in #1527
Update version to 1.17.0 by @elronbandel in #1535

New Contributors

@eliyahabba made their first contribution in #1464

Full Changelog: 1.16.0...1.17.0

Contributors

yifanmai, eladven, and 11 other contributors

Assets 2

07 Jan 15:29

elronbandel

1.16.4

21b732c

Unitxt 1.16.4

What's Changed

Fix bootstrap condition to handle cases with insufficient instances by @elronbandel in #1490

Contributors

elronbandel

Assets 2

07 Jan 14:57

elronbandel

1.16.3

d7fc247

Unitxt 1.16.3

What's Changed

Add Azure support and expand OpenAI model options in inference engine by @elronbandel in #1485
Benjams/fix bioasq card by @BenjSz in #1486
add separator to csv loader by @BenjSz in #1488
Fix bug in metrics loading in tasks by @elronbandel in #1487

Contributors

BenjSz and elronbandel

Assets 2

07 Jan 10:28

elronbandel

1.16.2

4d24be8

Unitxt 1.16.2

What's Changed

extend choices arrangement functionality with ReorderableMultipleChoi… by @eliyahabba in #1464
Add GPQA dataset by @elronbandel in #1474
Add simple QA dataset by @elronbandel in #1475
Add LongBench V2 dataset by @elronbandel in #1476
Adding typed recipe test by @antonpibm in #1473
Add place_correct_choice_position to set the correct choice index and… by @eliyahabba in #1481
Add MapReduceMetric a new base class to integrate all metrics into by @elronbandel in #1459
Add multi document support and FRAMES benchmark by @elronbandel in #1477

New Contributors

@eliyahabba made their first contribution in #1464

Contributors

elronbandel, antonpibm, and eliyahabba

Assets 2

05 Jan 12:48

elronbandel

1.16.1

028689c

Unitxt 1.16.1

Fix typing notation for python 3.8 by @elronbandel in #1453
Instance_metric and apply_metric keep only one instance at a time in mem, at the expense of repeated passes over input stream (2 times for instance_metric, #metrics for apply_metric) by @dafnapension in #1448
simplify class parameter listing on web page by @dafnapension in #1454
Bring code coverage tests back to life by @elronbandel in #1455
Fix coverage tests by @elronbandel in #1456
make demos_pool a local var rather than a separate stream by @dafnapension in #1436
Adding upper case and last non empty line processor by @antonpibm in #1458
performance by bluebench by @dafnapension in #1457
Add UNITXT_MOCK_INFERENCE_MODE environment variable to performance workflow by @elronbandel in #1461
remove redundant lines from performance.yml by @dafnapension in #1462
Benjams/add bioasq miniwiki datasets by @BenjSz in #1460
Add SocialIQA dataset by @elronbandel in #1468
Add parallelization to RITS inference by @arielge in #1441
Fix the type handeling for tasks to support string types by @elronbandel in #1470

Contributors

BenjSz, elronbandel, and 3 other contributors

Assets 2

23 Dec 14:35

elronbandel

1.16.0

f36f93c

1.16.0

Main Changes

What's Changed

Usability

Add error message when saving artifacts that got changed by @elronbandel in #1417
A simple way to create and evaluate given a 'task' in the catalog and python data structure by @yoavkatz in #1413
Evaluation results class for easier access to results by @elronbandel in #1326
Eval Assist integration by @martinscooper in #1409

Documentation

Update to new logo by @elronbandel in #1427
Indentation within docstrings to improve appearance on web pages, on the way - eliminating two red lines from "make docs-server" by @dafnapension in #1429
Add catalog search with tags filtering by @elronbandel in #1430
Update catalog search engine by @elronbandel in #1431
Add custom titles to catalog items by @elronbandel in #1432
Change card to dataset in the catalog search tags by @elronbandel in #1433
Updated documentation to show use of installed version and chat api by @yoavkatz in #1435
Fix documentation for task registration example by @Etelis in #1443

Bug Fixes

fix mistral format used in llmaj (when not using chat_api) by @lilacheden in #1425
Fix LMMSEval Inference Engine to work with chat api and fix examples by @elronbandel in #1440
metadata is set only once in recipe by @dafnapension in #1437
verify only fresh artifacts are fetched by @dafnapension in #1444
add data_classification_policy_to_clapnq by @BenjSz in #1451

CI/CD

eliminate exceeding line_limit errors, and many red lines from "make docs-server" by @dafnapension in #1434

New Contributors

@Etelis made their first contribution in #1443

Full Changelog: 1.15.10...1.16.0

Contributors

BenjSz, martinscooper, and 5 other contributors

Assets 2

09 Dec 13:34

elronbandel

1.15.10

74a9ad8

1.15.10

What's Changed

Fix arenahard bluebench template by @perlitz in #1405
Fixed formal types of infer() and also added runtime check by @yoavkatz in #1406
not using "score" as metric main_score by @lilacheden in #1407
Fix model strings for Llama 3 on Together AI by @yifanmai in #1411
Adjust binary llmaj to new engines and add rits support by @lilacheden in #1408
Granite Guardian RAG metrics by @arielge in #1393
Solved many red lines in 'make docs-server' by @dafnapension in #1418
Fix artifact dict assignment bug by @elronbandel in #1419
Remove top level imports from guerdian metric (as it adds dependencis to unitxt) by @elronbandel in #1420
Make types compatible with python 3.8 by @elronbandel in #1423
Benjams/loaders fix separator by @BenjSz in #1424
Update version to 1.15.10 by @elronbandel in #1426

Full Changelog: 1.15.9...1.15.10

Contributors

yifanmai, perlitz, and 6 other contributors

Assets 2

Releases: IBM/unitxt

Unitxt 1.18.0 - Faster Loading

Hugging Face Datasets Caching Policy

Unitxt Datasets Caching Policy

Faster Unitxt Dataset Preparation

Background:

Key improvements:

What's Changed

New Contributors

Contributors

Unitxt 1.17.2

What's Changed

Contributors

Unitxt 1.17.1

What's Changed

Non backward compatible change

New features

New Assets

Documentation

Bug Fixes

New Contributors

Contributors

Unitxt 1.17.0 - New LLM as Judges!

Importnat Changes

Detailed Changes

New Contributors

Contributors

Unitxt 1.16.4

What's Changed

Contributors

Unitxt 1.16.3

What's Changed

Contributors

Unitxt 1.16.2

What's Changed

New Contributors

Contributors

Unitxt 1.16.1

Contributors

1.16.0

Main Changes

What's Changed

Usability

Documentation

Bug Fixes

CI/CD

New Contributors

Contributors

1.15.10

What's Changed

Contributors