Releases: embeddings-benchmark/mteb
1.34.0
1.34.0 (2025-02-04)
Feature
-
feat: Add new benchmark BEIR-NL (#1909)
-
BEIR-NL datasets
-
BEIR-NL added to benchmarks
-
BEIR-NL annotations_creators changed to derived
-
BEIR-NL sample_creation clarified
-
Update mteb/tasks/Retrieval/nld/MMARCONLRetrieval.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
- Update mteb/tasks/Retrieval/nld/FEVERNLRetrieval.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
- Update mteb/tasks/Retrieval/nld/ClimateFEVERNLRetrieval.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
-
descriptions of models are changed to include BEIR-NL
-
dates for BEIR-NL fixed
-
more metadata annotations for BEIR-NL
Co-authored-by: Kenneth Enevoldsen <[email protected]> (de8f384
)
Unknown
-
Update tasks table (
d162645
) -
Merge branch 'main' of https://github.com/embeddings-benchmark/mteb (
3036c05
)
1.33.1
1.33.0
1.33.0 (2025-02-04)
Feature
-
feat: Merge MIEB into main 🎉 (#1944)
-
mieb ZeroshotClassification
-
mieb docs
-
mieb implementation demo
-
model meta; abstask column names; linear probe clf
-
model meta; abstask column names; linear probe clf
-
fix: update naming as candidate_labels
-
Update README.md
-
Update README.md
-
i2tretrieval
-
test load data ignore i2tretrieval
-
[MIEB] Add image clustering (#1088)
-
make lint
-
wip
-
add TinyImageNet and run
-
type hints
-
add accuracy
-
lint
-
remove unused & fix typos
-
T2I Retrieval
-
Any2AnyRetrieval
-
fix tests from merge
-
[MIEB] Add image text pair classification and tests (#1099)
-
add ImageTextPairClassification abstask and evaluator
-
dataset transform into sequence of images for each sample
-
fix processing logic; list of list images compatability
-
lint and docstrings
-
make lint
-
fix failing tests in TaskMetadata
-
add tests for mieb
-
skip gated repo
Co-authored-by: gowitheflow-1998 <[email protected]>
- [MIEB] Add image classification and zero shot classification tasks (#1101)
- fix task metadata
- use overrideable column names
- add CIFAR datasets
- add caltech101 dataset
- add FGVC aircraft dataset
- add food 101 dataset
- add OxfordPets dataset
- remove comments
- correct cifar100 path
- update cifar100 classification results
- cifar zero shot results
- add caltech101 zero shot
- matching CLIP paper implementation
- add aircraft and food zero shot
- add oxford pets zero shot
- [MIEB] Add CIFAR clustering (#1104)
add CIFAR clustering - [MIEB] Add more image classification and zero shot classification datasets (#1103)
- update category to i2t
- add MNIST linear probe and zero shot
- add FER2013 linear probe and zero shot
- add stanford cars linear probe and zero shot
- add birdsnap linear probe and zero shot
- add eurosat linear probe and zero shot
- lint
- correct eurosat zero shot labels
- add abstask for image multilable and voc2007
- make lint
- [MIEB] Add more image classification and zero shot datasets (#1105)
- add STL10 linear probe and zero shot
- add RESISC45 linear probe and zeor shot
- add Describable textures linear probe and zero shot
- fix spacing lint
- add SUN397 linear probe and zero shot
- correct SUN397 zero shot captions
- add baai bge vista
- add e5-v
- linting
- memory issues for image linear probe & zeroshot
- kknn linear probe arguments
- del comments
- Add some classification and ZeroShot classification tasks (#1107)
- Add Country211 classification task
- Add imagenet1k classification task
- Add UCF101 classification task
- Add PatchCamelyon Classification task
- Add GTSRB classification task
- Add GSTRB Zero Shot Classification
- Add country211 zero shot classification
- Add results for classification tasks
- Add zero shot classification tasks
- Add PatchCamelyon tasks and results
- Add linting
- Add results and fix prompts for zero shot
- Add results
- Add results and linting
- fix dependency & clip mock test
- [MIEB] Add jina clip (#1120)
- add jina clip and mscoco i2t and t2i results
- make lint
- [MIEB] Update
mieb
with themain
branch and some fixes (#1126) - fix instruction retrival (#1072)
- fix instruction retrival
- fix test
- add points
- make nested results
- add test
- skip instruction test
- fix instruction passes
- fix unions
- move do_length_ablation
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]>
- Update points table
- fix: fix bug-causing spelling error in function name of e5-mistral-instruct (#1106)
found bug - 1.12.85
Automatically generated by python-semantic-release - fix: MultilingualSentimentClassification (#1109)
- Update points table
- fix: Avoid spaces in dataset name for CQADupstack and ignore speed tasks
- 1.12.86
Automatically generated by python-semantic-release - fix: Ensure that MLSUMClusteringP2P.v2 use the fast implementation as was intended (#1112)
- fix: Ensure that MLSUMClusteringP2P.v2 use the fast implementation as was intended
- fix: fixed formatting for cli
- docs: improve searchability in the advanced usage documentation
- 1.12.87
Automatically generated by python-semantic-release - docs: improve searchability in the advanced usage documentation (#1113)
- docs: improve searchability in the advanced usage documentation
- docs: update based on corrections
- fix: export type for
mteb create_meta
(#1114) - fix export type
- fix dataset version too
- 1.12.88
Automatically generated by python-semantic-release - fix: Simplify models implementations (#1085)
- Merge
- Adapt
- Simplify
- Check for rev again
- Rmv cmmnt
- Simplify
- simplify
- Rmv comment
Co-authored-by: Kenneth Enevoldsen <[email protected]> - Use logging; change try except; add info
- Lint
- Rmv results
- Update rev
- format
- Simplify models; Allow instructions
- Jobs
- Fix merge
- Format
- Adapt models
- fix: ensure that e5 ignores the NQ
- format
Co-authored-by: Kenneth Enevoldsen <[email protected]>
- 1.12.89
Automatically generated by python-semantic-release - fix: nomic models using prefix correctly (#1125)
- fix: nomic models using prefix correctly
- chore: remove comment
- fix: handling in case not torch tensor
- Fix typo
Co-authored-by: Niklas Muennighoff <[email protected]>
- 1.12.90
Automatically generated by python-semantic-release - refactor vista model wrapper to contain lib import
- python 38 type hints
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anpalmak2003 <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Niklas Muennighoff <[email protected]>
Co-authored-by: Zach Nussbaum <[email protected]>
Co-authored-by: chenghao xiao <[email protected]>
- image memoery issues for all retrieval Abstasks
- Add CLEVR and SciMMIR Image-Text Understanding tasks (#1127)
- Add CLEVER and SciMMIR
- Update metadata
- remove useless comment
- Add linting
- fix typo and tests
- Add CLEVR count task
- add linting
- add fashion200k & fashionIQ test passed
- clip text max seq truncation
- add WebQA, NIGHTS, OVEN
- any2any retrieval chunk encoding
- add nomic vision model; any2any topk bug
- add cv recall
- add InfoSeek; VisualNews
- [MIEB] Add Stanford Cars i2i Retrieval (#1147)
- wip
- add results
- make lint
- change back the order
- [MIEB] Add CUB200 i2i retrieval (#1154)
- add cub200 and results
- add skip_first_result
- skipped self and rerun results
- consolidate i2t and t2i to any2any
- remove abstask and evaluators
- remove references from test
- tu-add berlin sketch retrieval
- XM3600; XFlickr30kCO; mutilingual
- wit multilingual retrieval t2i
- correct multilingual t2i meta
- meta
- add dinov2 model; 4 sizes
- cls evaluator channel bug fix
- add ALIGN model
- add FORBI2IRetrieval
- forb & tuberlin new revision
- disable tokenization parallelism
- add hateful meme retrieval i2tt2i
- add memotion retrieval t2ii2t
- add SciMMIR Retrieval i2tt2i
- ruff update
- Visual STS Abstask&evaluator
- add visual STS17
- add visual STS 12-16
- [mieb] Add blip and blip2 models, and ImageNetDog15Clustering task (#1226)
- wip: start adding BLIP models
- add other blip variants
- wip: add blip2_models.py
- make lint
- wip: implement blip2 wrapper
- feat: add blip2 models, still mismatched names
- fix: remove projections from image and text embeddings
- make lint
- wip: add coco BLIP2
- fix: BLIP2 better zero-shot classification without text_proj and vision_proj
- tidy blip2
- add imagenet-dog-15 dataset
- tidy and lint
- remove unused import
- add cluster_accuracy, ari and nmi to Image.ClusteringEvaluator
- add imagenet-10 clustering task
- add results forclip on ImageNet10Clustering and ImageNetDog15Clustering
- [mieb] add 3 compositionality evaluation tasks (#1229)
- linting & update unavailable dataset path
- add aro visual relation&attribution; sugarcrepe
- correct reference
- add SOPI2IRetrieval dataset/task (#1232)
- wip: start adding BLIP models
- add other blip variants
- wip: add blip2_models.py
- make lint
- wip: implement blip2 wrapper
- feat: add blip2 models, still mismatched names
- fix: remove projections from image and text embeddings
- make lint
- wip: add coco BLIP2
- fix: BLIP2 better zero-shot classification without text_proj and vision_proj
- tidy blip2
- add imagenet-dog-15 dataset
- tidy and lint
- remove unused import
- add cluster_accuracy, ari and nmi to Image.ClusteringEvaluator
- add imagenet-10 clustering task
- add SOPI2IRetrieval
- add results forclip on ImageNet10Clustering and ImageNetDog15Clustering
- add SOPI2IRetrieval results for clip 32
- add results for clip vit 32/SOPI2IRetrieval
- resolve conflict
- change reference
- Image text pair cls (#1233)
- fix ImageTextPair dataloading for large datasets; more compositionality evaluation datasets
- fix meta data
- fix validate points
Co-authored-by: Isaac Chung <[email protected]>
- Add RP2kI2IRetrieval and METI2IRetrieval (#1239)
- wip: start adding BLIP models
- add other blip variants
- wip: add blip2_models.py
- make lint
- wip: implement blip2 wrapper
- feat: add blip2 models, still mismatched names
- fix: remove projections from image and text embeddings
- make lint
- wip: add coco BLIP2
- fix: BLIP2 better zero-shot classification without text_proj and...
1.32.0
1.32.0 (2025-02-04)
Feature
- feat: add beir (#1933)
add beir (7ef3a90
)
Unknown
-
Updated links in MTEB(eng) and eng,classic (#1948) (
3cf2bed
) -
misc: add bgev1 models (#1928)
-
add bgev1 models
-
add bge-*-en
-
fix naming (
e16acf8
) -
misc: add warning for save_suffix removal from AbsTask (#1940)
add warning for param removal (07c489d
)
- Leaderboard: Acks (#1930)
Add acs (476afc7
)
1.31.8
1.31.8 (2025-02-01)
Documentation
- docs: Updated citation for mteb(scandinavian) (#1914)
fix: Updated citation for mteb(scandinavian) (f3526fc
)
Fix
-
fix: Add datasets in CodeRAG-Bench (#1595)
-
add three out of four datasets in CodeRAG-Bench
-
add verified CodeRAGStackoverflowPostsRetrieval dataset
-
clean up code and make some comments
-
fixed lint errors
-
addressed comments about code-rag datasets: fixed grammar and remove unnessary code and loop
-
roll back files which is not supposed to change
-
fixed the comments in split_by_first_newline() and make the methods private by adding a underscore prefix
-
refactor to use common args
-
update task descriptions
-
add entry in benchmarks
-
correct the alphanumeric order for the dataset
-
add in tasks.md
-
add in tasks.md
-
update task metadata
-
update importing path
-
fix lint errors
-
correct CodeRAG task metadata description field and id for stackoverflow-posts
-
fix error in test
Co-authored-by: Isaac Chung <[email protected]> (9c762da
)
Unknown
- Update tasks table (
57db0f9
)
1.31.7
1.31.7 (2025-02-01)
Documentation
- docs: Add sort to domains for task metadata (#1922)
Tests currently go into an infinite loop. This should prevent that. (6f673ba
)
Fix
Unknown
-
Update tasks table (
14616dc
) -
Update tasks table (
e932dfc
) -
Update tasks table (
6072eae
) -
Update tasks table (
2b95d66
) -
Update tasks table (
e344a2e
) -
Update tasks table (
597b8fc
) -
Update tasks table (
a420249
) -
Update tasks table (
4be5352
) -
Update tasks table (
7474c97
) -
Update tasks table (
9146cc3
) -
Update tasks table (
8cdb25a
) -
Update tasks table (
4294389
) -
Update tasks table (
c275b10
) -
Update tasks table (
0ae0417
) -
Update tasks table (
974ff3c
) -
Update tasks table (
0cd396e
) -
Update tasks table (
de3a1f9
) -
Update tasks table (
996c522
) -
Update tasks table (
2b5f320
) -
Update tasks table (
e183458
) -
Update tasks table (
df3ef70
) -
Update tasks table (
f42d5d0
) -
Update tasks table (
0ac5bf2
) -
Update tasks table (
52c000d
) -
Update tasks table (
0e8a539
) -
Update tasks table (
bf3256a
) -
Update tasks table (
53f4e2e
) -
Update tasks table (
ea6c1a2
) -
Update tasks table (
dafbb80
) -
Update tasks table (
745e2e6
) -
Update tasks table (
d6ff9d0
) -
Update tasks table (
e5ae84f
) -
Update tasks table (
c72a4ba
) -
Update tasks table (
471ea4c
) -
Update tasks table (
887ebf2
) -
Update tasks table (
54d1bd1
) -
Update tasks table (
f1ea61a
) -
Update tasks table (
6cb089f
) -
Update tasks table (
6d051da
) -
Update tasks table (
2756d67
) -
Update tasks table (
e823bd7
) -
Update tasks table (
c3ea285
) -
Update tasks table (
a9be716
) -
Update tasks table (
c01563d
) -
Update tasks table (
d57f988
) -
Update tasks table (
2850833
) -
Update tasks table (
13fd52e
) -
Update tasks table (
ff4e7c6
) -
Update tasks table (
26ffe3a
) -
Update tasks table (
b61de5d
) -
Update tasks table (
5c2cbfc
) -
Update tasks table (
2e34cc7
) -
Update tasks table (
96f3aff
) -
Update tasks table (
d9ba681
) -
Update tasks table (
9641319
) -
Update tasks table (
ad1deff
) -
Update tasks table (
1f7971f
) -
Update tasks table (
88a2fe1
) -
Update tasks table (
03b2380
) -
Update tasks table (
d9c9b9e
) -
Update tasks table (
635ed80
) -
Update tasks table (
f70a994
) -
Update tasks table (
a6c2841
) -
Update tasks table (
37ef436
) -
Update tasks table (
2b4a467
) -
Update tasks table (
3231736
) -
Update tasks table (
d2e1361
) -
Update tasks table (
7258174
) -
Update tasks table (
2cb0c3a
) -
Update tasks table (
6c0070a
) -
Update tasks table (
4b88d1c
) -
Update tasks table (
42bea66
) -
Update tasks table ([`e...
1.31.6
1.31.6 (2025-01-30)
Fix
-
fix: Filling missing metadata for leaderboard release (#1895)
-
Update ArxivClusteringS2S.py
-
fill some metadat for retrieval
-
fill in the reste of missing metadata
-
fix metadata
-
fix climatefever metadata
-
fix: Added CQADupstack annotations
-
removed annotation for non-exisitant task
-
format
-
Added financial to other financial dataset
-
Moved ArguAna annotation to derivate datasets
Co-authored-by: Kenneth Enevoldsen <[email protected]> (938e90f
)
Unknown
-
Update tasks table (
12ad5bd
) -
Update tasks table (
9076213
) -
Update tasks table (
4bb4ec6
) -
Update tasks table (
d510ddb
) -
Update tasks table (
e35c8dd
) -
Update tasks table (
9a6275e
) -
Update tasks table (
d9ab239
) -
Update tasks table (
21b60f5
) -
Update tasks table (
c46cb8b
) -
Update tasks table (
0bbc4c7
) -
Update tasks table (
3123d1c
) -
Update tasks table (
f7438b8
) -
Update tasks table (
51faf65
) -
Update tasks table (
1b76261
) -
Update tasks table (
67f8a79
) -
Update tasks table (
933f4af
) -
Update tasks table (
599849b
) -
Update tasks table (
ff4ae8d
) -
Update tasks table (
780a7d3
) -
Update tasks table (
c34ef64
) -
Update tasks table (
b23597c
) -
Update tasks table (
1030888
) -
Update tasks table (
913112a
) -
Update tasks table (
25a6f17
) -
Update tasks table (
e07ffe8
) -
Update tasks table (
b78525d
) -
Update tasks table (
6989fd5
) -
Update tasks table (
b7e412d
) -
Update tasks table (
2e817b0
) -
Update tasks table (
28ad172
) -
Update tasks table (
2850a97
) -
Update tasks table (
77681bf
) -
Adding a banner to the new MMTEB leaderboard (#1908)
-
Adding a banner to the new MMTEB leaderboard
-
linting
-
Update mteb/leaderboard/app.py
Co-authored-by: Isaac Chung <[email protected]>
- adding reference to mteb arena
Co-authored-by: Isaac Chung <[email protected]> (d0bb5b9
)
-
Update tasks table (
f258cfc
) -
Update tasks table (
6cc0560
) -
Update tasks table (
7996458
) -
Docs: update docs according to current state (#1870)
-
update docs
-
Apply suggestions from code review
Co-authored-by: Isaac Chung <[email protected]>
-
update readme
-
Update README.md
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]> (7e5d6c8
)
-
Update tasks table (
0a59704
) -
Feat: Add FaMTEB (Farsi/Persian Text Embedding Benchmark) (#1843)
-
Add Summary Retrieval Task
-
Add FaMTEBClassification
-
Add FaMTEBClustering
-
Add FaMTEBPairClassification
-
Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS
-
Add FaMTEBSummaryRetrieval
-
Add FaMTEB to benchmarks
-
fix benchmark names
-
temporary fix metadata
-
Fix dataset revisions
-
Update SummaryRetrievalEvaluator.py
-
Update task files
-
Update task files
-
add data domain and subtask description
-
Update AbsTaskSummaryRetrieval and FaMTEBSummaryRetrieval
-
Update AbsTaskSummaryRetrieval
-
Add mock task
-
Update AbsTaskSummaryRetrieval
-
Update AbsTaskSummaryRetrieval
-
make lint
-
Refactor SummaryRetrieval to subclass BitextMining
-
Add aggregated datasets
Co-authored-by: mehran <[email protected]>
Co-authored-by: e.zeinivand <[email protected]>
Co-authored-by: Erfun76 <[email protected]> (f3404b4
)
1.31.5
1.31.5 (2025-01-29)
Fix
- fix: Limited plotly version to be less than 6.0.0 (#1902)
Limited plotly version to be less than 6.0.0 (cec0ed4
)
Unknown
-
Update tasks table (
42c175f
) -
Update tasks table (
a5d1538
) -
Update tasks table (
ef929f8
) -
Update tasks table (
d6deab1
) -
Update tasks table (
1c84c1c
) -
Update tasks table (
cc1e899
) -
update stella/jasper metainfo (#1896)
update stella meta (976bdd5
)
1.31.4
1.31.4 (2025-01-29)
Fix
-
fix: Allow aggregated tasks within benchmarks (#1771)
-
fix: Allow aggregated tasks within benchmarks
Fixes #1231
- feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
-
format
-
remove "en-ext" from AmazonCounterfactualClassification
-
fixed mteb(deu)
-
fix: simplify in a few areas
-
wip
-
tmp
-
sav
-
Allow aggregated tasks within benchmarks
Fixes #1231 -
ensure correct formatting of eval_langs
-
ignore aggregate dataset
-
clean up dummy cases
-
add to mteb(eng, classic)
-
format
-
clean up
-
Allow aggregated tasks within benchmarks
Fixes #1231 -
added fixed from comments
-
fix merge
-
format
-
Updated task type
-
Added minor fix for dummy tasks (
8fb59a4
)
Unknown
-
Update tasks table (
3ee0785
) -
Update tasks table (
02f8ad5
) -
Update tasks table (
c77c82c
) -
Update tasks table (
e8b8ac0
) -
Update tasks table (
50f305f
) -
Update tasks table (
2689cb8
) -
Update tasks table (
24d5373
) -
Update tasks table (
e487eff
) -
Update tasks table (
8bc101f
) -
Update tasks table (
cebf5b6
) -
Update tasks table (
1ead72f
) -
Update tasks table (
d939627
)