Skip to content

Releases: embeddings-benchmark/mteb

1.34.0

04 Feb 16:56
Compare
Choose a tag to compare

1.34.0 (2025-02-04)

Feature

  • feat: Add new benchmark BEIR-NL (#1909)

  • BEIR-NL datasets

  • BEIR-NL added to benchmarks

  • BEIR-NL annotations_creators changed to derived

  • BEIR-NL sample_creation clarified

  • Update mteb/tasks/Retrieval/nld/MMARCONLRetrieval.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • Update mteb/tasks/Retrieval/nld/FEVERNLRetrieval.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • Update mteb/tasks/Retrieval/nld/ClimateFEVERNLRetrieval.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • descriptions of models are changed to include BEIR-NL

  • dates for BEIR-NL fixed

  • more metadata annotations for BEIR-NL


Co-authored-by: Kenneth Enevoldsen <[email protected]> (de8f384)

Unknown

1.33.1

04 Feb 15:19
Compare
Choose a tag to compare

1.33.1 (2025-02-04)

Fix

1.33.0

04 Feb 12:41
Compare
Choose a tag to compare

1.33.0 (2025-02-04)

Feature

  • feat: Merge MIEB into main 🎉 (#1944)

  • mieb ZeroshotClassification

  • mieb docs

  • mieb implementation demo

  • model meta; abstask column names; linear probe clf

  • model meta; abstask column names; linear probe clf

  • fix: update naming as candidate_labels

  • Update README.md

  • Update README.md

  • i2tretrieval

  • test load data ignore i2tretrieval

  • [MIEB] Add image clustering (#1088)

  • make lint

  • wip

  • add TinyImageNet and run

  • type hints

  • add accuracy

  • lint

  • remove unused & fix typos

  • T2I Retrieval

  • Any2AnyRetrieval

  • fix tests from merge

  • [MIEB] Add image text pair classification and tests (#1099)

  • add ImageTextPairClassification abstask and evaluator

  • dataset transform into sequence of images for each sample

  • fix processing logic; list of list images compatability

  • lint and docstrings

  • make lint

  • fix failing tests in TaskMetadata

  • add tests for mieb

  • skip gated repo


Co-authored-by: gowitheflow-1998 <[email protected]>

  • [MIEB] Add image classification and zero shot classification tasks (#1101)
  • fix task metadata
  • use overrideable column names
  • add CIFAR datasets
  • add caltech101 dataset
  • add FGVC aircraft dataset
  • add food 101 dataset
  • add OxfordPets dataset
  • remove comments
  • correct cifar100 path
  • update cifar100 classification results
  • cifar zero shot results
  • add caltech101 zero shot
  • matching CLIP paper implementation
  • add aircraft and food zero shot
  • add oxford pets zero shot
  • [MIEB] Add CIFAR clustering (#1104)
    add CIFAR clustering
  • [MIEB] Add more image classification and zero shot classification datasets (#1103)
  • update category to i2t
  • add MNIST linear probe and zero shot
  • add FER2013 linear probe and zero shot
  • add stanford cars linear probe and zero shot
  • add birdsnap linear probe and zero shot
  • add eurosat linear probe and zero shot
  • lint
  • correct eurosat zero shot labels
  • add abstask for image multilable and voc2007
  • make lint
  • [MIEB] Add more image classification and zero shot datasets (#1105)
  • add STL10 linear probe and zero shot
  • add RESISC45 linear probe and zeor shot
  • add Describable textures linear probe and zero shot
  • fix spacing lint
  • add SUN397 linear probe and zero shot
  • correct SUN397 zero shot captions
  • add baai bge vista
  • add e5-v
  • linting
  • memory issues for image linear probe & zeroshot
  • kknn linear probe arguments
  • del comments
  • Add some classification and ZeroShot classification tasks (#1107)
  • Add Country211 classification task
  • Add imagenet1k classification task
  • Add UCF101 classification task
  • Add PatchCamelyon Classification task
  • Add GTSRB classification task
  • Add GSTRB Zero Shot Classification
  • Add country211 zero shot classification
  • Add results for classification tasks
  • Add zero shot classification tasks
  • Add PatchCamelyon tasks and results
  • Add linting
  • Add results and fix prompts for zero shot
  • Add results
  • Add results and linting
  • fix dependency & clip mock test
  • [MIEB] Add jina clip (#1120)
  • add jina clip and mscoco i2t and t2i results
  • make lint
  • [MIEB] Update mieb with the main branch and some fixes (#1126)
  • fix instruction retrival (#1072)
  • fix instruction retrival
  • fix test
  • add points
  • make nested results
  • add test
  • skip instruction test
  • fix instruction passes
  • fix unions
  • move do_length_ablation
    Co-authored-by: Kenneth Enevoldsen <[email protected]>

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • Update points table
  • fix: fix bug-causing spelling error in function name of e5-mistral-instruct (#1106)
    found bug
  • 1.12.85
    Automatically generated by python-semantic-release
  • fix: MultilingualSentimentClassification (#1109)
  • Update points table
  • fix: Avoid spaces in dataset name for CQADupstack and ignore speed tasks
  • 1.12.86
    Automatically generated by python-semantic-release
  • fix: Ensure that MLSUMClusteringP2P.v2 use the fast implementation as was intended (#1112)
  • fix: Ensure that MLSUMClusteringP2P.v2 use the fast implementation as was intended
  • fix: fixed formatting for cli
  • docs: improve searchability in the advanced usage documentation
  • 1.12.87
    Automatically generated by python-semantic-release
  • docs: improve searchability in the advanced usage documentation (#1113)
  • docs: improve searchability in the advanced usage documentation
  • docs: update based on corrections
  • fix: export type for mteb create_meta (#1114)
  • fix export type
  • fix dataset version too
  • 1.12.88
    Automatically generated by python-semantic-release
  • fix: Simplify models implementations (#1085)
  • Merge
  • Adapt
  • Simplify
  • Check for rev again
  • Rmv cmmnt
  • Simplify
  • simplify
  • Rmv comment
    Co-authored-by: Kenneth Enevoldsen <[email protected]>
  • Use logging; change try except; add info
  • Lint
  • Rmv results
  • Update rev
  • format
  • Simplify models; Allow instructions
  • Jobs
  • Fix merge
  • Format
  • Adapt models
  • fix: ensure that e5 ignores the NQ
  • format

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • 1.12.89
    Automatically generated by python-semantic-release
  • fix: nomic models using prefix correctly (#1125)
  • fix: nomic models using prefix correctly
  • chore: remove comment
  • fix: handling in case not torch tensor
  • Fix typo

Co-authored-by: Niklas Muennighoff <[email protected]>

  • 1.12.90
    Automatically generated by python-semantic-release
  • refactor vista model wrapper to contain lib import
  • python 38 type hints

Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: anpalmak2003 <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Niklas Muennighoff <[email protected]>
Co-authored-by: Zach Nussbaum <[email protected]>
Co-authored-by: chenghao xiao <[email protected]>

  • image memoery issues for all retrieval Abstasks
  • Add CLEVR and SciMMIR Image-Text Understanding tasks (#1127)
  • Add CLEVER and SciMMIR
  • Update metadata
  • remove useless comment
  • Add linting
  • fix typo and tests
  • Add CLEVR count task
  • add linting
  • add fashion200k & fashionIQ test passed
  • clip text max seq truncation
  • add WebQA, NIGHTS, OVEN
  • any2any retrieval chunk encoding
  • add nomic vision model; any2any topk bug
  • add cv recall
  • add InfoSeek; VisualNews
  • [MIEB] Add Stanford Cars i2i Retrieval (#1147)
  • wip
  • add results
  • make lint
  • change back the order
  • [MIEB] Add CUB200 i2i retrieval (#1154)
  • add cub200 and results
  • add skip_first_result
  • skipped self and rerun results
  • consolidate i2t and t2i to any2any
  • remove abstask and evaluators
  • remove references from test
  • tu-add berlin sketch retrieval
  • XM3600; XFlickr30kCO; mutilingual
  • wit multilingual retrieval t2i
  • correct multilingual t2i meta
  • meta
  • add dinov2 model; 4 sizes
  • cls evaluator channel bug fix
  • add ALIGN model
  • add FORBI2IRetrieval
  • forb & tuberlin new revision
  • disable tokenization parallelism
  • add hateful meme retrieval i2tt2i
  • add memotion retrieval t2ii2t
  • add SciMMIR Retrieval i2tt2i
  • ruff update
  • Visual STS Abstask&evaluator
  • add visual STS17
  • add visual STS 12-16
  • [mieb] Add blip and blip2 models, and ImageNetDog15Clustering task (#1226)
  • wip: start adding BLIP models
  • add other blip variants
  • wip: add blip2_models.py
  • make lint
  • wip: implement blip2 wrapper
  • feat: add blip2 models, still mismatched names
  • fix: remove projections from image and text embeddings
  • make lint
  • wip: add coco BLIP2
  • fix: BLIP2 better zero-shot classification without text_proj and vision_proj
  • tidy blip2
  • add imagenet-dog-15 dataset
  • tidy and lint
  • remove unused import
  • add cluster_accuracy, ari and nmi to Image.ClusteringEvaluator
  • add imagenet-10 clustering task
  • add results forclip on ImageNet10Clustering and ImageNetDog15Clustering
  • [mieb] add 3 compositionality evaluation tasks (#1229)
  • linting & update unavailable dataset path
  • add aro visual relation&attribution; sugarcrepe
  • correct reference
  • add SOPI2IRetrieval dataset/task (#1232)
  • wip: start adding BLIP models
  • add other blip variants
  • wip: add blip2_models.py
  • make lint
  • wip: implement blip2 wrapper
  • feat: add blip2 models, still mismatched names
  • fix: remove projections from image and text embeddings
  • make lint
  • wip: add coco BLIP2
  • fix: BLIP2 better zero-shot classification without text_proj and vision_proj
  • tidy blip2
  • add imagenet-dog-15 dataset
  • tidy and lint
  • remove unused import
  • add cluster_accuracy, ari and nmi to Image.ClusteringEvaluator
  • add imagenet-10 clustering task
  • add SOPI2IRetrieval
  • add results forclip on ImageNet10Clustering and ImageNetDog15Clustering
  • add SOPI2IRetrieval results for clip 32
  • add results for clip vit 32/SOPI2IRetrieval
  • resolve conflict
  • change reference
  • Image text pair cls (#1233)
  • fix ImageTextPair dataloading for large datasets; more compositionality evaluation datasets
  • fix meta data
  • fix validate points

Co-authored-by: Isaac Chung <[email protected]>

  • Add RP2kI2IRetrieval and METI2IRetrieval (#1239)
  • wip: start adding BLIP models
  • add other blip variants
  • wip: add blip2_models.py
  • make lint
  • wip: implement blip2 wrapper
  • feat: add blip2 models, still mismatched names
  • fix: remove projections from image and text embeddings
  • make lint
  • wip: add coco BLIP2
  • fix: BLIP2 better zero-shot classification without text_proj and...
Read more

1.32.0

04 Feb 10:41
Compare
Choose a tag to compare

1.32.0 (2025-02-04)

Feature

add beir (7ef3a90)

Unknown

  • Updated links in MTEB(eng) and eng,classic (#1948) (3cf2bed)

  • misc: add bgev1 models (#1928)

  • add bgev1 models

  • add bge-*-en

  • fix naming (e16acf8)

  • misc: add warning for save_suffix removal from AbsTask (#1940)

add warning for param removal (07c489d)

  • Leaderboard: Acks (#1930)

Add acs (476afc7)

1.31.8

01 Feb 16:03
Compare
Choose a tag to compare

1.31.8 (2025-02-01)

Documentation

  • docs: Updated citation for mteb(scandinavian) (#1914)

fix: Updated citation for mteb(scandinavian) (f3526fc)

Fix

  • fix: Add datasets in CodeRAG-Bench (#1595)

  • add three out of four datasets in CodeRAG-Bench

  • add verified CodeRAGStackoverflowPostsRetrieval dataset

  • clean up code and make some comments

  • fixed lint errors

  • addressed comments about code-rag datasets: fixed grammar and remove unnessary code and loop

  • roll back files which is not supposed to change

  • fixed the comments in split_by_first_newline() and make the methods private by adding a underscore prefix

  • refactor to use common args

  • update task descriptions

  • add entry in benchmarks

  • correct the alphanumeric order for the dataset

  • add in tasks.md

  • add in tasks.md

  • update task metadata

  • update importing path

  • fix lint errors

  • correct CodeRAG task metadata description field and id for stackoverflow-posts

  • fix error in test


Co-authored-by: Isaac Chung <[email protected]> (9c762da)

Unknown

1.31.7

01 Feb 15:31
Compare
Choose a tag to compare

1.31.7 (2025-02-01)

Documentation

  • docs: Add sort to domains for task metadata (#1922)

Tests currently go into an infinite loop. This should prevent that. (6f673ba)

Fix

Unknown

Read more

1.31.6

30 Jan 22:22
Compare
Choose a tag to compare

1.31.6 (2025-01-30)

Fix

  • fix: Filling missing metadata for leaderboard release (#1895)

  • Update ArxivClusteringS2S.py

  • fill some metadat for retrieval

  • fill in the reste of missing metadata

  • fix metadata

  • fix climatefever metadata

  • fix: Added CQADupstack annotations

  • removed annotation for non-exisitant task

  • format

  • Added financial to other financial dataset

  • Moved ArguAna annotation to derivate datasets


Co-authored-by: Kenneth Enevoldsen <[email protected]> (938e90f)

Unknown

Co-authored-by: Isaac Chung <[email protected]>

  • adding reference to mteb arena

Co-authored-by: Isaac Chung <[email protected]> (d0bb5b9)

  • Update tasks table (f258cfc)

  • Update tasks table (6cc0560)

  • Update tasks table (7996458)

  • Docs: update docs according to current state (#1870)

  • update docs

  • Apply suggestions from code review

Co-authored-by: Isaac Chung <[email protected]>

  • update readme

  • Update README.md

Co-authored-by: Isaac Chung <[email protected]>


Co-authored-by: Isaac Chung <[email protected]> (7e5d6c8)

  • Update tasks table (0a59704)

  • Feat: Add FaMTEB (Farsi/Persian Text Embedding Benchmark) (#1843)

  • Add Summary Retrieval Task

  • Add FaMTEBClassification

  • Add FaMTEBClustering

  • Add FaMTEBPairClassification

  • Add FaMTEBRetrieval and BEIRFA and FaMTEBSTS

  • Add FaMTEBSummaryRetrieval

  • Add FaMTEB to benchmarks

  • fix benchmark names

  • temporary fix metadata

  • Fix dataset revisions

  • Update SummaryRetrievalEvaluator.py

  • Update task files

  • Update task files

  • add data domain and subtask description

  • Update AbsTaskSummaryRetrieval and FaMTEBSummaryRetrieval

  • Update AbsTaskSummaryRetrieval

  • Add mock task

  • Update AbsTaskSummaryRetrieval

  • Update AbsTaskSummaryRetrieval

  • make lint

  • Refactor SummaryRetrieval to subclass BitextMining

  • Add aggregated datasets


Co-authored-by: mehran <[email protected]>
Co-authored-by: e.zeinivand <[email protected]>
Co-authored-by: Erfun76 <[email protected]> (f3404b4)

1.31.5

29 Jan 14:15
Compare
Choose a tag to compare

1.31.5 (2025-01-29)

Fix

  • fix: Limited plotly version to be less than 6.0.0 (#1902)

Limited plotly version to be less than 6.0.0 (cec0ed4)

Unknown

update stella meta (976bdd5)

1.31.4

29 Jan 11:29
Compare
Choose a tag to compare

1.31.4 (2025-01-29)

Fix

  • fix: Allow aggregated tasks within benchmarks (#1771)

  • fix: Allow aggregated tasks within benchmarks

Fixes #1231

  • feat: Update task filtering, fixing bug on MTEB
  • Updated task filtering adding exclusive_language_filter and hf_subset
  • fix bug in MTEB where cross-lingual splits were included
  • added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)

The following code outlines the problems:

import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC

task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
# was eq. to:
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;])
task.hf_subsets
# correct filtering to English datasets:
# [&#39;en&#39;, &#39;de-en&#39;, &#39;es-en&#39;, &#39;pl-en&#39;, &#39;zh-en&#39;]
# However it should be:
# [&#39;en&#39;]

# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
task.hf_subsets
# [&#39;en&#39;]
# eq. to
task = mteb.get_task(&#34;STS22&#34;, hf_subsets=[&#34;en&#34;])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;], exclusive_language_filter=True)
  • format

  • remove "en-ext" from AmazonCounterfactualClassification

  • fixed mteb(deu)

  • fix: simplify in a few areas

  • wip

  • tmp

  • sav

  • Allow aggregated tasks within benchmarks
    Fixes #1231

  • ensure correct formatting of eval_langs

  • ignore aggregate dataset

  • clean up dummy cases

  • add to mteb(eng, classic)

  • format

  • clean up

  • Allow aggregated tasks within benchmarks
    Fixes #1231

  • added fixed from comments

  • fix merge

  • format

  • Updated task type

  • Added minor fix for dummy tasks (8fb59a4)

Unknown

1.31.3

28 Jan 15:24
Compare
Choose a tag to compare

1.31.3 (2025-01-28)

Fix

  • fix: External results are preferred when only they have the needed splits (#1893)

join_revisions now prefers task_results where the scores are not empty (2a41730)