Skip to content

Releases: embeddings-benchmark/mteb

1.30.0

25 Jan 04:05
Compare
Choose a tag to compare

1.30.0 (2025-01-25)

Feature

  • feat: Integrating ChemTEB (#1708)

  • Add SMILES, AI Paraphrase and Inter-Source Paragraphs PairClassification Tasks

  • Add chemical subsets of NQ and HotpotQA datasets as Retrieval tasks

  • Add PubChem Synonyms PairClassification task

  • Update task init for previously added tasks

  • Add nomic-bert loader

  • Add a script to run the evaluation pipeline for chemical-related tasks

  • Add 15 Wikipedia article classification tasks

  • Add PairClassification and BitextMining tasks for Coconut SMILES

  • Fix naming of some Classification and PairClassification tasks

  • Fix some classification tasks naming issues

  • Integrate WANDB with benchmarking script

  • Update .gitignore

  • Fix nomic_models.py issue with retrieval tasks, similar to issue #1115 in original repo

  • Add one chemical model and some SentenceTransformer models

  • Fix a naming issue for SentenceTransformer models

  • Add OpenAI, bge-m3 and matscibert models

  • Add PubChem SMILES Bitext Mining tasks

  • Change metric namings to be more descriptive

  • Add English e5 and bge v1 models, all the sizes

  • Add two Wikipedia Clustering tasks

  • Add a try-except in evaluation script to skip faulty models during the benchmark.

  • Add bge v1.5 models and clustering score extraction to json parser

  • Add Amazon Titan embedding models

  • Add Cohere Bedrock models

  • Add two SDS Classification tasks

  • Add SDS Classification tasks to classification init and chem_eval

  • Add a retrieval dataset, update dataset names and revisions

  • Update revision for the CoconutRetrieval dataset: handle duplicate SMILES (documents)

  • Update CoconutSMILES2FormulaPC task

  • Change CoconutRetrieval dataset to a smaller one

  • Update some models

  • Integrate models added in ChemTEB (such as amazon, cohere bedrock and nomic bert) with latest modeling format in mteb.
  • Update the metadata for the mentioned models
  • Fix a typo
    open_weights argument is repeated twice

  • Update ChemTEB tasks

  • Rename some tasks for better readability.
  • Merge some BitextMining and PairClassification tasks into a single task with subsets (PubChemSMILESBitextMining and PubChemSMILESPC)
  • Add a new multilingual task (PubChemWikiPairClassification) consisting of 12 languages.
  • Update dataset paths, revisions and metadata for most tasks.
  • Add a Chemistry domain to TaskMetadata
  • Remove unnecessary files and tasks for MTEB

  • Update some ChemTEB tasks

  • Move PubChemSMILESBitextMining to eng folder
  • Add citations for tasks involving SDS, NQ, Hotpot, PubChem data
  • Update Clustering tasks category
  • Change main_score for PubChemAISentenceParaphrasePC
  • Create ChemTEB benchmark

  • Remove CoconutRetrieval

  • Update tasks and benchmarks tables with ChemTEB

  • Mention ChemTEB in readme

  • Fix some issues, update task metadata, lint

  • eval_langs fixed
  • Dataset path was fixed for two datasets
  • Metadata was completed for all tasks, mainly following fields: date, task_subtypes, dialect, sample_creation
  • ruff lint
  • rename nomic_bert_models.py to nomic_bert_model.py and update it.
  • Remove nomic_bert_model.py as it is now compatible with SentenceTransformer.

  • Remove WikipediaAIParagraphsParaphrasePC task due to being trivial.

  • Merge amazon_models and cohere_bedrock_models.py into bedrock_models.py

  • Remove unnecessary load_data for some tasks.

  • Update bedrock_models.py, openai_models.py and two dataset revisions

  • Text should be truncated for amazon text embedding models.
  • text-embedding-ada-002 returns null embeddings for some inputs with 8192 tokens.
  • Two datasets are updated, dropping very long samples (len > 99th percentile)
  • Add a layer of dynamic truncation for amazon models in bedrock_models.py

  • Replace metadata_dict with self.metadata in PubChemSMILESPC.py

  • fix model meta for bedrock models

  • Add reference comment to original Cohere API implementation (4d66434)

Unknown

1.29.16

22 Jan 12:11
Compare
Choose a tag to compare

1.29.16 (2025-01-22)

Fix

  • fix: Added correct training data annotation to LENS (#1859)

Added correct training data annotation to LENS (e775436)

1.29.15

22 Jan 11:50
Compare
Choose a tag to compare

1.29.15 (2025-01-22)

Fix

  • fix: Adding missing model meta (#1856)

  • Added CDE models

  • Added bge-en-icl

  • Updated CDE to bge_full_data

  • Fixed public_training_data flag type to include boolean, as this is how all models are annotated

  • Added public training data link instead of bool to CDE and BGE

  • Added GME models

  • Changed Torch to PyTorch

  • Added metadata on LENS models

  • Added ember_v1

  • Added metadata for amazon titan

  • Removed GME implementation (692bd26)

1.29.14

22 Jan 09:41
Compare
Choose a tag to compare

1.29.14 (2025-01-22)

Fix

  • fix: Fix zeta alpha mistral (#1736)

  • fix zeta alpha mistral

  • update use_instructions

  • update training datasets

  • Update mteb/models/e5_instruct.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

  • update float

  • Update mteb/models/e5_instruct.py


Co-authored-by: Kenneth Enevoldsen <[email protected]> (4985da9)

  • fix: Hotfixed public_training_data type annotation (#1857)

Fixed public_training_data flag type to include boolean, as this is how all models are annotated (4bd7328)

Unknown

  • Add more annotations (#1833)

  • apply additions from #1794

  • add annotations for rumodels

  • add nomic training data

  • fix metadata

  • update rest of model meta

  • fix bge reranker (12ed9c5)

1.29.13

22 Jan 07:12
Compare
Choose a tag to compare

1.29.13 (2025-01-22)

Fix

  • fix: Fixed leaderboard search bar (#1852)

Fixed leaderboard search bar (fe33061)

1.29.12

21 Jan 11:37
Compare
Choose a tag to compare

1.29.12 (2025-01-21)

Fix

  • fix: Leaderboard Refinements (#1849)

  • Added better descriptions to benchmarks and removed beta tags

  • Fixed zero-shot filtering on app loading

  • Added zero-shot definition in an accordion

  • NaN values are now filled with blank

  • Added type hints to filter_models (a8cc887)

1.29.11

21 Jan 10:54
Compare
Choose a tag to compare

1.29.11 (2025-01-21)

Fix

  • fix: Add reported annotation and re-added public_training_data (#1846)

  • fix: Add additional dataset annotations

  • fix: readded public training data

  • update voyage annotations (a7a8144)

1.29.10

20 Jan 06:08
Compare
Choose a tag to compare

1.29.10 (2025-01-20)

Fix

  • fix: Remove default params, public_training_data and memory usage in ModelMeta (#1794)

  • fix: Leaderboard: K instead of M
    Fixes #1752

  • format

  • fixed existing annotations to refer to task name instead of hf dataset

  • added annotation to nvidia

  • added voyage

  • added uae annotations

  • Added stella annotations

  • sentence trf models

  • added salesforce and e5

  • jina

  • bge + model2vec

  • added llm2vec annotations

  • add jasper

  • format

  • format

  • Updated annotations and moved jina models

  • make models parameters needed to be filled

  • fix tests

  • remove comments

  • remove model meta from test

  • fix model meta from split

  • fix: add even more training dataset annotations (#1793)

  • fix: update max tokens for OpenAI (#1772)
    update max tokens

  • ci: skip AfriSentiLID for now (#1785)

  • skip AfriSentiLID for now

  • skip relevant test case instead


Co-authored-by: Isaac Chung <[email protected]>

  • 1.28.7
    Automatically generated by python-semantic-release
  • ci: fix model loading test (#1775)
  • pass base branch into the make command as an arg
  • test a file that has custom wrapper
  • what about overview
  • just dont check overview
  • revert instance check
  • explicitly omit overview and init
  • remove test change
  • try on a lot of models
  • revert test model file

Co-authored-by: Isaac Chung <[email protected]>

  • feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
  • feat: Update task filtering, fixing bug on MTEB
  • Updated task filtering adding exclusive_language_filter and hf_subset
  • fix bug in MTEB where cross-lingual splits were included
  • added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
    The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
# was eq. to:
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;])
task.hf_subsets
# correct filtering to English datasets:
# [&#39;en&#39;, &#39;de-en&#39;, &#39;es-en&#39;, &#39;pl-en&#39;, &#39;zh-en&#39;]
# However it should be:
# [&#39;en&#39;]
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
task.hf_subsets
# [&#39;en&#39;]
# eq. to
task = mteb.get_task(&#34;STS22&#34;, hf_subsets=[&#34;en&#34;])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;], exclusive_language_filter=True)
  • format
  • remove "en-ext" from AmazonCounterfactualClassification
  • fixed mteb(deu)
  • fix: simplify in a few areas
  • fix: Add gritlm
  • 1.29.0
    Automatically generated by python-semantic-release
  • fix: Added more annotations!
  • fix: Added C-MTEB (#1786)
    Added C-MTEB
  • 1.29.1
    Automatically generated by python-semantic-release
  • docs: Add contact to MMTEB benchmarks (#1796)
  • Add myself to MMTEB benchmarks
  • lint
  • fix: loading pre 11 (#1798)
  • fix loading pre 11
  • add similarity
  • lint
  • run all task types
  • 1.29.2
    Automatically generated by python-semantic-release
  • fix: allow to load no revision available (#1801)
  • fix allow to load no revision available
  • lint
  • add require_model_meta to leaderboard
  • lint
  • 1.29.3
    Automatically generated by python-semantic-release

Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>

  • fig merges
  • update models info
  • change public_training_code to str
  • change public_training_code=False to None
  • remove annotations
  • remove annotations
  • remove changed annotations
  • remove changed annotations
  • remove public_training_data and memory usage
  • make framework not optional
  • make framework non-optional
  • empty frameworks
  • add framework
  • fix tests
  • Update mteb/models/overview.py
    Co-authored-by: Isaac Chung <[email protected]>

Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]> (0a83e38)

  • fix: subsets to run (#1830)

  • fix split evals

  • add test

  • lint

  • fix moka

  • add assert (8be6b2e)

1.29.9

17 Jan 15:09
Compare
Choose a tag to compare

1.29.9 (2025-01-17)

Fix

  • fix: Fixed eval split for MultilingualSentiment in C-MTEB (#1804)

  • Fixed eval split for MultilingualSentiment in C-MTEB

  • FIxed splits for atec, bq and stsb in C-MTEB (96f639b)

1.29.8

17 Jan 14:04
Compare
Choose a tag to compare

1.29.8 (2025-01-17)

Fix

  • fix: Added Misc Chinese models (#1819)

  • Added moka and piccolo models to overview file

  • Added Text2Vec models

  • Added various Chinese embedding models


Co-authored-by: Isaac Chung <[email protected]> (9823529)

  • fix: Added way more training dataset annotations (#1765)

  • fix: Leaderboard: K instead of M
    Fixes #1752

  • format

  • fixed existing annotations to refer to task name instead of hf dataset

  • added annotation to nvidia

  • added voyage

  • added uae annotations

  • Added stella annotations

  • sentence trf models

  • added salesforce and e5

  • jina

  • bge + model2vec

  • added llm2vec annotations

  • add jasper

  • format

  • format

  • Updated annotations and moved jina models

  • fix: add even more training dataset annotations (#1793)

  • fix: update max tokens for OpenAI (#1772)

update max tokens

  • ci: skip AfriSentiLID for now (#1785)

  • skip AfriSentiLID for now

  • skip relevant test case instead


Co-authored-by: Isaac Chung <[email protected]>

  • 1.28.7

Automatically generated by python-semantic-release

  • ci: fix model loading test (#1775)

  • pass base branch into the make command as an arg

  • test a file that has custom wrapper

  • what about overview

  • just dont check overview

  • revert instance check

  • explicitly omit overview and init

  • remove test change

  • try on a lot of models

  • revert test model file


Co-authored-by: Isaac Chung <[email protected]>

  • feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)

  • feat: Update task filtering, fixing bug on MTEB

  • Updated task filtering adding exclusive_language_filter and hf_subset
  • fix bug in MTEB where cross-lingual splits were included
  • added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)

The following code outlines the problems:

import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC

task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
# was eq. to:
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;])
task.hf_subsets
# correct filtering to English datasets:
# [&#39;en&#39;, &#39;de-en&#39;, &#39;es-en&#39;, &#39;pl-en&#39;, &#39;zh-en&#39;]
# However it should be:
# [&#39;en&#39;]

# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &#34;STS22&#34;][0]
task.hf_subsets
# [&#39;en&#39;]
# eq. to
task = mteb.get_task(&#34;STS22&#34;, hf_subsets=[&#34;en&#34;])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&#34;STS22&#34;, languages=[&#34;eng&#34;], exclusive_language_filter=True)
  • format

  • remove "en-ext" from AmazonCounterfactualClassification

  • fixed mteb(deu)

  • fix: simplify in a few areas

  • fix: Add gritlm

  • 1.29.0

Automatically generated by python-semantic-release

  • fix: Added more annotations!

  • fix: Added C-MTEB (#1786)

Added C-MTEB

  • 1.29.1

Automatically generated by python-semantic-release

  • docs: Add contact to MMTEB benchmarks (#1796)

  • Add myself to MMTEB benchmarks

  • lint

  • fix: loading pre 11 (#1798)

  • fix loading pre 11

  • add similarity

  • lint

  • run all task types

  • 1.29.2

Automatically generated by python-semantic-release

  • fix: allow to load no revision available (#1801)

  • fix allow to load no revision available

  • lint

  • add require_model_meta to leaderboard

  • lint

  • 1.29.3

Automatically generated by python-semantic-release


Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>


Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]> (3b2d074)

Co-authored-by: sam021313 <[email protected]> (96420a2)

  • fix: Added Chinese Stella models (#1824)

Added Chinese Stella models (74b495c)