Releases: embeddings-benchmark/mteb
1.30.0
1.30.0 (2025-01-25)
Feature
-
feat: Integrating ChemTEB (#1708)
-
Add SMILES, AI Paraphrase and Inter-Source Paragraphs PairClassification Tasks
-
Add chemical subsets of NQ and HotpotQA datasets as Retrieval tasks
-
Add PubChem Synonyms PairClassification task
-
Update task init for previously added tasks
-
Add nomic-bert loader
-
Add a script to run the evaluation pipeline for chemical-related tasks
-
Add 15 Wikipedia article classification tasks
-
Add PairClassification and BitextMining tasks for Coconut SMILES
-
Fix naming of some Classification and PairClassification tasks
-
Fix some classification tasks naming issues
-
Integrate WANDB with benchmarking script
-
Update .gitignore
-
Fix
nomic_models.py
issue with retrieval tasks, similar to issue #1115 in original repo -
Add one chemical model and some SentenceTransformer models
-
Fix a naming issue for SentenceTransformer models
-
Add OpenAI, bge-m3 and matscibert models
-
Add PubChem SMILES Bitext Mining tasks
-
Change metric namings to be more descriptive
-
Add English e5 and bge v1 models, all the sizes
-
Add two Wikipedia Clustering tasks
-
Add a try-except in evaluation script to skip faulty models during the benchmark.
-
Add bge v1.5 models and clustering score extraction to json parser
-
Add Amazon Titan embedding models
-
Add Cohere Bedrock models
-
Add two SDS Classification tasks
-
Add SDS Classification tasks to classification init and chem_eval
-
Add a retrieval dataset, update dataset names and revisions
-
Update revision for the CoconutRetrieval dataset: handle duplicate SMILES (documents)
-
Update
CoconutSMILES2FormulaPC
task -
Change CoconutRetrieval dataset to a smaller one
-
Update some models
- Integrate models added in ChemTEB (such as amazon, cohere bedrock and nomic bert) with latest modeling format in mteb.
- Update the metadata for the mentioned models
-
Fix a typo
open_weights
argument is repeated twice -
Update ChemTEB tasks
- Rename some tasks for better readability.
- Merge some BitextMining and PairClassification tasks into a single task with subsets (
PubChemSMILESBitextMining
andPubChemSMILESPC
) - Add a new multilingual task (
PubChemWikiPairClassification
) consisting of 12 languages. - Update dataset paths, revisions and metadata for most tasks.
- Add a
Chemistry
domain toTaskMetadata
-
Remove unnecessary files and tasks for MTEB
-
Update some ChemTEB tasks
- Move
PubChemSMILESBitextMining
toeng
folder - Add citations for tasks involving SDS, NQ, Hotpot, PubChem data
- Update Clustering tasks
category
- Change
main_score
forPubChemAISentenceParaphrasePC
-
Create ChemTEB benchmark
-
Remove
CoconutRetrieval
-
Update tasks and benchmarks tables with ChemTEB
-
Mention ChemTEB in readme
-
Fix some issues, update task metadata, lint
eval_langs
fixed- Dataset path was fixed for two datasets
- Metadata was completed for all tasks, mainly following fields:
date
,task_subtypes
,dialect
,sample_creation
- ruff lint
- rename
nomic_bert_models.py
tonomic_bert_model.py
and update it.
-
Remove
nomic_bert_model.py
as it is now compatible with SentenceTransformer. -
Remove
WikipediaAIParagraphsParaphrasePC
task due to being trivial. -
Merge
amazon_models
andcohere_bedrock_models.py
intobedrock_models.py
-
Remove unnecessary
load_data
for some tasks. -
Update
bedrock_models.py
,openai_models.py
and two dataset revisions
- Text should be truncated for amazon text embedding models.
text-embedding-ada-002
returns null embeddings for some inputs with 8192 tokens.- Two datasets are updated, dropping very long samples (len > 99th percentile)
-
Add a layer of dynamic truncation for amazon models in
bedrock_models.py
-
Replace
metadata_dict
withself.metadata
inPubChemSMILESPC.py
-
fix model meta for bedrock models
-
Add reference comment to original Cohere API implementation (
4d66434
)
Unknown
- Update points table (
223bf32
)
1.29.16
1.29.15
1.29.15 (2025-01-22)
Fix
-
fix: Adding missing model meta (#1856)
-
Added CDE models
-
Added bge-en-icl
-
Updated CDE to bge_full_data
-
Fixed public_training_data flag type to include boolean, as this is how all models are annotated
-
Added public training data link instead of bool to CDE and BGE
-
Added GME models
-
Changed Torch to PyTorch
-
Added metadata on LENS models
-
Added ember_v1
-
Added metadata for amazon titan
-
Removed GME implementation (
692bd26
)
1.29.14
1.29.14 (2025-01-22)
Fix
-
fix: Fix zeta alpha mistral (#1736)
-
fix zeta alpha mistral
-
update use_instructions
-
update training datasets
-
Update mteb/models/e5_instruct.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
-
update float
-
Update mteb/models/e5_instruct.py
Co-authored-by: Kenneth Enevoldsen <[email protected]> (4985da9
)
- fix: Hotfixed public_training_data type annotation (#1857)
Fixed public_training_data flag type to include boolean, as this is how all models are annotated (4bd7328
)
Unknown
1.29.13
1.29.12
1.29.11
1.29.10
1.29.10 (2025-01-20)
Fix
-
fix: Remove default params,
public_training_data
andmemory usage
inModelMeta
(#1794) -
fix: Leaderboard:
K
instead ofM
Fixes #1752 -
format
-
fixed existing annotations to refer to task name instead of hf dataset
-
added annotation to nvidia
-
added voyage
-
added uae annotations
-
Added stella annotations
-
sentence trf models
-
added salesforce and e5
-
jina
-
bge + model2vec
-
added llm2vec annotations
-
add jasper
-
format
-
format
-
Updated annotations and moved jina models
-
make models parameters needed to be filled
-
fix tests
-
remove comments
-
remove model meta from test
-
fix model meta from split
-
fix: add even more training dataset annotations (#1793)
-
fix: update max tokens for OpenAI (#1772)
update max tokens -
ci: skip AfriSentiLID for now (#1785)
-
skip AfriSentiLID for now
-
skip relevant test case instead
Co-authored-by: Isaac Chung <[email protected]>
- 1.28.7
Automatically generated by python-semantic-release - ci: fix model loading test (#1775)
- pass base branch into the make command as an arg
- test a file that has custom wrapper
- what about overview
- just dont check overview
- revert instance check
- explicitly omit overview and init
- remove test change
- try on a lot of models
- revert test model file
Co-authored-by: Isaac Chung <[email protected]>
- feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
- feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
- format
- remove "en-ext" from AmazonCounterfactualClassification
- fixed mteb(deu)
- fix: simplify in a few areas
- fix: Add gritlm
- 1.29.0
Automatically generated by python-semantic-release - fix: Added more annotations!
- fix: Added C-MTEB (#1786)
Added C-MTEB - 1.29.1
Automatically generated by python-semantic-release - docs: Add contact to MMTEB benchmarks (#1796)
- Add myself to MMTEB benchmarks
- lint
- fix: loading pre 11 (#1798)
- fix loading pre 11
- add similarity
- lint
- run all task types
- 1.29.2
Automatically generated by python-semantic-release - fix: allow to load no revision available (#1801)
- fix allow to load no revision available
- lint
- add require_model_meta to leaderboard
- lint
- 1.29.3
Automatically generated by python-semantic-release
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
- fig merges
- update models info
- change public_training_code to str
- change
public_training_code=False
to None - remove annotations
- remove annotations
- remove changed annotations
- remove changed annotations
- remove
public_training_data
andmemory usage
- make framework not optional
- make framework non-optional
- empty frameworks
- add framework
- fix tests
- Update mteb/models/overview.py
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]> (0a83e38
)
1.29.9
1.29.8
1.29.8 (2025-01-17)
Fix
-
fix: Added Misc Chinese models (#1819)
-
Added moka and piccolo models to overview file
-
Added Text2Vec models
-
Added various Chinese embedding models
Co-authored-by: Isaac Chung <[email protected]> (9823529
)
-
fix: Added way more training dataset annotations (#1765)
-
fix: Leaderboard:
K
instead ofM
Fixes #1752 -
format
-
fixed existing annotations to refer to task name instead of hf dataset
-
added annotation to nvidia
-
added voyage
-
added uae annotations
-
Added stella annotations
-
sentence trf models
-
added salesforce and e5
-
jina
-
bge + model2vec
-
added llm2vec annotations
-
add jasper
-
format
-
format
-
Updated annotations and moved jina models
-
fix: add even more training dataset annotations (#1793)
-
fix: update max tokens for OpenAI (#1772)
update max tokens
-
ci: skip AfriSentiLID for now (#1785)
-
skip AfriSentiLID for now
-
skip relevant test case instead
Co-authored-by: Isaac Chung <[email protected]>
- 1.28.7
Automatically generated by python-semantic-release
-
ci: fix model loading test (#1775)
-
pass base branch into the make command as an arg
-
test a file that has custom wrapper
-
what about overview
-
just dont check overview
-
revert instance check
-
explicitly omit overview and init
-
remove test change
-
try on a lot of models
-
revert test model file
Co-authored-by: Isaac Chung <[email protected]>
-
feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
-
feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
-
format
-
remove "en-ext" from AmazonCounterfactualClassification
-
fixed mteb(deu)
-
fix: simplify in a few areas
-
fix: Add gritlm
-
1.29.0
Automatically generated by python-semantic-release
-
fix: Added more annotations!
-
fix: Added C-MTEB (#1786)
Added C-MTEB
- 1.29.1
Automatically generated by python-semantic-release
-
docs: Add contact to MMTEB benchmarks (#1796)
-
Add myself to MMTEB benchmarks
-
lint
-
fix: loading pre 11 (#1798)
-
fix loading pre 11
-
add similarity
-
lint
-
run all task types
-
1.29.2
Automatically generated by python-semantic-release
-
fix: allow to load no revision available (#1801)
-
fix allow to load no revision available
-
lint
-
add require_model_meta to leaderboard
-
lint
-
1.29.3
Automatically generated by python-semantic-release
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]> (3b2d074
)
- fix: bm25s (#1827)
Co-authored-by: sam021313 <[email protected]> (96420a2
)
- fix: Added Chinese Stella models (#1824)
Added Chinese Stella models (74b495c
)