Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Count unique texts, data leaks in calculate metrics #1438

Merged
merged 3 commits into from
Nov 14, 2024

Conversation

Samoed
Copy link
Collaborator

@Samoed Samoed commented Nov 11, 2024

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Added to calculate metadata:

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good here - only a minor comment

mteb/abstasks/AbsTask.py Show resolved Hide resolved
@KennethEnevoldsen KennethEnevoldsen merged commit dd5d226 into embeddings-benchmark:main Nov 14, 2024
10 checks passed
@KennethEnevoldsen
Copy link
Contributor

Ahh this was merged into main... damn that causes some merge conflicts..

@Samoed
Copy link
Collaborator Author

Samoed commented Nov 14, 2024

If you are about 2.0 I can make PR to update it

@KennethEnevoldsen
Copy link
Contributor

That would be great

@KennethEnevoldsen
Copy link
Contributor

Merged everything before this PR so that should be solved

KennethEnevoldsen added a commit that referenced this pull request Nov 14, 2024
* fix: Count unique texts, data leaks in calculate metrics (#1438)

* add more stat

* add more stat

* update statistics

* fix: update task metadata to allow for null (#1448)

* Update tasks table

* 1.19.5

Automatically generated by python-semantic-release

* base

* sync with main

---------

Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
KennethEnevoldsen added a commit that referenced this pull request Nov 27, 2024
* fix: Count unique texts, data leaks in calculate metrics (#1438)

* add more stat

* add more stat

* update statistics

* fix: update task metadata to allow for null (#1448)

* Update tasks table

* 1.19.5

Automatically generated by python-semantic-release

* Fix: Made data parsing in the leaderboard figure more robust (#1450)

Bugfixes with data parsing in main figure

* Fixed task loading (#1451)

* Fixed task result loading from disk

* Fixed task result loading from disk

* fix: publish (#1452)

* 1.19.6

Automatically generated by python-semantic-release

* fix: Fix load external results with `None` mteb_version (#1453)

* fix

* lint

* 1.19.7

Automatically generated by python-semantic-release

* WIP: Polishing up leaderboard UI (#1461)

* fix: Removed column wrapping on the table, so that it remains readable

* Added disclaimer to figure

* fix: Added links to task info table, switched out license with metric

* fix: loading pre 1.11.0 (#1460)

* small fix

* fix: fix

* 1.19.8

Automatically generated by python-semantic-release

* fix: swap touche2020 to maintain compatibility (#1469)

swap touche2020 for parity

* 1.19.9

Automatically generated by python-semantic-release

* docs: Add sum per language for task counts (#1468)

* add sum per lang

* add sort by sum option

* make lint

* fix: pinned datasets to <3.0.0 (#1470)

* 1.19.10

Automatically generated by python-semantic-release

* feat: add CUREv1 retrieval dataset (#1459)

* feat: add CUREv1 dataset

---------

Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>

* feat: add missing domains to medical tasks

* feat: modify benchmark tasks

* chore: benchmark naming

---------

Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>

* Update tasks table

* 1.20.0

Automatically generated by python-semantic-release

* fix: check if `model` attr of model exists (#1499)

* check if model attr of model exists

* lint

* Fix retrieval evaluator

* 1.20.1

Automatically generated by python-semantic-release

* add cure statistics

---------

Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Napuh <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
isaac-chung pushed a commit that referenced this pull request Dec 9, 2024
* fix: Count unique texts, data leaks in calculate metrics (#1438)
* add more stat
* add more stat
* update statistics
* fix: update task metadata to allow for null (#1448)
* Update tasks table
* 1.19.5
Automatically generated by python-semantic-release
* Fix: Made data parsing in the leaderboard figure more robust (#1450)
Bugfixes with data parsing in main figure
* Fixed task loading (#1451)
* Fixed task result loading from disk
* Fixed task result loading from disk
* fix: publish (#1452)
* 1.19.6
Automatically generated by python-semantic-release
* fix: Fix load external results with `None` mteb_version (#1453)
* fix
* lint
* 1.19.7
Automatically generated by python-semantic-release
* WIP: Polishing up leaderboard UI (#1461)
* fix: Removed column wrapping on the table, so that it remains readable
* Added disclaimer to figure
* fix: Added links to task info table, switched out license with metric
* fix: loading pre 1.11.0 (#1460)
* small fix
* fix: fix
* 1.19.8
Automatically generated by python-semantic-release
* fix: swap touche2020 to maintain compatibility (#1469)
swap touche2020 for parity
* 1.19.9
Automatically generated by python-semantic-release
* docs: Add sum per language for task counts (#1468)
* add sum per lang
* add sort by sum option
* make lint
* fix: pinned datasets to <3.0.0 (#1470)
* 1.19.10
Automatically generated by python-semantic-release
* feat: add CUREv1 retrieval dataset (#1459)
* feat: add CUREv1 dataset
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
* feat: add missing domains to medical tasks
* feat: modify benchmark tasks
* chore: benchmark naming
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
* Update tasks table
* 1.20.0
Automatically generated by python-semantic-release
* fix: check if `model` attr of model exists (#1499)
* check if model attr of model exists
* lint
* Fix retrieval evaluator
* 1.20.1
Automatically generated by python-semantic-release
* fix: Leaderboard demo data loading (#1507)
* Made get_scores error tolerant
* Added join_revisions, made get_scores failsafe
* Fetching metadata fixed fr HF models
* Added failsafe metadata fetching to leaderboard code
* Added revision joining to leaderboard app
* fix
* Only show models that have metadata, when filter_models is called
* Ran linting
* 1.20.2
Automatically generated by python-semantic-release
* fix: leaderboard only shows models that have ModelMeta (#1508)
Filtering for models that have metadata
* 1.20.3
Automatically generated by python-semantic-release
* fix: align readme with current mteb (#1493)
* align readme with current mteb
* align with mieb branch
* fix test
* 1.20.4
Automatically generated by python-semantic-release
* docs: Add lang family mapping and map to task table (#1486)
* add lang family mapping and map to task table
* make lint
* add back some unclassified lang codes
* Update tasks table
* fix: Ensure that models match the names on embedding-benchmarks/results (#1519)
* 1.20.5
Automatically generated by python-semantic-release
* fix: Adding missing metadata on models and mathcing names up with the results repo (#1528)
* Added Voyage 3 models
* Added correct metadata to Cohere models and matched names with the results repo
* 1.20.6
Automatically generated by python-semantic-release
* feat: Evaluate missing splits (#1525)
* fix: evaluate missing splits (#1268)
* implement partial evaluation for missing splits
* lint
* requested changes done from scratch
* test for missing split evaluation added
* uncomment test
* lint
* avoid circular import
* use TaskResult
* skip tests for now
---------
Co-authored-by: Isaac Chung <[email protected]>
* got test_all_splits_evaluated passing
* tests passing
* address review comments
* make lint
* handle None cases for kg_co2_emissions
* use new results info
---------
Co-authored-by: Thivyanth <[email protected]>
* 1.21.0
Automatically generated by python-semantic-release
* fix: Correct typos superseeded -> superseded (#1532)
fix typo -> superseded
* 1.21.1
Automatically generated by python-semantic-release
* fix: Task load data error for SICK-BR-STS and XStance (#1534)
* fix task load data for two tasks
* correct dataset keys
* 1.21.2
Automatically generated by python-semantic-release
* fix: Proprietary models now get correctly shown in leaderboard (#1530)
* Fixed showing proprietary models in leaderboard
* Added links to all OpenAI models
* Fixed table formatting issues
* Bumped Gradio version
* 1.21.3
Automatically generated by python-semantic-release
* docs: Add Model Meta parameters and metadata (#1536)
* add multi_qa_MiniLM_L6_cos_v1 model meta
* add all_mpnet_base_v2
* add parameters to model meta
* make lint
* add extra params to meta
* fix: add more model meta (jina, e5) (#1537)
* add e5 model meta
* address review comments
* 1.21.4
Automatically generated by python-semantic-release
* Add cohere models (#1538)
* fix: bug cohere names
* format
* fix: add nomic models (#1543)
#1515
* fix: Added all-minilm-l12-v2 (#1542)
#1515
* fix: Added arctic models (#1541)
#1515
* fix: add sentence trimming to OpenAIWrapper (#1526)
* fix: add sentence trimming to OpenAIWrapper
* fix: import tiktoken library inside encode function
* fix: check tokenizer library installed and update ModelMeta to pass tokenizer_name
* fix: pass tokenizer_name, max_tokens to loader
* fix: make tokenizer_name None for default
* fix: delete changes for ModelMeta
* fix: fix revision to 2 for OpenAI models
* fix: add docstring for OpenAIWrapper
* fix: lint
* feat: add openai optional dependency set
* fix: add sleep for too many requests
* fix: add lint
* fix: delete evaluate file
* 1.21.5
Automatically generated by python-semantic-release
* fix: Fixed metadata errors (#1547)
* 1.21.6
Automatically generated by python-semantic-release
* fix: remove curev1 from multlingual (#1552)
Seems like it was added here:
1cc6c9e
* 1.21.7
Automatically generated by python-semantic-release
* fix: Add Model2vec (#1546)
* Added Model2Vec wrapper
* Added Model2vec models
* Added model2vec models to registry
* Added model2vec as a dependency
* Ran linting
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Added adapted_from and superseeded_by to model2vec models.
* Added missing import
* Moved pyproject.toml to optional dependencies
* Fixed typos
* Added import error and changed model to model_name
* Added Numpy to frameworks
* Added Numpy to frameworks
* Corrected false info on model2vec models
* Replaced np.inf with maxint
* Update mteb/models/model2vec_models.py
Co-authored-by: Isaac Chung <[email protected]>
* Added option to have infinite max tokens, added it to Model2vec
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
* Made result loading more permissive, changed eval splits for HotPotQA and DBPedia (#1554)
* Removed train and dev from eval splits on HotpotQA
* Removed dev from eval splits on DBPedia
* Made task_results validation more permissive
* Readded exception in get_score
* Ran linting
* 1.21.8
Automatically generated by python-semantic-release
* docs: Correction of SICK-R metadata (#1558)
* Correction of SICK-R metadata
* Correction of SICK-R metadata
---------
Co-authored-by: rposwiata <[email protected]>
* feat(google_models): fix issues and add support for `text-embedding-005` and `text-multilingual-embedding-002` (#1562)
* fix: google_models batching and prompt
* feat: add text-embedding-005 and text-multilingual-embedding-002
* chore: `make lint` errors
* fix: address PR comments
* 1.22.0
Automatically generated by python-semantic-release
* fix(bm25s): search implementation (#1566)
fix: bm25s implementation
* 1.22.1
Automatically generated by python-semantic-release
* docs: Fix dependency library name for bm25s (#1568)
* fix: bm25s implementation
* correct library name
---------
Co-authored-by: Daniel Buades Marcos <[email protected]>
* fix: Add training dataset to model meta (#1561)
* fix: Add training dataset to model meta
Adresses #1556
* Added docs
* format
* feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564)
* feat: batch requests to cohere models
* fix: use correct task_type
* feat: use tqdm with openai
* fix: explicitely set `show_progress_bar` to False
* fix(publichealth-qa):  ignore rows with `None` values in `question` or `answer` (#1565)
* 1.23.0
Automatically generated by python-semantic-release
* fix wongnai
* update inits
* fix tests
* lint
* update imports
* fix tests
* lint
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Napuh <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Thivyanth <[email protected]>
Co-authored-by: Youngjoon Jang <[email protected]>
Co-authored-by: Rafał Poświata <[email protected]>
isaac-chung added a commit that referenced this pull request Jan 23, 2025
* Update tasks table
* 1.19.0
Automatically generated by python-semantic-release
* fix: Add the_ugly_duckling.txt for speedtask to Python wheel (#1402)
Add the_ugly_duckling.txt for speedtask to Python wheel
* 1.19.1
Automatically generated by python-semantic-release
* fix: Added the necessary trust_remote_code (#1406)
* 1.19.2
Automatically generated by python-semantic-release
* docs: Update recommendation for pushing results (#1401)
fix: Update recommendation for pushing results
* docs: Fix a typo in README (#1430)
Fix typo in readme
* fix: add logging for RetrievalEvaluator NaN values for similarity scores (#1398)
Fixes #1389
* 1.19.3
Automatically generated by python-semantic-release
* fix: make samples_per_label a task attribute (#1419)
make samples_per_label a task attr
* fix: Add Korean AutoRAGRetrieval (#1388)
* feat: add AutoRAG Korean embedding retrieval benchmark
* fix: run --- 🧹 Running linters ---
ruff format . 			# running ruff formatting
716 files left unchanged
ruff check . --fix  	# running ruff linting
All checks passed!
* fix: add metadata for AutoRAGRetrieval
* change link for markers_bm
* add AutoRAGRetrieval to init.py and update metadata
* add precise metadata
* update metadata: description and license
* delete descriptive_stats in AutoRAGRetrieval.py and run calculate_matadata_metrics.py
* fix: Add missing benchmarks in benchmarks.py (#1431)
Fixes #1423
* Update tasks table
* 1.19.4
Automatically generated by python-semantic-release
* Leaderboard 2.0: added performance x n_parameters plot + more benchmark info (#1437)
* Added elementary speed/performance plot
* Refactored table formatting code
* Bumped Gradio version
* Added more general info to benchmark description markdown block
* Adjusted margin an range on plot
* Made hover information easier to read on plot
* Made range scaling dynamic in plot
* Moved citation next to benchmark description
* Made titles in benchmark info bold
* Leaderboard: Fixed code benchmarks (#1441)
* fixed code benchmarks
* fix: Made n_parameters formatting smarter and more robust
* fix: changed jina-embeddings-v3 number of parameters from 572K to 572M
* fix: Fixed use_instuctions typo in model overview
* fix: Fixed sentence-transformer compatibility switch
* Ran linting
* Added all languages, tasks, types and domains to options
* Removed resetting options when a new benchmark is selected
* All results now get displayed, but models that haven't been run on everything get nan values in the table
* fix: Count unique texts, data leaks in calculate metrics (#1438)
* add more stat
* add more stat
* update statistics
* fix: update task metadata to allow for null (#1448)
* Update tasks table
* 1.19.5
Automatically generated by python-semantic-release
* Fix: Made data parsing in the leaderboard figure more robust (#1450)
Bugfixes with data parsing in main figure
* Fixed task loading (#1451)
* Fixed task result loading from disk
* Fixed task result loading from disk
* fix: publish (#1452)
* 1.19.6
Automatically generated by python-semantic-release
* fix: Fix load external results with `None` mteb_version (#1453)
* fix
* lint
* 1.19.7
Automatically generated by python-semantic-release
* WIP: Polishing up leaderboard UI (#1461)
* fix: Removed column wrapping on the table, so that it remains readable
* Added disclaimer to figure
* fix: Added links to task info table, switched out license with metric
* fix: loading pre 1.11.0 (#1460)
* small fix
* fix: fix
* 1.19.8
Automatically generated by python-semantic-release
* fix: swap touche2020 to maintain compatibility (#1469)
swap touche2020 for parity
* 1.19.9
Automatically generated by python-semantic-release
* docs: Add sum per language for task counts (#1468)
* add sum per lang
* add sort by sum option
* make lint
* fix: pinned datasets to <3.0.0 (#1470)
* 1.19.10
Automatically generated by python-semantic-release
* feat: add CUREv1 retrieval dataset (#1459)
* feat: add CUREv1 dataset
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
* feat: add missing domains to medical tasks
* feat: modify benchmark tasks
* chore: benchmark naming
---------
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
* Update tasks table
* 1.20.0
Automatically generated by python-semantic-release
* fix: check if `model` attr of model exists (#1499)
* check if model attr of model exists
* lint
* Fix retrieval evaluator
* 1.20.1
Automatically generated by python-semantic-release
* fix: Leaderboard demo data loading (#1507)
* Made get_scores error tolerant
* Added join_revisions, made get_scores failsafe
* Fetching metadata fixed fr HF models
* Added failsafe metadata fetching to leaderboard code
* Added revision joining to leaderboard app
* fix
* Only show models that have metadata, when filter_models is called
* Ran linting
* 1.20.2
Automatically generated by python-semantic-release
* fix: leaderboard only shows models that have ModelMeta (#1508)
Filtering for models that have metadata
* 1.20.3
Automatically generated by python-semantic-release
* fix: align readme with current mteb (#1493)
* align readme with current mteb
* align with mieb branch
* fix test
* 1.20.4
Automatically generated by python-semantic-release
* docs: Add lang family mapping and map to task table (#1486)
* add lang family mapping and map to task table
* make lint
* add back some unclassified lang codes
* Update tasks table
* fix: Ensure that models match the names on embedding-benchmarks/results (#1519)
* 1.20.5
Automatically generated by python-semantic-release
* fix: Adding missing metadata on models and mathcing names up with the results repo (#1528)
* Added Voyage 3 models
* Added correct metadata to Cohere models and matched names with the results repo
* 1.20.6
Automatically generated by python-semantic-release
* feat: Evaluate missing splits (#1525)
* fix: evaluate missing splits (#1268)
* implement partial evaluation for missing splits
* lint
* requested changes done from scratch
* test for missing split evaluation added
* uncomment test
* lint
* avoid circular import
* use TaskResult
* skip tests for now
---------
Co-authored-by: Isaac Chung <[email protected]>
* got test_all_splits_evaluated passing
* tests passing
* address review comments
* make lint
* handle None cases for kg_co2_emissions
* use new results info
---------
Co-authored-by: Thivyanth <[email protected]>
* 1.21.0
Automatically generated by python-semantic-release
* fix: Correct typos superseeded -> superseded (#1532)
fix typo -> superseded
* 1.21.1
Automatically generated by python-semantic-release
* fix: Task load data error for SICK-BR-STS and XStance (#1534)
* fix task load data for two tasks
* correct dataset keys
* 1.21.2
Automatically generated by python-semantic-release
* fix: Proprietary models now get correctly shown in leaderboard (#1530)
* Fixed showing proprietary models in leaderboard
* Added links to all OpenAI models
* Fixed table formatting issues
* Bumped Gradio version
* 1.21.3
Automatically generated by python-semantic-release
* docs: Add Model Meta parameters and metadata (#1536)
* add multi_qa_MiniLM_L6_cos_v1 model meta
* add all_mpnet_base_v2
* add parameters to model meta
* make lint
* add extra params to meta
* fix: add more model meta (jina, e5) (#1537)
* add e5 model meta
* address review comments
* 1.21.4
Automatically generated by python-semantic-release
* Add cohere models (#1538)
* fix: bug cohere names
* format
* fix: add nomic models (#1543)
#1515
* fix: Added all-minilm-l12-v2 (#1542)
#1515
* fix: Added arctic models (#1541)
#1515
* fix: add sentence trimming to OpenAIWrapper (#1526)
* fix: add sentence trimming to OpenAIWrapper
* fix: import tiktoken library inside encode function
* fix: check tokenizer library installed and update ModelMeta to pass tokenizer_name
* fix: pass tokenizer_name, max_tokens to loader
* fix: make tokenizer_name None for default
* fix: delete changes for ModelMeta
* fix: fix revision to 2 for OpenAI models
* fix: add docstring for OpenAIWrapper
* fix: lint
* feat: add openai optional dependency set
* fix: add sleep for too many requests
* fix: add lint
* fix: delete evaluate file
* 1.21.5
Automatically generated by python-semantic-release
* fix: Fixed metadata errors (#1547)
* 1.21.6
Automatically generated by python-semantic-release
* fix: remove curev1 from multlingual (#1552)
Seems like it was added here:
1cc6c9e
* 1.21.7
Automatically generated by python-semantic-release
* fix: Add Model2vec (#1546)
* Added Model2Vec wrapper
* Added Model2vec models
* Added model2vec models to registry
* Added model2vec as a dependency
* Ran linting
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Update mteb/models/model2vec_models.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Added adapted_from and superseeded_by to model2vec models.
* Added missing import
* Moved pyproject.toml to optional dependencies
* Fixed typos
* Added import error and changed model to model_name
* Added Numpy to frameworks
* Added Numpy to frameworks
* Corrected false info on model2vec models
* Replaced np.inf with maxint
* Update mteb/models/model2vec_models.py
Co-authored-by: Isaac Chung <[email protected]>
* Added option to have infinite max tokens, added it to Model2vec
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
* Made result loading more permissive, changed eval splits for HotPotQA and DBPedia (#1554)
* Removed train and dev from eval splits on HotpotQA
* Removed dev from eval splits on DBPedia
* Made task_results validation more permissive
* Readded exception in get_score
* Ran linting
* 1.21.8
Automatically generated by python-semantic-release
* docs: Correction of SICK-R metadata (#1558)
* Correction of SICK-R metadata
* Correction of SICK-R metadata
---------
Co-authored-by: rposwiata <[email protected]>
* feat(google_models): fix issues and add support for `text-embedding-005` and `text-multilingual-embedding-002` (#1562)
* fix: google_models batching and prompt
* feat: add text-embedding-005 and text-multilingual-embedding-002
* chore: `make lint` errors
* fix: address PR comments
* 1.22.0
Automatically generated by python-semantic-release
* fix(bm25s): search implementation (#1566)
fix: bm25s implementation
* 1.22.1
Automatically generated by python-semantic-release
* docs: Fix dependency library name for bm25s (#1568)
* fix: bm25s implementation
* correct library name
---------
Co-authored-by: Daniel Buades Marcos <[email protected]>
* fix: Add training dataset to model meta (#1561)
* fix: Add training dataset to model meta
Adresses #1556
* Added docs
* format
* feat: (cohere_models) cohere_task_type issue, batch requests and tqdm for visualization (#1564)
* feat: batch requests to cohere models
* fix: use correct task_type
* feat: use tqdm with openai
* fix: explicitely set `show_progress_bar` to False
* fix(publichealth-qa):  ignore rows with `None` values in `question` or `answer` (#1565)
* 1.23.0
Automatically generated by python-semantic-release
* fix: Added metadata for miscellaneous models (#1557)
* Added script for generating metadata, and metadata for the listed models
* Added misc models to overview
* Fixed misc metas
* Removed unnecessary imports
* Added logic to retrieve base model information
* Added base models to misc meta
* Added superseded_by to sentence-croissant models
* Added training datasets to mis models
* 1.23.1
Automatically generated by python-semantic-release
* fix: Added radar chart displaying capabilities on task types (#1570)
* Added radar chart displaying capabilities on task types
* Fixed table aggregation in leaderboard
* Spelled out why instructionretrieval is excluded
* 1.23.2
Automatically generated by python-semantic-release
* feat: add new arctic v2.0 models (#1574)
* feat: add new arctic v2.0 models
* chore: make lint
* 1.24.0
Automatically generated by python-semantic-release
* fix: Add namaa MrTydi reranking dataset (#1573)
* Add dataset class and file requirements
* pass tests
* make lint changes
* adjust meta data and remove load_data
---------
Co-authored-by: Omar Elshehy <[email protected]>
* Update tasks table
* 1.24.1
Automatically generated by python-semantic-release
* fix: Eval langs not correctly passed to monolingual tasks (#1587)
* fix SouthAfricanLangClassification.py
* add check for langs
* lint
* 1.24.2
Automatically generated by python-semantic-release
* feat: Add ColBert (#1563)
* feat: add max_sim operator for IR tasks to support multi-vector models
* docs: add doc for Model2VecWrapper.__init__(...)
* feat: add ColBERTWrapper to models & add ColBERTv2
* fix: resolve issues
* fix: resolve issues
* Update README.md
Co-authored-by: Roman Solomatin <[email protected]>
* Update README.md
Co-authored-by: Isaac Chung <[email protected]>
* Update README.md
Co-authored-by: Isaac Chung <[email protected]>
* Update mteb/evaluation/evaluators/RetrievalEvaluator.py
Co-authored-by: Isaac Chung <[email protected]>
* Update README.md
Co-authored-by: Isaac Chung <[email protected]>
* README.md: rm subset
* doc: update example for Late Interaction
* get colbert running without errors
* fix: pass is_query to pylate
* fix: max_sim add pad_sequence
* feat: integrate Jinja templates for ColBERTv2 and add model prompt handling
* feat: add revision & prompt_name
* doc: pad_sequence
* rm TODO jina colbert v2
* doc: warning: higher resource usage for MaxSim
---------
Co-authored-by: sam021313 <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
* 1.25.0
Automatically generated by python-semantic-release
* doc: colbert add score_function & doc section (#1592)
* doc: colbert add score_function & doc section
* doc: Update README.md
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* doc: Update README.md
Co-authored-by: Isaac Chung <[email protected]>
---------
Co-authored-by: sam021313 <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
* Feat: add support for scoring function (#1594)
* add support for scoring function
* lint
* move similarity to wrapper
* remove score function
* lint
* remove from InstructionRetrievalEvaluator
* Update mteb/evaluation/evaluators/RetrievalEvaluator.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* remove score function from README.md
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Add new models nvidia, gte, linq (#1436)
* Add new models nvidia, gte, linq
* add warning for gte-Qwen and nvidia models re: instruction used in docs as well
---------
Co-authored-by: isaac-chung <[email protected]>
* Leaderboard: Refined plots (#1601)
* Added embedding size guide to performance-size plot, removed shading on radar chart
* Changed plot names to something more descriptive
* Made plots failsafe
* fix: Leaderboard refinements (#1603)
* Added explanation of aggregate measures
* Added download button to result tables
* Task info gets sorted by task name
* Added custom, shareable links for each benchmark
* Moved explanation of aggregate metrics to the summary tab
* 1.25.1
Automatically generated by python-semantic-release
* Feat: Use similarity scores if available (#1602)
* Use similarity scores if available
* lint
* Add NanoBEIR Datasets (#1588)
* add NanoClimateFeverRetrieval task, still requires some debugging
* move task to correct place in init file
* add all Nano datasets and results
* format code
* Update mteb/tasks/Retrieval/eng/tempCodeRunnerFile.py
Co-authored-by: Roman Solomatin <[email protected]>
* pin revision to commit and add datasets to benchmark.py
* create new benchmark for NanoBEIR
* add revision when loading datasets
* lint
---------
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: isaac-chung <[email protected]>
* Update tasks table
* Feat: Evaluate missing languages (#1584)
* init
* fix tests
* update mock retrieval
* update tests
* use subsets instead of langs
* Apply suggestions from code review
Co-authored-by: Isaac Chung <[email protected]>
* fix tests
* add to readme
* rename subset in readme
---------
Co-authored-by: Isaac Chung <[email protected]>
* Add IBM Granite Embedding Models (#1613)
* add IBM granite embedding models
* lint formatting
* add adapted_from and superseded_by to ModelMeta
* fix: disable co2_tracker for API models (#1614)
* 1.25.2
Automatically generated by python-semantic-release
* fix: set `use_instructions` to True in models using prompts (#1616)
feat: set `use_instructions` to True in models using prompts
* 1.25.3
Automatically generated by python-semantic-release
* fix: override existing results (#1617)
* fix override existing results
* lint
* fix tests
* add tests with overwrite
* lint
* update tests
* lint
* update
* lint
* 1.25.4
Automatically generated by python-semantic-release
* add MSMARCO eval split in MTEB English (classic) benchmark (#1620)
* add MSMARCO eval split in MTEB English (classic) benchmark
Fixes #1608
* Add co-author
Co-authored-by: aashka-trivedi <[email protected]>
---------
Co-authored-by: aashka-trivedi <[email protected]>
* fix: GermanDPR Dataset Causes Cross-Encoder Failure Due to Unexpected dict (#1621)
Fixes #1609
* fix: properly add mteb_model_meta to model object (#1623)
* 1.25.5
Automatically generated by python-semantic-release
* Feat: Add jasper (#1591)
* init jasper
* init jasper
* add to overview
* add to overview
* remove some params
* fix max length
* return sdpa
* add dtype
* add dtype
* fix convert_to_tensor
* change to encode
* return whitespace processing
* explicitly add instructions
* move seq length
* try float
* fix max_seq_length
* add prompt validation to format instruction
* don't use instructions only to s2p
* fix: Update results_to_dataframe to use BenchmarkResults class (#1628)
* 1.25.6
Automatically generated by python-semantic-release
* Speed up test_save_predictions (#1631)
* fix: Correction of discrepancies for gte-Qweb model (#1637)
* 1.25.7
Automatically generated by python-semantic-release
* fix: output_folder for co2 evaluation (#1642)
* 1.25.8
Automatically generated by python-semantic-release
* fix: add missing benchmark to benchmarks.py (#1641)
add missing benchmark
* 1.25.9
Automatically generated by python-semantic-release
* fix: Cast all Model2Vec outputs as floats (#1667)
cast all outputs as floats
* 1.25.10
Automatically generated by python-semantic-release
* fix: Update gritlm kwargs (#1643)
* Fix kwarg
* format
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* 1.25.11
Automatically generated by python-semantic-release
* fix: Use batch size kwargs for openai APIs (#1668)
Fixes #1645
* 1.25.12
Automatically generated by python-semantic-release
* fix: Pass trust_remote_code=True to CPM model (#1669)
Fixes #1651
* 1.25.13
Automatically generated by python-semantic-release
* fix: Updated metadata for CPM (#1670)
* fix: Pass trust_remote_code=True to CPM model
Fixes #1651
* fix: Updated metadata for cpm
* 1.25.14
Automatically generated by python-semantic-release
* fix: remove model as a parameter for MulticlassClassification (#1666)
remove model parameter
* fix: Use prompts instead of prompt names for voyage (#1665)
* fix prompt names
* lint
* change input type
* 1.25.15
Automatically generated by python-semantic-release
* fix: Update BUCC dataset revision (#1674)
* trust remote code
* Update revision
* 1.25.16
Automatically generated by python-semantic-release
* fix: Add warning for non-retrieval tasks when using bm25s (#1678)
* clean up install instruction
* add check for bm25s and skip non-retrieval tasks
* add versions
* 1.25.17
Automatically generated by python-semantic-release
* fix: add check for key error in loader (#1675)
* add check for key error
* make KeyError everywhere
* update error
* 1.25.18
Automatically generated by python-semantic-release
* fix: trust remote code for snowflake-arctic-embed-m-v2.0 (#1682)
trust remote code
* 1.25.19
Automatically generated by python-semantic-release
* fix: nomic tensor return (#1683)
* fix nomic tensor return
* add typehint
* 1.25.20
Automatically generated by python-semantic-release
* feat: add `avsolatorio/NoInstruct-small-Embedding-v0` (#1677)
add no_instruct
* fix: arg name for openbmb/MiniCPM-Embedding (#1691)
fix name
* 1.26.0
Automatically generated by python-semantic-release
* fix: add trust_remote_code to Snowflake/snowflake-arctic-embed-m-long (#1695)
trust remote code
* fix: add revision for jinaai/jina-embeddings-v2-small-en (#1692)
add revision
* 1.26.1
Automatically generated by python-semantic-release
* fix: update model loader to trust remote code (#1697)
update model loader
* 1.26.2
Automatically generated by python-semantic-release
* fix: nomic prompts (#1685)
* fix nomic prompts
* fix variable model name
* pass prompts to model
* use sentence transformer wrapper
* update prompts
* lint
* update prompts
* update list for classification
* fix: NanoBeir (#1687)
* fix nano beir
* lint
* 1.26.3
Automatically generated by python-semantic-release
* Update RerankingEvaluator.py (#1702)
* fix: Register MicroLlama Text Embedding (#1644)
Register MicroLlama Text Embedding
* fix: GermanDPR (#1703)
* fix GermanDPR
* lint
* 1.26.4
Automatically generated by python-semantic-release
* Fix: minicpmv2 (#1705)
* updmini cpm
* flash_attn implementation
* remove flash attn
* ci: Refresh the v2 leaderboard daily (#1711)
* Create leaderboard_refresh.yaml
* Shorten and fix
* factory reset instead of normal
* Fix: typos in adding a model (#1722)
* fix: rollback BUCC revision (#1706)
* fix bucc
* fix logger
* upd evaluator
* add comment
* lint
* 1.26.5
Automatically generated by python-semantic-release
* fix: Added zero shot tag to benchmark (#1710)
* Added method for determining whether a model is zero shot
* Added .items() where intended
* Added filtering functions for zero shot models
* Added zero-shot filtering button and error message when table is empty.:
* Ran linting
* Fixed docstring linting error
* is_zero_shot returns None when no training data is specified
* Added zero-shot emoji column to leaderboard
* Added explanation for zero shot column
* Added soft and hard zero-shot buttons
* Added training data annotations to 24 models from HuggingFace Hub
* 1.26.6
Automatically generated by python-semantic-release
* feat: reduce logging for load_results()
- redacts missing subsets to avoid 100+ subsets printed
- reduce to logging.info
- removed splits that are commonly never evaluated on and thus also the errors for them being missing
The second part removed quite a few warnings (4930 to XX)
It also seems like the splits were accidentally included in some of the MMTEB benchmark.
This will remove those splits from those benchmarks (which are all in beta). We will have to recompute the tables for the paper though (we should do that anyway)
Other potential thing to consider:
- Scifact is included in MTEB(Medical). I have removed the "train" split from it as I think that was a mistake. (checked other dataset in benchmark)
Here is a count of the current top errors:
```py
{
    "MassiveScenarioClassification: Missing splits {'validation'}": 238,  # included in e.g. mteb(fra)
    "MassiveIntentClassification: Missing splits {'validation'}": 237, # included in e.g. mteb(fra)
    "MassiveScenarioClassification: Missing subsets {'af', 'da', ...} for split test": 230,
    "AmazonReviewsClassification: Missing splits {'validation'}": 229, # included in e.g. mteb(deu)
    "MassiveIntentClassification: Missing subsets {'af', 'da', ...} for split test": 228,
    "STS22: Missing subsets {'fr-pl', 'de-en', ...} for split test": 223,
    "AmazonReviewsClassification: Missing subsets {'es', 'ja', ...} for split test": 196,
    "MTOPDomainClassification: Missing splits {'validation'}": 195, # included in mteb(fra)
    "MTOPIntentClassification: Missing splits {'validation'}": 194, # included in mteb(fra)
    "AmazonCounterfactualClassification: Missing splits {'validation'}": 189, # included in mteb(deu)
    "MTOPDomainClassification: Missing subsets {'es', 'th', ...} for split test": 165,
    "STS17: Missing subsets {'en-ar', 'es-es', ...} for split test": 164,
    "MTOPIntentClassification: Missing subsets {'es', 'th', ...} for split test": 164,
    "AmazonCounterfactualClassification: Missing subsets {'de', 'ja', ...} for split test": 148,
}
```
* 1.27.0
Automatically generated by python-semantic-release
* feat: Add nomic modern bert (#1684)
* add nomic modern bert
* use SentenceTransformerWrapper
* use SentenceTransformerWrapper
* try nomic wrapper
* update
* use all prompts
* pass prompts
* use fp16
* lint
* change to version
* remove commented code
* fix: allow kwargs in init for RerankingWrapper (#1676)
* allow kwargs in init
* fix retrieval
* convert corpus_in_pair to list
* 1.28.0
Automatically generated by python-semantic-release
* Fixed result loading on leaderboard (#1739)
* Only main_score gets loaded for leaderboard thereby avoiding OOM errors
* Fixed plot failing because of missing embedding dimensions
* Ran linting
* test: Add script to test model loading below n_parameters threshold (#1698)
* add model loading test for models below 2B params
* add failure message to include model namne
* use the real get_model_meta
* use cache folder
* teardown per function
* fix directory removal
* write to file
* wip loading from before
* wip
* Rename model_loading_testing.py to model_loading.py
* Delete tests/test_models/test_model_loading.py
* checks for models below 2B
* try not using cache folder
* update script with scan_cache_dir and add args
* add github CI: detect changed model files and run model loading test
* install all model dependencies
* dependecy installations and move file location
* should trigger a model load test in CI
* find correct commit for diff
* explicity fetch base branch
* add make command
* try to run in python instead and add pytest
* fix attribute error and add read mode
* separate script calling
* let pip install be cached and specify repo path
* check ancestry
* add cache and rebase
* try to merge instead of rebase
* try without merge base
* check if file exists first
* Apply suggestions from code review
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Update .github/workflows/model_loading.yml
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* address review comments to run test once from CI and not pytest
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* fix: Leaderboard Speedup (#1745)
* Added get_scores_fast
* Made leaderboard faster with smarter dependency graph and event management and caching
* Changed print to logger.info
* 1.28.1
Automatically generated by python-semantic-release
* fix: Fixed task_type aggregation on leaderboard (#1746)
* Fixed task_type aggregation in leaderboard
* Fixed an error due to unneccesary indentation in get_score
* 1.28.2
Automatically generated by python-semantic-release
* fix: Fixed definition of zero-shot in ModelMeta (#1747)
* Corrected zero_shot definition to be based on task names, not dataset path
* 1.28.3
Automatically generated by python-semantic-release
* fix: fixes implementation of similarity() (#1748)
* fix(#1594): fixes implementation of similarity()
* fix: add similarity to SentenceTransformerWrapper
---------
Co-authored-by: sam021313 <[email protected]>
* 1.28.4
Automatically generated by python-semantic-release
* fix: Leaderboard: `K` instead of `M` (#1761)
Fixes #1752
* other: add script for leaderboard compare (#1758)
* add script
* remove changes
* remove changes
* add comment
* lint
* order like in benchmark object
* round results
* 1.28.5
Automatically generated by python-semantic-release
* fix: added annotations for training data (#1742)
* fix: Added annotations for arctic embed models
* added google and bge
* added cohere
* Added e5
* added bge based model2vec
* annotated oAI
* format and update annotations
* 1.28.6
Automatically generated by python-semantic-release
* fix: update max tokens for OpenAI (#1772)
update max tokens
* ci: skip AfriSentiLID for now (#1785)
* skip AfriSentiLID for now
* skip relevant test case instead
---------
Co-authored-by: Isaac Chung <[email protected]>
* 1.28.7
Automatically generated by python-semantic-release
* ci: fix model loading test (#1775)
* pass base branch into the make command as an arg
* test a file that has custom wrapper
* what about overview
* just dont check overview
* revert instance check
* explicitly omit overview and init
* remove test change
* try on a lot of models
* revert test model file
---------
Co-authored-by: Isaac Chung <[email protected]>
* feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
* feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
```py
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
```
* format
* remove "en-ext" from AmazonCounterfactualClassification
* fixed mteb(deu)
* fix: simplify in a few areas
* 1.29.0
Automatically generated by python-semantic-release
* fix: Added C-MTEB (#1786)
Added C-MTEB
* 1.29.1
Automatically generated by python-semantic-release
* docs: Add contact to MMTEB benchmarks (#1796)
* Add myself to MMTEB benchmarks
* lint
* fix: loading pre 11 (#1798)
* fix loading pre 11
* add similarity
* lint
* run all task types
* 1.29.2
Automatically generated by python-semantic-release
* fix: allow to load no revision available (#1801)
* fix allow to load no revision available
* lint
* add require_model_meta to leaderboard
* lint
* 1.29.3
Automatically generated by python-semantic-release
* fix: Zero shot and aggregation on Leaderboard (#1810)
* Made join_revision filter out no_revision_available when other revisions have been run on the task
* Fixed zero-shot filtering
* Fixed aggregation of task types
* Ran linting
* fix: Added `ModelMeta` for BGE, GTE Chinese and multilingual models (#1811)
* Added BGE Chinese and multilingual-gemma models
* Added GTE multilingual and Chinese models
* Fixed date format
* 1.29.4
Automatically generated by python-semantic-release
* fix: Add additional contacts (#1817)
add contacts from #1790
* Update points table
* 1.29.5
Automatically generated by python-semantic-release
* fix: Added more Chinese models' `ModelMeta` (#1814)
* Added Multilingual USE models
* Added Moka models
* Added dmeta models
* Added jina-zh
* Added  piccolo models
* 1.29.6
Automatically generated by python-semantic-release
* Add model inf-retriever-v1 (#1744)
* feat(models): add infly/inf-retriever-v1 model metadata- Add inf_models.py file with metadata for infly/inf-retriever-v1 model
- Update overview.py to include inf_models in model imports
* Reformat code
* Update inf-retriever-v1 ModelMeta
* Fill more information for inf-retriever-v1
* Add license information for inf-retriever-v1
---------
Co-authored-by: Samuel Yang <[email protected]>
* ci: only return 1 model_name per file (#1818)
* only return 1 model_name per file
* fix args parse
* revert test change
* fix: add bge-m3 `ModelMeta` (#1821)
add bge
* 1.29.7
Automatically generated by python-semantic-release
* fix: Added Chinese Stella models (#1824)
Added Chinese Stella models
* fix: bm25s (#1827)
Co-authored-by: sam021313 <[email protected]>
* fix: Added way more training dataset annotations (#1765)
* fix: Leaderboard: `K` instead of `M`
Fixes #1752
* format
* fixed existing annotations to refer to task name instead of hf dataset
* added annotation to nvidia
* added voyage
* added uae annotations
* Added stella annotations
* sentence trf models
* added salesforce and e5
* jina
* bge + model2vec
* added llm2vec annotations
* add jasper
* format
* format
* Updated annotations and moved jina models
* fix: add even more training dataset annotations (#1793)
* fix: update max tokens for OpenAI (#1772)
update max tokens
* ci: skip AfriSentiLID for now (#1785)
* skip AfriSentiLID for now
* skip relevant test case instead
---------
Co-authored-by: Isaac Chung <[email protected]>
* 1.28.7
Automatically generated by python-semantic-release
* ci: fix model loading test (#1775)
* pass base branch into the make command as an arg
* test a file that has custom wrapper
* what about overview
* just dont check overview
* revert instance check
* explicitly omit overview and init
* remove test change
* try on a lot of models
* revert test model file
---------
Co-authored-by: Isaac Chung <[email protected]>
* feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
* feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
```py
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
```
* format
* remove "en-ext" from AmazonCounterfactualClassification
* fixed mteb(deu)
* fix: simplify in a few areas
* fix: Add gritlm
* 1.29.0
Automatically generated by python-semantic-release
* fix: Added more annotations!
* fix: Added C-MTEB (#1786)
Added C-MTEB
* 1.29.1
Automatically generated by python-semantic-release
* docs: Add contact to MMTEB benchmarks (#1796)
* Add myself to MMTEB benchmarks
* lint
* fix: loading pre 11 (#1798)
* fix loading pre 11
* add similarity
* lint
* run all task types
* 1.29.2
Automatically generated by python-semantic-release
* fix: allow to load no revision available (#1801)
* fix allow to load no revision available
* lint
* add require_model_meta to leaderboard
* lint
* 1.29.3
Automatically generated by python-semantic-release
---------
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
---------
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
* fix: Added Misc Chinese models (#1819)
* Added moka and piccolo models to overview file
* Added Text2Vec models
* Added various Chinese embedding models
---------
Co-authored-by: Isaac Chung <[email protected]>
* 1.29.8
Automatically generated by python-semantic-release
* fix: Fixed eval split for MultilingualSentiment in C-MTEB (#1804)
* Fixed eval split for MultilingualSentiment in C-MTEB
* FIxed splits for atec, bq and stsb in C-MTEB
* 1.29.9
Automatically generated by python-semantic-release
* fix: subsets to run (#1830)
* fix split evals
* add test
* lint
* fix moka
* add assert
* fix: Remove default params, `public_training_data` and `memory usage` in `ModelMeta` (#1794)
* fix: Leaderboard: `K` instead of `M`
Fixes #1752
* format
* fixed existing annotations to refer to task name instead of hf dataset
* added annotation to nvidia
* added voyage
* added uae annotations
* Added stella annotations
* sentence trf models
* added salesforce and e5
* jina
* bge + model2vec
* added llm2vec annotations
* add jasper
* format
* format
* Updated annotations and moved jina models
* make models parameters needed to be filled
* fix tests
* remove comments
* remove model meta from test
* fix model meta from split
* fix: add even more training dataset annotations (#1793)
* fix: update max tokens for OpenAI (#1772)
update max tokens
* ci: skip AfriSentiLID for now (#1785)
* skip AfriSentiLID for now
* skip relevant test case instead
---------
Co-authored-by: Isaac Chung <[email protected]>
* 1.28.7
Automatically generated by python-semantic-release
* ci: fix model loading test (#1775)
* pass base branch into the make command as an arg
* test a file that has custom wrapper
* what about overview
* just dont check overview
* revert instance check
* explicitly omit overview and init
* remove test change
* try on a lot of models
* revert test model file
---------
Co-authored-by: Isaac Chung <[email protected]>
* feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (#1787)
* feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
```py
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
# was eq. to:
task = mteb.get_task("STS22", languages=["eng"])
task.hf_subsets
# correct filtering to English datasets:
# ['en', 'de-en', 'es-en', 'pl-en', 'zh-en']
# However it should be:
# ['en']
# with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == "STS22"][0]
task.hf_subsets
# ['en']
# eq. to
task = mteb.get_task("STS22", hf_subsets=["en"])
# which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task("STS22", languages=["eng"], exclusive_language_filter=True)
```
* format
* remove "en-ext" from AmazonCounterfactualClassification
* fixed mteb(deu)
* fix: simplify in a few areas
* fix: Add gritlm
* 1.29.0
Automatically generated by python-semantic-release
* fix: Added more annotations!
* fix: Added C-MTEB (#1786)
Added C-MTEB
* 1.29.1
Automatically generated by python-semantic-release
* docs: Add contact to MMTEB benchmarks (#1796)
* Add myself to MMTEB benchmarks
* lint
* fix: loading pre 11 (#1798)
* fix loading pre 11
* add similarity
* lint
* run all task types
* 1.29.2
Automatically generated by python-semantic-release
* fix: allow to load no revision available (#1801)
* fix allow to load no revision available
* lint
* add require_model_meta to leaderboard
* lint
* 1.29.3
Automatically generated by python-semantic-release
---------
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
* fig merges
* update models info
* change public_training_code to str
* change `public_training_code=False` to None
* remove annotations
* remove annotations
* remove changed annotations
* remove changed annotations
* remove `public_training_data` and `memory usage`
* make framework not optional
* make framework non-optional
* empty frameworks
* add framework
* fix tests
* Update mteb/models/overview.py
Co-authored-by: Isaac Chung <[email protected]>
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
* 1.29.10
Automatically generated by python-semantic-release
* fix: Add reported annotation and re-added public_training_data (#1846)
* fix: Add additional dataset annotations
* fix: readded public training data
* update voyage annotations
* 1.29.11
Automatically generated by python-semantic-release
* fix: Leaderboard Refinements (#1849)
* Added better descriptions to benchmarks and removed beta tags
* Fixed zero-shot filtering on app loading
* Added zero-shot definition in an accordion
* NaN values are now filled with blank
* Added type hints to filter_models
* 1.29.12
Automatically generated by python-semantic-release
* rest of the merge conflicts
* fix merge conflicts
* fill in model meta defaults
* fix ModeMeta modalities
* fix metadata pydantic errors;
* assert model.model instead since it is a wrapper
* fix: Fixed leaderboard search bar (#1852)
Fixed leaderboard search bar
* 1.29.13
Automatically generated by python-semantic-release
* fix: Hotfixed public_training_data type annotation (#1857)
Fixed public_training_data flag type to include boolean, as this is how all models are annotated
* fix: Fix zeta alpha mistral (#1736)
* fix zeta alpha mistral
* update use_instructions
* update training datasets
* Update mteb/models/e5_instruct.py
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* update float
* Update mteb/models/e5_instruct.py
---------
Co-authored-by: Kenneth Enevoldsen <[email protected]>
* Add more annotations (#1833)
* apply additions from #1794
* add annotations for rumodels
* add nomic training data
* fix metadata
* update rest of model meta
* fix bge reranker
* 1.29.14
Automatically generated by python-semantic-release
* fix: Adding missing model meta (#1856)
* Added CDE models
* Added bge-en-icl
* Updated CDE to bge_full_data
* Fixed public_training_data flag type to include boolean, as this is how all models are annotated
* Added public training data link instead of bool to CDE and BGE
* Added GME models
* Changed Torch to PyTorch
* Added metadata on LENS models
* Added ember_v1
* Added metadata for amazon titan
* Removed GME implementation
* fix Encoder class
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Helena Kloosterman <[email protected]>
Co-authored-by: Alexey Vatolin <[email protected]>
Co-authored-by: Kenneth Enevoldsen <[email protected]>
Co-authored-by: Elias H <[email protected]>
Co-authored-by: Youngjoon Jang <[email protected]>
Co-authored-by: Márton Kardos <[email protected]>
Co-authored-by: Roman Solomatin <[email protected]>
Co-authored-by: Napuh <[email protected]>
Co-authored-by: Daniel Buades Marcos <[email protected]>
Co-authored-by: nadshe <[email protected]>
Co-authored-by: olivierr42 <[email protected]>
Co-authored-by: Thivyanth <[email protected]>
Co-authored-by: Rafał Poświata <[email protected]>
Co-authored-by: Omar Elshehy <[email protected]>
Co-authored-by: Omar Elshehy <[email protected]>
Co-authored-by: Sam <[email protected]>
Co-authored-by: sam021313 <[email protected]>
Co-authored-by: KGupta10 <[email protected]>
Co-authored-by: Aashka Trivedi <[email protected]>
Co-authored-by: Niklas Muennighoff <[email protected]>
Co-authored-by: chenghao xiao <[email protected]>
Co-authored-by: Ken Wang <[email protected]>
Co-authored-by: Orion Weller <[email protected]>
Co-authored-by: Isaac Chung <[email protected]>
Co-authored-by: Samuel Yang <[email protected]>
Co-authored-by: Samuel Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants