Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dev to main] v1.3.0 #55

Merged
merged 71 commits into from
Aug 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
cd5947f
Implement TransformersEmbedder to support data parallelism
lsz05 Jul 1, 2024
6f4b444
Add output format choices
lsz05 Jul 1, 2024
e29e5ae
Add test cases
lsz05 Jul 1, 2024
f99743c
Remove unnecessary comment
lsz05 Jul 2, 2024
b36b8a5
Fix
lsz05 Jul 2, 2024
5e72df0
Merge branch 'feature/parallel' of github.com:sbintuitions/JMTEB into…
lsz05 Jul 2, 2024
94dc67a
Merge pull request #38 from sbintuitions/feature/parallel
lsz05 Jul 2, 2024
a78d93f
Make leaderboard
lsz05 Jul 10, 2024
f35285b
Add a link to leaderboard in README
lsz05 Jul 10, 2024
8057f3f
Fix linking
lsz05 Jul 10, 2024
25fd21b
Fix linking
lsz05 Jul 10, 2024
094bdc5
Merge pull request #40 from sbintuitions/leaderboard
lsz05 Jul 10, 2024
69f4c3a
Support model_kwargs setting in TextEmbedder, and support torch dtype…
lsz05 Jul 10, 2024
d4725b0
Minor fix
lsz05 Jul 10, 2024
20c25fa
Add test cases for bf16
lsz05 Jul 10, 2024
c446606
Fix lint
lsz05 Jul 10, 2024
8c7727a
Rename function
lsz05 Jul 12, 2024
a0d31d8
Rename function
lsz05 Jul 12, 2024
fe590c1
Merge pull request #41 from sbintuitions/feature/model_kwargs
masaya-ohagi Jul 12, 2024
fdc7cdf
Add prediction logging for classification and retrieval
lsz05 Jul 24, 2024
fea50d5
fix EvaluatonResults
lsz05 Jul 24, 2024
624ceeb
Fix embedding format in reranking
lsz05 Jul 26, 2024
86eef3d
Add an argument to control how many predicted docs are logged
lsz05 Jul 26, 2024
f5fe5d3
fix imports and tests
lsz05 Jul 26, 2024
957b8f6
Implement prediction logging in STS
lsz05 Jul 26, 2024
eea32dd
Implement prediction logging in clustering
lsz05 Jul 26, 2024
1d53001
give up implementing prediction logging in pair classification
lsz05 Jul 26, 2024
2da792a
Add prediction logging for reranking
lsz05 Jul 29, 2024
8890f3b
Add an option to output predictions for all datasets
lsz05 Jul 29, 2024
2eac1d7
Update README
lsz05 Jul 29, 2024
ba87d84
Update README format
lsz05 Jul 29, 2024
161d40a
Merge pull request #45 from sbintuitions/fix/reranking_tensor
lsz05 Jul 30, 2024
19d16d1
Fix README
lsz05 Jul 30, 2024
5ad6f69
ignore MD028
lsz05 Jul 30, 2024
a3a45f1
Merge pull request #43 from sbintuitions/feature/log_predictions
lsz05 Jul 31, 2024
ae8d872
Fix bfloat bug
lsz05 Jul 31, 2024
7881bd7
Merge pull request #47 from sbintuitions/fix/sts_bfloat16_tensor
lsz05 Jul 31, 2024
8d25b3c
hot-fix to OOM error with multi-GPUs
akiFQC Aug 5, 2024
c0ef36c
format
akiFQC Aug 5, 2024
9d30622
update CI python ver.
akiFQC Aug 5, 2024
683fbd8
del a linel
akiFQC Aug 5, 2024
0810589
Merge pull request #51 from sbintuitions/fix/multi_gpu_rerank_retrieval
lsz05 Aug 5, 2024
4ffd114
use sbert embedder with encode_multi_process
akiFQC Aug 7, 2024
7e8e031
add chunk_size_factor
akiFQC Aug 7, 2024
6fb4a6e
fix chunk_size_factor
akiFQC Aug 7, 2024
c361727
small fix chunk_size
akiFQC Aug 7, 2024
2809731
format
akiFQC Aug 7, 2024
7744381
Add saving predictions to jsonl
lsz05 Aug 7, 2024
58644a4
Fix lint
lsz05 Aug 7, 2024
dc96352
Merge pull request #52 from sbintuitions/fix/prediction_logging_to_jsonl
akiFQC Aug 8, 2024
7adb030
Merge remote-tracking branch 'upstream/dev' into improve/batch_size_s…
akiFQC Aug 8, 2024
62368bc
add: code and tests of multi-gpu inference with pytorch DP
akiFQC Aug 8, 2024
b7376c0
update init
akiFQC Aug 8, 2024
9508f3c
debug DP
akiFQC Aug 8, 2024
b9a50c6
revert sbert embedder
akiFQC Aug 8, 2024
c6f079a
format
akiFQC Aug 8, 2024
4261cf5
find_executable_batch_size
akiFQC Aug 8, 2024
39f98a3
add comment
akiFQC Aug 8, 2024
abe4f88
debug
akiFQC Aug 9, 2024
61aa4da
fix to review
akiFQC Aug 9, 2024
79a6c8b
update
akiFQC Aug 9, 2024
56f415d
del unused import
akiFQC Aug 9, 2024
ca71155
Merge pull request #53 from sbintuitions/improve/batch_size_setting
akiFQC Aug 13, 2024
4694d36
Version bump-up to v1.3.0
lsz05 Aug 19, 2024
0496926
Merge pull request #54 from sbintuitions/v1.3.0
lsz05 Aug 19, 2024
864c3dd
Update README
lsz05 Aug 20, 2024
e9ba66f
fix lint
lsz05 Aug 20, 2024
b51faeb
fix lint
lsz05 Aug 20, 2024
0737287
Add note for batch size
lsz05 Aug 20, 2024
a22291f
fix lint
lsz05 Aug 20, 2024
89fe77e
Merge pull request #56 from sbintuitions/v1.3.0
lsz05 Aug 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
run-tests:
runs-on: ubuntu-latest
env:
PYTHON_VERSION: "3.9"
PYTHON_VERSION: "3.10"
NO_CACHE: ${{ github.event.inputs.no-cache || 'false' }}
steps:
- name: Checkout
Expand Down Expand Up @@ -53,7 +53,7 @@ jobs:
lint_check:
runs-on: ubuntu-latest
env:
PYTHON_VERSION: "3.9"
PYTHON_VERSION: "3.10"
steps:
- uses: actions/checkout@v3

Expand Down
3 changes: 2 additions & 1 deletion .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
MD013: false
MD040: false
MD025: false
MD025: false
MD028: false
40 changes: 39 additions & 1 deletion README.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2点更新お願いします。

  1. リーダーボードについての記述を強調していただく(現状、目立ってないので見落としそう)
  2. 複数GPUについてのnoteを追加いただく

Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

This is an easy-to-use evaluation script designed for JMTEB evaluation.

JMTEB leaderboard is [here](leaderboard.md). A guidance for submission is coming soon.

## Quick start

```bash
Expand Down Expand Up @@ -38,4 +40,40 @@ poetry run python -m jmteb \
```

> [!NOTE]
> Some tasks (e.g., AmazonReviewClassification in classification, JAQKET and Mr.TyDi-ja in retrieval, esci in reranking) are time-consuming and memory-consuming. Heavy retrieval tasks take hours to encode the large corpus, and use much memory for the storage of such vectors. If you want to exclude them, add `--eval_exclude "['amazon_review_classification', 'mrtydi', 'jaqket', 'esci']"`.
> Some tasks (e.g., AmazonReviewClassification in classification, JAQKET and Mr.TyDi-ja in retrieval, esci in reranking) are time-consuming and memory-consuming. Heavy retrieval tasks take hours to encode the large corpus, and use much memory for the storage of such vectors. If you want to exclude them, add `--eval_exclude "['amazon_review_classification', 'mrtydi', 'jaqket', 'esci']"`. Similarly, you can also use `--eval_include` to include only evaluation datasets you want.

> [!NOTE]
> If you want to log model predictions to further analyze the performance of your model, you may want to use `--log_predictions true` to enable all evaluators to log predictions. It is also available to set whether to log in the config of evaluators.

## Multi-GPU support

There are two ways to enable multi-GPU evaluation.

* New class `DPSentenceBertEmbedder` ([here](src/jmteb/embedders/data_parallel_sbert_embedder.py)).

```bash
poetry run python -m jmteb \
--evaluators "src/configs/tasks/jsts.jsonnet" \
--embedder DPSentenceBertEmbedder \
--embedder.model_name_or_path "<model_name_or_path>" \
--save_dir "output/<model_name_or_path>"
```

* With `torchrun`, multi-GPU in [`TransformersEmbedder`](src/jmteb/embedders/transformers_embedder.py) is available. For example,

```bash
MODEL_NAME=<model_name_or_path>
MODEL_KWARGS="\{\'torch_dtype\':\'torch.bfloat16\'\}"
torchrun \
--nproc_per_node=$GPUS_PER_NODE --nnodes=1 \
src/jmteb/__main__.py --embedder TransformersEmbedder \
--embedder.model_name_or_path ${MODEL_NAME} \
--embedder.pooling_mode cls \
--embedder.batch_size 4096 \
--embedder.model_kwargs ${MODEL_KWARGS} \
--embedder.max_seq_length 512 \
--save_dir "output/${MODEL_NAME}" \
--evaluators src/jmteb/configs/jmteb.jsonnet
```

Note that the batch size here is global batch size (`per_device_batch_size` × `n_gpu`).
62 changes: 62 additions & 0 deletions docs/results/MU-Kindai/Japanese-DiffCSE-BERT-base/summary.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.7809527709426081
},
"amazon_review_classification": {
"macro_f1": 0.5155899232320224
},
"massive_intent_classification": {
"macro_f1": 0.7879373479249787
},
"massive_scenario_classification": {
"macro_f1": 0.8662625888023707
}
},
"Reranking": {
"esci": {
"ndcg@10": 0.9095168116460639
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.42314124780036416
},
"jaqket": {
"ndcg@10": 0.36199154051747723
},
"mrtydi": {
"ndcg@10": 0.07810683176415421
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.6077212544951452
},
"nlp_journal_title_abs": {
"ndcg@10": 0.6433890489201118
},
"nlp_journal_title_intro": {
"ndcg@10": 0.39317174536190913
}
},
"STS": {
"jsick": {
"spearman": 0.754165277432144
},
"jsts": {
"spearman": 0.7558202366183716
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.4966545453348478
},
"mewsc16": {
"v_measure_score": 0.3877356318022785
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.6237623762376237
}
}
}
62 changes: 62 additions & 0 deletions docs/results/MU-Kindai/Japanese-MixCSE-BERT-base/summary.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.776174162517931
},
"amazon_review_classification": {
"macro_f1": 0.5085781180553806
},
"massive_intent_classification": {
"macro_f1": 0.7718541530739129
},
"massive_scenario_classification": {
"macro_f1": 0.8592571786794985
}
},
"Reranking": {
"esci": {
"ndcg@10": 0.9100551950168166
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.42368135774043536
},
"jaqket": {
"ndcg@10": 0.37721850397542034
},
"mrtydi": {
"ndcg@10": 0.07878085186566607
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.636999375405723
},
"nlp_journal_title_abs": {
"ndcg@10": 0.6413498649875696
},
"nlp_journal_title_intro": {
"ndcg@10": 0.397250919496823
}
},
"STS": {
"jsick": {
"spearman": 0.7756925231422259
},
"jsts": {
"spearman": 0.7652968548841591
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.5262387436934941
},
"mewsc16": {
"v_measure_score": 0.37277574537292835
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.623321554770318
}
}
}
62 changes: 62 additions & 0 deletions docs/results/MU-Kindai/Japanese-SimCSE-BERT-base-sup/summary.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.7619809437515043
},
"amazon_review_classification": {
"macro_f1": 0.5205592432502059
},
"massive_intent_classification": {
"macro_f1": 0.7789367871593064
},
"massive_scenario_classification": {
"macro_f1": 0.8490320705866646
}
},
"Reranking": {
"esci": {
"ndcg@10": 0.9065584234991577
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.4411487123884245
},
"jaqket": {
"ndcg@10": 0.39613283459361814
},
"mrtydi": {
"ndcg@10": 0.08154879873415645
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.6276035246534508
},
"nlp_journal_title_abs": {
"ndcg@10": 0.5838785018803183
},
"nlp_journal_title_intro": {
"ndcg@10": 0.3489329387182086
}
},
"STS": {
"jsick": {
"spearman": 0.7463567093877269
},
"jsts": {
"spearman": 0.7468283806971927
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.41041888940251137
},
"mewsc16": {
"v_measure_score": 0.45175891401665724
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.6236711552090717
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.7619809437515043
},
"amazon_review_classification": {
"macro_f1": 0.5152108946679324
},
"massive_intent_classification": {
"macro_f1": 0.7895128475562229
},
"massive_scenario_classification": {
"macro_f1": 0.865430249169577
}
},
"Reranking": {
"esci": {
"ndcg@10": 0.9115815294581953
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.47387768939865055
},
"jaqket": {
"ndcg@10": 0.3956683977353904
},
"mrtydi": {
"ndcg@10": 0.1144234568266308
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.6416096544574569
},
"nlp_journal_title_abs": {
"ndcg@10": 0.7023477497744102
},
"nlp_journal_title_intro": {
"ndcg@10": 0.4536720868647063
}
},
"STS": {
"jsick": {
"spearman": 0.781770693640686
},
"jsts": {
"spearman": 0.7680617109850311
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.5301620892693397
},
"mewsc16": {
"v_measure_score": 0.4034776723308173
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.6238078417520311
}
}
}
62 changes: 62 additions & 0 deletions docs/results/MU-Kindai/Japanese-SimCSE-BERT-large-sup/summary.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
{
"Classification": {
"amazon_counterfactual_classification": {
"macro_f1": 0.7725250131648236
},
"amazon_review_classification": {
"macro_f1": 0.5341627023771393
},
"massive_intent_classification": {
"macro_f1": 0.7682863192709365
},
"massive_scenario_classification": {
"macro_f1": 0.8639396658321546
}
},
"Reranking": {
"esci": {
"ndcg@10": 0.9094717381883379
}
},
"Retrieval": {
"jagovfaqs_22k": {
"ndcg@10": 0.47038430326303626
},
"jaqket": {
"ndcg@10": 0.44101304795602897
},
"mrtydi": {
"ndcg@10": 0.11429128335865787
},
"nlp_journal_abs_intro": {
"ndcg@10": 0.43434267808785576
},
"nlp_journal_title_abs": {
"ndcg@10": 0.6240651697600803
},
"nlp_journal_title_intro": {
"ndcg@10": 0.3651687833824759
}
},
"STS": {
"jsick": {
"spearman": 0.787528927058734
},
"jsts": {
"spearman": 0.7781413957931619
}
},
"Clustering": {
"livedoor_news": {
"v_measure_score": 0.48448646364489634
},
"mewsc16": {
"v_measure_score": 0.43168522818790694
}
},
"PairClassification": {
"paws_x_ja": {
"binary_f1": 0.6235418875927891
}
}
}
Loading
Loading