Skip to content
This repository has been archived by the owner on Oct 7, 2022. It is now read-only.

复现Roberta-Large和ELECTRA-Large的问题 #46

Open
yiyaxiaozhi opened this issue Mar 20, 2022 · 1 comment
Open

复现Roberta-Large和ELECTRA-Large的问题 #46

yiyaxiaozhi opened this issue Mar 20, 2022 · 1 comment

Comments

@yiyaxiaozhi
Copy link

yiyaxiaozhi commented Mar 20, 2022

我使用的环境是
pytorch 1.4.0
transformers 2.8.0
参照着文档https://github.com/thunlp/OpenMatch/blob/master/docs/experiments-msmarco.md 中的训练命令

CUDA_VISIBLE_DEVICES=0 \
python train.py\
        -task ranking \
        -model bert \
        -train ./data/train.jsonl \
        -max_input 3000000 \
        -save ./checkpoints/electra_large.bin \
        -dev queries=./data/queries.dev.small.tsv,docs=./data/collection.tsv,qrels=./data/qrels.dev.small.tsv,trec=./data/run.msmarco-passage.dev.small.100.trec \
        -qrels ./data/qrels.dev.small.tsv \
        -vocab google/electra-large-discriminator \
        -pretrain google/electra-large-discriminator \
        -res ./results/electra_large.trec \
        -metric mrr_cut_10 \
        -max_query_len 32 \
        -max_doc_len 256 \
        -epoch 1 \
        -batch_size 2 \
        -lr 5e-6 \
        -eval_every 10000

训练到global step 约18w local step约72w的时候,训练的验证MRR就会从0.33一直往下降,且整个训练过程结束后,最高的MRR只到了0.336.是什么地方遗漏了会导致这样的问题呢?

@Yu-Shi
Copy link
Member

Yu-Shi commented Jun 6, 2022

您试试加大batch size可不可以呢?您可以采用多卡训练或者gradient accumulation

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants