-
Notifications
You must be signed in to change notification settings - Fork 845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
增加Sentence-Bert论文讲解 #858
base: master
Are you sure you want to change the base?
增加Sentence-Bert论文讲解 #858
Conversation
请参考: |
|
||
并且在对bert模型进行微调时,设置了三个目标函数,用于不同任务的训练优化,具体如下: | ||
|
||
### 2.1分类目标函数(Classification Objective Function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
标题不需要打括号带英文,正文里带英文
|
||
![](../../images/natural_language_processing/Sentence-Bert/sentence_bert_1.png) | ||
|
||
### 2.2回归目标函数(Regression Objective Function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
|
||
![](../../images/natural_language_processing/Sentence-Bert/sentence_bert_2.png) | ||
|
||
### 2.3三元目标函数(Triplet Objective Function) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
## 1介绍 | ||
[Bert模型](https://paddlepedia.readthedocs.io/en/latest/tutorials/pretrain_model/bert.html#)已经在NLP各大任务中都展现出了强者的姿态。在语义相似度计算(semantic textual similarity)任务上也不例外,但是,由于bert模型规定,在计算语义相似度时,需要将两个句子同时进入模型,进行信息交互,这造成大量的计算开销,使得它既不适合语义相似度搜索,也不适合非监督任务,比如聚类。 | ||
|
||
例如,有10000个句子,我们想要找出最相似的句子对,需要计算(10000*9999/2)次,需要大约65个小时。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个计算过程是?9999表示啥?2表示啥?
|
||
### 2.3三元目标函数(Triplet Objective Function) | ||
|
||
在这个目标函数下,将模型框架进行修改,将原来的两个输入,变成三个句子输入。给定一个锚定句(anchor sentence)$a$,一个肯定句(positive sentence)$p$和一个否定句(negative sentence)$n$,模型通过使$a到p$的距离小于$a到n$的距离,来优化模型。使其目标函数$o$最小,即 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anchor sentence是啥?解释一下
其中,$s_{a}$、$s_{p}$和$s_{n}$分别表示句子$a$、$p$和$n$的向量,$||·||$表示距离度量,$\varepsilon$表示边距。在论文中,距离度量为欧式距离,边距大小为1。 | ||
|
||
### 2.4训练参数 | ||
模型训练过程中,批次大小为16,学习率为2e-5,采用Adam优化器进行优化,并且默认的池化策略为平均池化。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adam还是AdamW?现在的BERT的优化器应该都是AdamW
|
增加Sentence-Bert论文讲解