Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

『Fix』失效链接和需要问题 #34

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions docs/content/ch04.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ $$
输入 $x$ :Thank you $<X>$ me to your party $<Y>$ week.
输出 $y$ : $<X>$ for inviting $<Y>$ last

### 10.2.2 检索方法
### 4.2.2 检索方法
假设我们有一个存储库 $S$ ,它是一组序列(通常是文档或段落)的集合。

$$
Expand All @@ -408,7 +408,7 @@ $$
- 检索 $(x',y') \in S$ ,使得 $x'$ 和 $x $最相似。
- 生成 $y = y'$ 。

### 10.2.3 Retrieval-augmented generation (RAG) ([Lewis et al., 2020](https://arxiv.org/pdf/2005.11401.pdf))
### 4.2.3 Retrieval-augmented generation (RAG) ([Lewis et al., 2020](https://arxiv.org/pdf/2005.11401.pdf))

![rag-architecture](images/rag-architecture.png)

Expand All @@ -420,7 +420,7 @@ $$

在实践中, $\sum_{z \in S}$ 由前k个代替(类似于为混合专家选择前1个或2个专家)。

#### 10.2.3.1 检索器
#### 4.2.3.1 检索器

Dense Passage Retrieval (DPR)** ([Karpukhin et al., 2020](https://arxiv.org/pdf/2004.04906.pdf))

Expand All @@ -433,7 +433,7 @@ $$
- 负例:随机或者使用BM25检索出的不包含答案的段落
- 推理:使用[FAISS](https://github.com/facebookresearch/faiss)(Facebook AI相似性搜索)

#### 10.2.3.2 生成器
#### 4.2.3.2 生成器

$$
p(y \mid z, x) = p(y \mid \text{concat}(z, x)).
Expand All @@ -442,12 +442,12 @@ $$
- 使用BART-large(400M参数),其中输入为检索出的段落 $z$ 和输入 $x$
- 回想一下,BART是基于网络、新闻、书籍和故事数据,使用去噪目标函数(例如,掩码)训练得到的

#### 10.2.3.3 训练
#### 4.2.3.3 训练

- 用BART、DPR(用BERT初始化)初始化
- 训练 $\text{BART}$和$\text{BERT}_\text{q}$

#### 10.2.3.4 实验
#### 4.2.3.4 实验

- 在Jeopardy问题生成任务上,输入Hemingway的检索结果:

Expand All @@ -458,28 +458,28 @@ $$

这里引用GPT-3 few-shot的结果进行比较:NaturalQuestions (29.9%), WebQuestions (41.5%), TriviaQA (71.2%)

### 10.2.4 RETRO ([Borgeaud et al., 2021](https://arxiv.org/pdf/2112.04426.pdf))
### 4.2.4 RETRO ([Borgeaud et al., 2021](https://arxiv.org/pdf/2112.04426.pdf))

- 基于32个token的块进行检索
- 存储库:2 trillion tokens
- 70亿参数(比GPT-3少25倍)
- 使用冻结的BERT进行检索(不更新)
- 在MassiveText上训练(与训练Gopher的数据集相同)

#### 10.2.4.1 实验结果
#### 4.2.4.1 实验结果

- 在语言建模方面表现出色
- NaturalQuestions准确率:45.5%(SOTA为54.7%)

![retro-lm-results](images/retro-lm-results.png)

### 10.2.5 讨论
### 4.2.5 讨论

- 基于检索的模型高度适合知识密集型的问答任务。
- 除了可扩展性之外,基于检索的模型还提供了可解释性和更新存储库的能力。
- 目前尚不清楚这些模型是否具有与稠密Transformer相同的通用能力。

## 10.3 总体总结
## 4.3 总体总结

- 为了扩大模型规模,需要改进稠密Transformer。
- 混合专家和基于检索的方法相结合更有效。
Expand Down
3 changes: 2 additions & 1 deletion docs/content/ch06.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,7 +339,8 @@ Adam将存储从2倍的模型参数( $\theta_t,g_t$ )增加到了4倍( $\t

## 延伸阅读

- [混合精度训练](https://lilianweng.github.io/lil-log/2021/09/24/train-large-neural-networks.html#mixed-precision-training)
- [混合精度训练](https://lilianweng.github.io/posts/2021-09-25-train-large/#mixed-precision-training)
- [Mixed Precision Training](https://arxiv.org/pdf/1710.03740.pdf). Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu. ICLR 2018.
- [Fixing Weight Decay Regularization in Adam](https://arxiv.org/pdf/1711.05101.pdf). I. Loshchilov, F. Hutter. 2017. 介绍了AdamW
- [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators](https://arxiv.org/pdf/2003.10555.pdf). Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. ICLR 2020.
- [DeBERTa: Decoding-enhanced BERT with Disentangled Attention](https://arxiv.org/pdf/2006.03654.pdf). Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. ICLR 2020.