Rejection Sampling + Run Again (#7)

natolambert · Aug 11, 2024 · 18d33da · 18d33da
1 parent e385796
commit 18d33da
Show file tree

Hide file tree

Showing 8 changed files with 70 additions and 9 deletions.
diff --git a/.github/workflows/static.yml b/.github/workflows/static.yml
@@ -38,10 +38,6 @@ jobs:
         run: |
           echo "/Library/TeX/texbin" >> $GITHUB_PATH
           echo "PATH=$PATH:/Library/TeX/texbin" >> $GITHUB_ENV
-          xelatex --version  # Verify xelatex is accessible
-          if ! command -v xelatex &> /dev/null; then
-            sudo ln -s /Library/TeX/texbin/xelatex /usr/local/bin/xelatex
-          fi
 
 
 

diff --git a/Makefile b/Makefile
@@ -42,7 +42,7 @@ PANDOC_COMMAND = pandoc
 DOCX_ARGS = --standalone --reference-doc templates/docx.docx
 EPUB_ARGS = --template templates/epub.html --epub-cover-image $(COVER_IMAGE)
 HTML_ARGS = --template templates/html.html --standalone --to html5 
-PDF_ARGS = --template templates/pdf.latex --pdf-engine xelatex
+PDF_ARGS = --template templates/pdf.tex --pdf-engine xelatex
 NESTED_HTML_TEMPLATE = templates/chapter.html
 
 # Per-format file dependencies

diff --git a/README.md b/README.md
@@ -59,6 +59,7 @@ sudo apt-get install texlive-fonts-recommended texlive-xetex
 brew install pandoc
 brew install make
 ```
+(See below for `pandoc-crossref`)
 
 ### Folder structure
 

diff --git a/chapters/02-installation.md → chapters/02-optimization.md b/chapters/02-installation.md → chapters/02-optimization.md
@@ -1,4 +1,4 @@
-# Installation
+# Optimizaiton - Overview
 
 This is the installation chapter.
 We love the book [@russell2016artificial].

diff --git a/chapters/03-opt-rejection-sampling.md b/chapters/03-opt-rejection-sampling.md
@@ -0,0 +1,14 @@
+# Rejection Sampling
+
+Rejection Sampling (RS) is a popular and simple baseline for performing preference fine-tuning. 
+Rejection sampling operates by curating new candidate instructions, filtering them based on a trained reward model, and then fine-tuning the original model only on the top completions.
+
+The name originates from computational statistics  [@gilks1992adaptive], where one wishes to sample from a complex distribution, but does not have a direct method to do so.
+To alleviate this, one samples from a simpler to model distribution and uses a heuristic to check if the sample is permissible.
+With language models, the target distribution is high-quality answers to instructions, the filter is a reward model, and the sampling distribution is the current model.
+
+## Related works
+
+Many prominent RLHF and preference fine-tuning papers have used rejection sampling as a baseling, but a canonical implementation and documentation does not exist
+
+WebGPT [@nakano2021webgpt], Anthropic's Helpful and Harmless agent[@bai2022training], OpenAI's popular paper on process reward models [@lightman2023let], Llama 2 Chat models [@touvron2023llama], and other seminal works all use this baseline.
diff --git a/chapters/03-usage.md b/chapters/03-usage.md
diff --git a/chapters/bib.bib b/chapters/bib.bib
@@ -10,4 +10,43 @@ @book{russell2016artificial
   author={Russell, Stuart J and Norvig, Peter},
   year={2016},
   publisher={Pearson}
+}
+
+@article{gilks1992adaptive,
+  title={Adaptive rejection sampling for Gibbs sampling},
+  author={Gilks, Walter R and Wild, Pascal},
+  journal={Journal of the Royal Statistical Society: Series C (Applied Statistics)},
+  volume={41},
+  number={2},
+  pages={337--348},
+  year={1992},
+  publisher={Wiley Online Library}
+}
+
+@article{nakano2021webgpt,
+  title={Webgpt: Browser-assisted question-answering with human feedback},
+  author={Nakano, Reiichiro and Hilton, Jacob and Balaji, Suchir and Wu, Jeff and Ouyang, Long and Kim, Christina and Hesse, Christopher and Jain, Shantanu and Kosaraju, Vineet and Saunders, William and others},
+  journal={arXiv preprint arXiv:2112.09332},
+  year={2021}
+}
+
+@article{bai2022training,
+  title={Training a helpful and harmless assistant with reinforcement learning from human feedback},
+  author={Bai, Yuntao and Jones, Andy and Ndousse, Kamal and Askell, Amanda and Chen, Anna and DasSarma, Nova and Drain, Dawn and Fort, Stanislav and Ganguli, Deep and Henighan, Tom and others},
+  journal={arXiv preprint arXiv:2204.05862},
+  year={2022}
+}
+
+@article{lightman2023let,
+  title={Let's verify step by step},
+  author={Lightman, Hunter and Kosaraju, Vineet and Burda, Yura and Edwards, Harri and Baker, Bowen and Lee, Teddy and Leike, Jan and Schulman, John and Sutskever, Ilya and Cobbe, Karl},
+  journal={arXiv preprint arXiv:2305.20050},
+  year={2023}
+}
+
+@article{touvron2023llama,
+  title={Llama 2: Open foundation and fine-tuned chat models},
+  author={Touvron, Hugo and Martin, Louis and Stone, Kevin and Albert, Peter and Almahairi, Amjad and Babaei, Yasmine and Bashlykov, Nikolay and Batra, Soumya and Bhargava, Prajjwal and Bhosale, Shruti and others},
+  journal={arXiv preprint arXiv:2307.09288},
+  year={2023}
 }
diff --git a/templates/pdf.latex → templates/pdf.tex b/templates/pdf.latex → templates/pdf.tex
@@ -185,6 +185,18 @@
 $if(indent)$
 $else$
 \makeatletter
+% new code here
+\newsavebox\pandoc@box
+\newcommand*\pandocbounded[1]{%
+ \sbox\pandoc@box{#1}%
+ \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}%
+ \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}%
+ \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi%
+ \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}%
+ \else\usebox{\pandoc@box}%
+ \fi%
+}
+
 \@ifundefined{KOMAClassName}{% if non-KOMA class
   \IfFileExists{parskip.sty}{%
     \usepackage{parskip}
@@ -427,6 +439,8 @@
 $endif$
 $endif$
 
+
+
 \begin{document}
 $if(has-frontmatter)$
 \frontmatter