Skip to content

Commit

Permalink
WIP add next chapter buttons, fix title link (#29)
Browse files Browse the repository at this point in the history
  • Loading branch information
natolambert authored Jan 5, 2025
1 parent 012c844 commit 969d4e4
Show file tree
Hide file tree
Showing 19 changed files with 138 additions and 3 deletions.
7 changes: 7 additions & 0 deletions chapters/01-introduction.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "Home"
prev-url: "https://rlhfbook.com/"
next-chapter: "Key Related Works"
next-url: "02-related-works.html"
---

# Introduction

Reinforcement learning from Human Feedback (RLHF) is a technique used to incorporate human information into AI systems.
Expand Down
7 changes: 7 additions & 0 deletions chapters/04-related-works.md → chapters/02-related-works.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "Introduction"
prev-url: "01-introduction.html"
next-chapter: "Problem Setup"
next-url: "03-setup.html"
---

# Key Related Works

In this chapter we detail the key papers and projects that got the RLHF field to where it is today.
Expand Down
7 changes: 7 additions & 0 deletions chapters/05-setup.md → chapters/03-setup.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "Key Related Works"
prev-url: "02-related-works.html"
next-chapter: "Problem Formulation"
next-url: "04-optimization.html"
---

# Definitions

This chapter includes all the definitions, symbols, and operatings frequently used in the RLHF process.
Expand Down
6 changes: 6 additions & 0 deletions chapters/03-optimization.md → chapters/04-optimization.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
---
prev-chapter: "Problem Setup"
prev-url: "03-setup.html"
next-chapter: "The Nature of Preferences"
next-url: "05-preferences.html"
---

# Problem Formulation

Expand Down
6 changes: 6 additions & 0 deletions chapters/02-preferences.md → chapters/05-preferences.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
---
prev-chapter: "Problem Formulation"
prev-url: "04-optimization.html"
next-chapter: "Preference Data"
next-url: "06-preference-data.html"
---

# The Nature of Preferences

Expand Down
7 changes: 7 additions & 0 deletions chapters/06-preference-data.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "The Nature of Preferences"
prev-url: "05-preferences.html"
next-chapter: "Reward Modeling"
next-url: "07-reward-models.html"
---

# [Incomplete] Preference Data

## Collecting Preference Data
Expand Down
7 changes: 7 additions & 0 deletions chapters/07-reward-models.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "Preference Data"
prev-url: "06-preference-data.html"
next-chapter: "Regularization"
next-url: "08-regularization.html"
---

# Reward Modeling

Reward models are core to the modern approach to RLHF.
Expand Down
7 changes: 7 additions & 0 deletions chapters/08-regularization.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "Reward Modeling"
prev-url: "07-reward-models.html"
next-chapter: "Instruction Tuning"
next-url: "09-instruction-tuning.html"
---

# Regularization

Throughout the RLHF optimization, many regularization steps are used to prevent over-optimization of the reward model.
Expand Down
7 changes: 7 additions & 0 deletions chapters/09-instruction-tuning.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
---
prev-chapter: "Regularization"
prev-url: "08-regularization.html"
next-chapter: "Rejection Sampling"
next-url: "10-rejection-sampling.html"
---

# Instruction Tuning
7 changes: 7 additions & 0 deletions chapters/10-rejection-sampling.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "Instruction Tuning"
prev-url: "09-instruction-tuning.html"
next-chapter: "Policy Gradients"
next-url: "11-policy-gradients.html"
---

# Rejection Sampling

Rejection Sampling (RS) is a popular and simple baseline for performing preference fine-tuning.
Expand Down
8 changes: 8 additions & 0 deletions chapters/11-policy-gradients.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
---
prev-chapter: "Rejection Sampling"
prev-url: "10-rejection-sampling.html"
next-chapter: "Direct Alignment Algorithms"
next-url: "12-direct-alignment.html"
---

# [Incomplete] Policy Gradient Algorithms


Expand Down Expand Up @@ -25,6 +32,7 @@ $$\nabla_\theta J(\pi_\theta) = \mathbb{E}_\tau \left[ \sum_{t=0}^T \nabla_\thet

Reinforce is a specific implementation of vanilla policy gradient that uses a Monte Carlo estimator of the gradient.
[@ahmadian2024back]

### Proximal Policy Optimization

## Computing Policy Gradients with a Language Model
Expand Down
7 changes: 7 additions & 0 deletions chapters/12-direct-alignment.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
---
prev-chapter: "Policy Gradients"
prev-url: "11-policy-gradients.html"
next-chapter: "Constitutional AI"
next-url: "13-cai.html"
---

# [Incomplete] Direct Alignment Algorithms
7 changes: 7 additions & 0 deletions chapters/13-cai.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
---
prev-chapter: "Direct Alignment"
prev-url: "12-direct-alignment.html"
next-chapter: "Reasoning Models"
next-url: "14-reasoning.html"
---

# [Incomplete] Constitutional AI and AI Feedback
9 changes: 8 additions & 1 deletion chapters/14-reasoning.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
# [Incomplete] Constitutional AI
---
prev-chapter: ""
prev-url: ""
next-chapter: ""
next-url: ""
---

# [Incomplete] Reasoning Training and Models
7 changes: 7 additions & 0 deletions chapters/15-synthetic.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
---
prev-chapter: ""
prev-url: ""
next-chapter: ""
next-url: ""
---

# [Incomplete] Synthetic Data
7 changes: 7 additions & 0 deletions chapters/16-evaluation.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
---
prev-chapter: ""
prev-url: ""
next-chapter: ""
next-url: ""
---

# [Incomplete] Evaluation
7 changes: 7 additions & 0 deletions chapters/17-over-optimization.md
Original file line number Diff line number Diff line change
@@ -1 +1,8 @@
---
prev-chapter: ""
prev-url: ""
next-chapter: ""
next-url: ""
---

# [Incomplete] Over Optimization
19 changes: 18 additions & 1 deletion templates/chapter.html
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
$endfor$
$if(title)$
<header id="title-block-header">
<h1 class="title"><a href="www.rlhfbook.com" style="color: inherit; text-decoration: none;">$title$</a></h1>
<h1 class="title"><a href="https://rlhfbook.com/" style="color: inherit; text-decoration: none;">$title$</a></h1>
$if(subtitle)$
<p class="subtitle">$subtitle$</p>
$endif$
Expand All @@ -76,6 +76,23 @@ <h2 id="$idprefix$toc-title">$toc-title$</h2>
<div id="content">
$body$
</div>

<div id="chapter-navigation" style="display: flex; justify-content: space-between; padding: 2em 0;">
$if(prev-url)$
<a href="$prev-url$" class="prev-chapter">
← Previous: $prev-chapter$
</a>
$else$
<div></div>
$endif$

$if(next-url)$
<a href="$next-url$" class="next-chapter">
Next: $next-chapter$ →
</a>
$endif$
</div>

$for(include-after)$
$include-after$
$endfor$
Expand Down
2 changes: 1 addition & 1 deletion templates/html.html
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ <h2>Abstract</h2>
<body>
<section id="acknowledgements" style="padding: 20px; text-align: center;">
<h2>Acknowledgements</h2>
<p>I would like to thank the following people who helped me with this project: Costa Huang, </p>
<p>I would like to thank the following people who helped me with this project: Costa Huang, (and of course Claude)</p>
<p>Additionally, thank you to the <a href="https://github.com/natolambert/rlhf-book/graphs/contributors">contributors on GitHub</a> who helped improve this project.</p>
</section>
<footer style="padding: 20px; text-align: center;">
Expand Down

0 comments on commit 969d4e4

Please sign in to comment.