Skip to content

Commit

Permalink
Add RTD config and automatic API doc generation.
Browse files Browse the repository at this point in the history
Ported from #646.

PiperOrigin-RevId: 704739557
  • Loading branch information
iindyk authored and copybara-github committed Dec 10, 2024
1 parent 34ecff1 commit 1ed92a1
Show file tree
Hide file tree
Showing 18 changed files with 725 additions and 95 deletions.
19 changes: 19 additions & 0 deletions .github/workflows/preview_docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Add a link to preview the documentation on Read the Docs for every pull request.
name: "RTD preview"

on:
pull_request_target:
types:
- opened

permissions:
pull-requests: write

jobs:
documentation-links:
runs-on: ubuntu-latest
steps:
- uses: readthedocs/actions/preview@v1
with:
project-slug: "readthedocs-preview"
single-version: true
59 changes: 59 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Sphinx documentation
docs/_build/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Environments
.venv
15 changes: 15 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-ast
- id: check-merge-conflict
- id: end-of-file-fixer
- id: trailing-whitespace

- repo: https://github.com/mwouts/jupytext
rev: v1.15.2
hooks:
- id: jupytext
files: docs/tutorials/
args: [--sync]
15 changes: 15 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 2

build:
os: ubuntu-lts-latest
tools:
python: "3.12"

sphinx:
configuration: docs/conf.py
# Note this is set to false for now while the warnings are resolved
fail_on_warning: false

python:
install:
- requirements: docs/requirements.txt
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,3 @@ open source, fast and deterministic.

Check out [`tutorials/`](./tutorials) for more information on how to use Grain!


106 changes: 106 additions & 0 deletions docs/CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Contributing to Grain



## Contributing to the Grain project documentation

### Pre-requisites

To contribute to the documentation, you will need to set your development environment.

You can create a virtual environment or conda environment and install the packages in
`docs/requirements.txt`.

```bash
# Create a virtual environment
python -m venv .venv
# Activate the virtual environment
source .venv/bin/activate
# Install the requirements
pip install -r docs/requirements.txt
```

or with conda

```bash
# Create a conda environment
conda create -n "grain-docs" python=3.12
# Activate the conda environment
conda activate grain-docs
# Install the requirements
python -m pip install -r docs/requirements.txt
```

### Building the documentation locally

To build the documentation locally, you can run the following command:

```bash
# Change to the docs/ directory
cd docs
sphinx-build -b html . _build/html
```

You can then open the generated HTML files in your browser by opening
`docs/_build/html/index.html`.

## Documentation via Jupyter notebooks

The `pygrain` documentation includes Jupyter notebooks that are rendered
directly into the website via the [myst-nb](https://myst-nb.readthedocs.io/) extension.
To ease review and diff of notebooks, we keep markdown versions of the content
synced via [jupytext](https://jupytext.readthedocs.io/).

Note you will need to install `jupytext` to sync the notebooks with markdown files:

```bash
# With pip
python -m pip install jupytext

# With conda
conda install -c conda-forge jupytext
```

### Adding a new notebook

We aim to have one notebook per topic or tutorial covered.
To add a new notebook to the repository, first move the notebook into the appropriate
location in the `docs` directory:

```bash
mv ~/new-tutorial.ipynb docs/tutorials/new_tutorial.ipynb
```

Next, we use `jupytext` to mark the notebook for syncing with Markdown:

```bash
jupytext --set-formats ipynb,md:myst docs/tutorials/new_tutorial.ipynb
```

Finally, we can sync the notebook and markdown source:

```bash
jupytext --sync docs/tutorials/new_tutorial.ipynb
```

To ensure that the new notebook is rendered as part of the site, be sure to add
references to a `toctree` declaration somewhere in the source tree, for example
in `docs/index.md`. You will also need to add references in `docs/conf.py`
to specify whether the notebook should be executed, and to specify which file
sphinx should use when generating the site.

### Editing an existing notebook

When editing the text of an existing notebook, it is recommended to edit the
markdown file only, and then automatically sync using `jupytext` via the
`pre-commit` framework, which we use to check in GitHub CI that notebooks are
properly synced.
For example, say you have edited `docs/tutorials/new_tutorial.md`, then
you can do the following:

```bash
pip install pre-commit
git add docs/tutorials/new_tutorial.* # stage the new changes
pre-commit run # run pre-commit checks on added files
git add docs/tutorials/new_tutorial.* # stage the files updated by pre-commit
git commit -m "Update new tutorial" # commit to the branch
13 changes: 4 additions & 9 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,6 @@



https://github.com/google/grain/tree/main/docs



PyGrain is the pure Python backend for Grain, primarily targeted at JAX users.
PyGrain is designed to be:

Expand All @@ -26,19 +22,19 @@ of dependencies when possible. For example, it should not depend on TensorFlow.

## High Level Idea

The PyGrain backend differs from traditional tf.data pipelines. Instead of
The PyGrain backend differs from traditional `tf.data` pipelines. Instead of
starting from filenames that need to be shuffled and interleaved to shuffle the
data, PyGrain pipeline starts by sampling indices.

Indices are globally unique, monotonically increasing values used to track
progress of the pipeline (for checkpointing). These indices are then mapped into
record keys in the range [0, len(dataset)]. Doing so enables *global
record keys in the range `[0, len(dataset)]`. Doing so enables *global
transformations* to be performed (e.g. global shuffling, mixing, repeating for
multiple epochs, sharding across multiple machines) before reading any records.
*Local transformations* that map/filter (aka preprocessing) a single example or
combine multiple consecutive records happen after reading.

![Difference between typical tf.data pipeline and a PyGrain pipeline](grain_pipeline.svg)
![Difference between typical tf.data pipeline and a PyGrain pipeline](./images/grain_pipeline.svg)

Steps in the pipeline:

Expand All @@ -55,7 +51,7 @@ Steps in the pipeline:

## Training Loop

*PyGrain* has no opinion on how you write your training loop. Instead PyGrain
*PyGrain* has no opinion on how you write your training loop. Instead, PyGrain
will return an iterator that implements:

* `next(ds_iter)` returns the element as NumPy arrays.
Expand Down Expand Up @@ -99,4 +95,3 @@ order defined by the user. The first of these transformations needs to be able
to process the raw records as read by the data source. The second transformation
needs to be able to process the elements produced by the first transformation
and so on.

Loading

0 comments on commit 1ed92a1

Please sign in to comment.