Cleaned up formatting so that it displays well on the hosted website.…

… This mostly involved adding carriage returns, changing indenting, and fixing headers.
ChiSym · Nov 22, 2024 · cd21072 · cd21072
1 parent d51e572
commit cd21072
Show file tree

Hide file tree

Showing 2 changed files with 68 additions and 53 deletions.
diff --git a/README.md b/README.md
@@ -26,47 +26,57 @@ This library supports an automated build using [GNU Make](https://www.gnu.org/so
 
 ### Steps
 
-1. Clone this repository:
-   ```bash
-   git clone [email protected]:probcomp/genparse.git
-   cd genparse
-   ```
+#### 1. Clone this repository:
+
+```bash
+git clone [email protected]:probcomp/genparse.git
+cd genparse
+```
 
-2. Create and activate a virtual environment. Using Conda (recommended):
-   ```bash
-   conda create -n genparse python=3.10
-   conda activate genparse
-   ```
-   Using Python's `venv` module:
-   ```bash
-   python -m venv genparse
-   source genparse/bin/activate  # On Windows, use `genparse\Scripts\activate`
-   ```
+#### 2. Create and activate a virtual environment. Using Conda (recommended):
 
-3. Install package in editable mode with pre-commit hooks
-   ```bash
-   make env 
-   ```
-   GenParse optionally depends on Rust for faster parsing. If you do not have Rust installed, you will prompted to do so. However, if you do not want to install Rust, you can also install the library without the Rust dependency via:
-   ```bash
-   make env-no-rust
-   ```
+```bash
+conda create -n genparse python=3.10
+conda activate genparse
+```
 
-4. You can test your installation by running the following example:
-   ```python
-   >>> from genparse import InferenceSetup
-   >>> grammar = 'start: "Sequential Monte Carlo is " ( "good" | "bad" )'
-   >>> m = InferenceSetup('gpt2', grammar, proposal_name='character')
-   >>> m(' ', n_particles=15)
-   {
-     'Sequential Monte Carlo is good▪': 0.7770842914205952,
-     'Sequential Monte Carlo is bad▪': 0.22291570857940482,
-   }
-   ```
+Using Python's `venv` module:
+
+```bash
+python -m venv genparse
+source genparse/bin/activate  
+```
+> **💡Tip**: On Windows, use `genparse\Scripts\activate`
+
+#### 3. Install package in editable mode with pre-commit hooks
+
+```bash
+make env 
+```
+
+GenParse optionally depends on Rust for faster parsing. If you do not have Rust installed, you will prompted to do so. However, if you do not want to install Rust, you can also install the library without the Rust dependency via:
+
+```bash
+make env-no-rust
+```
+
+#### 4. You can test your installation by running the following example:
+
+```python
+>>> from genparse import InferenceSetup
+>>> grammar = 'start: "Sequential Monte Carlo is " ( "good" | "bad" )'
+>>> m = InferenceSetup('gpt2', grammar, proposal_name='character')
+>>> m(' ', n_particles=15)
+{
+   'Sequential Monte Carlo is good▪': 0.7770842914205952,
+   'Sequential Monte Carlo is bad▪': 0.22291570857940482,
+}
+```
+
+Or simply by running:
 
-   Or simply by running:
-   ```bash
-   python genparse_tiny_example.py
+```bash
+python genparse_tiny_example.py
    ```
 
 
@@ -106,10 +116,10 @@ This project is licensed under the MIT License - see the LICENSE file for detail
 
 ## Acknowledgments
 
-- Thanks to VLLM, Lark, Hugging Face and all of the teams we have dependencies on.
+Thanks to VLLM, Lark, Hugging Face and all of the teams we have dependencies on.
 ## Licensing
 
-This project makes use of several open-source dependencies, which are distributed under the following licenses:
+This project makes use of several open-source dependencies. We have done our best to represent this correctly, but please do your own due diligence:
 
 - **Arsenal**: GNU General Public License v3.0
 - **FrozenDict**: MIT License

diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -36,9 +36,10 @@ setup = InferenceSetup('gpt2', grammar)
 - **model_name** (str): Name of the language model to use. See the README for the list of models currently supported by GenParse.
 - **grammar** (str): The grammar specification in Lark format.
 
-`InferenceSetup` accepts the following optional arguments:
-- **proposal_name** (str): The type of proposal to use. Options include 'character' and 'token'. Default is 'character'.
 
+It accepts the following optional arguments:
+
+- **proposal_name** (str): The type of proposal to use. Options include 'character' and 'token'. Default is 'character'.
 - **num_processes** (int): The number of processes to use for parallel proposals. This can help speed up the inference process by utilizing multiple CPU cores. Default: min(mp.cpu_count(), 2)
 - **use_rust_parser** (bool): Whether to use the Rust implementation of the Earley parser for faster inference. If False, the Python implementation is used. Default to True.
 - **use_vllm** (bool or None): Whether to use VLLM for LLM next token probability computations. If None, VLLM is used when possible (i.e., if the vllm library is available and CUDA is enabled). Default is None.
@@ -48,6 +49,7 @@ setup = InferenceSetup('gpt2', grammar)
 - **llm_opts** (dict or None): Additional options for the language model, such as temperature or top-p settings for sampling.
 - **vllm_engine_opts** (dict or None): Additional options for the VLLM engine, such as data type (dtype). These options are ignored if VLLM is not used.
 
+
 > **💡Tip:** To try different grammars without having to instantiate new `InferenceSetup` objects each time, use the `update_grammar` method; `setup.update_grammar(new_grammar)` will replace the existing grammar in `setup` with `new_grammar`.
 
 
@@ -65,7 +67,7 @@ When calling `InferenceSetup`, the following arguments are required:
 - **prompt** (str): The input prompt to generate samples from. This is the starting text for the language model.
 - **n_particles** (int): The number of particles (samples) to generate.
 
-We also highlight the following optional arguments:
+There are the following optional arguments:
 
 - **method** (str): The sampling method to use. Options include 'smc' for Sequential Monte Carlo and 'is' for importance sampling. Default to 'smc'.
 * **max_tokens** (int): The maximum number of tokens to generate. Defaults to 500.
@@ -101,19 +103,22 @@ GenParse additionally provides methods to visualize inference runs. To display t
 
 1. Specify `return_record=True` when calling `InferenceSetup`:
 
-   ```python
-   result = setup(' ', n_particles=10, return_record=True)
-   ```
+```python
+result = setup(' ', n_particles=10, return_record=True)
+```
+
 2. Save the SMC record in `notes/smc_viz/`:
 
-   ```python
-   import json
-   with open('notes/smc_viz/record.json', 'w') as f:
-       f.write(json.dumps(result.record))
-   ```
+```python
+import json
+with open('notes/smc_viz/record.json', 'w') as f:
+      f.write(json.dumps(result.record))
+```
+
 3. Run a server in `notes/smc_viz/`:
 
-   ```bash
-   python -m http.server --directory notes/smc_viz 8000
-   ```
+```bash
+python -m http.server --directory notes/smc_viz 8000
+```
+
 4. Navigate to [localhost:8000/](http://localhost:8000/).