Skip to content

Commit

Permalink
Merge pull request #43 from pjlab-sys4nlp/data_mix
Browse files Browse the repository at this point in the history
PUBLISH: filename refactors and readme preparation
  • Loading branch information
DaizeDong authored Dec 24, 2023
2 parents 8245759 + 96f04b8 commit 251405e
Show file tree
Hide file tree
Showing 84 changed files with 219 additions and 50,200 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -173,3 +173,4 @@ smoe/utils/gpu_diag.py
/logs/
/logs-cpt/
/tensorboard/
models/
36 changes: 29 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<span style="color:red">📢 <strong><i>A SMALLER AFFORDABLE MoE MODEL FOR EVERYONE!!</i></strong></span>
<div>
<a href="https://huggingface.co/llama-moe" target="_blank">🤗 Model Weights</a> | <a href="#" target="_blank">📃 Technical Report</a> | <a href="#quick-start">🚀 Quick Start</a><br />
<a href="docs/Installation.md">⚙️ Installation Guide</a> | <a href="#expert-construction">🚧 Expert Construction</a> | <a href="#continual-pretraining">🚅 Continual Pre-training</a> | <a href="#evaluation">💎 Evaluation</a>
<a href="#installation">⚙️ Installation Guide</a> | <a href="#expert-construction">🚧 Expert Construction</a> | <a href="#continual-pretraining">🚅 Continual Pre-training</a> | <a href="#evaluation">💎 Evaluation</a>
</div>
</div>

Expand All @@ -19,7 +19,7 @@ We build LLaMA-MoE with the following two steps:

<h2 id="features">🔥 Features</h2>

1. **Lightweight Models**: The total number of model parameters is only 6.7B, which is friendly for deployment and research usage.
1. **Lightweight Models**: The number of activated model parameters is only 3.0~3.5B, which is friendly for deployment and research usage.
2. **Multiple Expert Construction Methods**:
1. Neuron-Independent: Random, Clustering, Co-activation Graph, Gradient ([Zhang et al., 2022](http://arxiv.org/abs/2110.01786), [Zuo et al., 2022](http://arxiv.org/abs/2204.07675))
2. Neuron-Sharing: Inner, Inter (residual)
Expand All @@ -42,6 +42,8 @@ We build LLaMA-MoE with the following two steps:
<h2 id="quick-start">🚀 QuickStart</h2>

```python
# python>=3.10

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

Expand All @@ -60,6 +62,26 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
# Suzhou is famous of its beautiful gardens. The most famous one is the Humble Administrator's Garden. It is a classical Chinese garden with a history of more than 600 years. The garden is divided into three
```

<h2 id="installation">⚙️ Installation</h2>

1. Prepare conda environment: `conda create -n smoe python=3.11` (If your environment name is not `smoe`, you may need to change environment in launching scripts)
2. Add correct environment variables in `~/.bashrc` (`gcc` is set to newer version for installing `flash-attn`). e.g.:
```bash
export PATH=/mnt/petrelfs/share/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/mnt/petrelfs/share/cuda-11.8/lib64:$LD_LIBRARY_PATH
export PATH=/mnt/petrelfs/share/gcc-10.1.0/bin:$PATH
export LD_LIBRARY_PATH=/mnt/petrelfs/share/gcc-10.1.0/lib64:$LD_LIBRARY_PATH
```
3. Take the variables into effect: `source ~/.bashrc`
4. Install PyTorch (CUDA-11.8): `pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118`
5. Install dependencies: `pip install -r requirements.txt`
6. Install `flash-attn`: `pip install flash-attn==2.0.1 --no-build-isolation`. You may need to follow the [flash-attn installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) to avoid some errors.
7. Install the latest Git: `conda install git`
8. Clone the repo: `git clone [email protected]:pjlab-sys4nlp/llama-moe.git` (If you don't setup the ssh key to GitHub, you may not able to clone through ssh. Check the [docs](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) about it.)
9. Change current directory: `cd llama-moe`
10. Install `smoe` in [editable mode](https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-e): `pip install -e .[dev]`
11. Setup `pre-commit` hooks: `pre-commit install`
<h2 id="performance">📊 Model Performance</h2>
| Model | \#Activated Experts | \#Experts | \#Activated Params | Links |
Expand All @@ -83,13 +105,13 @@ print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))
<h2 id="expert-construction">🚧 Expert Construction</h2>
- Neuron-Independent
- Independent<sub>Random</sub>: `bash ./scripts/moefication/split/run_split_random.sh`
- Independent<sub>Clustering</sub>: `bash ./scripts/moefication/split/run_split_clustering.sh`
- Independent<sub>Random</sub>: `bash ./scripts/expert_construction/split/run_split_random.sh`
- Independent<sub>Clustering</sub>: `bash ./scripts/expert_construction/split/run_split_clustering.sh`
- Neuron-Sharing
- Sharing<sub>Inner</sub>: `bash ./scripts/moefication/split/run_split_gradient.sh`
- Sharing<sub>Inter</sub>: `bash ./scripts/moefication/split/run_split_gradient_residual.sh`
- Sharing<sub>Inner</sub>: `bash ./scripts/expert_construction/split/run_split_gradient.sh`
- Sharing<sub>Inter</sub>: `bash ./scripts/expert_construction/split/run_split_gradient_residual.sh`
For more information, please refer to [Expert Construction docs](docs/moefication/README.md).
For more information, please refer to [Expert Construction docs](docs/expert_construction/README.md).
<h2 id="continual-pretraining">🚅 Continual Pre-training</h2>
Expand Down
4 changes: 2 additions & 2 deletions docs/Installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
5. Install dependencies: `pip install -r requirements.txt`
6. Install `flash-attn`: `pip install flash-attn==2.0.1 --no-build-isolation`. You may need to follow the [flash-attn installation instructions](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#installation-and-features) to avoid some errors.
7. Install the latest Git: `conda install git`
8. Clone the repo: `git clone [email protected]:pjlab-sys4nlp/train-moe.git` (If you don't setup the ssh key to GitHub, you may not able to clone through ssh. Check the [docs](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) about it.)
9. Change current directory: `cd train-moe`
8. Clone the repo: `git clone [email protected]:pjlab-sys4nlp/llama-moe.git` (If you don't setup the ssh key to GitHub, you may not able to clone through ssh. Check the [docs](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account) about it.)
9. Change current directory: `cd llama-moe`
10. Install `smoe` in [editable mode](https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-e): `pip install -e .[dev]`
11. Setup `pre-commit` hooks: `pre-commit install`
2 changes: 1 addition & 1 deletion docs/continual_pretraining/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ model_type="llama"
pretrained_model=/mnt/petrelfs/share_data/quxiaoye/models/llama_7B
```

For LLaMA with MoEfication, use the following settings:
For LLaMA-MoE, use the following settings:
```bash
model_type="llama_moe"
pretrained_model=/mnt/petrelfs/share_data/quxiaoye/models/llama_7B_MoE_16Select4-l2_norm
Expand Down
49 changes: 22 additions & 27 deletions docs/moefication/README.md → docs/expert_construction/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# MoEfication of LLaMA Model
# Expert Construction of LLaMA-MoE

This documentation provides the procedures to convert a LLaMA model to LLaMA-MoE.

Expand All @@ -20,15 +20,15 @@ The conversion from LLaMA to LLaMA-MoE consists of two steps:
To randomly split the intermediate neurons in FFNs, you can run:

```shell
bash ./scripts/moefication/split/run_split_random.sh
bash ./scripts/expert_construction/split/run_split_random.sh
```

Remember to change the following variables:

```shell
num_experts="" # number of experts in each MoE layer

model_path="" # path to the LLaMA checkpoint
model_path="" # path to the LLaMA checkpoint
save_path="" # path to save the indices sets
```

Expand All @@ -39,15 +39,15 @@ save_path="" # path to save the indices sets
To split the intermediate neurons in FFNs by k-means clustering, you can run:

```shell
bash ./scripts/moefication/split/run_split_clustering.sh
bash ./scripts/expert_construction/split/run_split_clustering.sh
```

Remember to change the following variables:

```shell
num_experts="" # number of experts in each MoE layer

model_path="" # path to the LLaMA checkpoint
model_path="" # path to the LLaMA checkpoint
save_path="" # path to save the indices sets

metric="" # metric for clustering, choices: `l2` `cos`
Expand All @@ -65,15 +65,15 @@ We also implenmented the co-activation graph based method in [MoEfication](https
You need to install [METIS](http://glaros.dtc.umn.edu/gkhome/metis/metis/download) first. Then you can run to following script to perform splitting:

```shell
bash ./scripts/moefication/split/run_split_graph.sh
bash ./scripts/expert_construction/split/run_split_graph.sh
```

Remember to change the following variables:

```shell
num_experts="" # number of experts in each MoE layer

model_path="" # path to the LLaMA checkpoint
model_path="" # path to the LLaMA checkpoint
save_path="" # path to save the indices sets

metric="" # metric to measure the sparsity, choices: `l1_norm` `l2_norm` `plain`
Expand All @@ -82,7 +82,7 @@ proj_type="" # weights to perform clustering, choices: `up_proj` `gate_proj`



#### Gradient Split
#### Gradient Split

Before performing gradient-based splitting (Eq. 8 in the technical report), you need to prepare a bunch of pretraining data and group them into different clusters by running:

Expand All @@ -93,15 +93,15 @@ python smoe/entrypoint/text_clustering.py
Then, you need to run the following script to get the importance vector $v$ for the intermediate neurons in each layer:

```shell
bash scripts/moefication/split/run_split_gradient_get_grads.sh
bash scripts/expert_construction/split/run_split_gradient_get_grads.sh
```

Remember to change the following variables:

```shell
dataset_dir="" # path to clustered data
pretrained_model="" # path to the LLaMA checkpoint
tokenizer_path="" # path to the LLaMA tokenizer
tokenizer_path="" # path to the LLaMA tokenizer
save_path="" # path to save the indices sets

accumulate_level="" # should be set to `sample`
Expand All @@ -111,14 +111,14 @@ importance_type="" # should be set to `feature_change`



##### Neuron Independent
##### Neuron Independent

> This part is not included in our technical report.
You can also split the intermediate neurons in a neuron-independent manner by treating the expert split as a task assignment problem. To perform the split, you can run:

```shell
bash ./scripts/moefication/split/run_split_gradient.sh
bash ./scripts/expert_construction/split/run_split_gradient.sh
```

Remember to change the following variables:
Expand All @@ -128,7 +128,7 @@ expert_num="" # number of experts in each MoE layer
expert_size="" # intermediate neurons in each expert
share_neurons="False" ######### SET AS FLASE TO BE NEURON-INDEPENDENT #########

model_path="" # path to the LLaMA checkpoint
model_path="" # path to the LLaMA checkpoint
score_file_path="" # path to the score files generated above
save_path="" # path to save the indices sets
visualization_path="" # path to save the visualization results
Expand All @@ -144,7 +144,7 @@ proj_type="" # weights to perform clustering, choices: `up_proj` `gate_proj`
Here we use the same entrance as the **Neuron Independent** strategy above for gradient split.

```shell
bash ./scripts/moefication/split/run_split_gradient.sh
bash ./scripts/expert_construction/split/run_split_gradient.sh
```

Remember to change the following variables:
Expand All @@ -154,7 +154,7 @@ expert_num="" # number of experts in each MoE layer
expert_size="" # intermediate neurons in each expert
share_neurons="True" ######### SET AS TRUE TO BE INNER-SHARING #########

model_path="" # path to the LLaMA checkpoint
model_path="" # path to the LLaMA checkpoint
score_file_path="" # path to the score files generated above
save_path="" # path to save the indices sets
visualization_path="" # path to save the visualization results
Expand All @@ -170,7 +170,7 @@ proj_type="" # weights to perform clustering, choices: `up_proj` `gate_proj`
You can run the following script to perform inter-sharing split:

```shell
bash ./scripts/moefication/split/run_split_gradient_residual.sh
bash ./scripts/expert_construction/split/run_split_gradient_residual.sh
```

Remember to change the following variables:
Expand All @@ -181,7 +181,7 @@ expert_num_residual="" # number of residual experts
expert_size="" # intermediate neurons in each expert
share_neurons="" # Whether to share neurons in non-residual experts

model_path="" # path to the LLaMA checkpoint
model_path="" # path to the LLaMA checkpoint
score_file_path="" # path to the score files generated above
save_path="" # path to save the indices sets
visualization_path="" # path to save the visualization results
Expand All @@ -199,7 +199,7 @@ proj_type="" # weights to perform clustering, choices: `up_proj` `gate_proj`
Run the following script:

```shell
bash ./scripts/moefication/convert/run_convert.sh
bash ./scripts/expert_construction/convert/run_convert.sh
```


Expand All @@ -209,7 +209,7 @@ bash ./scripts/moefication/convert/run_convert.sh
Run the following script:

```shell
bash ./scripts/moefication/convert/run_convert_gradient.sh
bash ./scripts/expert_construction/convert/run_convert_gradient.sh
```


Expand All @@ -219,7 +219,7 @@ bash ./scripts/moefication/convert/run_convert_gradient.sh
Run the following script:

```shell
bash ./scripts/moefication/convert/run_convert_gradient_residual.sh
bash ./scripts/expert_construction/convert/run_convert_gradient_residual.sh
```


Expand All @@ -229,18 +229,13 @@ bash ./scripts/moefication/convert/run_convert_gradient_residual.sh
```
--smoe
-- scripts
-- moefication
-- expert_construction
-- convert
-- get_hidden_features (deprecated)
-- prune (deprecated)
-- select (deprecated)
-- split
-- smoe
-- entrypoint
-- moefication
-- expert_construction
```





15 changes: 9 additions & 6 deletions example.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
import torch
from transformers import AutoTokenizer
# python>=3.10

from smoe.models.llama_moe import LlamaMoEForCausalLM
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_dir = "/mnt/petrelfs/share_data/quxiaoye/runs/llama2_random_split_112gpus_16_2/outputs/cpt-llama2_random_split_112gpus_16_2_scale_factor_8-2342244/checkpoint-13600/"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = LlamaMoEForCausalLM.from_pretrained(model_dir, torch_dtype=torch.bfloat16)
model_dir = "llama-moe/LLaMA-MoE-v1-3_5B-2_8"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_dir, torch_dtype=torch.bfloat16, trust_remote_code=True
)
model.eval()
model.to("cuda:0")

input_text = "Suzhou is famous of"
Expand Down
1 change: 0 additions & 1 deletion models/124M/encoder.json

This file was deleted.

Loading

0 comments on commit 251405e

Please sign in to comment.