diff --git a/README.md b/README.md index 708492a..05f2e86 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ ![](visuals/octopus.jpg) -This repository provides all materials for the paper [Generative Representational Instruction Tuning](https://arxiv.org/abs/2202.08904). We continue developing the repository and welcome any contributions. If you want to use the code in the exact same way as in the paper, please use the 1.0.0 release at commit hash `TODO`. +This repository provides all materials for the paper [Generative Representational Instruction Tuning](https://arxiv.org/abs/2402.09906). We continue developing the repository and welcome any contributions. If you want to use the code in the exact same way as in the paper, please use the 1.0.0 release at commit hash `TODO`. - [Inference](#inference) - [Training](#training) @@ -119,6 +119,7 @@ Shortcuts: - emb/gen/gritlm = embedding, generative, unified - bf16c = embeddings are cast back to bf16 after pooling and similarity computation is also done in bf16 (simulating how cached embeddings would operate) - bb/cc/bbcc... = order of bidirectional vs causal attention +- gendups = not using `--use_unique_indices` during training. If not used and training is unified, then data is duplicated worsening performance The most important ones are: @@ -160,7 +161,9 @@ They are explained in more detail in the paper and its appendix. So to e.g. trai Setup: ```bash git clone https://github.com/ContextualAI/gritlm` +cd gritlm pip install -e . +cd gritlm ```` Below are easy examples for getting started: @@ -437,5 +440,12 @@ The code is inspired by: If useful please consider citing 😊 ```bibtex -TODO +@misc{muennighoff2024generative, + title={Generative Representational Instruction Tuning}, + author={Niklas Muennighoff and Hongjin Su and Liang Wang and Nan Yang and Furu Wei and Tao Yu and Amanpreet Singh and Douwe Kiela}, + year={2024}, + eprint={2402.09906}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} ``` \ No newline at end of file diff --git a/gritlm/__init__.py b/gritlm/__init__.py index 47bde56..c29eae3 100644 --- a/gritlm/__init__.py +++ b/gritlm/__init__.py @@ -1,3 +1,3 @@ -__version__ = "0.9.4" +__version__ = "1.0.0" from .gritlm import GritLM \ No newline at end of file diff --git a/scripts/training/train_embonly.sh b/scripts/training/train_embonly.sh index fe8d711..3784b2f 100644 --- a/scripts/training/train_embonly.sh +++ b/scripts/training/train_embonly.sh @@ -12,7 +12,7 @@ ###################### ### Set enviroment ### ###################### -cd /home/niklas/gritlm +cd /home/niklas/gritlm/gritlm source /env/bin/start-ctx-user conda activate gritlm export WANDB_PROJECT="gritlm" diff --git a/scripts/training/train_genonly.sh b/scripts/training/train_genonly.sh index 40c2a64..751ca41 100644 --- a/scripts/training/train_genonly.sh +++ b/scripts/training/train_genonly.sh @@ -12,7 +12,7 @@ ###################### ### Set enviroment ### ###################### -cd /home/niklas/gritlm +cd /home/niklas/gritlm/gritlm source /env/bin/start-ctx-user conda activate gritlm export WANDB_PROJECT="gritlm" diff --git a/scripts/training/train_gritlm_7b.sh b/scripts/training/train_gritlm_7b.sh index 59f42e9..ccc592c 100644 --- a/scripts/training/train_gritlm_7b.sh +++ b/scripts/training/train_gritlm_7b.sh @@ -12,10 +12,10 @@ ###################### ### Set enviroment ### ###################### -cd /home/niklas/gritlm +cd /home/niklas/gritlm/gritlm source /env/bin/start-ctx-user -conda activate gritlmt2 -NCCL_ASYNC_ERROR_HANDLING=1 +conda activate gritlm +#NCCL_ASYNC_ERROR_HANDLING=1 export WANDB_PROJECT="gritlm" # Training setup GPUS_PER_NODE=8 diff --git a/scripts/training/train_gritlm_8x7b.sh b/scripts/training/train_gritlm_8x7b.sh index ceb679c..b029971 100644 --- a/scripts/training/train_gritlm_8x7b.sh +++ b/scripts/training/train_gritlm_8x7b.sh @@ -12,11 +12,11 @@ ###################### ### Set enviroment ### ###################### -cd /home/niklas/gritlm +cd /home/niklas/gritlm/gritlm source /env/bin/start-ctx-user conda activate gritlm #NCCL_ASYNC_ERROR_HANDLING=1 -export TORCH_NCCL_ASYNC_ERROR_HANDLING=1 +#export TORCH_NCCL_ASYNC_ERROR_HANDLING=1 export WANDB_PROJECT="gritlm" # Training setup GPUS_PER_NODE=8 diff --git a/scripts/training/train_test.sh b/scripts/training/train_test.sh index b9bc7bd..1c3ba1f 100644 --- a/scripts/training/train_test.sh +++ b/scripts/training/train_test.sh @@ -12,10 +12,10 @@ ###################### ### Set enviroment ### ###################### -cd /home/niklas/gritlm +cd /home/niklas/gritlm/gritlm source /env/bin/start-ctx-user -conda activate gritlmt2 -NCCL_ASYNC_ERROR_HANDLING=1 +conda activate gritlm +#NCCL_ASYNC_ERROR_HANDLING=1 export WANDB_PROJECT="gritlm" # Training setup GPUS_PER_NODE=8 diff --git a/setup.py b/setup.py index f624f60..a320de8 100644 --- a/setup.py +++ b/setup.py @@ -43,7 +43,7 @@ setup( name='gritlm', - version='0.9.4', + version='1.0.0', description='GritLM', long_description=readme, long_description_content_type="text/markdown",