Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Bug Fix] trainer.update(1) should be used after loss.mean() is called #1000

Open
wants to merge 49 commits into
base: v0.x
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
de7b23d
clean slate for 1.x
szha Mar 18, 2020
01122db
[Numpy] Numpy version of GluonNLP (#1225)
sxjscience Jun 10, 2020
982a416
Fix bert cfg (#1245)
zheyuye Jun 11, 2020
789e2b9
fix download
sxjscience Jun 11, 2020
b714eac
[Numpy] Try to fix the CI (#1248)
sxjscience Jun 11, 2020
85b6f09
[Numpy] Add "match_tokens_with_char_spans" + Enable downloading from …
sxjscience Jun 16, 2020
ee1f0e3
[Numpy] Update QA Dataset and revise run_squad (#1250)
zheyuye Jun 18, 2020
e06ff01
Pin mxnet version range on CI (#1257)
leezu Jul 7, 2020
689eba9
[CI] AWS batch job tool for GluonNLP (Part I) (#1251)
szha Jul 7, 2020
cd48efd
Update codecov action to handle different OS and Python versions (#1254)
leezu Jul 8, 2020
83e1f13
Use Amazon S3 Transfer Acceleration (#1260)
leezu Jul 10, 2020
a646c34
[FEATURE] update backtranslation and add multinomial sampler (#1259)
hutao965 Jul 11, 2020
ea9152b
Fixes to make the CI more stable (#1265)
sxjscience Jul 16, 2020
70a1887
Update for Block API (#1261)
leezu Jul 17, 2020
9d83fe6
Fix parameter share regex (#1267)
leezu Jul 17, 2020
4743afc
Add fp16 support for Bert QA inference (#1264)
MoisesHer Jul 17, 2020
e78a24e
[CI] update batch to gluonnlp-dev (#1268)
szha Jul 18, 2020
3a0ed9f
[Numpy] Refactor Roberta (#1269)
zheyuye Jul 21, 2020
f407b8e
[CI] Batch cpu version (#1275)
szha Jul 22, 2020
57eb411
[Numpy] Fix conversion toolkits (#1274)
zheyuye Jul 23, 2020
74bd2ce
[Feature] Add FP16 inference support to NMT + Add BoundedBudgetSample…
hutao965 Jul 23, 2020
d76897b
Add embedding related methods in numpy version (#1263)
acphile Jul 28, 2020
4d43f82
add subversion/wget to docker, add readme (#1279)
szha Jul 28, 2020
3c87457
Add layout + compute_layout support: TransformerNMT, BERT, ALBERT, EL…
sxjscience Jul 29, 2020
033214e
[Numpy] Fix SQuAD + Fix GLUE downloading (#1280)
sxjscience Jul 29, 2020
2294421
[Numpy Refactor] BART (#1282)
zheyuye Jul 30, 2020
1f9ad44
Horovod support for pretraining and fune-tuning squad (#1276)
zheyuye Aug 1, 2020
7e1f9d0
[DOC] Add the basic documentation for the embedding API (#1281)
acphile Aug 4, 2020
20af58f
Fix gelu (#1287)
zheyuye Aug 5, 2020
ded0f99
fix prepare_openwebtext (#1289)
ZiyueHuang Aug 6, 2020
c33e62e
[FEATURE]Horovod support for training transformer + add mirror data f…
hutao965 Aug 7, 2020
9e268c0
Fix electra (#1291)
zheyuye Aug 8, 2020
32e87d4
[Numpy] Benchmark the backbone models + Some fixes + Always use pytho…
sxjscience Aug 14, 2020
6ae558e
[FEATURE]Horovod support for training transformer (PART 2) (#1301)
hutao965 Aug 20, 2020
d8b68c6
[Numpy] Fix AWS Batch + Add Docker Support (#1302)
sxjscience Aug 20, 2020
d17ec4c
minor fix for run_electra.py & remove hybridization in the constructi…
ZiyueHuang Aug 22, 2020
99b35d8
Add Intro for batch + upload squad traininng command (#1305)
zheyuye Aug 22, 2020
d93356f
[MODEL] make beam search a hybrid block (#1310)
szha Aug 23, 2020
210dd0c
[Numpy] [Fix] Update README.md (#1306)
sxjscience Aug 23, 2020
b324ee6
[CI] Add GPU pytest + Append AWS Batch job submission to current pipe…
barry-jin Aug 24, 2020
3b14d69
[CI] Update unittests-gpu (#1313)
barry-jin Aug 24, 2020
dca17ee
automatically generate date suffix for dev versions (#1314)
szha Aug 25, 2020
39ec921
fix typo (#1317)
liuzh47 Aug 26, 2020
970318d
fix typo (#1318)
liuzh47 Aug 26, 2020
bba8697
[CI] Update GPU Test Workflow + Update Some Tests and README (#1316)
barry-jin Aug 28, 2020
66e5e05
fix https://github.com/dmlc/gluon-nlp/issues/1315 (#1319)
ZiyueHuang Aug 28, 2020
ff95fb4
[CI] Fix Source Reference Issues (#1332)
barry-jin Sep 1, 2020
1bd85b6
[BUGFIX] fix valid candidates issue (#1323)
liuzh47 Sep 1, 2020
189bbdc
[MODEL] convert gpt2 model (#1328)
hutao965 Sep 1, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[Numpy] [Fix] Update README.md (#1306)
* Update README.md

Update README.md

Update ubuntu18.04-devel-gpu.Dockerfile

Update README.md

update

Update README.md

Update README.md

Update README.md

use python3 -m

Update benchmark_utils.py

Update benchmark_utils.py

Update ubuntu18.04-devel-gpu.Dockerfile

Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update ubuntu18.04-devel-gpu.Dockerfile

* update

* Update README.md

* Update README.md

* Update ubuntu18.04-devel-gpu.Dockerfile

* Update README.md
sxjscience authored Aug 23, 2020

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit 210dd0ca9be36fe82643d28a7e495e9647b09d5f
44 changes: 35 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,29 @@
# GluonNLP + Numpy
<h3 align="center">
GluonNLP: Your Choice of Deep Learning for NLP
</h3>

Implementing NLP algorithms using the new numpy-like interface of MXNet. It's also a testbed for the next-generation release of GluonNLP.

This is a work-in-progress.
<p align="center">
<a href="https://github.com/dmlc/gluon-nlp/actions"><img src="https://github.com/dmlc/gluon-nlp/workflows/continuous%20build/badge.svg"></a>
<a href="https://codecov.io/gh/dmlc/gluon-nlp"><img src="https://codecov.io/gh/dmlc/gluon-nlp/branch/master/graph/badge.svg"></a>
<a href="https://github.com/dmlc/gluonnlp/actions"><img src="https://img.shields.io/badge/python-3.6%2C3.8-blue.svg"></a>
<a href="https://pypi.org/project/gluonnlp/#history"><img src="https://img.shields.io/pypi/v/gluonnlp.svg"></a>
</p>

GluonNLP is a toolkit that enables easy text preprocessing, datasets
loading and neural models building to help you speed up your Natural
Language Processing (NLP) research.

# Features

- Data Pipeline for NLP
- AutoML support (TODO)
For NLP Practitioners
- Easy-to-use Data Pipeline
- Automatically Train Models via AutoNLP (TODO)

For Researchers
- Pretrained Model Zoo
- Programming with numpy-like API

For Engineers
- Fast Deployment
- [TVM](https://tvm.apache.org/) (TODO)
- AWS Integration
@@ -70,6 +84,18 @@ python3 -m gluonnlp.cli.preprocess help

```

### Frequently Asked Questions
- **Question**: I cannot you access the command line toolkits. By running `nlp_data`, it reports `nlp_data: command not found`.

This is sometimes because that you have installed glunonnlp to the user folder and
the executables are installed to `~/.local/bin`. You can try to change the `PATH` variable to
also include '~/.local/bin'.

```
export PATH=${PATH}:~/.local/bin
```


# Run Unittests
You may go to [tests](tests) to see all how to run the unittests.

@@ -78,8 +104,8 @@ You may go to [tests](tests) to see all how to run the unittests.
You can use Docker to launch a JupyterLab development environment with GluonNLP installed.

```
docker pull gluonai/gluon-nlp:v1.0.0
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 gluonai/gluon-nlp:v1.0.0
docker pull gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=4g gluonai/gluon-nlp:gpu-latest
```

For more details, you can refer to the guidance in [tools/docker].
For more details, you can refer to the guidance in [tools/docker](tools/docker).
2 changes: 1 addition & 1 deletion scripts/benchmarks/benchmark_utils.py
Original file line number Diff line number Diff line change
@@ -91,7 +91,7 @@ def is_mxnet_available():


logger = logging.getLogger(__name__) # pylint: disable=invalid-name
logging_config(logger=logger)
logging_config(folder='gluonnlp_benchmark', name='benchmark', logger=logger)


_is_memory_tracing_enabled = False
1 change: 0 additions & 1 deletion scripts/machine_translation/train_transformer.py
Original file line number Diff line number Diff line change
@@ -526,7 +526,6 @@ def train(args):

if __name__ == '__main__':
os.environ['MXNET_GPU_MEM_POOL_TYPE'] = 'Round'
os.environ['MXNET_USE_FUSION'] = '0' # Manually disable pointwise fusion
args = parse_args()
np.random.seed(args.seed)
mx.random.seed(args.seed)
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -39,6 +39,7 @@ def find_version(*file_paths):
'protobuf',
'pandas',
'tokenizers>=0.7.0',
'click>=7.0', # Dependency of youtokentome
'youtokentome>=1.0.6',
'fasttext>=0.9.2'
]
1 change: 0 additions & 1 deletion src/gluonnlp/data/tokenizers.py
Original file line number Diff line number Diff line change
@@ -30,7 +30,6 @@
from typing import List, Tuple, Union, NewType, Optional
from collections import OrderedDict

import jieba
import sacremoses

from .vocab import Vocab
4 changes: 2 additions & 2 deletions tests/README.md
Original file line number Diff line number Diff line change
@@ -3,13 +3,13 @@
To run the unittests, use the following command

```bash
pytest .
python3 -m pytest .
```

To test for certain file, e.g., the `test_models_transformer.py`, use the following command

```bash
pytest test_models_transformer
python3 -m pytest test_models_transformer
```

Refer to the [official guide of pytest](https://docs.pytest.org/en/latest/) for more details.
25 changes: 23 additions & 2 deletions tools/docker/README.md
Original file line number Diff line number Diff line change
@@ -9,14 +9,35 @@ You can run the docker with the following command.

```
docker pull gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=4g gluonai/gluon-nlp:gpu-latest
docker run --gpus all --rm -it -p 8888:8888 -p 8787:8787 -p 8786:8786 --shm-size=2g gluonai/gluon-nlp:gpu-latest
```

Here, we open the ports 8888, 8787, 8786, which are used for connecting to JupyterLab.
Also, we set `--shm-size` to `4g`. This sets the shared memory storage to 4GB. Since NCCL will
Also, we set `--shm-size` to `2g`. This sets the shared memory storage to 2GB. Since NCCL will
create shared memory segments, this argument is essential for the JupyterNotebook to work with NCCL.
(See also https://github.com/NVIDIA/nccl/issues/290).

The folder structure of the docker image will be
```
/workspace/
├── gluonnlp
├── horovod
├── mxnet
├── notebooks
├── data
```

If you have a multi-GPU instance, e.g., [g4dn.12xlarge](https://aws.amazon.com/ec2/instance-types/g4/),
[p2.8xlarge](https://aws.amazon.com/ec2/instance-types/p2/),
[p3.8xlarge](https://aws.amazon.com/ec2/instance-types/p3/), you can try to run the following
command to verify the installation of horovod + MXNet

```
docker run --gpus all --rm -it --shm-size=4g gluonai/gluon-nlp:gpu-latest \
horovodrun -np 2 python3 -m pytest /workspace/horovod/horovod/test/test_mxnet.py
```


## Build your own Docker Image
To build a docker image fom the dockerfile, you may use the following command:

21 changes: 20 additions & 1 deletion tools/docker/ubuntu18.04-devel-gpu.Dockerfile
Original file line number Diff line number Diff line change
@@ -74,7 +74,7 @@ RUN echo "hwloc_base_binding_policy = none" >> /usr/local/etc/openmpi-mca-params
ENV LD_LIBRARY_PATH=/usr/local/openmpi/lib:$LD_LIBRARY_PATH
ENV PATH=/usr/local/openmpi/bin/:/usr/local/bin:/root/.local/bin:$PATH

RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
RUN ln -s $(which python3) /usr/local/bin/python

RUN mkdir -p ${WORKDIR}

@@ -144,6 +144,25 @@ WORKDIR ${WORKDIR}
# Debug horovod by default
RUN echo NCCL_DEBUG=INFO >> /etc/nccl.conf

# Install NodeJS + Tensorboard + TensorboardX
RUN curl -sL https://deb.nodesource.com/setup_14.x | bash - \
&& apt-get install -y nodejs

RUN apt-get update \
&& apt-get install -y --no-install-recommends \
libsndfile1-dev

RUN pip3 install --no-cache --upgrade \
soundfile==0.10.2 \
ipywidgets==7.5.1 \
jupyter_tensorboard==0.2.0 \
widgetsnbextension==3.5.1 \
tensorboard==2.1.1 \
tensorboardX==2.1
RUN jupyter labextension install jupyterlab_tensorboard \
&& jupyter nbextension enable --py widgetsnbextension \
&& jupyter labextension install @jupyter-widgets/jupyterlab-manager

# Revise default shell to /bin/bash
RUN jupyter notebook --generate-config \
&& echo "c.NotebookApp.terminado_settings = { 'shell_command': ['/bin/bash'] }" >> /root/.jupyter/jupyter_notebook_config.py