Skip to content

Commit

Permalink
chore(language): optimize Dockerfile
Browse files Browse the repository at this point in the history
  • Loading branch information
bouassaba committed Jun 14, 2024
1 parent 369013a commit 4b5bab2
Show file tree
Hide file tree
Showing 10 changed files with 780 additions and 1,380 deletions.
2 changes: 1 addition & 1 deletion idp/package.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "voltaserve-idp",
"version": "2.0.0",
"license": "MIT",
"license": "AGPL-3.0-only",
"private": true,
"scripts": {
"start": "ts-node -r tsconfig-paths/register src/app.ts",
Expand Down
4 changes: 3 additions & 1 deletion language/.dockerignore
Original file line number Diff line number Diff line change
@@ -1 +1,3 @@
__pycache__
__pycache__
.pdm-python
.venv
3 changes: 2 additions & 1 deletion language/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
/__pycache__
/.idea
/.venv
/.pdm-python
20 changes: 10 additions & 10 deletions language/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
FROM python:3.11-slim-bookworm
FROM python:3.12-alpine

WORKDIR /app

COPY . .

RUN apt-get update
RUN apk update
RUN apk add --no-cache build-base

RUN apt-get install -y curl build-essential python3-poetry rust-all
RUN pip3 install pipx
ENV PATH="/root/.local/bin:$PATH"

RUN poetry install --no-root
RUN pipx install pdm --python $(which python)

RUN pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
RUN pdm install --prod --no-editable
RUN .venv/bin/python3 -m ensurepip
RUN .venv/bin/python3 -m spacy download xx_ent_wiki_sm

ENV PIP_DEFAULT_TIMEOUT=900

RUN poetry run spacy download xx_ent_wiki_sm

ENTRYPOINT ["poetry", "run", "flask", "run", "--host=0.0.0.0", "--port=8084"]
ENTRYPOINT ["pdm", "run", "flask", "run", "--host=0.0.0.0", "--port=8084"]

EXPOSE 8084
36 changes: 13 additions & 23 deletions language/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,47 +2,37 @@

## Getting Started

Install [Poetry](https://python-poetry.org).
Install [PDM](https://pdm-project.org).

Install Python 3.12.

Install dependencies:

```shell
poetry install
pdm install
```

Spawn a shell within the project's virtual environment:
Activate the virtual environment created by PDM:

```shell
poetry shell
source .venv/bin/activate
```

Install spaCy model:
Make sure PIP is available:

```shell:
poetry run spacy download xx_ent_wiki_sm
poetry run spacy download zh_core_web_trf
poetry run spacy download de_core_news_lg
poetry run spacy download en_core_web_trf
poetry run spacy download fr_core_news_lg
poetry run spacy download it_core_news_lg
poetry run spacy download ja_core_news_trf
poetry run spacy download nl_core_news_lg
poetry run spacy download pt_core_news_lg
poetry run spacy download ru_core_news_lg
poetry run spacy download es_core_news_lg
poetry run spacy download sv_core_news_lg
```shell
python3 -m ensurepip
```

On Apple Silicon or Intel Macs with supported AMD GPUs, do the following to install a hardware accelerated version of PyTorch:
Install spaCy model:

https://developer.apple.com/metal/pytorch
```shell:
```shell
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
spacy download xx_ent_wiki_sm
```

Run for development:

```shell
poetry run flask run --host=0.0.0.0 --port=8084 --debug
flask run --host=0.0.0.0 --port=8084 --debug
```
15 changes: 5 additions & 10 deletions language/app.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,6 @@
from flask import Flask, request, jsonify
import string
import spacy
import torch

if torch.backends.mps.is_available():
mps_device = torch.device("mps")
x = torch.ones(1, device=mps_device)
print(f"🔥 MPS device is available: {x}")
else:
print ("MPS device not found.")

app = Flask(__name__)
nlp = None
Expand Down Expand Up @@ -75,9 +67,12 @@ def ner_entities():
result[key]["frequency"] += 1
else:
result[key] = {"text": entity["text"], "frequency": 1}

# Convert the dictionary back to a list of entities with the "frequency" field
result = [{"text": value["text"], "frequency": value["frequency"]} for value in result.values()]
result = [
{"text": value["text"], "frequency": value["frequency"]}
for value in result.values()
]

# Sort by descending order of frequency
result.sort(key=lambda x: x["frequency"], reverse=True)
Expand Down
Loading

0 comments on commit 4b5bab2

Please sign in to comment.