GitHub - grelade/vidcaption-ml: ML video captioning tool; flask/torchserve multi-gpu server

ML Video captioning tool

A tool for creating video captions in bulk. Uses the BLIP2 + OPT img2txt model as the captioning module.

Blogpost explaining the pipeline is here.

single-gpu server

Running the single-GPU captioning tool works on a Flask HTTP server.

cd flask
pip install -r requirements.txt
./start.sh

(loading BLIP2 into memory takes several seconds)

Two example clients are provided:

cli-dummydata.py (finds captions on a random noise data)
cli-putinmask.py (finds a putin mask in an example video sample_videos/putin_test.mp4 based on the provided captions)

multi-gpu server

Running the captioning tool on multiple GPUs works with torchserve.

cd torchserve
pip install -r requirements-torchserve.txt
./server-create.sh
./start.sh

(takes some time to load BLIP2 into GPU memory)

After the server is up and running, several example clients are provided:

cli-dummydata.py (finds captions on a random noise data)
cli-single.py (find captions, uses sync video loading and async single-frame requests)
cli-str-single.py (find captions, uses async stream-like video loading and async single-frame requests)
cli-str-prebatch.py (find captions, uses async stream-like video loading and async prebatched requests)
cli-putinmask.py (finds a putin mask in an example video sample_videos/putin_test.mp4 based on the provided captions)

client benchmark

client script	torchserve config file	capt. speed (frm/s)	video res	client params
cli-single.py	config-single.properties	34.6	480x270
cli-str-single.py	config-single.properties	64.5	480x270
cli-str-prebatch.py	config.properties	140.4	480x270	prebatch_size = 32
cli-single.py	config-single.properties	30.4	640x360
cli-str-single.py	config-single.properties	62.9	640x360
cli-str-prebatch.py	config.properties	125.1	640x360	prebatch_size = 32
cli-single.py	config-single.properties	12.7	1280x720
cli-str-single.py	config-single.properties	40.7	1280x720
cli-str-prebatch.py	config.properties	76.6	1280x720	prebatch_size = 16
cli-single.py	config-single.properties	6.4	1920x1080
cli-str-single.py	config-single.properties	29.1	1920x1080
cli-str-prebatch.py	config.properties	43.2	1920x1080

Non-standard client parameters in the last column are given. All tests were completed locally i.e. without network latency. A single rescaled video with 901 frames was captioned.

config.properties

To change the number of gpus we must set the number_of_gpu parameter and change the minWorkers, maxWorkers parameters accordingly in the model specification. To use only a subset of gpus available, set the CUDA_VISIBLE_DEVICES variable to a list in the start.sh script.

Technology used

huggingface transformers - BLIP2 model
flask - used by the single-gpu server
torchserve - used by the multi-gpu server
aiohttp, asyncio - concurrent runtime in bulk

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
flask		flask
sample_video		sample_video
torchserve		torchserve
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Video captioning tool

single-gpu server

multi-gpu server

client benchmark

config.properties

Technology used

About

Releases

Packages

Languages

License

grelade/vidcaption-ml

Folders and files

Latest commit

History

Repository files navigation

ML Video captioning tool

single-gpu server

multi-gpu server

client benchmark

config.properties

Technology used

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages