Add 'run_batch' mode for GPU encoding and decoding with batch_size >= 1 #1534

veelion · 2022-11-03T09:12:11Z

This mode improves the throughput of websocket server.

Test result:

hardware-1:
Platinum 8358P CPU @ 2.60GHz 15 cores + 80G memory, A5000 * 1 + 24G memory
hardware-2:
Platinum 8369B CPU @ 2.90GHz 32 cores + 120GB memory, A100-SXM4-80GB * 1 + 80GB memory
data:
3000 wavs with different durations in range [0.6, 15] seconds.

hardware	websocket_server	concurrency	batch_size	RTF	CER
hardware-1	libtorch(CPU)	30	1	0.01666	8.90
hardware-1	libtorch(GPU)	10	1	0.00831	8.90
hardware-1	libtorch(GPU+batch)	20	8	0.00339	9.61
hardware-2	libtorch(CPU)	48	1	0.00753	8.90
hardware-2	libtorch(GPU)	48	1	0.00234	8.90
hardware-2	libtorch(GPU+batch)	48	8	0.00110	9.61

With same CPU, GPU is 2~3 times faster than CPU, run_batch is 2.x times faster than non run_batch mode, but the CER has a little bigger.

…eam)

…el.cc

…ame() deprecated in 1.13.1

WangGewu · 2022-11-17T11:55:52Z

libtorch-gpu代码中，没有显式的释放显存。在调用量增加的时候，是否会存在out of memory的问题？

veelion · 2022-11-21T01:00:10Z

runtime/core/decoder/batch_torch_asr_model.cc

+      r_hyps_pad_sos_eos, ctc_scores_tensor).toTuple()->elements();
+  auto rescores = outputs[1].toTensor().to(at::kCPU);
+#ifdef USE_GPU
+  c10::cuda::CUDACachingAllocator::emptyCache();


#1534 clear GPU memory cache here, so it could support much more concurrency.

veelion added 30 commits July 20, 2022 11:10

add concurrent performance testing for websocket_server_main (non-str…

33910f8

…eam)

Merge branch 'main' of https://github.com/wenet-e2e/wenet

90c1d75

Merge branch 'main' of https://github.com/wenet-e2e/wenet

4d52c27

Merge branch 'main' of https://github.com/wenet-e2e/wenet

4f573b5

Merge branch 'main' of https://github.com/wenet-e2e/wenet

fda37b7

add batch processing to decoder

3f199f5

add api_batch_main

455c870

add BatchRecognizer to api

a67bc93

add batch_model pointer to DecodeResource

44501ae

add batch processing source to decoder_srcs

7aabd7b

jit export forward_encoder_batch()

7a28701

add batch processing to Python binding

19114ee

Merge branch 'main' of https://github.com/wenet-e2e/wenet into batch

7fe373a

before change attention-scoring

0fd6c89

add multi-threads for computing fbank, ctc searching

44392e6

to call jit script which support batch_forward_attention_decoder()

e1e597e

add run_batch flag to support BatchTorchAsrModel

c25afd8

replace UpdateResult with decoder's get_batch_result()

63c42f4

add FLAGS_enable_timestamp

238fc8e

add FLAGS_run_batch for runing for batch decoding

3576cb4

fix: nbsdx/SimpleJSON#4

c3a17b1

support run_batch

7adcd74

add batch_connection_handler.h

a3045fc

jit export batch_forward_attention_decoder()

7bc634a

add to decoder_srcs with batch_torch_asr_model.cc, batch_onnx_asr_mod…

35e8d1a

…el.cc

remove log msg

d172dd3

add is_fp16_

d710a54

add is_fp16 to Read()

1329090

add BatchOnnxAsrModel on GPU

3aa76c7

add Yaml reader

816a13a

veelion added 25 commits October 21, 2022 17:16

only emptyCache() if USE_GPU

2ba32c3

supprot GPU

e2259db

add more pytorch version

457d9b0

save eos, sos to onnx_config for onnxruntime of C++

14ffae6

transformer decoder has no 'reverse_weight' in confi

1e9faaf

fix rescore_inputs

a73b792

release GPU memory

65eb608

add onnx_version 1.13.1

7d1700a

replace GetInputName() with GetInputNameAllocated(), becaue GetInputN…

14d6cd5

…ame() deprecated in 1.13.1

Merge branch 'main' of https://github.com/wenet-e2e/wenet

65a88f9

Merge branch 'wenet-e2e:main' into main

cf50ad0

merge

c64906a

add description of 'run_batch' mode

4ae1c65

Merge run_batch mode to main branch

a0fb171

fix batch_size

068e4a7

notes for a little bigger CER

aa1ac47

remove trailing whitespace

ba93bd9

fix flake8 error

bbc7c15

fix cpplint error

fb9e436

fix flake8 error

cd85c84

pytorch version back to 1.10.0

e8a0a24

change reference to pointer of non-const object

1e2af87

fix github action build error

acea6da

Merge branch 'main' into main

8a7ac0a

Merge branch 'main' of https://github.com/wenet-e2e/wenet

b61fae1

veelion commented Nov 21, 2022

View reviewed changes

veelion added 3 commits November 30, 2022 15:57

supported GPU-compute feature(fbank) by kaldifeat

8fda52a

add fbank_cuda.h

e0b4e42

Merge branch 'main' into vee-main

f3e2aee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'run_batch' mode for GPU encoding and decoding with batch_size >= 1 #1534

Add 'run_batch' mode for GPU encoding and decoding with batch_size >= 1 #1534

veelion commented Nov 3, 2022

WangGewu commented Nov 17, 2022

veelion Nov 21, 2022

Add 'run_batch' mode for GPU encoding and decoding with batch_size >= 1 #1534

Are you sure you want to change the base?

Add 'run_batch' mode for GPU encoding and decoding with batch_size >= 1 #1534

Conversation

veelion commented Nov 3, 2022

WangGewu commented Nov 17, 2022

veelion Nov 21, 2022

Choose a reason for hiding this comment