Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decoder for open vocabulary keyword spotting #505

Merged
merged 32 commits into from
Jan 20, 2024
Merged

Conversation

pkufool
Copy link
Contributor

@pkufool pkufool commented Dec 27, 2023

No description provided.

@pkufool pkufool marked this pull request as draft December 27, 2023 02:39
@pkufool
Copy link
Contributor Author

pkufool commented Jan 2, 2024

I am busy these days, so this PR won't be merged in several days, this is the progress, in case someone wants to try it.

The C++ binary (sherpa-onnx-keyword-spotter) is working now, you have to build the project yourself, then you can find the binary in /build/bin.

I uploaded one Chinese model to https://www.modelscope.cn/models/pkufool/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01/summary you can try it as follows:

Clone the model:

git lfs install
git clone https://www.modelscope.cn/pkufool/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.git
ln -s sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01 exp-ppinyin

Prepare the keywords, the file looks like(ppy_keywords.txt):

w én s ēn t è k ǎ s uǒ  @文森特卡索
zh ōu w àng j ūn @周望军
zh ū l ì n án @朱丽楠
j iǎng y ǒu b ó @蒋友伯
n ǚ ér @女儿
f ǎ g uó @法国
j iàn m iàn h uì @见面会
l uò sh í @落实

For the pinyin part, you can use script/text2token.py to generate them.

python scripts/text2token.py \
--text keywords_raw.txt \
--tokens exp-ppinyin/tokens.txt \
--tokens-type ppinyin \
--output ppy_keywords.txt

keywords_raw.txt is

文森特卡索
周望军
朱丽楠
蒋友伯
女儿
法国
见面会
落实

Note, for now, you have to fill the @xxx by hand (in ppy_keywords.txt), the output of text2token.py does not contain them, will improve it later.

Run the keyword spotter:

./build/bin/sherpa-onnx-keyword-spotter \
  --tokens=exp-ppinyin/tokens.txt \
  --encoder=exp-ppinyin/encoder-epoch-12-avg-2-chunk-16-left-64.onnx \
  --decoder=exp-ppinyin/decoder-epoch-12-avg-2-chunk-16-left-64.onnx \
  --joiner=exp-ppinyin/joiner-epoch-12-avg-2-chunk-16-left-64.onnx \
  --keywords-file=ppy_keywords.txt \
  --max-active-paths=4 \
  --keywords-score=1.0 \
  --keywords-threshold=0.25 \
  --num-threads=8 \
  exp-ppinyin/test_wavs/3.wav exp-ppinyin/test_wavs/4.wav exp-ppinyin/test_wavs/5.wav exp-ppinyin/test_wavs/6.wav

The outputs are:

/star-kw/kangwei/code/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:361 ./build/bin/sherpa-onnx-keyword-spotter --tokens=exp-ppinyin/tokens.txt --encoder=exp-ppinyin/encoder-epoch-12-avg-2-chunk-16-left-64.onnx --decoder=exp-ppinyin/decoder-epoch-12-avg-2-chunk-16-left-64.onnx --joiner=exp-ppinyin/joiner-epoch-12-avg-2-chunk-16-left-64.onnx --keywords-file=ppy_keywords.txt --max-active-paths=4 --keywords-score=1.0 --keywords-threshold=0.25 --num-threads=8 exp-ppinyin/test_wavs/3.wav exp-ppinyin/test_wavs/4.wav exp-ppinyin/test_wavs/5.wav exp-ppinyin/test_wavs/6.wav

KeywordSpotterConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=OnlineModelConfig(transducer=OnlineTransducerModelConfig(encoder="exp-ppinyin/encoder-epoch-12-avg-2-chunk-16-left-64.onnx", decoder="exp-ppinyin/decoder-epoch-12-avg-2-chunk-16-left-64.onnx", joiner="exp-ppinyin/joiner-epoch-12-avg-2-chunk-16-left-64.onnx"), paraformer=OnlineParaformerModelConfig(encoder="", decoder=""), wenet_ctc=OnlineWenetCtcModelConfig(model="", chunk_size=16, num_left_chunks=4), zipformer2_ctc=OnlineZipformer2CtcModelConfig(model=""), tokens="exp-ppinyin/tokens.txt", num_threads=8, debug=False, provider="cpu", model_type=""), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.2, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=True, max_active_paths=4, num_trailing_blanks=1, keywords_score=1, keywords_threshold=0.25, keywords_file="ppy_keywords.txt",
2024-01-02 09:39:07.708196448 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3556038, index: 15, mask: {16, 52, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2024-01-02 09:39:07.709450780 [E:onnxruntime:, env.cc:254 ThreadMain] pthread_setaffinity_np failed for thread: 3556039, index: 16, mask: {17, 53, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
exp-ppinyin/test_wavs/4.wav
{"start_time":0.00, "keyword": "蒋友伯", "timestamps": [0.64, 0.68, 0.84, 0.96, 1.12, 1.16], "tokens":["j", "iǎng", "y", "ǒu", "b", "ó"]}

exp-ppinyin/test_wavs/5.wav
{"start_time":0.00, "keyword": "周望军", "timestamps": [0.64, 0.68, 0.76, 0.84, 1.00, 1.08], "tokens":["zh", "ōu", "w", "àng", "j", "ūn"]}

exp-ppinyin/test_wavs/6.wav
{"start_time":0.00, "keyword": "朱丽楠", "timestamps": [0.64, 0.68, 0.76, 0.80, 1.00, 1.04], "tokens":["zh", "ū", "l", "ì", "n", "án"]}

exp-ppinyin/test_wavs/3.wav
{"start_time":0.00, "keyword": "文森特卡索", "timestamps": [0.32, 0.72, 0.96, 1.00, 1.28, 1.36, 1.52, 1.60, 1.92, 1.96], "tokens":["w", "én", "s", "ēn", "t", "è", "k", "
ǎ", "s", "uǒ"]}

exp-ppinyin/test_wavs/5.wav
{"start_time":0.00, "keyword": "落实", "timestamps": [1.80, 1.92, 2.12, 2.20], "tokens":["l", "uò", "sh", "í"]}

exp-ppinyin/test_wavs/6.wav
{"start_time":0.00, "keyword": "见面会", "timestamps": [2.16, 2.24, 2.28, 2.36, 2.48, 2.52], "tokens":["j", "iàn", "m", "iàn", "h", "uì"]}

exp-ppinyin/test_wavs/4.wav
{"start_time":0.00, "keyword": "女儿", "timestamps": [3.08, 3.20, 3.24], "tokens":["n", "ǚ", "ér"]}

exp-ppinyin/test_wavs/3.wav
{"start_time":0.00, "keyword": "法国", "timestamps": [4.56, 4.64, 4.80, 4.88], "tokens":["f", "ǎ", "g", "uó"]}

@pkufool pkufool marked this pull request as ready for review January 11, 2024 03:13
@pkufool pkufool changed the title [WIP] decoder for open vocabulary keyword spotting decoder for open vocabulary keyword spotting Jan 11, 2024
@pkufool pkufool requested a review from csukuangfj January 13, 2024 03:46
@pkufool
Copy link
Contributor Author

pkufool commented Jan 16, 2024

@csukuangfj Could have a look again?

@pkufool pkufool merged commit b6c0209 into k2-fsa:master Jan 20, 2024
175 of 181 checks passed
XiaYucca pushed a commit to XiaYucca/sherpa-onnx that referenced this pull request Jan 9, 2025
* various fixes to ContextGraph to support open vocabulary keywords decoder

* Add keyword spotter runtime

* Add binary

* First version works

* Minor fixes

* update text2token

* default values

* Add jni for kws

* add kws android project

* Minor fixes

* Remove unused interface

* Minor fixes

* Add workflow

* handle extra info in texts

* Minor fixes

* Add more comments

* Fix ci

* fix cpp style

* Add input box in android demo so that users can specify their keywords

* Fix cpp style

* Fix comments

* Minor fixes

* Minor fixes

* minor fixes

* Minor fixes

* Minor fixes

* Add CI

* Fix code style

* cpplint

* Fix comments

* Fix error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants