-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Transformers in the Wav2Vec2 Encoder for the ASR Inference #1520
Conversation
I observe failing for the recent pull requests and I think the following check caused it since this commit. Any suggestions? |
@vince62s, thanks for suggesting ONEAPI_VERSION. It fixed the issue indeed. I struggled with the test environment where I download and read the audio file. I tried several and ended up using the audio file already used the Whisper test. Now everything is in a good shape and good to go! |
LGTM but if @nguyendc-systran you can have a look, thanks. |
@vince62s It is good for me. |
…rence (OpenNMT#1520)" This reverts commit f92a8a2.
…ASR Inference (OpenNMT#1520)"" This reverts commit 7c60769.
Encodes the input features. | ||
|
||
Arguments: | ||
features: Mel spectogram of the audio, as a float array with shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this one take raw audio, not a mel spectrogram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. It should be not Mel spectrogram but raw audio. How can we fix it? making another PR for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a project maintaner/member/contributor, but I would guess so.
This PR allows Transformers in the Wav2Vec2 Encoder for the ASR inference. Details are similarly implemented by following the Whisper model parts. This work improved runtime GPU memory usage from 3060MB to 1897MB for the in-house computing environment and inference time 9.2 sec to 5.48 sec for in-house test data. I wish this PR would be accepted and maintained for the future use. Testing script is found python/tests/test_transformers.py