-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instant availablity of speech input after opening #151
Comments
A possible solution would be to start listening as soon as the app starts, and then feed the input from the microphone to Vosk at maximum speed as soon as it is ready. |
Out of curiosity, does the responsiveness change based on enabling/disabling battery optimization? E.g. setting to unrestricted in Android settings? I'd just be curious if it being optimized for doze affects this or not |
Sounds like a good option not just as an workaround but even for the first time of the service. I didn't expect that increasing speed is possible / does not influence the vosk voice recognition
I tried it but it seems to make no difference. However I think the initialization of the vosk model has more to do in which thread / context it is running. E.g. I observed the following behaviour:
=> So I think there needs to be any main instance of a dicio service / process / background thread which is accessed for voice input by dicio main app, the app overlay and a system registered speech recognizer service. But I am not sure whether defining a service in the manifest means that it will be instanciated only once or always anew for each requesting app. (As it seems to happen with the overlay which is in a different context than dicio main app, although defined and provided by dicio) |
I don't actually mean increasing the speed of audio, but rather just feeding audio samples as fast as possible to the Vosk recognizer. The recognizer will still interpret each sample as if it was some small value milliseconds long. |
I have uploaded a first draft in my fork. It's to early for merging, but may you can test how it behaves on your phones? Regarding Instant availability: Switching between Dicio main app and dicio overlay is now working instantly for some time - until background service is shut down from system due to inactivity. ( @AyoungDukie unfortunatley battery optimzation seems not to influence this) The initialization however still needs its time. @Stypox I looked at the way vosk is started and what other options of vosk initialization may be possible, but I still don't really have a starting point how to implement your idea
with vosk. In order to do so: Does this mean that some kind of buffering would have to be implemented in the SttService? |
I built an APK if you want to test it: app-debug.zip. I will test it a bit tomorrow. |
Overall it looks promising, thanks for working on this. I can finally see the service appears in "Voice input", though I don't know how to test if it actually works in other apps, too. Feel free to open a PR from that branch, you will still be able to push commits without issues. I found these problems:
Also, maybe it would be a good idea to completely extract STT from Dicio and build a separate app Dicio can interact with through
I think so, yeah. But for now, don't worry about that. The background service is already more responsive than before. |
Ok, then I will do a PR so it will show up directly in the main repository here
Indeed actually this (a simple STT service) was actually the starting point what I was looking for, when I found dicio, in which the whole thing of downloading the stt service etc. was already implemented, However, now I think there are multiple points of views on this:
To sum it up, I think, too, after the STT is reliable seperated within dicio, completely separating would most flexible for all. But for dicio, the users would then need a good step-by-step guide to instal and activate the vosk/dicio-STT-service-standalone-app on their system. |
I don't think that developing a STT service alone is really needed. Mantain and develop both would be a big job for a developer. There is much work in progress offline STT and TTS engines, I have listed here. No one of them actually is useful but much are in active development. |
Hi,
after #109 for speech input recognition for other apps is implemented (@Stypox thanks for your positive feedback :D ), with time I realized that reloading the speech model each time anew is annoying (my hardware needs about 10s). This also applies for the main app, at least for the first opening (and after system shut it down, when it was too long in background), but for the "external" speech service is even more critical.
So in order to improve I thought about whether a continuosly running background service with the loaded database might be an option, which is then requested by dicio for speech processing. I assume this could be connected with #126 and #54 referenced from your roadmap #129. Maybe this is even the best way to implement this as a background service (because due to it's definition as speech recognition service it is hopefully not stopped so easily from the system)
But before I try to do so: Does anyone have hints / thoughts about this?
The text was updated successfully, but these errors were encountered: