How can I implement a Wake Word for and emdedded system for the hard of hearing? #141
-
I am using Deepgram for voice to text conversion for my VOIP businees. Deepgram is a vast improvement over manu other engines that I have subscribed to. It is fast and accurate, But for this new project that I will implement under Debian Linux I need a really reliable wake word engine to trigger voice command transcription for my set of commands that a user will speak. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hey @marksymmes! Glad you're liking Deepgram - we always like to hear it :) You have a couple options here. You can use Deepgram's API to listen to wake words by hooking a websocket up to the streaming API. The websocket will transfer the incoming audio to Deepgram, then Deepgram will transcribe it, and your code can listen for the specific wake word you choose. This solution will get you up and running quickly, and I've included some code at the end of this comment that you can use right now. Alternatively, wake-word functionality can be implemented using an embedded hardware/software combination. This is typically done by using an ASIC chip. (The relevant google search is "asic embedded system") This is how products such as Amazon Alexa work because they process the incoming audio on the device. Unfortunately, these types of systems are much more difficult to set up and may require a non-trivial amount of dedicated resources to implement. So there are some possible tradeoffs. Given your situation of a new project/product, you may want to get a prototype off the ground quickly. Deepgram's API should get you going in a very short period of time, and you'll be able to prototype and continue development without the wake word being a bottleneck on the rest of the product. If you're able to limit the amount of time the audio is being processed -- which I hope is definitely true for the initial product offering -- then Deepgram's API solution will not only be fast to implement, but also cost effective. If you need an "always on, 24/7" kind of functionality, and the audio stream is constantly receiving input, then we may want to figure out a way to keep the Deepgram solution in the price range you need. We're happy to work with you on this as you spin up, so feel free to keep the conversation going. Please see below for the Python code to listen for a wake word using Deepgram's streaming API: from deepgram import Deepgram # pip install deepgram-sdk
import asyncio
import aiohttp
import os
# Initialize the Deepgram SDK.
DEEPGRAM_API_KEY = os.environ["DEEPGRAM_API_KEY"] # Your Deepgram API Key
deepgram = Deepgram(DEEPGRAM_API_KEY)
async def listen(audio_stream_url: str, wake_word: str):
# Define the callback function to handle Deepgram's streaming response.
def handler(data):
for alternative in data["channel"]["alternatives"]:
for word in alternative["words"]:
if word["word"].lower() == wake_word.lower():
# TODO: Put your wake-word callback here
print(f'{word["word"]} spoken at {word["start"]}.{word["end"]}')
# Create a websocket connection to Deepgram with the parameters you want.
try:
live = await deepgram.transcription.live(
{"punctuate": True, "interim_results": False, "language": "en-US"}
)
except Exception as e:
print(f"Could not open socket to Deepgram: {e}")
return
# Listen for the connection to close.
live.registerHandler(
live.event.CLOSE, lambda c: print(f"Connection closed with code {c}.")
)
# Listen for any transcripts received from Deepgram and write them to the console
live.registerHandler(live.event.TRANSCRIPT_RECEIVED, handler)
# Listen for the connection to open and send streaming audio from the URL to Deepgram
async with aiohttp.ClientSession() as session:
async with session.get(audio_stream_url) as audio:
print(f"Lisening on {audio_stream_url} for `{wake_word}`...")
while True:
data = await audio.content.readany()
live.send(data)
# If there's no data coming from the livestream then break out of the loop
if not data:
break
# Indicate that we've finished sending data by sending the customary zero-byte message to the Deepgram streaming endpoint, and wait until we get back the final summary metadata object
await live.finish()
def main():
# Specify the URL for the audio you would like to stream
url = "http://stream.live.vc.bbcmedia.co.uk/bbc_radio_fourlw_online_nonuk" # Outside the UK
# url = 'http://stream.live.vc.bbcmedia.co.uk/bbc_radio_fourfm' # Inside the UK
wake_word = (
"was" # `was` gets said often enough in the audio stream to see how this works.
)
asyncio.run(listen(audio_stream_url=url, wake_word=wake_word))
if __name__ == "__main__":
main() |
Beta Was this translation helpful? Give feedback.
Hey @marksymmes! Glad you're liking Deepgram - we always like to hear it :)
You have a couple options here.
You can use Deepgram's API to listen to wake words by hooking a websocket up to the streaming API. The websocket will transfer the incoming audio to Deepgram, then Deepgram will transcribe it, and your code can listen for the specific wake word you choose. This solution will get you up and running quickly, and I've included some code at the end of this comment that you can use right now.
Alternatively, wake-word functionality can be implemented using an embedded hardware/software combination. This is typically done by using an ASIC chip. (The relevant google search is "asic embedded…