How can I implement a Wake Word for and emdedded system for the hard of hearing? #141

marksymmes · 2023-05-08T21:40:42Z

marksymmes
May 8, 2023

I am using Deepgram for voice to text conversion for my VOIP businees. Deepgram is a vast improvement over manu other engines that I have subscribed to. It is fast and accurate,

But for this new project that I will implement under Debian Linux I need a really reliable wake word engine to trigger voice command transcription for my set of commands that a user will speak.

Answered by jjmaldonis

May 17, 2023

Hey @marksymmes! Glad you're liking Deepgram - we always like to hear it :)

You have a couple options here.

You can use Deepgram's API to listen to wake words by hooking a websocket up to the streaming API. The websocket will transfer the incoming audio to Deepgram, then Deepgram will transcribe it, and your code can listen for the specific wake word you choose. This solution will get you up and running quickly, and I've included some code at the end of this comment that you can use right now.

Alternatively, wake-word functionality can be implemented using an embedded hardware/software combination. This is typically done by using an ASIC chip. (The relevant google search is "asic embedded…

View full answer

jjmaldonis · 2023-05-17T22:13:47Z

jjmaldonis
May 17, 2023
Maintainer

Hey @marksymmes! Glad you're liking Deepgram - we always like to hear it :)

You have a couple options here.

You can use Deepgram's API to listen to wake words by hooking a websocket up to the streaming API. The websocket will transfer the incoming audio to Deepgram, then Deepgram will transcribe it, and your code can listen for the specific wake word you choose. This solution will get you up and running quickly, and I've included some code at the end of this comment that you can use right now.

Alternatively, wake-word functionality can be implemented using an embedded hardware/software combination. This is typically done by using an ASIC chip. (The relevant google search is "asic embedded system") This is how products such as Amazon Alexa work because they process the incoming audio on the device. Unfortunately, these types of systems are much more difficult to set up and may require a non-trivial amount of dedicated resources to implement.

So there are some possible tradeoffs. Given your situation of a new project/product, you may want to get a prototype off the ground quickly. Deepgram's API should get you going in a very short period of time, and you'll be able to prototype and continue development without the wake word being a bottleneck on the rest of the product.

If you're able to limit the amount of time the audio is being processed -- which I hope is definitely true for the initial product offering -- then Deepgram's API solution will not only be fast to implement, but also cost effective. If you need an "always on, 24/7" kind of functionality, and the audio stream is constantly receiving input, then we may want to figure out a way to keep the Deepgram solution in the price range you need. We're happy to work with you on this as you spin up, so feel free to keep the conversation going.

Please see below for the Python code to listen for a wake word using Deepgram's streaming API:

from deepgram import Deepgram  # pip install deepgram-sdk
import asyncio
import aiohttp
import os


# Initialize the Deepgram SDK.
DEEPGRAM_API_KEY = os.environ["DEEPGRAM_API_KEY"]  # Your Deepgram API Key
deepgram = Deepgram(DEEPGRAM_API_KEY)


async def listen(audio_stream_url: str, wake_word: str):
    # Define the callback function to handle Deepgram's streaming response.
    def handler(data):
        for alternative in data["channel"]["alternatives"]:
            for word in alternative["words"]:
                if word["word"].lower() == wake_word.lower():
                    # TODO: Put your wake-word callback here
                    print(f'{word["word"]} spoken at {word["start"]}.{word["end"]}')

    # Create a websocket connection to Deepgram with the parameters you want.
    try:
        live = await deepgram.transcription.live(
            {"punctuate": True, "interim_results": False, "language": "en-US"}
        )
    except Exception as e:
        print(f"Could not open socket to Deepgram: {e}")
        return

    # Listen for the connection to close.
    live.registerHandler(
        live.event.CLOSE, lambda c: print(f"Connection closed with code {c}.")
    )

    # Listen for any transcripts received from Deepgram and write them to the console
    live.registerHandler(live.event.TRANSCRIPT_RECEIVED, handler)

    # Listen for the connection to open and send streaming audio from the URL to Deepgram
    async with aiohttp.ClientSession() as session:
        async with session.get(audio_stream_url) as audio:
            print(f"Lisening on {audio_stream_url} for `{wake_word}`...")
            while True:
                data = await audio.content.readany()
                live.send(data)

                # If there's no data coming from the livestream then break out of the loop
                if not data:
                    break

    # Indicate that we've finished sending data by sending the customary zero-byte message to the Deepgram streaming endpoint, and wait until we get back the final summary metadata object
    await live.finish()


def main():
    # Specify the URL for the audio you would like to stream
    url = "http://stream.live.vc.bbcmedia.co.uk/bbc_radio_fourlw_online_nonuk"  # Outside the UK
    # url = 'http://stream.live.vc.bbcmedia.co.uk/bbc_radio_fourfm'  # Inside the UK

    wake_word = (
        "was"  # `was` gets said often enough in the audio stream to see how this works.
    )

    asyncio.run(listen(audio_stream_url=url, wake_word=wake_word))


if __name__ == "__main__":
    main()

1 reply

marksymmes May 18, 2023
Author

Thanks for the in depth response. I think I get it.

It makes sense to me that listening 24/7 is expensive if it is not done locally.

I will follow your guidance from your response and maybe implement one physical button or a simple tiny IR remote to trigger a wake up.
Then use your engine to translate the speech commands that will be part of a menu tree similar to an IVR but driven by voice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

How can I implement a Wake Word for and emdedded system for the hard of hearing? #141

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Deepgram

How can I implement a Wake Word for and emdedded system for the hard of hearing? #141

marksymmes May 8, 2023

Replies: 1 comment · 1 reply

jjmaldonis May 17, 2023 Maintainer

marksymmes May 18, 2023 Author

marksymmes
May 8, 2023

Replies: 1 comment 1 reply

jjmaldonis
May 17, 2023
Maintainer

marksymmes May 18, 2023
Author