Getting input from user's microphone and send the data to Deepgram #171

weilirs · 2023-05-26T21:49:07Z

weilirs
May 26, 2023

Hi, I am learning web development and I want to build an application that users can speak with their microphones the client side will send this audio data to the server side and the server side will use Deepgram API to transcribe. What's the best way to achieve this functionality?
Thanks!

Answered by jjmaldonis

Jun 5, 2023

Yeah I can try to help figure out what's going on. I've created a local React app and copy-pasted your code into it. It compiles and it looks good by eye, and I can run it as the frontend.

For the backend/server, can you please send me a .zip file with the code and some instructions on how to boot it up?

For example, it would be helpful to have the code that hosts the /transcribe API endpoint that is called in the React app via const response = await fetch('http://localhost:8000/api/transcribe', { .... I see the definition of that endpoint - and everything looks good by eye - but I can't run it without additional code.

View full answer

jjmaldonis · 2023-05-26T22:30:35Z

jjmaldonis
May 26, 2023
Maintainer

Hey @weilirs! Check out this blog post to get started: https://blog.deepgram.com/how-to-transcribe-only-what-you-need-with-python-listening-before-connected/

The code is written in Python and it hooks up to your computer's microphone, listens to the audio, streams the audio to deepgram, and gets the transcription back. This is the best way to achieve what you're looking for quickly. Let me know if you'd like more info or would like to go in another direction, happy to keep talking about what you're building.

6 replies

jjmaldonis May 30, 2023
Maintainer

I was able to get my mic to pick up the audio I was saying using the Python code in the blog post I linked, so I will need your full code to reproduce the issue you're seeing.

Are you using our Node SDK? https://github.com/deepgram/deepgram-node-sdk

Also, have you seen this set of example code? https://github.com/deepgram-devs/node-live-example It is Javascript code for a server and client and it should do something very similar to what you're looking for.

weilirs Jun 2, 2023
Author

Yes I am using your Node SDK and was able to transcribe a portion of the audio but not all of it.

My client-side code:

import { useEffect, useState } from 'react';
import './App.css';

function App() {
  const [isRecording, setIsRecording] = useState(false);
  const [mediaRecorder, setMediaRecorder] = useState(null);

  useEffect(() => {
    // Get audio stream from user's microphone
    navigator.mediaDevices.getUserMedia({ audio: true, video: false })
      .then(stream => {
        const recorder = new MediaRecorder(stream);
        setMediaRecorder(recorder);
      })
      .catch(() => {
        console.error("fail to get audio stream");
      });
  }, []);

  const startRecording = () => {
    if (mediaRecorder) {
      mediaRecorder.start(10000);
      setIsRecording(true);

      mediaRecorder.ondataavailable = async (event) => {
        if (event.data.size > 0) {
          // Send audio data to server
          console.log("sending audio data to server");
          const response = await fetch('http://localhost:8000/api/transcribe', {
            method: 'POST',
            body: event.data
          });

          if (!response.ok) {
            console.error(`Server response: ${response.status}`);
            return;
          }

          const transcription = await response.json();
          console.log(transcription);
        }
      };
    }
  };

  const stopRecording = () => {
    if (mediaRecorder && isRecording) {
      mediaRecorder.stop();
      setIsRecording(false);
    }
  };

  return (
    <>
      <div>
        <button onClick={startRecording} disabled={isRecording}>Start Recording</button>
        <button onClick={stopRecording} disabled={!isRecording}>Stop Recording</button>
      </div>
    </>
  );
}

export default App;

My server-side code:

const pkg = require('@deepgram/sdk');
const { Deepgram } = pkg;
require('dotenv').config();


const deepgramApiKey = process.env.DEEPGRAM_API_KEY
const deepgram = new Deepgram(deepgramApiKey);
const express = require('express');

const transcribeRoute = (router) => {
    router.post('/transcribe', express.raw({ type: 'audio/*' }),async (req) => {
        const audioData = req.body; // Get raw audio data from request body
        console.log("audio data received");

    const deepgramLive = deepgram.transcription.live({
        punctuate: true, 
        model: 'nova', 
        language: 'en-US'
    });

    deepgramLive.addListener('transcriptReceived', (transcription) => {
        // Send transcription result back to client
        console.log("transcription received");
        const data = JSON.parse(transcription);
        console.dir(data, { depth: null });
    });

    deepgramLive.addListener('close', () => {
        console.log('WebSocket connection to Deepgram closed');
    });

    deepgramLive.addListener('error', (error) => {
        console.error('An error occurred with the WebSocket connection:', error);
    });

    // Send audio data to Deepgram
    deepgramLive.addListener('open', () => {
    if (deepgramLive.getReadyState() === 1) {
        console.log("sending audio data");
        deepgramLive.send(audioData);
    }
});
    });
    return router;
};
module.exports = transcribeRoute;

Thank you for your help!

weilirs Jun 3, 2023
Author

I tried the example code you gave me and I think this is what I expected, but I still want to know why my implementation doesn't work as expected so it would be great if you can shed some light on it.

I really appreciate for all the help you've given.

jjmaldonis Jun 5, 2023
Maintainer

Yeah I can try to help figure out what's going on. I've created a local React app and copy-pasted your code into it. It compiles and it looks good by eye, and I can run it as the frontend.

For the backend/server, can you please send me a .zip file with the code and some instructions on how to boot it up?

For example, it would be helpful to have the code that hosts the /transcribe API endpoint that is called in the React app via const response = await fetch('http://localhost:8000/api/transcribe', { .... I see the definition of that endpoint - and everything looks good by eye - but I can't run it without additional code.

Answer selected by jpvajda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Getting input from user's microphone and send the data to Deepgram #171

{{title}}

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Deepgram

Getting input from user's microphone and send the data to Deepgram #171

weilirs May 26, 2023

Replies: 1 comment · 6 replies

jjmaldonis May 26, 2023 Maintainer

jjmaldonis May 30, 2023 Maintainer

weilirs Jun 2, 2023 Author

weilirs Jun 3, 2023 Author

jjmaldonis Jun 5, 2023 Maintainer

weilirs
May 26, 2023

Replies: 1 comment 6 replies

jjmaldonis
May 26, 2023
Maintainer

jjmaldonis May 30, 2023
Maintainer

weilirs Jun 2, 2023
Author

weilirs Jun 3, 2023
Author

jjmaldonis Jun 5, 2023
Maintainer