Diarization Inconsistencies with Nova-2 and Limited Language Support in Other Models #1033

rizz-official · 2024-12-24T10:07:58Z

rizz-official
Dec 24, 2024

The diarization feature in Deepgram has the following issues:

Inconsistent Diarization with Nova-2:

Speaker detection is unreliable when using the nova-2 model with diarization enabled for supported languages.
Sometimes speakers are detected, but the diarization accuracy is low.
Diarization Limitations in Other Models:

Models like nova-phonecall and nova-medical do not support diarization for languages other than English.
Attempting diarization with non-English languages on these models results in a 400 Bad Request error.

Answered by deepgram-community[bot]

Jan 9, 2025

Yes, we've heard this a lot and understand the limits of Diarization. Our know our product team has been discussing improving it in 2025, but I don't have an ETA yet of when that improvement might be released.

Here are some suggestions to try:

Prepend audio from the primary speaker: For short audio files (under 3 minutes), prepend a 30-second clip of the primary speaker's voice before the full audio. This gives the diarization model a reference point to more reliably identify that speaker throughout the rest of the audio. <https://deepgram.gitbook.io/help-center/faq/improving-diarization-by-prepending-audio-from-the-primary-speaker|Help Center>
Use multichannel audio: When possible, use …

View full answer

2024-12-24T10:08:00Z

deepgram-community[bot]
bot Dec 24, 2024

Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently.
_{Consider joining our Discord community for more opportunity to engage with your fellow Deepgram users. You can earn points which can be redeemed for cool stuff by being active in our communities!}

0 replies

2024-12-24T10:08:09Z

deepgram-community[bot]
bot Dec 24, 2024

Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion.

0 replies

2024-12-24T10:08:10Z

deepgram-community[bot]
bot Dec 24, 2024

It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?

The programming language you are working in (e.g. JavaScript, Python).
The deepgram product you are using (e.g Speech to Text, Agent API)
A request ID that triggered your error or issue.

0 replies

2025-01-09T00:23:15Z

deepgram-community[bot]
bot Jan 9, 2025

Yes, we've heard this a lot and understand the limits of Diarization. Our know our product team has been discussing improving it in 2025, but I don't have an ETA yet of when that improvement might be released.

Here are some suggestions to try:

Prepend audio from the primary speaker: For short audio files (under 3 minutes), prepend a 30-second clip of the primary speaker's voice before the full audio. This gives the diarization model a reference point to more reliably identify that speaker throughout the rest of the audio. <https://deepgram.gitbook.io/help-center/faq/improving-diarization-by-prepending-audio-from-the-primary-speaker|Help Center>
Use multichannel audio: When possible, use multichannel audio where each speaker is on their own channel. This can significantly improve diarization accuracy compared to mono audio with multiple speakers. <https://developers.deepgram.com/docs/diarization|Deepgram Docs>
Combine multichannel with diarization: For better performance, try using the multichannel feature together with diarization. This can help in cases where diarization alone might struggle. <https://github.com/orgs/deepgram/discussions/939|GitHub Discussion>
Use longer audio files: Diarization generally improves with longer audio files. If possible, provide longer audio samples to get better results.
Ensure good audio quality: Poor audio quality can significantly impact diarization performance. Avoid situations like holding a microphone up to a computer speaker, which can degrade the audio signal. <https://github.com/orgs/deepgram/discussions/283|GitHub Discussion>

This message was sent by John Vajda from Deepgram, via our community automation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Diarization Inconsistencies with Nova-2 and Limited Language Support in Other Models #1033

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Diarization Inconsistencies with Nova-2 and Limited Language Support in Other Models #1033

rizz-official Dec 24, 2024

Replies: 4 comments

deepgram-community[bot] bot Dec 24, 2024

deepgram-community[bot] bot Dec 24, 2024

deepgram-community[bot] bot Dec 24, 2024

deepgram-community[bot] bot Jan 9, 2025

rizz-official
Dec 24, 2024

deepgram-community[bot]
bot Dec 24, 2024

deepgram-community[bot]
bot Dec 24, 2024

deepgram-community[bot]
bot Dec 24, 2024

deepgram-community[bot]
bot Jan 9, 2025