Facing accuracy issues while transcribing Hindi (indian regional language) #147

nagarajuprimefocus · 2023-05-11T17:46:39Z

nagarajuprimefocus
May 11, 2023

Currently accuracy for Hindi is not good. Any suggestion to move in right direction to achieve good accuracy ?

Jul 6, 2023

Hi @rahulbansal16, with both Deepgram's enhanced and base model tiers and with our managed Whisper offering, you can use detect_language=true, and the model will detect whether each file is in Hindi or English, and transcribe it in that language.

https://api.deepgram.com/v1/listen?tier=enhanced&detect_language=true
https://api.deepgram.com/v1/listen?model=whisper&detect_language=true

Unfortunately however, the current default for Hindi is in Devanagari script, so if Hindi is detected, it will transcribe in Devanagari (hi language code). The only way to get Latin alphabet transcription is to know the file is in Hindi, and specify language=hi-Latn.

We understand that this is not ideal, and …

View full answer

SandraRodgers · 2023-05-12T13:39:09Z

SandraRodgers
May 12, 2023
Maintainer

Hi @nagarajuprimefocus ,

I recommend you try changing the tier you are using. Hindi is available on the base or enhanced tier.

https://api.deepgram.com/v1/listen?tier=enhanced&language=hi

The other option would be to try Deepgram's Whisper Cloud offering:

https://api.deepgram.com/v1/listen?model=whisper&detect_language=true

or try:

https://api.deepgram.com/v1/listen?model=whisper&language=hi

0 replies

rahulbansal16 · 2023-07-03T05:25:25Z

rahulbansal16
Jul 3, 2023

Hi @SandraRodgers
Is it possible to setup deepgram for this workflow?

The input audio can be in either Hindi or English.
If the input is in English, output the English transcript,
but if the output is Hindi, output the Hindi in Latin transcript.

I don't want to manually pass the language for each audio file.

0 replies

jkroll-deepgram · 2023-07-06T21:37:47Z

jkroll-deepgram
Jul 6, 2023
Collaborator

Hi @rahulbansal16, with both Deepgram's enhanced and base model tiers and with our managed Whisper offering, you can use detect_language=true, and the model will detect whether each file is in Hindi or English, and transcribe it in that language.

https://api.deepgram.com/v1/listen?tier=enhanced&detect_language=true
https://api.deepgram.com/v1/listen?model=whisper&detect_language=true

Unfortunately however, the current default for Hindi is in Devanagari script, so if Hindi is detected, it will transcribe in Devanagari (hi language code). The only way to get Latin alphabet transcription is to know the file is in Hindi, and specify language=hi-Latn.

We understand that this is not ideal, and are exploring a future feature that will give customers more control over the languages and scripts that can be detected.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Facing accuracy issues while transcribing Hindi (indian regional language) #147

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Facing accuracy issues while transcribing Hindi (indian regional language) #147

nagarajuprimefocus May 11, 2023

Replies: 3 comments

SandraRodgers May 12, 2023 Maintainer

rahulbansal16 Jul 3, 2023

jkroll-deepgram Jul 6, 2023 Collaborator

nagarajuprimefocus
May 11, 2023

SandraRodgers
May 12, 2023
Maintainer

rahulbansal16
Jul 3, 2023

jkroll-deepgram
Jul 6, 2023
Collaborator