ASR Enhancer

Overview

ASR Enhancer is a tool designed to improve the accuracy of automatic speech recognition (ASR) systems, particularly for voice-enabled assistants. The system leverages phoneme-based corrections and hill-climbing algorithms to optimize the output of ASR models, correcting misrecognized words and phrases to deliver higher-quality transcriptions.

Key Features

Phoneme-Based Corrections: Improves recognition accuracy by utilizing an inverse phoneme table to fix ASR errors.
Hill-Climbing Algorithm: Iteratively refines sentence outputs by exploring and selecting optimal corrections based on a defined cost function.
Bigram and Unigram Analysis: Enhances correction efficiency by identifying and addressing errors in common word pairings.
Flexible Algorithms: Various approaches, including greedy and hill-climbing methods, were explored and evaluated for efficiency and effectiveness.

Algorithm Overview

Core Steps

State Definition:
The current best-corrected sentence is considered the "state" at each iteration.
Neighbor Generation:
- For each character in the sentence, the algorithm identifies its presence in the inverse phoneme table.
- The phoneme table maps erroneous phonemes to their corrected forms.
- Replacements are made for single characters or bigrams, generating a list of potential corrections, each associated with a cost.
Best Neighbor Selection:
- Among the generated neighbors, the one with the lowest cost is selected as the next state.
- The process continues iteratively until no further improvement is possible.

Installation

Prerequisites

Python 3.x
Conda (recommended for environment management)

Steps to Set Up

Clone the repository:

git clone https://github.com/adityjhaa/asr-enhancer.git  
cd asr-enhancer

Install required dependencies using Conda:
```
conda env create -f environment.yml  
```
Activate the environment:
```
conda activate asr-enhancer  
```
Run the ASR Enhancer:
```
python asr_enhancer.py  
```

Algorithm Variants and Performance

1. Greedy Algorithm Without Word Correction

Description: Updates characters from left to right, replacing them with the lowest-cost neighbors.
Results:
- Average Loss: 2.1136
- Average Time per Sentence: 13 seconds

2. Hill Climbing Without Word Correction

Description: Examines all characters before making updates but does not consider bigrams.
Results:
- Average Loss: 2.0243
- Average Time per Sentence: 50 seconds

3. Greedy Algorithm with Word Updates Before Character Correction

Description: Adds missing words to the beginning and end of the sentence, then performs character corrections.
Results:
- Average Loss: 1.8454
- Average Time per Sentence: 25 seconds

4. Greedy Algorithm with Word Updates After Character Correction

Description: Performs word corrections after character corrections, avoiding unnecessary modifications.
Results:
- Average Loss: 1.8058
- Average Time per Sentence: 25 seconds

5. Hill Climbing with Word Updates After Character Correction

Description: Combines hill climbing with word corrections applied post character correction.
Results:
- Average Loss: 1.7099
- Average Time per Sentence: 55 seconds

6. Hill Climbing with Unigram and Bigram Corrections

Description: Integrates bigram checks to address errors in common word pairings (e.g., "SH").
Results:
- Average Loss: 1.5158
- Average Time per Sentence: 60 seconds

Analysis and Insights

Bigram Corrections: Incorporating bigrams significantly reduced the loss, highlighting the importance of contextual analysis in phoneme corrections.
Word Updates After Character Correction: This approach consistently outperformed others, demonstrating the effectiveness of correcting broader context only after addressing finer details.
Algorithm Choice: While hill climbing with Word and Bigram updates achieved the best results, it required more computational time compared to greedy algorithms.

Future Improvements

Dynamic Phoneme Correction: Enhance the inverse phoneme table with adaptive learning to handle rare or context-specific errors.
Deep Learning Integration: Incorporate neural networks to predict corrections based on semantic understanding.
Performance Optimization: Reduce time complexity by parallelizing bigram and unigram analyses.
Real-World Integration: Extend support to process real-time ASR outputs from popular systems like Google ASR or Alexa.

Acknowledgments

This project was developed under the guidance of the COL333: Artificial Intelligence faculty at IIT Delhi. It builds upon foundational ideas in ASR error correction, phonetics, and heuristic algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
driver.py		driver.py
environment.yml		environment.yml
solution.py		solution.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASR Enhancer

Overview

Key Features

Algorithm Overview

Core Steps

Installation

Prerequisites

Steps to Set Up

Algorithm Variants and Performance

1. Greedy Algorithm Without Word Correction

2. Hill Climbing Without Word Correction

3. Greedy Algorithm with Word Updates Before Character Correction

4. Greedy Algorithm with Word Updates After Character Correction

5. Hill Climbing with Word Updates After Character Correction

6. Hill Climbing with Unigram and Bigram Corrections

Analysis and Insights

Future Improvements

Acknowledgments

About

Contributors 2

Languages

License

adityjhaa/asr-enhancer

Folders and files

Latest commit

History

Repository files navigation

ASR Enhancer

Overview

Key Features

Algorithm Overview

Core Steps

Installation

Prerequisites

Steps to Set Up

Algorithm Variants and Performance

1. Greedy Algorithm Without Word Correction

2. Hill Climbing Without Word Correction

3. Greedy Algorithm with Word Updates Before Character Correction

4. Greedy Algorithm with Word Updates After Character Correction

5. Hill Climbing with Word Updates After Character Correction

6. Hill Climbing with Unigram and Bigram Corrections

Analysis and Insights

Future Improvements

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages