Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support OpenAI Whisper API Integration for Enhanced Transcription Flexibility #139

Conversation

agentmarketbot
Copy link
Contributor

Pull Request: Support Additional Whisper Services (OpenAI Whisper API Integration)

Overview

This pull request addresses issue #137 from the GitHub repository for the grouplang secretary bot, focusing on enhancing the bot's transcription capabilities by integrating the OpenAI Whisper API as an additional voice transcription service. This integration aims to provide improved flexibility, performance, and cost-effectiveness for users depending on their needs.

Summary of Changes

The following key changes have been made:

  1. Integration of OpenAI Whisper API:

    • Added functionality to support the OpenAI Whisper API as a new transcription option.
    • Developed a dedicated OpenAITranscriber class to manage audio transcription using the OpenAI Whisper API.
  2. Abstract Base Class for Transcription Services:

    • Introduced an abstract class named BaseTranscriber that defines the common interface and shared functionality for various transcription services, including both AWS and OpenAI.
  3. Refactor of AWS Transcription Logic:

    • The existing logic for AWS transcription has been refactored into a structured AWSTranscriber class, enhancing code maintainability and clarity.
  4. Factory Pattern for Service Selection:

    • Implemented a TranscriberFactory which employs a factory design pattern to dynamically select and instantiate the appropriate transcriber based on the chosen service in the configuration settings.
  5. Configuration Updates:

    • Updated the configuration system to allow users to select between the AWS and OpenAI transcription services through the TRANSCRIPTION_SERVICE environment variable. Users can now easily switch between options ('aws' or 'openai') as needed.
  6. Bot Handler Modifications:

    • Made necessary updates to the bot handlers, ensuring they use the new factory pattern to facilitate seamless audio message transcription.
  7. Documentation Revamp:

    • Revised the README.md file to thoroughly document the new features, configuration settings, and API references, covering both the existing AWS service and the newly added OpenAI service.

Next Steps

  • Ensure that all environment configurations are correctly set prior to deployment.
  • Conduct thorough testing to verify the functionality and reliability of both transcription services.
  • Gather performance metrics and user feedback to inform potential future enhancements.

Please review the changes and provide feedback or approval to merge this integration into the main codebase. Thank you!

Add support for using OpenAI's Whisper API as an alternative to AWS 
Transcribe for voice message transcription. Changes include:

- Create abstract BaseTranscriber class for transcription service interface
- Implement OpenAITranscriber class for Whisper API integration
- Add TranscriberFactory for dynamic service selection
- Update configuration to support service selection via TRANSCRIPTION_SERVICE
- Update documentation with new configuration options and prerequisites
- Refactor bot handlers to use the new transcription architecture

These changes allow users to choose their preferred transcription service 
while maintaining the existing AWS Transcribe functionality.
@vadanrod14 vadanrod14 closed this Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants