This project was bootstrapped with an initial set of files and configurations using Create React App.
- Youtube Video Downloader
- Automatic Transcription (Using Video Model from Google)
- Personalized Voice Cloning and Lip Syncing
- Speaker Diarization
- PII Redaction (Using Text Davinci Model from OpenAI)
- Video Concatenation
- Modify Resolution
- Video Trimming
- Modify Resolution
- Video Cutting and Editing with ease with the help of Transcription.
- URL Shortening
- SRT file download
- Remove Filler Words
- Deploy the Backend API folder in google cloud or create a flask applciation Update respective api calls
- Set up firebase for register/login api and enable sign/in with email/password feature and Update the API keys. [Reference Link: https://console.firebase.google.com/project/aieditorv1/authentication]
- Frontend is a React Application.
- We have deployed Tortoise TTS and Wav2Lip model in Beam.cloud for seamless integration and scalability. [Reference link: https://docs.beam.cloud/getting-started/quickstart]
- Get API keys for openai account and replace them in extract_pii.py file to use GPT Turbo 3.5 model. [Reference Link: https://platform.openai.com/docs/introduction]
- To run this project simpliy clone the repo and make sure you have node installed
- In the terminal type "npm install react-scripts" followed by "npm start" to run the react application on your localhost.
The effort to create an AI-based audio and video editing tool aimed to provide an all-in-one collaborative editing solution that made editing easier for users of all skill levels. The project's components included a thorough review of the state-of-the-art, a close examination of the system architecture, the application of client and data-tier technologies, performance benchmarks, and considerations for deployment, operations, and maintenance. Modern technology was added into the created solution, allowing users to modify their recordings as quickly as editing a Google Doc, including real-time voice cloning, automatic transcription, and text-to-speech conversion.