Civil STT Controller

Description

This repository contains the controller for a Speech to Text (STT) service. The controller is responsible for handling MP3 audio files, hand over to the STT service and processing the generated transcripts.

The controller utilizes a local SQLite database and also maintains a remote GraphQL connection to the CMS system of a radio's archive website. It updates the MP3's path in the CMS along with the corresponding transcript.

The filenames of the MP3 audio files contain important metadata, including the slug of the program and the release date of the episode.

Features

Monitors a designated folder for incoming MP3 files.
Moves incoming MP3 files to the "processing" folder (configured in the .env file under the variable PROCESSING_FOLDER).
Adds a record to the local SQLite database indicating that the file is being processed, with the status field set to "processing".
Resolves metadata from the filename and stores it in the database.
Adds a record to the remote hygraph CMS via a GraphQL mutation, updating the MP3's path and associated metadata.

How to Use

Clone the repository to your local machine.
Go up to the parent folder of the cloned reposity with cd .. and create a virtual env folder by mkdir venv then execute the virtual env creation with python -m venv venv/.
Activeate the virtual env by source venv/bin/activate command
Go back to the project folder by cd civic-stt-controller and install the required dependencies by running the following command:
```
pip install -r requirements.txt
```
Configure the necessary environment variables in the .env file:
- INPUT_FOLDER: Path to the folder where incoming MP3 files are placed.
- PROCESSING_FOLDER: Path to the folder where files are moved for processing.
- OUTPUT_FOLDER: Path to the folder where the STT-generated transcripts are stored.
Open a terminal window and start the virtual environment with the following command:

cd stt-py-env
source bin/activate

Start the input watcher by running the following command:
```
python inputwatch.py
```
or
```
python3 inputwatch.py
```
This will begin monitoring the designated input folder for incoming MP3 files. When a new file arrives, it will be moved to the processing folder, a processing record will be added to the local SQLite database, and the metadata will be updated in the remote hygraph CMS.
Start the output watcher by running the following command:
```
python outputwatch.py
```
or
```
python3 outputwatch.py
```
This will monitor the output folder for the STT-generated transcripts. When a corresponding TXT file is detected (with the same name as the original MP3 file), the content of the TXT file will be read. The corresponding record in the SQLite database will be updated with the transcript content, the status will be set to "done," and the transcript will be added to the hygraph CMS.

Requirements

Python (version 3.x)

Certainly! Here's an additional section describing the REST API connected to the local SQLite database:

REST API

The repository also includes a FastAPI engine that provides a REST API interface to interact with the local SQLite database. While not mandatory for enabling audio and transcript processing, starting the FastAPI server allows you to monitor the progress of the ongoing processes.

To start the FastAPI server, run the following command:

uvicorn main:app --reload

Once the server is running, you can access the API at http://localhost:8000.

Endpoint

The following GET endpoint is available:

GET /: Retrieves a list of all audio files in the database with the status of the transcripting process and the transcript.

Handle running processes

If you reach your server via ssh and wanna kill the running python processes, do the following:

ps -ef | grep python
kill -9 <PID>

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.history		.history
thunder-tests		thunder-tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
checkdir.py		checkdir.py
db_service.py		db_service.py
exceptions.py		exceptions.py
fakestt.py		fakestt.py
file_service.py		file_service.py
functions.py		functions.py
graphql_service.py		graphql_service.py
inputwatch.py		inputwatch.py
main.py		main.py
outputwatch.py		outputwatch.py
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Civil STT Controller

Description

Features

How to Use

Requirements

REST API

Endpoint

Handle running processes

About

Releases

Packages

Contributors 3

Languages

Koffair/civil-stt-controller

Folders and files

Latest commit

History

Repository files navigation

Civil STT Controller

Description

Features

How to Use

Requirements

REST API

Endpoint

Handle running processes

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages