Speech emotion recognition continues to be a difficult task. There are still several open problems: which are the best input features, and which is the most effective neural architecture. I have adopted a combination of input features, that include Mel spectrogram, Mel-frequency cepstral coefficients (MFCCs), chromagram, spectral contrast and Tonnetz representation. I propose an architecture based on bidirectional long-short term memory (LSTM) layers, that fully exploit the temporal information of audio recordings. I have trained the network on audio files from four different origins: Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Crowd-sources Emotional Multimodal Actors Dataset (CREMA), Surrey Audio-Visual Expressed Emotion (SAVEE), Toronto emotional speech set (TESS).
-
Notifications
You must be signed in to change notification settings - Fork 0
Dundalia/LoopQPrize_2022
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Solution for the Loop Q Prize 2022: A speech emotion recognition DL model
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published