LoopQPrize_2022

Speech emotion recognition continues to be a difficult task. There are still several open problems: which are the best input features, and which is the most effective neural architecture. I have adopted a combination of input features, that include Mel spectrogram, Mel-frequency cepstral coefficients (MFCCs), chromagram, spectral contrast and Tonnetz representation. I propose an architecture based on bidirectional long-short term memory (LSTM) layers, that fully exploit the temporal information of audio recordings. I have trained the network on audio files from four different origins: Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Crowd-sources Emotional Multimodal Actors Dataset (CREMA), Surrey Audio-Visual Expressed Emotion (SAVEE), Toronto emotional speech set (TESS).

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
code		code
Davide,Baldelli_Overview.pdf		Davide,Baldelli_Overview.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoopQPrize_2022

About

Releases

Packages

Languages

Dundalia/LoopQPrize_2022

Folders and files

Latest commit

History

Repository files navigation

LoopQPrize_2022

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages