Speech emotion recognition using deep learning
Speech emotion recognition using deep learning
Date
2023-09-14
Authors
Σκουλίδης, Γεώργιος
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis aims to build a robust system for recognizing speech emotions through the
utilization of supervised labeled data and advanced deep-learning techniques with
image classification. The core objective of this study is to construct a model that can
precisely distinguish between emotional classes for English speakers.
The investigation extends to the analysis of the influence of three specific spectral
features. These features are treated as images, each displayed in four different output
sizes resulting from a quadratic transformation achieved through bilinear image
interpolation. We also emphasize the evaluation of three custom abstract
Convolutional Neural Network (CNN) architectures. These architectures are
characterized by their composition of three convolutional layers and three
fully-connected layers, among other components. We use parameter tuning to identify
the optimal internal parameters and concretize the CNN structures, but to also adjust
the batch size and learning rate values to enhance performance. Furthermore, to
improve generalization, a custom early-stopping algorithm is integrated with the
5-fold cross-validation method. Specific pre-processing steps are employed, along
with some audio-based techniques for cross-validation data.
An additional objective is to study the impact of the optimized pre-trained English
Speech Emotion Recognition model when applied to speech samples from Greek
speakers. A limited dataset of Greek speech is employed to train, validate, and test the
model's performance, while we assess the knowledge of the model's pre-trained
layers.
Description
Keywords
Deep learning, Speech emotion recognition, Image classification, Ekman model, Transfer learning, Convolutional neural networks, Machine learning, Acoustic spectral features, Parameter tuning, Frozen layers, Speech to image, Back-propagation, Activation function