Abstract:
In computer-human interaction, understanding human behavior is one of the main tasks for
a machine to make the whole process as natural as possible. Conversation is one of the
quickest and most natural methods of communication between humans. By understanding
the emotion behind the speech, machines can better assist humans. The purpose of this
project is to build a machine learning model that will recognize emotion in the audio
containing speech. The modeling of the SER can be done respective and irrespective of the
semantic contents of the speech itself. This research will aim to focus on tone and pitch
recognition of the voice. Which will help in recognizing the emotional state of the human
using the non-linguistic aspect of the speech. As human emotions can get complicated to the
point that even humans cannot always precisely understand the emotion of the speaker, we
can summarize them into four main categories neutral, positive, angry, and sad emotions.
These will be the focus of the predictions for the developed model. Comparison will be done
between various implementations to find out the best fitting model for the non-linguistic
approach.