Abstract:
The objective of speech separation is to distinguish and separate different speakers’
utterances from each other and often also from background noise. Speech separation is one of
the fundamental problems of signal processing domain that has a big diversity of
applications, including hearing prosthesis, mobile telecommunication, and robust automatic
speech and speaker recognition. The capability of the human hearing to isolate one sound
source from a combination of several sounds from multiple sources is exceptional. Humans
appear to be capable of following the utterances of one speaker in the presence of other
speakers and background noises with extremely low to no amount of efforts even in noisy
environments such as parties. The problem of separation of multiple speakers from noise into
separate utterances is referred as “cocktail party problem”. This problem has been
investigated worldwide but there is no evidence of using Azerbaijani dataset to solve this
kind of problem. This paper aims to investigate how different language affects the solution if
this problem and proposes several solutions as well. The first approach will use included
Support Vector Machine, Multi-Layer perceptron, Decision Tree Classifier, Random Forest
Classifier and K-Nearest Neighbors. Additionally, pretrained models will be used to
experiment on custom dataset as an alternative to proposed solution. All the models are going
to be evaluated on several evaluation metrics such as accuracy, SNR, SDR, precision, recall.