Abstract:
Our Speech comprises of paralinguistic features such as: identity, age, gender,
accent etc. The objective of this paper is to identify, the age-group and gender of the
speaker from Azerbaijani speech. The paper will be focused on both the adult and
children speech identification. The identification of the age and gender of children
speech is more complex than the adult’s speech as the voice of boys and girls before
puberty coincide and additionally the puberty complicates the distinction between an
adult and a teenager which results in possible errors with age-group identification.
Moreover, the existence of numerous accents of Azerbaijani data an additional
milestone. To identify age and gender from the speech the data should be pre-processed
and then the features extracted. Next step is to classify according to results obtained.
Various approaches are going to be tested such as x-vectors and i-vectors that are based
on Deep Neural Network architecture. Then there is MFCC - a feature extraction
technique a part of Automatic voice processing for unique feature extraction. On top of
that the GMM-SVM model which is a Gaussian mixture model is run. KNN and MLP
are another prominent approach to be used as a classifier for age and gender
identification problem. Another feature extraction technique called SDC – shifted delta
cepstral coefficient will be tested and compared with the MFCC results. The music and
audio analysis package called Librosa and a PyAudio library are used to enable the
record and play of an audio for demonstration purposes in the future. There outcome of
the model is going to be classified into 4 age groups which are: Children (7-14), Young
aged (15-24), Middle aged (25-54) and Seniors (55-80) and 2 genders: Male and Female.
A proper identification of age and gender is sometimes a hard task for a human being as
well which complicated the identification process for the machines.