Baby cry recognition based on SLGAN model data generation and deep feature fusion
Document Type
Article
Publication Date
5-15-2024
Abstract
Deep learning models have been applied in baby cry recognition to enhance the recognition accuracy. However, the current research still suffers from data imbalance problem, which leads to bias in model learning. Sparse Autoencoder Long Short-Term Memory based Generative Adversarial Network (SLGAN) is proposed to solve the data imbalance problem. The proposed SLGAN model generates new baby cry data to ensure the number of samples for every cry class is equal. Speech features are extracted using Mel-spectrograms and Short-Time Fourier Transform (STFT). Two deep learning models, i.e. VGG16 and VGG19 are used to extract the deep features. The deep features are then dimensionally reduced by using Principal Component Analysis (PCA). A sparse autoencoder model is used to fuse the deep features. Finally, the fused features are trained and classified using the Deep Neural Network. The experimental results show that the proposed method outperforms the existing methods.
Keywords
Baby cry, Data generation, Generative adversarial networks (GANs), Sparse autoencoder, Feature fusion
Divisions
fac_eng,biomedengine,paediatrics
Funders
Universiti Malaya [GPF074A-2018]
Publication Title
Expert Systems with Applications
Volume
242
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Publisher Location
THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND