Baby cry recognition based on SLGAN model data generation and deep feature fusion

Document Type

Article

Publication Date

5-15-2024

Abstract

Deep learning models have been applied in baby cry recognition to enhance the recognition accuracy. However, the current research still suffers from data imbalance problem, which leads to bias in model learning. Sparse Autoencoder Long Short-Term Memory based Generative Adversarial Network (SLGAN) is proposed to solve the data imbalance problem. The proposed SLGAN model generates new baby cry data to ensure the number of samples for every cry class is equal. Speech features are extracted using Mel-spectrograms and Short-Time Fourier Transform (STFT). Two deep learning models, i.e. VGG16 and VGG19 are used to extract the deep features. The deep features are then dimensionally reduced by using Principal Component Analysis (PCA). A sparse autoencoder model is used to fuse the deep features. Finally, the fused features are trained and classified using the Deep Neural Network. The experimental results show that the proposed method outperforms the existing methods.

Keywords

Baby cry, Data generation, Generative adversarial networks (GANs), Sparse autoencoder, Feature fusion

Divisions

fac_eng,biomedengine,paediatrics

Funders

Universiti Malaya [GPF074A-2018]

Publication Title

Expert Systems with Applications

Volume

242

Publisher

PERGAMON-ELSEVIER SCIENCE LTD

Publisher Location

THE BOULEVARD, LANGFORD LANE, KIDLINGTON, OXFORD OX5 1GB, ENGLAND

This document is currently not available here.

Share

COinS