An Improved Convolutional Neural Network For Speech Detection

PROLAY GHOSH

doi:10.52783/jisem.v10i3.5951

PDF

Published: Apr 11, 2025

DOI: https://doi.org/10.52783/jisem.v10i3.5951

Keywords:

Emotion Detection, CNN, RMSProp, RAVDESS, Machine Learning.

PROLAY GHOSH, TANUSREE SAHA, DEBASHIS SANKI, SHIBRAJ BASAK,

Abstract

The detection of emotions from speech is the aim of this paper. Speech consists of anger, joy and fear have very high and wide range in pitch, whereas Speech consists of sad and tired emotion have very low pitch. Speech Emotion detection technology can recognize human emotions to help machines better for understanding intentions of a user to improve the human-computer interaction. Classification model named Convolutional Neural Network (CNN) based on mainly Mel Frequency Cepstral Coefficient (MFCC) feature to detect emotion have been presented here. Different approaches have been discussed and compared to find best CNN model using different combinations of parameters. The models have been trained to distinguish eight different emotions such as calm, neutral, angry, sad, happy, disgust, fear, surprise. The proposed work shows that CNN 3 Layer model with RMSprop optimizer when trained with 80 Epochs works best among other CNN models for the RAVDESS dataset.

Issue

Vol. 10 No. 3 (2025)

Section

Articles

Journal of Information Systems Engineering and Management

An Improved Convolutional Neural Network For Speech Detection

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details