A Hybrid LSTM-CNN Model Approach for Recognition of Mishing Language Vowel Phonemes

S. K. Saikia

doi:10.52783/jisem.v10i36s.6555

PDF

Published: Apr 16, 2025

DOI: https://doi.org/10.52783/jisem.v10i36s.6555

Keywords:

ishing Language, Vowel recognition, speech recognition, deep neural networks, hybrid LSTM-CNN, MFCC, Mel Spectrogram, acoustic modeling.

S. K. Saikia, D. J. Borah, S. Kalita

Abstract

This paper proposes a hybrid deep learning architecture that combines Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) to enhance vowel recognition in the Mishing language, an under-resourced Tibeto-Burman language of Northeast India. The model leverages temporal features extracted from Mel Frequency Cepstral Coefficients (MFCC) via an LSTM branch and spatial features obtained from Mel Spectrograms via a CNN branch. Experiments on a Mishing language vowel dataset demonstrate performance with a test accuracy of 95%, precision of 0.94, recall of 0.94, and F1-score of 0.94. Visualizations including training curves, precision/recall trends, and a confusion matrix validate the effectiveness of the proposed model. Our comprehensive experimental study highlights potential improvements in Mishing vowel recognition accuracy and provides a pathway for future research in Mishing speech recognition.

Issue

Vol. 10 No. 36s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

A Hybrid LSTM-CNN Model Approach for Recognition of Mishing Language Vowel Phonemes

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details