A Hybrid LSTM-CNN Model Approach for Recognition of Mishing Language Vowel Phonemes

Main Article Content

S. K. Saikia, D. J. Borah, S. Kalita

Abstract

This paper proposes a hybrid deep learning architecture that combines Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs) to enhance vowel recognition in the Mishing language, an under-resourced Tibeto-Burman language of Northeast India. The model leverages temporal features extracted from Mel Frequency Cepstral Coefficients (MFCC) via an LSTM branch and spatial features obtained from Mel Spectrograms via a CNN branch. Experiments on a Mishing language vowel dataset demonstrate performance with a test accuracy of 95%, precision of 0.94, recall of 0.94, and F1-score of 0.94. Visualizations including training curves, precision/recall trends, and a confusion matrix validate the effectiveness of the proposed model. Our comprehensive experimental study highlights potential improvements in Mishing vowel recognition accuracy and provides a pathway for future research in Mishing speech recognition.

Article Details

Section
Articles