A Comparative Study of Feature Extraction and Classification Techniques for Punjabi Speech Recognition
Main Article Content
Abstract
Punjabi is the second most spoken language in north India. It is desirable to have a communication system in a local language that permits ordinary people to communicate with machines via speech interface to retrieve information or to perform their daily activities. It is observed that conventional Automatic Speech Recognition (ASR) systems are in English or European languages. They also use the Mel Frequency Cepstral Coefficient (MFCC), Perceptual Linear Prediction (PLP) etc. features, but do not perform well in real-world situations. Here, a study has been carried out on different feature extraction techniques to find the best-performing feature extraction technique for Punjabi language in clean and noisy environments. This paper also compares the performance of different acoustic models-based ASR systems for Punjabi. Previously performed studies have utilized the Context-Independent (CI) and Context-Dependent (CD) acoustic models but this study focused on CD models. This study will help the researcher to know about the behavior of different feature extraction techniques and acoustic models for Punjabi speech dataset in clean and noisy environments. Experimental results show that MFCC and Gammatone Frequency Cepstral Coefficients (GFCCs) perform well in clean and noisy environments, respectively. The best Word Error Rate (WER) is 12% and 14.8% achieved by MFCC and GFCC feature extraction technique with Bidirectional Long Short-Term Memory (BLSTM) as acoustic model in clean and noisy environment, respectively.