Hybrid CNN–Vision Transformer Framework for Human Metapneumovirus Detection: A Comparative Study of Deep Learning Models

Main Article Content

Ravi Babu Birudugadda, Seshasai Priya Sadam

Abstract

Human metapneumovirus (hMPV) is a respiratory virus that can cause serious infections in children, elderly individuals, and patients with weak immune systems. Early detection is important because the symptoms of hMPV often resemble other respiratory diseases, making clinical diagnosis difficult. Artificial intelligence has recently become a useful tool for analyzing medical images and supporting disease detection. This study presents a comparative analysis of deep learning techniques used for the identification of hMPV from medical imaging data. Instead of training a new dataset, the research analyzes experimental results reported in previous studies and evaluates the effectiveness of different deep learning models. Traditional Convolutional Neural Network (CNN) models are first examined as the baseline approach for image classification. However, CNN models mainly capture local spatial features and may miss global relationships in medical images. To address this limitation, the study proposes a hybrid deep learning framework that combines CNN with a Vision Transformer architecture. In this framework, CNN layers perform local feature extraction, while the Vision Transformer module captures global contextual information through an attention mechanism. Performance values are reviewed from published research papers and compared using evaluation metrics such as accuracy, precision, recall, and F1-score. The analysis shows that hybrid CNN–Transformer models can achieve detection accuracies of approximately 91–94%, which is higher than conventional CNN approaches. These results highlight the potential of hybrid deep learning architectures for improving AI-assisted respiratory disease diagnosis.

Article Details

Section
Articles