Multimodal Approach in Prediction of Alzheimer’s Disease Using Voice, Transcript Dataset

Main Article Content

Manisha Chaudhari, Mukta Agrawal

Abstract

Introduction Alzheimer’s disease (AD) is a progressive neurodegenerative disorder characterized by cognitive decline , memory impairment and impact on language abilities. Early and accurate prediction of AD is critical for effective intervention and management. This study proposes a multimodal approach that integrates heterogeneous data sources—including voice recordings, transcribed speech, textual metadata, and neuroimaging—to enhance prediction accuracy


Objectives: The primary objective of this study is to develop and evaluate a multimodal machine learning framework that combines acoustic features from voice recordings and linguistic features from speech transcripts to improve the accuracy and reliability of early Alzheimer’s disease prediction. By utilizing both speech and textual data, the approach aims to capture subtle cognitive and behavioral patterns that may not be evident through a single modality, ultimately contributing to earlier, non-invasive, and scalable diagnostic tools..


Methods: This study proposes a multimodal approach that integrates heterogeneous data sources—including voice recordings, transcribed speech, textual metadata, and neuroimaging—to enhance prediction accuracy. By leveraging the complementary strengths of each modality, the system captures both linguistic and paralinguistic features from speech, semantic and syntactic cues from transcripts, and structural biomarkers from MRI scans.


Results: Experimental results on benchmark datasets demonstrate that the multimodal fusion approach significantly outperforms unimodal baselines, offering a more robust and holistic understanding of AD-related indicators.both train ans test accuracy are almost same. Both the dataset have good accuracy more than 70% accuracy is achieved. This approach underscores the potential of multimodal machine learning in advancing non-invasive, early-stage Alzheimer’s diagnosis.


Conclusions: This study demonstrates the effectiveness of a multimodal approach that integrates both voice and transcript data for the prediction of Alzheimer's disease. By leveraging linguistic features from transcripts alongside acoustic features from voice recordings, the model achieves a more comprehensive understanding of cognitive decline indicators. The fusion of these modalities not only improves prediction accuracy but also provides a non-invasive, cost-effective alternative for early detection. Future work may focus on expanding datasets, incorporating other modalities (e.g., facial expressions or brain imaging), and enhancing real-time clinical applicability to support early intervention and better patient outcomes.

Article Details

Section
Articles