MO-DiPredict: Multi-Omics Data Integration Framework for Early Detection and Subtype Prediction of Blood Cancers
Main Article Content
Abstract
The study introduces MO-DiPredict, a framework designed to combine multi-omics data for detecting blood cancers at early stages and identifying cancer subtypes. It focuses on integrating diverse data types, including genomic, transcriptomic, proteomic, clinical, and imaging data, addressing challenges like noise, inconsistencies, and varying data scales. The framework aligns features using Canonical Correlation Analysis and captures relationships between modalities through Graph Neural Networks. Machine learning methods such as XGBoost, CNNs, and Transformer networks process the data, with feature engineering refining input variables like tumor mutation scores, pathway activity, and radiomics features. Datasets from TCGA, GEO, CPTAC, SEER, and TCIA were used to evaluate the framework. Results show that MO-DiPredict performs better in metrics such as accuracy and recall when compared to MRGCN and MODILM, achieving an AUC-ROC of 0.93. Incremental improvements from feature engineering and multi-modal integration were confirmed through ablation studies. Clinical features contributed most to the predictions, followed by genomic and transcriptomic data. Scalability tests indicate consistent performance as dataset size increases. The study provides a method for integrating diverse biological and clinical data to improve cancer detection and classification. The findings demonstrate the framework's ability to handle complex datasets, making it a practical tool for exploring multi-omics in cancer research.