Early Detection and Improved Outcomes in Lung Cancer: Leveraging Machine Learning for Predictive Insights
Main Article Content
Abstract
Lung cancer (LC) remains one of the most common causes of cancer-related mortality worldwide, primarily due to late-stage diagnosis. This study examines the potential of predicting techniques using machine learning (ML) to enable the early, innocuous detection of LC by leveraging factors related to lifestyle and environment, such as drinking alcohol, smoking, and air pollution. A robust dataset was employed to develop and compare the efficacy of a diverse array of ML models, including logistic regression (LR), as well as advanced non-stationary methods such as the gradient a booster Machines (GBM), CatBoost, and randomised forests (RF). By employing methods such as RFE (Recursive Feature Elimination) and correlational evaluation to isolate key predictors, the readability and efficiency of the model were enhanced. Based on the evaluation results, a CatBoost was still probably the most beneficial model, as its cross-validation imply performance of 95.53%, an error margin of 1.37%, and an astonishing AUC of 0.98 on the benchmark data set. With an AUC of 0.90, Logistic Regression exhibited a robust equilibrium between interpretability and accuracy. These results emphasise the transformative value of ML in the early detection of LC, thereby facilitating its implementation into clinical workflows. A thorough comparative analysis of linear and non-linear approaches offers valuable insights into their respective strengths and limitations. This work emphasizes the importance of further research into AI-driven healthcare solutions, aiming to improve early diagnosis and ultimately enhance patient outcomes through timely interventions.