Interpretable Machine Learning Models with Attention-Based Feature Attribution for High-Dimensional Tabular Data
Main Article Content
Abstract
Interpretability is a critical requirement for machine learning models in domains where transparency and trust are paramount. While deep neural networks (DNNs) offer powerful representation capabilities, their adoption for tabular data has been limited due to a lack of transparency and interpretability. In this work, we propose an attention-based neural architecture tailored for high-dimensional tabular data, which explicitly generates feature-level attributions as part of the prediction process. By treating input features as tokens and applying attention mechanisms, our model learns to assign interpretable importance weights to each feature per instance. We formalize this attention as an additive feature attribution model, providing insight into the decision-making process of the network. Experimental results on synthetic high-dimensional datasets demonstrate that our model achieves competitive accuracy while correctly identifying the truly informative features, outperforming classical interpretable models such as logistic regression and random forests in both predictive performance and clarity of explanation. Our approach bridges the gap between model performance and interpretability, offering a transparent alternative for deep learning on tabular data.