Privacy Preservation in Data-Hungry Deep Learning: A Comprehensive Review of Attacks and Techniques
Main Article Content
Abstract
Deep learning (DL) has achieved remarkable success across various domains, including healthcare, finance, and natural language processing; however, its reliance on sensitive data poses significant privacy risks. Privacy-preserving deep learning (PPDL) has therefore emerged as a critical research direction, integrating cryptographic techniques, statistical privacy mechanisms, and distributed training paradigms. This survey reviews state-of-the-art privacy-preserving deep learning (PPDL) techniques centered on homomorphic encryption (HE), secure multi-party computation (SMPC) (and hybrid protocols), differential privacy (DP), and secure enclaves (SE/TEEs). We also position federated learning (FL) as an orchestration paradigm that composes these techniques at scale. We systematically analyze their efficacy, privacy, and efficiency trade-offs, and map common attack vectors—such as reconstruction, inversion, membership inference, poisoning, and hardware-level side channels—to representative defenses. Bibliometric analysis using VOSviewer further highlights the thematic structure of the field, with strong clusters around cryptography, differential privacy, and system-level optimization. Our findings reveal that no single paradigm suffices in practice: while HE and SMPC provide strong confidentiality, they incur high costs; DP enables formal guarantees at the expense of accuracy; and FL reduces raw-data exposure but introduces novel vulnerabilities. We conclude that hybrid, layered strategies combining DP, cryptography, and robust aggregation are the most promising path toward scalable, trustworthy PPDL for real-world deployment.