Secure Document Automation Using Blockchain Anchors and AI-Validated Semantic Hashing for Invoice Integrity

Main Article Content

Ranadheer Reddy Charabuddi

Abstract

During the age of digital transformation, invoice automation is central to operation efficiency but is vulnerable to fraud, semantic manipulation, and structural tampering. Legacy document processing systems based on Optical Character Recognition (OCR) and rule-based field extraction tend not to incorporate semantic comprehension as well as cryptographic security needed to provide document integrity. In response to these key shortcomings, this research introduces a new framework referred to as Double-Layered Integrity Validation using Blockchain-Coupled Structural and Semantic Hashing (DLIV-BCSH). The system integrates cryptographic and artificial intelligence-based methods to implement a tamper-evident verification process. To that end, it employs the BLAKE3 algorithm for the fast, collision-resistant structural hashing and Sentence-BERT with SimHash for semantic similarity capture. This two-stage hashing is then rooted in a blockchain, developing immutable, timestamped audit trails without on-chain exposure of sensitive invoice content. The scheme is tested with the Customer Invoices Dataset from Kaggle, which contains realistic but synthetic structured invoices. A preprocessing pipeline consisting of field extraction, text normalisation, data imputation, and tokenisation gets the data ready for secure hashing and embedding. Implemented in Python, the DLIV-BCSH system shows excellent performance statistics with 98.5% accuracy of tamper detection, 96.7% semantic sensitivity, and an average latency of blockchain anchoring at 0.85 seconds. It precisely identifies byte-level as well as meaning-level tampering and performs better than current models like ARCHANGEL and VBlock. This study highlights the strong synergy between Blockchain Technology and Artificial Intelligence, providing a secure, low-latency, and privacy-preserving way for document integrity validation.

Article Details

Section
Articles