Hierarchical Vision Transformer Model-based Lung Cancer Detection with Multiscale Patch Embedding and Cross-Attention Fusion

K Yogeswara Rao, K Srinivasa Rao

doi:10.52783/jisem.v10i15s.2434

PDF

Published: Mar 4, 2025

DOI: https://doi.org/10.52783/jisem.v10i15s.2434

Keywords:

Lung Cancer Detection, Hierarchical ViT, CNN Feature Extraction, Hybrid Model, Multiscale Patch Embedding, and Cross-Attention

K Yogeswara Rao, K Srinivasa Rao

Abstract

Lung cancer has become one of the most complex tumors to diagnose early, particularly with CT imaging, due to the intricacy and unpredictable nature of malignant patterns. Vision Transformers (ViTs) significantly improve feature extraction that captures insights in complex images for accurate diagnosis. However, extracting local spatial in small nodules while maintaining global features is challenging due to the patch merging in the hierarchical ViTs, which is ineffective for diversified images. Thus, this work introduces a Convolutional Neural Network (CNN) and Hierarchical Vision Transformer (ViT)-assisted hybrid model for lung cancer detection, enriched by multiscale patch embedding and cross-attention fusion to improve feature extraction and analysis of lung PET/CT images. Initially, the proposed approach applies the preprocessing and augmentation procedure to improve the generalization for lung cancer detection tasks. In the hybrid model, the CNN model extracts the local spatial features from the integrated multimodal PET/CT images and divides the feature maps of images into multiple scales to provide input to the hierarchical ViT succeeded by the multiscale patch embedding and position encoding. Moreover, the design of cross-attention fusion in hierarchical ViT combines the multiscale information, allowing the model to concentrate on relevant patterns and enhance diagnostic accuracy. Thus, experimental results show that the proposed model outperforms the existing lung cancer detection approaches, particularly in cases with small or indistinct lesions, by efficiently merging multiscale embeddings.

Issue

Vol. 10 No. 15s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Hierarchical Vision Transformer Model-based Lung Cancer Detection with Multiscale Patch Embedding and Cross-Attention Fusion

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details