Dynamic Gated Fusion with Cross-Modal Attention for Multimodal Tourism Sentiment Analysis

Ling Ma

PDF

Published: May 19, 2025

Keywords:

multimodal fusion; dynamic gating; tourism sentiment analysis; cross-modal attention; deep learning

Ling Ma, Adisak Sangsongfa, Nopadol Amm-Dee

Abstract

To address the limitations of unimodal sentiment analysis in the context of tourism in Heilongjiang, this paper proposes a dynamic gated multimodal fusion model that integrates textual and visual features through a cross-modal attention mechanism, enhancing both the accuracy and interpretability of sentiment analysis. Building upon previous unimodal studies—BiLSTM with FastText for text analysis and ResNet50 for image analysis—the model introduces a gating mechanism to dynamically adjust the contribution of each modality. Additionally, a Transformer-based attention layer is employed to capture inter-modal dependencies. Experiments conducted on a Heilongjiang tourism dataset (6,580 reviews and 5,976 images) demonstrate that the proposed model achieves an accuracy of 98.2%, marking a 1.2% improvement over the text-based unimodal baseline. Visualization results reveal that the gating mechanism assigns greater weight to visual features in extreme sentiment cases (e.g., negative reviews), with weights reaching up to 0.72. This study offers a transparent and interpretable framework for multimodal sentiment analysis in tourism.

Issue

Vol. 10 No. 48s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Dynamic Gated Fusion with Cross-Modal Attention for Multimodal Tourism Sentiment Analysis

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details