Topic Driven Text Extraction for Kannada Document Summarization Using LDA

Veena R

doi:10.52783/jisem.v10i36s.6324

PDF

Published: Apr 15, 2025

DOI: https://doi.org/10.52783/jisem.v10i36s.6324

Keywords:

Kannada NLP, multi-document summarization, topic modeling, LDA, redundancy reduction, extractive summarization.

Veena R, D. Ramesh, Hanumanthappa M

Abstract

Automatic Text Summarization (ATS) compacts source content into a concise format while preserving core information. While extensively studied for resource-rich languages, ATS remains challenging for low-resource languages like Kannada due to limited corpora and NLP tools. This work introduces an extractive, topic-driven method for summarizing Kannada news articles from multiple documents. We developed a custom dataset of 100 Kannada news story sets (3 articles per set) to address the lack of standardized benchmarks. The proposed approach leverages Latent Dirichlet Allocation (LDA) to identify latent themes across documents, followed by sentence selection using vector-space modeling. Sentences are scored based on their relevance to identified topics (via cosine similarity) and prioritized to maximize informational value while minimizing redundancy through Maximum Marginal Relevance (MMR). Evaluations using ROUGE metrics demonstrate that the LDA-based method outperforms existing summarization algorithms, producing summaries closer to human-generated references. The system achieves higher F-scores (e.g., 0.68 at 40% compression) compared to baseline models like TextRank and approaches for other Indian languages, validating its efficacy for low-resource linguistic contexts.

Issue

Vol. 10 No. 36s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Topic Driven Text Extraction for Kannada Document Summarization Using LDA

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details