Analyzing Performance and Accuracy in Complaint Classification using Spark MLlib

Mohammed Bachir MAHDJOUB

PDF

Published: Aug 20, 2025

Keywords:

Spark MLlib, Complaints Classification, Customer Relationship Management (CRM), Big Data, Machine Learning, Performance Analysis.

Mohammed Bachir MAHDJOUB, Fatima Zohra LAALLAM, Messaoud MEZATI

Abstract

Customer complaint data explosion poses the need for complaint classification methods that are scalable and efficient to facilitate Customer Relationship Management (CRM). This work surmounts the complaint classification issue of correctly classifying the unstructured customer complaints using the assistance of Apache Spark and its machine learning library, MLlib. A multi-stage PySpark pipeline used classification from the text of Amazon product reviews to "Highly Dissatisfied" and "Mildly Dissatisfied" classes. Three of the most popular classification algorithms—Naive Bayes, Logistic Regression, and RandomForestClassifier—were evaluated on the entire set of metrics like accuracy and macro F1-score and weighted recall and precision. Our experiments show that while the best accuracy was produced by the model of RandomForestClassifier all things being equal, the most balanced performance was provided by the model of Naive Bayes with the best macro F1-score of 0.6884 and highest weightage of precision of 0.7022. This optimal trade-off makes the model best suited for practical deployment. Our discovery is that for this specific classification task the most efficient solution for consistently and correctly classifying the customer complaints on large scale is the algorithm of Naive Bayes.

Issue

Vol. 10 No. 58s (2025)

Section

Articles

Journal of Information Systems Engineering and Management

Analyzing Performance and Accuracy in Complaint Classification using Spark MLlib

Abstract

Volume 10 (2025)

Volume 9 (2024)

Volume 8 (2023)

Volume 7 (2022)

Volume 6 (2021)

Volume 5 (2020)

Volume 4 (2019)

Volume 3 (2018)

Volume 2 (2017)

Volume 1 (2016)

Journal of Information Systems Engineering and Management

Article Sidebar

Main Article Content

Abstract

Article Details