A Natural Language Processing Model for Early Detection of Suicidal Ideation in Textual Data

Main Article Content

Jocelyn M. Beriña, Thelma D. Palaoag

Abstract

According to the World Health Organization, suicide is one of the top 10 causes of death. An estimated 138 people's lives are significantly impacted for every suicide death, and practically every other statistic pertaining to suicide fatalities is concerning. The widespread usage of social media and the almost universal mobile devices used to access social media networks present new opportunities for preventative intervention as well as new forms of data for studying the behavior of those who (attempt to) take their own life. We show that it is feasible to identify suicidal risk individuals using social media data. Specifically, we propose concepts for an automated system for detecting measurable signals around suicide attempts using natural language processing and machine learning (particularly deep learning) approaches. The goal of this project is to improve the automatic identification and reporting of suicidal posts. It offers a method that examines Twitter as a social media platform to find warning indicators for suicide in people. The previously mentioned approach's main goal is to automatically detect anomalous changes in a user's online behavior. The comprehension and identification of intricate factors of risk or warning indicators that might precede the incident provide difficulties in the prevention of suicide. Many natural language processing (NLP) approaches are used to measure textual variations and pass them via a unique framework that may be utilized broadly to accomplish this objective. Deep learning and machine learning-based categorization algorithms are used to identify suicidal thoughts in the early stages by analyzing tweets on social networking platform Twitter. We first performed data pre-processing for both classifiers, followed by feature extraction, and then machine learning and deep learning classifiers, respectively. We use a CNN-LSTM model for this purpose to assess and contrast it with other classification methods. In comparison to earlier CNN-LSTM systems, the study demonstrates that the CNN-LSTM framework using embedding of words techniques achieves 94% classification accuracy.

Article Details

Section
Articles