Novel Approach for Effective Automatic Story Generation in English Language
Main Article Content
Abstract
This study investigates the connections among different automated evaluation criteria for the creation of short stories. The N-gram model, the CBOW (Continuous Bag-of-Words) model, the GRU (Gated Recurrent Unit) model, and the Generative Pre-trained Transformer 2 (GPT-2) model are among the language models it uses to generate texts from short stories. Aesop's brief stories are used to instruct all models. The produced texts are assessed using a number of metrics, such as WMD (Word Mover's Distance), BERTScore, Perplexity, BLEU score, the quantity of grammatical errors, Self-BLEU score, and ROUGE score. When these evaluation measures are correlated, four different clusters of metrics with significant relationships are found. The first cluster shows a moderate correlation between perplexity and grammatical errors. The second group reveals a strong correlation between BLEU, ROUGE, and BERTScore. In contrast, WMD exhibits a negative correlation with BLEU, ROUGE and BERTScore. Furthermore, Self-BLEU, which measures the diversity of the generated text, shows no significant correlation with any of the other metrics. Ultimately, the study concludes that a comprehensive evaluation of generated text requires the use of multiple metrics, each focusing on a different characteristic of the text quality.