Summarizing Software Programming Logic Using Word2Vec Embeddings: A Representation Learning Approach

Main Article Content

Rajesh Unnikrishna Menon

Abstract

Comprehending the rationality of massive software systems is one of the key issues of modern software engineering. The traditional code summarizing methods are largely based on static analysis and heuristics that are often manually developed, and may not succeed in reflecting the semantic relationship between elements of code. The suggested approach fills this gap by using Word2Vec embeddings to encode code tokens with dense vectors of a continuous semantic space. Through contextual co-occurrence associations among tokens, Word2Vec is able to elicit latent patterns in program organization and program behavior. Experiments on open-source repositories prove that embedding-based models can be effectively used to produce meaningful code logic summaries, which are better than frequency-based baseline methods on both quantitative and qualitative metrics. The findings show that Word2Vec embeddings are an effective semantically-aware basis of automated code summarization workflows, which is computationally efficient and can scale without sacrificing performance to competitive assets to supervised neural architectures. The embedding training process is unsupervised, and hence it does not rely on large labeled datasets, which is why the technique is especially applicable to a wide variety of programming tasks and new languages where annotated training data is limited.

Article Details

Section
Articles