SFTCS: Multiform Semantic Fusion Based on Transformer for Code Summarization

Jun Yang; Longhao Ao; Rongzhi Qi

doi:10.1109/ICSTE63875.2024.00028

Abstract

Automatic code summarization aims to create co-herent natural language descriptions for code snippets. Recent studies indicate that integrating additional code representation structures improve the quality of generated summaries beyond token sequence-based approaches. Current technical research mainly concentrates on the graph structure information of code snippets. However, using a graph method makes it challenging to extract meaningful semantic information from the code snippets. To address the issue, we investigate how to learn more about code semantics and information from the perspective of code text feature representation. Consequently, we propose a novel model named SFTCS. SFTCS employs a conjoint feature fusion strategy, focusing on two types of information in code snippets: code token sequence features and FGT features based on Abstract Syntax Trees (AST) sequence and graph sequence. An N-adjacency ma-trix based on AST features is constructed, and the corresponding feature for AST block are optimized through a graph-based structure. Subsequently, a FGC decoder processes the two types of features. With the aid of an attention mechanism for feature enhancement, the SFTCS model ultimately predicts the results concretely. Evaluations on two publicly available Java datasets validate the effectiveness of SFTCS, showing its superiority over six advanced code summarization models. Ablation studies further elucidate the contributions of each component within the SFTCS model.

SFTCS: Multiform Semantic Fusion Based on Transformer for Code Summarization

Authors

Abstract

Related Articles