10.1109/ACCESS.2025.3542566

Nizam, Ali; İslamoğlu, Ertuğrul; Adalı, Ömer Kerem; Aydın, Musa

doi:10.1109/ACCESS.2025.3542566

Yayınlanmış 14 Şubat 2025 | Sürüm v1

Veri Seti Açık

10.1109/ACCESS.2025.3542566

1. Fatih Sultan Mehmet Vakıf Üniversitesi

Code embedding represents code semantics in vector form. Although code embedding-based systems have been successfully applied to various source code analysis tasks, further research is required to enhance code embedding for better code analysis capabilities, aiming to surpass the performance and functionality of static code analysis tools. In addition, standard methods for improving code embedding are essential to develop more effective embedding-based systems, similar to augmentation techniques in the image processing domain. This study aims to create a contrastive learning-based system to explore the potential of a generic method for enhancing code embedding for code classification tasks. A triplet lossbased deep learning network is designed to optimize in-class similarity and increase the distance between classes. An experimental dataset that contains code from Java, Python, and PHP programming languages and 4 different code smells is created by collecting code from open-source repositories on GitHub. We evaluate the proposed system’s effectiveness with widely used BERT, CodeBERT, and GraphCodeBERT pretrained models to create code embedding for the code classification task of code smell detection. Our findings indicate that the proposed system may offer improvements in accuracy, an average of 8% and a maximum of 13% for models. These results suggest that incorporating contrastive learning techniques into the generation process of code representation as a preprocessing step can enhance performance in code analysis

Dosyalar

9000_smells.json

Dosyalar (18.4 MB)

Ad	Boyut	Hepisini indir
9000_smells.json md5:0507d809ce0be13e9ab00c8335f17009	18.4 MB	Ön İzleme İndir
ReadMe.pdf md5:7c857ca2415edce574a7846718edf2c6	40.2 kB	Ön İzleme İndir

Görüntüleme

İndirilenler

Daha fazla ayrıntı göster

	Tüm sürümler	Bu sürüm
Görüntüleme	13	13
İndirilenler	29	29
Veri miktarı	386.5 MB	386.5 MB

Oluşturuldu

30 Temmuz 2025

DOI

Kaynak türü

Veri Seti

Yayınlandığı dergi

IEEE Access, 13(1), 31335-31350, 2025.

Bilim dalları

Teknik Bilimler > Bilgisayar Bilimleri

Teknik Bilimler > Bilgisayar Bilimleri > Yazılım > Yazılım Mühendisliği

Teknik Bilimler > Bilgisayar Bilimleri > Yapay Zeka, Bilgisayarda Öğrenme ve Örüntü Tanıma > Doğal Dil İşlemesi

Anahtar Kelimeler

Code embedding, contrastive learning, triplet loss, code smell detection.

Haklar

Creative Commons Attribution Share Alike 4.0 International

Permits almost any use subject to providing credit and license notice. Frequently used for media assets and educational materials. The most common license for Open Access scientific publications. Not recommended for software. Read more

10.1109/ACCESS.2025.3542566

Dosyalar

9000_smells.json

Dosyalar (18.4 MB)

TÜBİTAK ULAKBİM

İLETİŞİM

10.1109/ACCESS.2025.3542566

Oluşturanlar

Açıklama

Dosyalar

9000_smells.json

Dosyalar (18.4 MB)