Effort-Aware Just-in-Time Software Defect Prediction

نوع: Type: Thesis

مقطع: Segment: Masters

عنوان: Title: Effort-Aware Just-in-Time Software Defect Prediction

ارائه دهنده: Provider: Sadra Goudarzdashti

اساتید راهنما: Supervisors: Dr. Morteza Yousef Sanati

اساتید مشاور: Advisory Professors: Dr. Muharram Mansoorizadeh

اساتید ممتحن یا داور: Examining professors or referees: Dr. Reza Mohammadi, Dr. Shakoor Vakilian

زمان و تاریخ ارائه: Time and date of presentation: 2025

مکان ارائه: Place of presentation: seminar

چکیده: Abstract: .Effort-aware Just-in-Time Software Defect Prediction (JIT-SDP) is one of the key challenges in software engineering, aiming to identify defective code changes at the moment of commit. This task plays a crucial role in reducing maintenance costs, improving software quality, and optimizing resource management. Given the limited resources available for code inspection, achieving high accuracy in detecting defective changes with minimal inspection effort is of particular importance. In this research, a novel approach based on language models is proposed to establish a semantic alignment between commit messages and code changes, enabling the model to gain a deeper understanding of the intent behind each change. To this end, a two-stage framework consisting of pre- training and fine-tuning phases was designed. In the pre-training phase, two complementary methods were employed: Masked Language Modeling (MLM) to extract semantic and structural representations from each component independently, and Contrastive Learning to bring together the embeddings of related commit–code pairs while separating unrelated samples. In the fine-tuning phase, the model was trained on labeled data containing code changes, commit messages, and handcrafted features to predict defective changes. Experimental results on the JIT-Defects4J dataset demonstrated that the proposed method outperforms existing baselines across all evaluation metrics. Specifically, it achieved an improvement of 7% in F1 score, 1% in AUC, and 4.9% in Recall@20%Effort compared to the strongest baseline. These results indicate that leveraging semantic pre-training based on language models to jointly represent commit messages and code changes can effectively enhance prediction accuracy, improve model generalization, and ultimately contribute to the advancement of software quality assurance processes