Program Repair using Large Language Models

نوع: Type: Thesis

مقطع: Segment: masters

عنوان: Title: Program Repair using Large Language Models

ارائه دهنده: Provider: mohammadhosein molavvani

اساتید راهنما: Supervisors: Dr Morteza yuosef sanati and Dr Muharram Mansoorizadeh

اساتید مشاور: Advisory Professors:

اساتید ممتحن یا داور: Examining professors or referees: Dr Reza Mohammadi and Dr Mehdi Sakhaei nia

زمان و تاریخ ارائه: Time and date of presentation: 2025

مکان ارائه: Place of presentation: seminar

چکیده: Abstract: With the rapid expansion of software applications in diverse domains such as industry, finance, healthcare, and infrastructure, software security has become a critical challenge in the development and maintenance of software systems. Security vulnerabilities in source code may result in severe consequences including data breaches, system instability, and loss of organizational credibility. Therefore, automatic detection and repair of software vulnerabilities has emerged as an important research direction in recent years. Advances in Large Language Models (LLMs) have opened new horizons for program repair, yet existing approaches still face challenges such as insufficient understanding of syntactic structures, inability to capture complex semantic dependencies, and unstable performance in real-world scenarios.In this research, a novel approach based on the large language model CodeT5 is proposed to enhance the accuracy of vulnerability repair by integrating structural code information. Specifically, the Abstract Syntax Tree (AST) was extracted using the tree sitter tool and incorporated into the training data as an additional structured textual feature. The CVEFixes-BigVul dataset, containing more than 8,400 pairs of vulnerable and fixed C/C++ functions, was employed for experimentation. The enriched dataset enabled CodeT5 to undergo fine-tuning with deeper structural awareness of code syntax and dependencies. Experimental results demonstrate that the integration of AST information significantly improves the Perfect Prediction metric, enabling the model to generate more precise and structurally consistent patches. These findings suggest that combining syntactic knowledge with LLM-based approaches provides an effective strategy for automated vulnerability repair.

فایل: ّFile: Download فایل