Detection of Vulnerabilities at the Source Code Level Using Embedding Methods

نوع: Type: Thesis

مقطع: Segment: masters

عنوان: Title: Detection of Vulnerabilities at the Source Code Level Using Embedding Methods

ارائه دهنده: Provider: Mohammad Fasihi

اساتید راهنما: Supervisors: Dr. Morteza Yousef Sanati

اساتید مشاور: Advisory Professors: Dr. Muharram Mansoorizadeh

اساتید ممتحن یا داور: Examining professors or referees: Dr.mehdi sakhaei nia, Dr.Reza Mohammadi

زمان و تاریخ ارائه: Time and date of presentation: 2026

مکان ارائه: Place of presentation: اتاق سمینار

چکیده: Abstract: Early detection of software vulnerabilities in source code plays a crucial role in maintaining the security of modern software systems. In recent years, Large Language Models have shown promising capabilities in code understanding and vulnerability pattern recognition. However, the limited input token capacity of these models often leads to the truncation of long functions, which may result in the loss of critical code segments and introduce code-length bias in vulnerability detection models. In this thesis, a framework for binary vulnerability detection in source code using language models is proposed. The proposed approach consists of two main phases. First, an intelligent code reduction strategy is applied to remove redundant elements and increase the semantic density of the input code. Second, instead of relying on destructive truncation techniques, a flexible processing mechanism based on a sliding window strategy is employed to analyze long code segments while preserving their logical context. Extensive experiments conducted on two widely used datasets, REVEAL and Devign, demonstrate that the proposed framework improves evaluation metrics, particularly the F1-score, compared with baseline models. Moreover, the results indicate that the proposed approach effectively mitigates code-length bias and improves the fairness of vulnerability detection models. Cross-project evaluation further shows that the framework maintains strong stability and generalization capability across projects with different structural characteristics.