Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14
Software Defect Prediction
Submitted To : Submitted By :
Priya Singh Sahil Verma
Department of Software Department of Software Engineering, Delhi Technological Engineering, Delhi Technological University, University, Delhi, India Delhi, India priya.singh.academia@gmail.com Sahilverma.1802@gmail.com INTRODUCTION Software Defect Prediction (SDP) is a critical research area in software engineering focused on identifying software defects before deployment.
The primary objectives of SDP include improving software quality,
enhancing reliability, and reducing maintenance costs by early detection of fault components.
Traditional manual testing and debugging processes have become
challenging due to the growing complexity of software systems
These models analyze data patterns to classify software components into
defective and non-defective categories. OBJECTIVES SDP is a specialized domain in software engineering focused on identifying defects in software systems before production.
The primary goal is to detect code sections more likely to contain
Provides insights for selecting the best suitable embedding
techniques for SDP tasks. Helps create more robust, stable, and high-performing software applications. SOFTWARE DEFECT PREDICTION Software Defect Prediction is essential for the quality of the software, reliability, and efficiency of software systems. It allows identifying early defects in a software system, which is becoming increasingly important due to the increasing complexity of Figure 1. Flowchart of SDP
software applications and demand on
speed for delivery. DATASET USED METHODOLOGY The methodology involves the use of a large-scale dataset :
PyTraceBugs Dataset
Unique Code Study
Plagiarism Detection in Documents
Duplication and Near-Duplication
Detection
The following figures illustrates the
Figure 2. Flowchart for process followed during this study. predicting bug PyTraceBugs Dataset
Collected the PyTraceBugs dataset at a large-scale, which
consists of over 24,000 buggy snippets and 5.7 million repaired ones.
Fine-tune Python-specific BERT embeddings while
classifying buggy and correct code in a binary defect prediction setup.
Evaluate model performance based on precision, recall,
and F1-score metrics. Unique Code Study Analyzed Syntactic Redundancy using token-based measures and Hamming distance.
Analyzing around 420 million lines of code belonging to
about 6,000 software projects.
Searched through statistical redundancy metrics to
explore the patterns and similarities. Plagiarism Detection in Reviewed systems such as Turnitin, SafeAssign, and EVE. Documents String matching, tokenization, and heuristic-based comparisons against proprietary databases have been used.
Plagiarism is further detected through similarity scores
and percentage matches. Duplication and Near-Duplication Detection Used a watermark-based system to identify plagiarized code in programming assignments.
Classified students into copier or supplier categories using
binary processing.
Benchmarked against systems like MOSS to highlight
advantages of direct plagiarism detection. RESULT The following pie chart and graph shows the Distribution of publications by year and Publications by different sources. CONCLUSION Machine learning and natural language processing are driving significant role in software engineering processes.
A breakthrough in defect prediction due to its large-size, diverse
corpora that address limitations of earlier datasets.
Innovations in plagiarism detection and text summarization between
NLP and software engineering.
Research on detecting redundancy and ensuring uniqueness improves
software quality and maintainability. FUTURE WORK Enhanced Defect Detection
Develop models that better understand code semantics for enhanced
accuracy in plagiarism detection and defect prediction.
Design computationally efficient algorithms and cloud-based
solutions for large-scale data processing in defect prediction and redundancy detection.
Advanced techniques, diverse datasets can significantly improve SDP