Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
6 views

Software Defect Prediction

Uploaded by

sahilverma20652
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Software Defect Prediction

Uploaded by

sahilverma20652
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Software Defect Prediction

Submitted To : Submitted By :

Priya Singh Sahil Verma


Department of Software Department of Software
Engineering, Delhi Technological Engineering, Delhi Technological
University, University,
Delhi, India Delhi, India
priya.singh.academia@gmail.com Sahilverma.1802@gmail.com
INTRODUCTION
 Software Defect Prediction (SDP) is a critical research area in software
engineering focused on identifying software defects before deployment.

 The primary objectives of SDP include improving software quality,


enhancing reliability, and reducing maintenance costs by early detection
of fault components.

 Traditional manual testing and debugging processes have become


challenging due to the growing complexity of software systems

 These models analyze data patterns to classify software components into


defective and non-defective categories.
OBJECTIVES
 SDP is a specialized domain in software engineering focused on
identifying defects in software systems before production.

 The primary goal is to detect code sections more likely to contain


errors.

 Utilizes advanced techniques including: Machine learning,


Statistical analysis.

 Enables more software development practices.

 Provides insights for selecting the best suitable embedding


techniques for SDP tasks.
 Helps create more robust, stable, and high-performing software
applications.
SOFTWARE DEFECT PREDICTION
Software Defect Prediction is essential
for the quality of the software,
reliability, and efficiency of software
systems. It allows identifying early
defects in a software system, which is
becoming increasingly important due
to the increasing complexity of Figure 1. Flowchart of SDP

software applications and demand on


speed for delivery.
DATASET USED
METHODOLOGY
The methodology involves the use
of a large-scale dataset :

 PyTraceBugs Dataset

 Unique Code Study

 Plagiarism Detection in
Documents

 Duplication and Near-Duplication


Detection

The following figures illustrates the


Figure 2. Flowchart for
process followed during this study.
predicting bug
PyTraceBugs Dataset

 Collected the PyTraceBugs dataset at a large-scale, which


consists of over 24,000 buggy snippets and 5.7 million
repaired ones.

 Fine-tune Python-specific BERT embeddings while


classifying buggy and correct code in a binary defect
prediction setup.

 Evaluate model performance based on precision, recall,


and F1-score metrics.
Unique Code Study
 Analyzed Syntactic Redundancy using token-based
measures and Hamming distance.

 Analyzing around 420 million lines of code belonging to


about 6,000 software projects.

 Searched through statistical redundancy metrics to


explore the patterns and similarities.
Plagiarism Detection in
 Reviewed systems such as Turnitin, SafeAssign, and EVE.
Documents
 String matching, tokenization, and heuristic-based
comparisons against proprietary databases have been
used.

 Plagiarism is further detected through similarity scores


and percentage matches.
Duplication and Near-Duplication
Detection
 Used a watermark-based system to identify plagiarized
code in programming assignments.

 Classified students into copier or supplier categories using


binary processing.

 Benchmarked against systems like MOSS to highlight


advantages of direct plagiarism detection.
RESULT
The following pie chart and graph
shows the Distribution of
publications by year and Publications
by different sources.
CONCLUSION
 Machine learning and natural language processing are driving
significant role in software engineering processes.

 A breakthrough in defect prediction due to its large-size, diverse


corpora that address limitations of earlier datasets.

 Innovations in plagiarism detection and text summarization between


NLP and software engineering.

 Research on detecting redundancy and ensuring uniqueness improves


software quality and maintainability.
FUTURE WORK
 Enhanced Defect Detection

 Develop models that better understand code semantics for enhanced


accuracy in plagiarism detection and defect prediction.

 Design computationally efficient algorithms and cloud-based


solutions for large-scale data processing in defect prediction and
redundancy detection.

 Advanced techniques, diverse datasets can significantly improve SDP


models.
THANK-YOU

You might also like