This document discusses using data mining techniques to improve software maintenance and reliability. It provides an overview of applying techniques like classification, association rule mining, and clustering to mine software engineering data from code bases, change histories, and bug reports. Specifically, it describes mining frequent patterns and rules from source code and revision histories to detect bugs as deviations from these patterns. A methodology is presented that involves parsing source code to build an itemset database, applying frequent itemset mining to extract programming patterns and rules, and detecting violations of rules as potential bugs. Challenges and limitations of these approaches are also discussed.
Implementation of Secured Network Based Intrusion Detection System Using SVM ...
This document discusses the implementation of a secured network-based intrusion detection system using the support vector machine (SVM) algorithm. It begins with an abstract that outlines hardening different intrusion detection implementations and proposals. The paper then discusses using naive Bayes, a classification method for intrusion detection, to analyze transmitted data for malicious content and block transmissions from corrupted hosts. It also discusses using flow correlation information to improve classification accuracy while minimizing effects on network performance.
A self adaptive learning approach for optimum path evaluation of process for ...
This academic article summarizes a research paper that proposes a new process analysis method using process mining to discover user behaviors and predict the next application a user will access based on audit logs. The method introduces building a weighted directed graph to represent relationships between different processes accessed in user sessions. It then uses this "reference graph" and random walk theory to evaluate future user sessions and determine which processes are most likely to be explored next. The paper outlines collecting log data from multiple systems, managing the data using classification algorithms, and applying the proposed random walk technique to detect user behaviors and predict the next application.
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODS
This document presents a static malware detection system using data mining techniques. The system extracts raw features from Windows Portable Executable (PE) files including PE header information, DLLs, and API functions. It then selects important features using Information Gain and reduces dimensions using Principal Component Analysis. Three classifiers (SVM, J48, Naive Bayes) are trained on the transformed feature vectors to classify files as malicious or benign. When evaluated on a dataset of over 247,000 files, the system achieved a detection rate of 99.6%.
A method for detecting abnormal program behavior on embedded devicesRaja Ram
The document presents a method for detecting abnormal program behavior on embedded devices using a self-organizing map (SOM) approach. It extracts features from the processor's program counter and cycles per instruction, and uses these features to train an unsupervised SOM to classify program behavior. Testing on an ARM Cortex-M3 processor showed the method can identify unknown program behaviors not in the training set with over 98.4% accuracy.
Using Fuzzy Clustering and Software Metrics to Predict Faults in large Indust...IOSR Journals
This document describes a study that uses fuzzy clustering and software metrics to predict faults in large industrial software systems. The study uses fuzzy c-means clustering to group software components into faulty and fault-free clusters based on various software metrics. The study applies this method to the open-source JEdit software project, calculating metrics for 274 classes and identifying faults using repository data. The results show 88.49% accuracy in predicting faulty classes, demonstrating that fuzzy clustering can be an effective technique for fault prediction in large software systems.
A PROPOSED MODEL FOR DIMENSIONALITY REDUCTION TO IMPROVE THE CLASSIFICATION C...IJNSA Journal
Over the past few years, intrusion protection systems have drawn a mature research area in the field of computer networks. The problem of excessive features has a significant impact on
intrusion detection performance. The use of machine learning algorithms in many previous researches has been used to identify network traffic, harmful or normal. Therefore, to obtain the accuracy, we must reduce the dimensionality of the data used. A new model design based on a combination of feature selection and machine learning algorithms is proposed in this paper. This model depends on selected genes from every feature to increase the accuracy of intrusion detection systems. We selected from features content only ones which impact in attack detection. The performance has been evaluated based on a comparison of several known algorithms. The NSL-KDD dataset is used for examining classification. The proposed model outperformed the other learning approaches with accuracy 98.8 %.
Implementation of Secured Network Based Intrusion Detection System Using SVM ...IRJET Journal
This document discusses the implementation of a secured network-based intrusion detection system using the support vector machine (SVM) algorithm. It begins with an abstract that outlines hardening different intrusion detection implementations and proposals. The paper then discusses using naive Bayes, a classification method for intrusion detection, to analyze transmitted data for malicious content and block transmissions from corrupted hosts. It also discusses using flow correlation information to improve classification accuracy while minimizing effects on network performance.
A self adaptive learning approach for optimum path evaluation of process for ...Alexander Decker
This academic article summarizes a research paper that proposes a new process analysis method using process mining to discover user behaviors and predict the next application a user will access based on audit logs. The method introduces building a weighted directed graph to represent relationships between different processes accessed in user sessions. It then uses this "reference graph" and random walk theory to evaluate future user sessions and determine which processes are most likely to be explored next. The paper outlines collecting log data from multiple systems, managing the data using classification algorithms, and applying the proposed random walk technique to detect user behaviors and predict the next application.
A STATIC MALWARE DETECTION SYSTEM USING DATA MINING METHODSijaia
This document presents a static malware detection system using data mining techniques. The system extracts raw features from Windows Portable Executable (PE) files including PE header information, DLLs, and API functions. It then selects important features using Information Gain and reduces dimensions using Principal Component Analysis. Three classifiers (SVM, J48, Naive Bayes) are trained on the transformed feature vectors to classify files as malicious or benign. When evaluated on a dataset of over 247,000 files, the system achieved a detection rate of 99.6%.
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...IJNSA Journal
Malicious software is constantly being developed and improved, so detection and classification of malwareis an ever-evolving problem. Since traditional malware detection techniques fail to detect new/unknown malware, machine learning algorithms have been used to overcome this disadvantage. We present a Convolutional Neural Network (CNN) for malware type classification based on the API (Application Program Interface) calls. This research uses a database of 7107 instances of API call streams and 8 different malware types:Adware, Backdoor, Downloader, Dropper, Spyware, Trojan, Virus,Worm. We used a 1-Dimensional CNN by mapping API calls as categorical and term frequency-inverse document frequency (TF-IDF) vectors and compared the results to other classification techniques.The proposed 1-D CNN outperformed other classification techniques with 91% overall accuracy for both categorical and TF-IDF vectors.
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
Critical analysis of genetic algorithm based IDS and an approach for detecti...IOSR Journals
This document discusses and analyzes different approaches to intrusion detection systems (IDS), including genetic algorithm approaches, data mining approaches, and intrusion detection in mobile ad hoc networks (MANETs). It provides an overview of genetic algorithm based IDS and data mining techniques for IDS. It also proposes a hybrid approach that combines anomaly detection and signature-based IDS to detect both known and unknown attacks. The document concludes that a critical analysis of various IDS techniques was performed and a hybrid approach is promising for intrusion detection in MANETs.
Taint analysis is the trending approach of analysing software for security purposes. By using the taint analysis technique, tainted tags are added to the data entering from the sensitive sources into the applications, then the propagations of the tainted data are monitored carefully. Taint analysis can be done in two ways including static taint analysis where analysis is conducted without executing the program, and dynamic taint analysis where the tainted data is monitored during the program execution. This paper reviews the taint analysis technique, with a focus on dynamic taint analysis. In addition, some of the existing taint analysis tools and their application areas are reviewed. In the end, the paper summarises the defects associated with each of the tools and presents some of them.
IRJET- Two Factor Authentication using User Behavioural AnalyticsIRJET Journal
This document summarizes research on using user behavior analytics (UBA) for two-factor authentication. UBA uses machine learning algorithms to analyze user activity patterns and flag anomalies that could indicate compromised accounts. The researchers propose a framework to authenticate users based on features of their typing behavior analyzed using machine learning. Traditional security event monitoring (SIEM) takes a rules-based approach, while UBA is not rule-based and can detect anomalies over time using risk scoring. Comparing different machine learning algorithms can help determine the best for predicting authentication based on behavioral patterns.
A Study of Intrusion Detection System Methods in Computer NetworksEditor IJCATR
Intrusion detection system (IDS) is an application system monitoring the network for malicious or intrusive activity. In these
systems, malicious or intrusive activities intrusion can be detected by using information like port scanning and detecting unusual traffic,
and then they can be reported to the network. Since intrusion detection systems do not involve predefined detection power and intrusion
detection, they require being intelligent. In this case, systems have the capability of learning. They can analyze packages entering the
network, and detect normal and unusual users. The common intelligent methods are neural networks, fuzzy logic, data mining techniques,
and genetic algorithms. In this research, the purpose is to study various intelligent methods.
A zero-day (also known as 0-day) vulnerability is a computer-software vulnerability that is unknown to those who would be interested in mitigating the vulnerability.
IRJET- Proximity Detection Warning System using Ray CastingIRJET Journal
This document proposes a proximity detection warning system using ray casting and pathfinding algorithms. The system would detect obstacles in an environment beforehand using ray casting to calculate distances between objects. It would then find an optimal shortest path for a user to navigate safely using the A* pathfinding algorithm. The system architecture involves object detection with ray casting, data preprocessing of threats, continuously tracking objects, and calculating a safe route using A*. The goal is to allow autonomous objects like drones to traverse environments without collisions.
Formal method techniques provides a suitable platform for the software development in software systems.
Formal methods and formal verification is necessary to prove the correctness and improve performance of
software systems in various levels of design and implementation, too. Security Discussion is an important
issue in computer systems. Since the antivirus applications have very important role in computer systems
security, verifying these applications is very essential and necessary. In this paper, we present four new
approaches for antivirus system behavior and a behavioral model of protection services in the antivirus
system is proposed. We divided the behavioral model in to preventive behavior and control behavior and
then we formal these behaviors. Finally by using some definitions we explain the way these behaviors are
mapped on each other by using our new approaches.
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...IRJET Journal
This document proposes an intrusion detection framework that uses multiple binary classifiers optimized by a genetic algorithm. It analyzes decision trees, naive Bayes, and support vector machines to classify network connections as normal or attacks based on the NSL-KDD dataset. The classifiers are aggregated and a genetic algorithm is used to generate high-quality solutions. Experimental results show that the proposed method achieves 99% accuracy in intrusion detection, outperforming single classification techniques. The goal is to develop an application that can efficiently process network data and identify intrusion risks.
Cyber security is a Major concern in the world. As a result of frequent and consistent daily cyber attack, this journal was written to enlighten viewers and readers on zero day attack prediction
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Comparative Performance Analysis of Machine Learning Techniques for Software ...csandit
Machine learning techniques can be used to analyse data from different perspectives and enable
developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
A Review on Software Mining: Current Trends and MethodologiesIJERA Editor
With the evolution of Software Mining, it has enabled to play a crucial role in the day to day activities .By empowering the software with the data mining methods it aids to all the software developers and the managers at all the managerial levels to use it as a tool so that the relative software engineering data (in the form of code, design documents, bug reports) to visualize the project‟s status of evolution and progress. To add on, further all the mining methods and algorithms help to device the models to develop any fault prone real time system for the real world more prior to the testing phase or the evolution phase. Also , in this review paper it will highlight the different methodologies for software mining along with its extension as a tool for fault tolerance and as well as a bibliography with the special prominence on mining the software engineering information.
This document summarizes a research paper that examines the use of data mining techniques to predict software aging-related bugs from imbalanced datasets. The paper compares the performance of general data mining techniques versus techniques developed for imbalanced datasets on a real-world dataset of aging bugs found in MySQL software. The results show that techniques designed for imbalanced datasets, such as SMOTEbagging and MSMOTEboosting, performed better than general techniques at correctly predicting the minority class of data points related to aging bugs. The paper concludes that imbalanced dataset techniques are more useful for predicting rare aging bugs from imbalanced software bug datasets.
1. Emergence of Software EngineeringIn the software industry, we.docxjackiewalcutt
1. Emergence of Software Engineering
In the software industry, we have seen the complexity of computer-based systems increase dramatically over the past decades along with advances in technology. This new technology has increased the demand for computer-based systems to control many infrastructures with software. As a result, designing and building cost-effective, reliable, and high-quality software has become the focus of software engineering in the computer industry.
In the past the processes used for designing and developing software were very informal, which contributed to the rise in development and maintenance costs. The results of ad hoc development processes contributed to a higher percentage of unreliable and lesser quality products entering the marketplace. Many accidents resulted from failures in computer-based systems with hardware devices that were controlled with software. At the time, the industry was considered to be in a crisis state, which then led to the emergence of new practices and methods in software engineering.
Technological advances have had a big impact on the complexity level required in software systems. The emergence of new communication protocols, hardware devices, and graphical user interface components have placed a greater demand on software engineers to design quality, reliable, and safe software.
A Brief History of Software Engineering
In the 1950s and the early 1960s, the various engineering disciplines were beginning to analyze how aspects of the engineering field could be applied to methods used in developing software products. As computing power evolved over the decades, the demand increased along with the complexity of the problems that needed to be addressed in the design of software. The term software engineering was introduced in 1968 at the first international software engineering conference, held by the North Atlantic Treaty Organization (NATO) Science Committee (Mahoney 2004). Many practitioners believe this is the milestone that marked the emergence of the software engineering discipline.
Software was developed to control critical hardware devices in the mid- to late-1960s and early 1970s. During this time, cases emerged that involved operational errors and accidents resulting in the loss of human lives and damage to property. Defects in software were uncovered, which heightened public awareness to the need for better quality and reliability of software. The escalating cost of building quality and reliable software was on the rise in the computer industry and the demand for skilled programmers could not be met. The state of software development was viewed by practitioners as being in a "crisis" state and was commonly referred to as the software crisis.
In response to the software crisis, researchers and practitioners have been trying to develop a set of methodologies, processes, and tools as the "silver bullet" for building software. The combination of these methodologies, processes, and tools i ...
this pdf file includes software development life cycle, requirement analysis and specification, project management, design, coding, testing, maintenance and quality reuse and case tools.
This document discusses applying privacy preserving data mining techniques to code profiling data. Code profiling generates metrics about software attributes and performance. The author applies encryption to code profiling data from 140 Java codes to preserve privacy. K-means clustering and k-NN classification are performed on the actual and encrypted data, showing similar results while preserving privacy. Correlation analysis identifies weakly correlated attributes that are removed to improve clustering accuracy, though this decreases classifier accuracy. The paper concludes privacy preserving data mining of code profiling data is an emerging area that could benefit from additional encryption and classification techniques.
The document discusses using data mining techniques to analyze crime data and predict crime trends. It describes collecting crime reports from various sources to create a database. Machine learning algorithms would then be applied to the crime data to discover patterns and relationships between different crimes. This analysis could help police identify crime hotspots and determine if a crime was committed in a known location. The proposed system aims to forecast crimes and trends based on past crime data, date and location to help prevent crimes. It discusses implementing the system using Python and testing it with sample input data.
The document discusses various topics related to software engineering including:
1) How early days of software development have affected modern practices.
2) Definitions of software engineering from different sources.
3) The stages of software design including problem analysis, solution identification, and abstraction description.
4) Object-oriented design principles like information hiding, independent objects, and service-based communication.
INTERNAL Assign no 207( JAIPUR NATIONAL UNI)Partha_bappa
This document discusses various topics related to management information systems and decision making. It addresses:
1. The differences between internal and external information used for managerial decision making, and factors analyzed for internal strengths/weaknesses and external opportunities/threats.
2. The support functions provided by decision support systems, including aiding less structured problems, combining models/analytics with data access, and emphasizing ease of use.
3. Applications of artificial intelligence systems such as computer vision, machine learning, neural networks and natural language processing.
4. The acid test ratio for evaluating a firm's ability to meet short-term liabilities.
5. The significance of enterprise resource planning (ERP) systems
The document provides an overview of a college website management system. It discusses the purpose and scope of the system, which is to automate college operations and provide services to members. It outlines the key functionality including online membership, tracking admissions and activities. The objectives are to make information retrieval and maintenance easy while adopting security measures. The proposed system would use ASP.NET for the front-end and be suitable for any education institute.
The document provides definitions and explanations of key software engineering concepts. It summarizes stakeholders as anyone who directly or indirectly benefits from a system. Prototyping draws criticism for prioritizing quick prototypes over quality. Incremental development delivers software in pieces that build on prior deliveries, while evolutionary development iteratively produces more complete versions. Formal methods are not widely used due to extended timelines, complex mathematics, and incompatibility with other tools. Risk analysis identifies possible losses in development. Information systems link to business objectives by improving processes and maintaining competitive advantages. Process improvement involves measurement, analysis, change identification. Requirements elicitation uses techniques like interviews and prototyping. Architecture design represents effectiveness and reduces risks. Modular design improves
This document provides an overview of key concepts in software engineering. It discusses software engineering principles like the software development process, software testing approaches, and software quality models. Specifically, it defines software engineering, describes different software categories and challenges, explains fundamental software process activities, and outlines software testing techniques like black box and white box testing. It also covers topics like the Capability Maturity Model, evolutionary process models, prototyping, requirements engineering, and software design principles.
David vernon software_engineering_notesmitthudwivedi
This document provides an overview of the Software Engineering 2 course, including its aims, objectives, course contents, and recommended textbooks. The course aims to provide knowledge of techniques for estimating, designing, building, and ensuring quality in software projects. The objectives cover understanding software metrics, estimating project costs and schedules, quality assurance attributes and standards, and software analysis and design techniques. The course content includes topics like software metrics, estimation models, quality assurance, and object-oriented analysis and design. The document also summarizes several software engineering process models and risk management approaches.
This document discusses cloning an organization to allow testing and manipulation without affecting the original site. It defines cloning as creating an exact copy that can be used for tasks without risk to the original. Types of clones include the frontend design, backend design, and database. Benefits of cloning for software testing are that it is cost-effective, improves security and product quality, and increases customer satisfaction. The document then discusses various software testing types, reverse engineering, and software development life cycles like waterfall, RAD, spiral, V-model, incremental, agile, iterative, big bang and prototype models. The conclusion is that cloning can help test and learn new features without interrupting the original organization's data and business.
Application of Data Mining Techniques for Improving Continuous IntegrationDr. Amarjeet Singh
This document discusses how data mining techniques can be applied to data from continuous integration (CI) processes to extract useful knowledge and solve problems. CI involves frequent integration of code changes to detect errors early. It generates vast amounts of log and monitoring data that is often ignored. The document proposes applying clustering, classification, text mining and other techniques to this CI data to group similar logs, predict failures, find frequent issues, and understand team productivity and code changes. This could help with maintenance and further development by gaining insights from the abundant CI data typically not utilized.
CS266 Software Reverse Engineering (SRE)
Introduction to Software Reverse Engineering
Teodoro (Ted) Cipresso, teodoro.cipresso@sjsu.edu
Department of Computer Science
San José State University
Spring 2015
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
Machine learning techniques can be used to analyse data from different perspectives and enable developers to retrieve useful information. Machine learning techniques are proven to be useful
in terms of software bug prediction. In this paper, a comparative performance analysis of
different machine learning techniques is explored for software bug prediction on public
available data sets. Results showed most of the machine learning methods performed well on
software bug datasets.
1) The document discusses various ways that artificial intelligence can be applied to different phases of the software engineering lifecycle, including requirements specification, design, coding, testing, and estimation.
2) It provides examples of using techniques like natural language processing to clarify requirements, knowledge graphs to manage requirements information, and computational intelligence for requirements prioritization.
3) For design, the document discusses using intelligent agents to recommend patterns and designs to satisfy quality attributes from requirements and assist with assigning responsibilities to components.
Parameter Estimation of GOEL-OKUMOTO Model by Comparing ACO with MLE MethodIRJET Journal
The document presents a comparison of the Ant Colony Optimization (ACO) method and Maximum Likelihood Estimation (MLE) method for parameter estimation of the Goel-Okumoto software reliability growth model. It describes using the ACO and MLE methods to estimate unknown parameters of the Goel-Okumoto model based on ungrouped time domain failure data. The key parameters estimated are a, which represents the expected total number of failures, and b, which represents the failure detection rate. The document aims to determine which of these two parameter estimation methods can best identify failures at early stages of software reliability monitoring.
Different Methodologies For Testing Web Application TestingRachel Davis
The document discusses different methodologies for testing web applications, including functionality testing, performance testing, usability testing, compatibility testing, unit testing, load testing, stress testing, and security testing. It provides details on each type of testing, including definitions and the pros and cons of functionality testing specifically. The key methodologies covered are functionality testing, which validates outputs against expected outputs; performance testing, which evaluates a system under pressure; and usability testing, which tests the user-friendliness of an application.
Similar to Improvement of Software Maintenance and Reliability using Data Mining Techniques (20)
A review on data mining techniques for Digital Mammographic Analysisijdmtaiir
This document discusses techniques for analyzing digital mammograms using data mining. It first reviews various medical image processing techniques that have been used for mammographic analysis, including feature extraction methods and statistical analysis of features. It then discusses various data mining techniques that have been applied for mammographic diagnosis, including neural networks, Bayesian networks, naive Bayes classifiers, genetic algorithms, and associative classification. Finally, it provides examples of recent studies applying these techniques to tasks like classifying mammograms as benign or malignant.
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
Face recognition is used in wide range of application.
In recent years, face recognition has become one of the most
successful applications in image analysis and understanding.
Different statistical method and research groups reported a
contradictory result when comparing principal component
analysis (PCA) algorithm, independent component analysis
(ICA) algorithm, and linear discriminant analysis (LDA)
algorithm that has been proposed in recent years. The goal of
this paper is to compare and analyze the three algorithms and
conclude which is best. Feret Dataset is used for consistency
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
-This paper describes three different fundamental
mathematical programming approaches that are relevant to
data mining. They are: Feature Selection, Clustering and
Robust Representation. This paper comprises of two clustering
algorithms such as K-mean algorithm and K-median
algorithms. Clustering is illustrated by the unsupervised
learning of patterns and clusters that may exist in a given
databases and useful tool for Knowledge Discovery in
Database (KDD). The results of k-median algorithm are used
to collecting the blood cancer patient from a medical database.
K-mean clustering is a data mining/machine learning algorithm
used to cluster observations into groups of related observations
without any prior knowledge of those relationships. The kmean algorithm is one of the simplest clustering techniques
and it is commonly used in medical imaging, biometrics and
related fields.
Analysis of Classification Algorithm in Data Miningijdmtaiir
Data Mining is the extraction of hidden predictive
information from large database. Classification is the process
of finding a model that describes and distinguishes data classes
or concept. This paper performs the study of prediction of class
label using C4.5 and Naïve Bayesian algorithm.C4.5 generates
classifiers expressed as decision trees from a fixed set of
examples. The resulting tree is used to classify future samples
.The leaf nodes of the decision tree contain the class name
whereas a non-leaf node is a decision node. The decision node
is an attribute test with each branch (to another decision tree)
being a possible value of the attribute. C4.5 uses information
gain to help it decide which attribute goes into a decision node.
A Naïve Bayesian classifier is a simple probabilistic classifier
based on applying Baye’s theorem with strong (naive)
independence assumptions. Naive Bayesian classifier assumes
that the effect of an attribute value on a given class is
independent of the values of the other attribute. This
assumption is called class conditional independence. The
results indicate that Predicting of class label using Naïve
Bayesian classifier is very effective and simple compared to
C4.5 classifier
Performance Analysis of Selected Classifiers in User Profilingijdmtaiir
User profiles can serve as indicators of personal
preferences which can be effectively used while providing
personalized services. Building user files which can capture
accurate information of individuals has been a daunting task.
Several attempts have been made by researchers to extract
information from different data sources to build user profiles
on different application domains. Towards this end, in this
paper we employ different classification algorithmsto create
accurate user profiles based on information gathered from
demographic data. The aim of this work is to analyze the
performance of five most effective classification methods,
namely Bayesian Network(BN), Naïve Bayesian(NB), Naives
Bayes Updateable(NBU), J48, and Decision Table(DT). Our
simulation results show that, in general, the J48has the highest
classification accuracy performance with the lowest error rate.
On the other hand, it is found that Naïve Bayesian and Naives
Bayes Updateable classifiers have the lowest time requirement
to build the classification model
Analysis of Sales and Distribution of an IT Industry Using Data Mining Techni...ijdmtaiir
The goal of this work is to allow a corporation to
improve its marketing, sales, and customer support operations
through a better understanding of its customers. Keep in mind,
however, that the data mining techniques and tools described
here are equally applicable in fields ranging from law
enforcement to radio astronomy, medicine, and industrial
process control. Businesses in today’s environment
increasingly focus on gaining competitive advantages.
Organizations have recognized that the effective use of data is
the key element in the next generation is to predict the sales
value and emerging trend of technology market. Data is
becoming an important resource for the companies to analyze
existing sales value with current technology trends and this
will be more useful for the companies to identify future sales
value. There a variety of data analysis and modeling techniques
to discover patterns and relationships in data that are used to
understand what your customers want and predict what they
will do. The main focus of this is to help companies to select
the right prospects on whom to focus, offer the right additional
products to company’s existing customers and identify good
customers who may be about to leave. This results in improved
revenue because of a greatly improved ability to respond to
each individual contact in the best way and reduced costs due
to properly allocated resources. Keywords: sales, customer,
technology, profit.
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...ijdmtaiir
Educational mining used to evaluate the leaner's
performance and the learning environment. The learning
process are involved and influenced by different
components. The memory is playing vital role in the
process of learning. The long term, short term, working,
instant, responsive, process, recollect, reference,
instruction and action memory are involved in the
process of learning. The influencing factors on these
memories are identified through the construction analysis
of Neural Network Back Propagation Algorithm. The
observed set of data represented using cubical dataset
format for the mining approach. The mining process is
carried out using neural network based back propagation
network model to decide the influencing cognitive load
for the different learning challenges. The learners’
difficulties are identified through the experimental
results.
An Analysis of Data Mining Applications for Fraud Detection in Securities Marketijdmtaiir
This document summarizes a research paper on using data mining techniques to detect fraud in securities markets. It discusses the challenges of developing such applications, including dealing with massive datasets and ensuring accuracy and privacy. It also reviews common data mining tasks like classification, clustering, and association rule learning that are applicable for fraud detection. Finally, it discusses different types of databases like relational, temporal, sequence and spatial databases that are relevant for data mining of securities market data.
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...ijdmtaiir
The health care industry contains large amount of
health care data with hidden information. This information is
useful for making effective decision. For getting appropriate
result from the hidden information computer based data mining
techniques are used. Previously Neural Network (NN) is
widely used for predicting cardiac disease. In this paper, a
Cardiac Disease Prediction System (CDPS) is developed by
using data clustering. The CDPS system uses 15 parameters to
predict the disease, for example BP, Obesity, cholesterol, etc.
This 15 attributes like sex, age, weight are given as the input.
In this paper by using the patient’s medical record, an illdefined classification is used at the early stage of the patient to
diagnose the cardiac disease. Based on the result the patients
are advised to keep the sensor to predict them.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Music Promotes Gross National Happiness Using Neutrosophic fuzzyCognitive Map...ijdmtaiir
This paper provides an investigation to promote
gross national happiness from music using fuzzy logic model.
Music influences rate of learning. It has been the subject of
study for many years. Researchers have confirmed that loud
background noise impedes learning, concentration, and
information acquisition. An interesting phenomenon that
occurs frequently in listening new music among the students
creates sense of anxiety even without having
properunderstandingofmusic.Happiness is the emotion that
expresses various degrees of positive and negativefeelings
ranging from satisfaction to extreme joy. The happiness is
thegoal most people strive to achieve. Happypeople are
satisfied with their lives. The goal of this work is to find the
particular component of music which will ultimately promote
the happiness of people because of indeterminacy situation in
components of music
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
The DEMATEL method is then a good technique for
making decisions. In this paper we analyzed the risk factors of
youth violence and what makes them more aggressive. Since
there are more risk factors of youth violence, to relate each
other more complex to construct FCM and analyze them.
Moreover the data is an unsupervised one obtained from
survey as well as interviews. Hence fuzzy alone has the
capacity to analyses these concepts.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Analyzing the Role of a Family in Constructing Gender Roles Using Combined Ov...ijdmtaiir
Family, as a social institution and as the
fundamental unit of the society, plays a vital role in forming
persons. We become full-fledged members of the society
through the process of socialization which starts in the family.
It provides the first foundational formation of personhood.
Especially in traditional societies like India the role of the
family assumes even greater importance. We learn to
differentiate and to discriminate between man and woman from
the role our parents play. In this paper we analyze the role of
the family in constructing gender roles using Combined
Overlap Block Fuzzy Cognitive Maps
An Interval Based Fuzzy Multiple Expert System to Analyze the Impacts of Clim...ijdmtaiir
Indian agriculture is completely dependent on the
environment and any undesirable change in the environment
has an adverse impact on agriculture. Climate change and
pollution in India have caused great damage to the
environment. In this paper we analyse the impact of climate
change on Indian agriculture. The first section gives an
introduction to the problem. In section two we introduce a new
fuzzy tool called interval based fuzzy multiple expert system.
Section three adapt the new fuzzy tool to analyse the problem
of agricultural impacts of climate change and in section four
we give the results and suggestions based on our analysis
An Approach for the Detection of Vascular Abnormalities in Diabetic Retinopathyijdmtaiir
Diabetic Retinopathy is a common complication of
diabetes that is caused by changes in the blood vessels of the
retina. The blood vessels in the retina get altered. Exudates are
secreted, micro-aneurysms and hemorrhages occur in the
retina. The appearance of these features represents the degree
of severity of the disease. In this paper the proposed approach
detects the presence of abnormalities in the retina using image
processing techniques by applying morphological processing
techniques to the fundus images to extract features such as
blood vessels, micro aneurysms and exudates. These features
are used for the detection of severity of Diabetic Retinopathy.
It can quickly process a large number of fundus images
obtained from mass screening to help reduce the cost, increase
productivity and efficiency for ophthalmologists.
Improve the Performance of Clustering Using Combination of Multiple Clusterin...ijdmtaiir
The ever-increasing availability of textual
documents has lead to a growing challenge for information
systems to effectively manage and retrieve the information
comprised in large collections of texts according to the user’s
information needs. There is no clustering method that can
adequately handle all sorts of cluster structures and properties
(e.g. shape, size, overlapping, and density). Combining
multiple clustering methods is an approach to overcome the
deficiency of single algorithms and further enhance their
performances. A disadvantage of the cluster ensemble is the
highly computational load of combing the clustering results
especially for large and high dimensional datasets. In this paper
we propose a multiclustering algorithm , it is a combination of
Cooperative Hard-Fuzzy Clustering model based on
intermediate cooperation between the hard k-means (KM) and
fuzzy c-means (FCM) to produce better intermediate clusters
and ant colony algorithm. This proposed method gives better
result than individual clusters.
The Study of Symptoms of Tuberculosis Using Induced Fuzzy Coginitive Maps (IF...ijdmtaiir
Tuberculosis or TB is a common, infectious disease
caused by various strains of mycobacterium usually called as
mycobacterium Tuberculosis. Tb attacks the lungs but can also
affect other parts of the body. Most infections are
asymptomatic and latent, but about one in ten latent infections
eventually progresses to active disease which, if left untreated,
kills more than 50% of those so infected. Hence, this paper
analysisthes y m p t o m s o f Tuberculosis using Induced
Fuzzy Cognitive Maps (IFCMs). IFCMs area fuzzy-graph
modeling approach based on expert’s opinion. This is the nonstatistical approach to study the problems with imprecise
information.
A Study on Finding the Key Motive of Happiness Using Fuzzy Cognitive Maps (FCMs)ijdmtaiir
Happiness is subjective. It is difficult to compare one
person’s happiness with another. It can be especially difficult
to compare happiness across cultures. The function of man is
to live a certain kind of life and this activity implies a rational
principle. The function of a good man is the good and noble
performance. If any action is well performed it is performed in
accord with the appropriate excellence. If this is the case, then
happiness turns out to be an activity of the soul in accordance
with virtue. That is happiness as the exercise of virtue. Every
human being thinks happiness in his own perspective.
Everyone wants to be happy and searching happiness in all
their activities. Happiness is typically measured using
subjective measures. Happiness cannot be defined in terms of
rigid boundaries and it is a vague term and it is appropriate to
use Fuzzy Logic. In particular, we use Fuzzy Cognitive
Mapping to find the key motive of happiness. This paper
consists of four sections. The first section is of inductive
nature about happiness. The section two introduces the
concept of Fuzzy Cognitive Mapping (FCMs). In section three
Fuzzy Cognitive Mapping applied on the concept of
Happiness. Section four gives the conclusions and
suggestions
Study of sustainable development using Fuzzy Cognitive Relational Maps (FCM)ijdmtaiir
Sustainable development provides a framework under
which communities can use resources efficiently, create efficient
infrastructures, protect and enhance quality of life, and create new
businesses to strengthen their economies. It can help us create
healthy communities that can sustain our generation, as well as
those that follow ours. Sustainable development is not a new
concept. Rather, it is the latest expression of a long-standing ethic
involving peoples' relationships with the environment and the
current generation's responsibilities to future generations. For a
community to be truly sustainable, it must adopt a three-pronged
approach that considers economic, environmental and cultural
resources. Communities must consider these needs in the short
term as well as the long term. Sustainable Development also can
be defined simply as a better quality of life for everyone, now and
for generations to come. It is a vision of progress that links
economic development, protection of the environment and social
justice, and its values are recognised by democratic governments
and political movements the world over. Sustainable
Development is therefore closely linked to Governance, Better
Regulation and Impact Assessment. Indicators to measure
progress are also vital.This paper has four sections. In the first
section we introduce the notion of fuzzy cognitive maps and
Combined Fuzzy Cognitive Maps (CFCMs). In section two we
describe the problem and justification for the use of FCMs. In
section three we give the adaptation of FCM to the problem. In
the final section we give conclusions based on our analysis of the
problem using FCM
OCS Training Institute is pleased to co-operate with
a Global provider of Rig Inspection/Audits,
Commission-ing, Compliance & Acceptance as well as
& Engineering for Offshore Drilling Rigs, to deliver
Drilling Rig Inspec-tion Workshops (RIW) which
teaches the inspection & maintenance procedures
required to ensure equipment integrity. Candidates
learn to implement the relevant standards &
understand industry requirements so that they can
verify the condition of a rig’s equipment & improve
safety, thus reducing the number of accidents and
protecting the asset.
A brief introduction to quadcopter (drone) working. It provides an overview of flight stability, dynamics, general control system block diagram, and the electronic hardware.
Understanding Cybersecurity Breaches: Causes, Consequences, and PreventionBert Blevins
Cybersecurity breaches are a growing threat in today’s interconnected digital landscape, affecting individuals, businesses, and governments alike. These breaches compromise sensitive information and erode trust in online services and systems. Understanding the causes, consequences, and prevention strategies of cybersecurity breaches is crucial to protect against these pervasive risks.
Cybersecurity breaches refer to unauthorized access, manipulation, or destruction of digital information or systems. They can occur through various means such as malware, phishing attacks, insider threats, and vulnerabilities in software or hardware. Once a breach happens, cybercriminals can exploit the compromised data for financial gain, espionage, or sabotage. Causes of breaches include software and hardware vulnerabilities, phishing attacks, insider threats, weak passwords, and a lack of security awareness.
The consequences of cybersecurity breaches are severe. Financial loss is a significant impact, as organizations face theft of funds, legal fees, and repair costs. Breaches also damage reputations, leading to a loss of trust among customers, partners, and stakeholders. Regulatory penalties are another consequence, with hefty fines imposed for non-compliance with data protection regulations. Intellectual property theft undermines innovation and competitiveness, while disruptions of critical services like healthcare and utilities impact public safety and well-being.
Social media management system project report.pdfKamal Acharya
The project "Social Media Platform in Object-Oriented Modeling" aims to design
and model a robust and scalable social media platform using object-oriented
modeling principles. In the age of digital communication, social media platforms
have become indispensable for connecting people, sharing content, and fostering
online communities. However, their complex nature requires meticulous planning
and organization.This project addresses the challenge of creating a feature-rich and
user-friendly social media platform by applying key object-oriented modeling
concepts. It entails the identification and definition of essential objects such as
"User," "Post," "Comment," and "Notification," each encapsulating specific
attributes and behaviors. Relationships between these objects, such as friendships,
content interactions, and notifications, are meticulously established.The project
emphasizes encapsulation to maintain data integrity, inheritance for shared behaviors
among objects, and polymorphism for flexible content handling. Use case diagrams
depict user interactions, while sequence diagrams showcase the flow of interactions
during critical scenarios. Class diagrams provide an overarching view of the system's
architecture, including classes, attributes, and methods .By undertaking this project,
we aim to create a modular, maintainable, and user-centric social media platform that
adheres to best practices in object-oriented modeling. Such a platform will offer users
a seamless and secure online social experience while facilitating future enhancements
and adaptability to changing user needs.
Literature Reivew of Student Center DesignPriyankaKarn3
It was back in 2020, during the COVID-19 lockdown Period when we were introduced to an Online learning system and had to carry out our Design studio work. The students of the Institute of Engineering, Purwanchal Campus, Dharan did the literature study and research. The team was of Prakash Roka Magar, Priyanka Karn (me), Riwaz Upreti, Sandip Seth, and Ujjwal Dev from the Department of Architecture. It was just a scratch draft made out of the initial phase of study just after the topic was introduced. It was one of the best teams I had worked with, shared lots of memories, and learned a lot.
Encontro anual da comunidade Splunk, onde discutimos todas as novidades apresentadas na conferência anual da Spunk, a .conf24 realizada em junho deste ano em Las Vegas.
Neste vídeo, trago os pontos chave do encontro, como:
- AI Assistant para uso junto com a SPL
- SPL2 para uso em Data Pipelines
- Ingest Processor
- Enterprise Security 8.0 (Maior atualização deste seu release)
- Federated Analytics
- Integração com Cisco XDR e Cisto Talos
- E muito mais.
Deixo ainda, alguns links com relatórios e conteúdo interessantes que podem ajudar no esclarecimento dos produtos e funções.
https://www.splunk.com/en_us/campaigns/the-hidden-costs-of-downtime.html
https://www.splunk.com/en_us/pdfs/gated/ebooks/building-a-leading-observability-practice.pdf
https://www.splunk.com/en_us/pdfs/gated/ebooks/building-a-modern-security-program.pdf
Nosso grupo oficial da Splunk:
https://usergroups.splunk.com/sao-paulo-splunk-user-group/
Exploring Deep Learning Models for Image Recognition: A Comparative Reviewsipij
Image recognition, which comes under Artificial Intelligence (AI) is a critical aspect of computer vision,
enabling computers or other computing devices to identify and categorize objects within images. Among
numerous fields of life, food processing is an important area, in which image processing plays a vital role,
both for producers and consumers. This study focuses on the binary classification of strawberries, where
images are sorted into one of two categories. We Utilized a dataset of strawberry images for this study; we
aim to determine the effectiveness of different models in identifying whether an image contains
strawberries. This research has practical applications in fields such as agriculture and quality control. We
compared various popular deep learning models, including MobileNetV2, Convolutional Neural Networks
(CNN), and DenseNet121, for binary classification of strawberry images. The accuracy achieved by
MobileNetV2 is 96.7%, CNN is 99.8%, and DenseNet121 is 93.6%. Through rigorous testing and analysis,
our results demonstrate that CNN outperforms the other models in this task. In the future, the deep
learning models can be evaluated on a richer and larger number of images (datasets) for better/improved
results.
In May 2024, globally renowned natural diamond crafting company Shree Ramkrishna Exports Pvt. Ltd. (SRK) became the first company in the world to achieve GNFZ’s final net zero certification for existing buildings, for its two two flagship crafting facilities SRK House and SRK Empire. Initially targeting 2030 to reach net zero, SRK joined forces with the Global Network for Zero (GNFZ) to accelerate its target to 2024 — a trailblazing achievement toward emissions elimination.
Improvement of Software Maintenance and Reliability using Data Mining Techniques
1. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 02 December 2012 Page No.36-39
ISSN: 2278-2419
36
Improvement of Software Maintenance and
Reliability using Data Mining Techniques
Yethiraj N G
Assistant Professor, Department of Computer Science
Maharani’s Science College for Women, Bangalore, India
.
Abstract-Software is ubiquitous in our daily life. It brings us
great convenience and a big headache about software reliability
as well: Software is never bug-free, and software bugs keep
incurring monetary loss of even catastrophes. In the pursuit of
better reliability, software engineering researchers found that
huge amount of data in various forms can be collected from
software systems, and these data, when properly analyzed, can
help improve software reliability. Unfortunately, the huge
volume of complex data renders the analysis of simple
techniques incompetent; consequently, studies have been
resorting to data mining for more effective analysis. In the past
few years, we have witnessed many studies on mining for
software reliability reported in data mining as well as software
engineering forums. These studies either develop new or apply
existing data mining techniques to tackle reliability problems
from different angles. In order to keep data mining researchers
abreast of the latest development in this growing research area,
we propose this paper on data mining for software reliability.
In this paper, we will present a comprehensive overview of this
area, examine representative studies, and lay out challenges to
data mining researchers.
Key words- Software, Software Reliability, Data Mining,
Frequent Item Set, Extracting Rules.
I. INTRODUCTION
The economies of all developed nations are dependent on
software. More and More systems are software controlled.
Software Engineering is concerned with theories, methods and
tools for professional software development. Software
Engineering is an engineering discipline which is concerned
with all aspects of software production. Software Engineers
should adopt a systematic and organized approach to their
work and use appropriate tools and techniques depending on
the problem to be solved, the development constraints and the
resources available. Software reliability, unlike many other
quality factors, can be measured directed and estimated using
historical and developmental data [1]. Software reliability is
defined in statistical terms as “the probability of failure-free
operation of a computer program in a specified environment
for a specific time”. Measures of reliability- if we consider a
computer-based system, a simple measure of reliability is
mean-time-between-failure (MTBF),where MTBF = MTTF +
MTTR, the acronym MTTF and MTTR are mean-time-to-
failure and mean-time-to-repair respectively [2].Software
reliability specification- Reliability is a complex concept that
should always be considered at the system rather than the
individual component level. Because the components in a
system are interdependent, a failure in one component can be
propagated through the system and affect the operation of other
components. In a computer-based system, we have to consider
three dimensions when specifying the overall system
reliability:
Hardware reliability- What is the probability of a hardware
component failing and how long would it take to repair that
component? (ii) Software reliability- How likely is it that a
software component will produce an incorrect output?
Software failures are different from hardware failures in that
software does not wear out: It can continue operating correctly
after producing an incorrect result. (iii) Operator reliability –
How likely is it that the operator of a system will make an
error? [1].
Following are the basic terminologies that are frequently used
for reliability-Table-1
System
Failure
When the system does not perform as per the
user expectations, then system failure occurs.
System
Error
When the system gives the result in an
unexpected manner then the system error occurs.
System
Fault
It is probability of the system that the failure can
lead to system error.
Human
Error
It is human activity that makes the system fault to
occur.
Mining Software Engineering Data –The main goal is to
transform static record – keeping Software Engineering data to
active data so that the hidden patterns and trends could be
explored.Normally, a Software is “full of bugs”, In Windows
2000, containing35 million lines of code, there were 63,000
known bugs at the time of release, 2 per 1000 lines. Software
failure costs are becoming very high. A study by the National
Institute of Standards and Technology found that software
errors cost the U.S. economy about $59.5 billion annually. So
testing and debugging are laborious and expensive. “50% of
my company employees are testers, and the rest spends 50% of
their time testing!” —Bill Gates, in 1995. In general Software
is complex for e.g., MySQL has 1.2 millions of LOC and its
runtime data is larger and more complex. In fact, finding bugs
is challenging which requires
specifications/properties, which often don’t exist and also
substantial human efforts in analysing data are required [3].
Software Reliability Methods are:
Static Bug Detection - Without running the code, detect
bugs in code,
2. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 02 December 2012 Page No.36-39
ISSN: 2278-2419
37
Dynamic Bug Detection (aka. Testing) - Run the code with
some test inputs and detect failures/bugs and
Debugging - Given known test failures (symptoms),
pinpoint the bug locations in the code.
Mining for Soft Reliability is absolutely needed because,
i. Finding bugs is challenging.It requires
specifications/properties, which often don’t exist and
also require substantial human efforts in analyzing
data.
ii. We can mine common patterns as likely
specifications/properties Detect violations of patterns
as likely bugs.
iii. We can mine huge data for patterns or locations to
narrow down the scope of human inspection
II. TECHNIQUES
The Software engineering tasks helped by data mining are – (i)
programming,(ii)defect detection,(iii)testing,(iv)debugging
and(v)maintenance.Data mining techniques
are(i)Classification, (ii) Association, (iii) Patterns Detection,
(iv) Clustering [4].
Software engineering data
Considered are- (i) Code bases, (ii)
change history, (iii) program states,(iv)
structural entities and (v) bug reports [5].
III. ANALYSIS
Data Mining for Software Bug Detection needs frequent
pattern mining then automated Debugging in Software
Programs is carried out from frequent patterns to software bugs
and statistical debugging. Further, automated Debugging in
computer systems is carried out from (i) Automated diagnosis
of system misconfigurations and (ii) performance debugging
[6].
A. Software Bug Detection
Common approach: mining rules/patterns from source
code/revision histories and detecting bugs as rule/pattern
violations.
B. Mining rules from source code
i. Bugs as deviant behaviour [Engler et al., SOSP’01]
ii. Mining programming rules with PR-Miner [Li et al.,
FSE’05]
iii. Mining function precedence protocols [Ramanathan et
al., ICSE’07]
iv. Revealing neglected conditions [Chang et al.,
ISSTA’07]
C. Mining rules from revision histories
i. DynaMine [Livshits& Zimmermann, FSE’05]
D. Mining copy-paste patterns from source code
ii. CP-Miner [Li et al., OSDI’04] to find copy-paste bugs
[7].
Bugs as Deviant Behaviour
Static verification tools need rules to check against program
code.To find errors without knowing the truth
Contradiction in belief. To find lies: cross-examine
one witness or many witness. Any contradiction is an
error (internal consistency)
Deviation from common behaviour. To infer correct
behaviour: if 1 person does X, might be right or a
coincidence. If 1000s do X and 1 does Y, probably an
error (statistical analysis)
IV. A BRIEF METHODOLOGY: SOFTWARE BUG
DETECTION
Based on the discussion presented in the previous section, the
following steps for software bug detection are presented.
Step 1:Mining rules from source code [8]
Bugs as deviant 37ehaviour [Engler et al., SOSP’01]
Mining techniques: Statistical analysis
Mining programming rules with PR-Miner [Li et al.,
FSE’05]
Mining function precedence protocols [Ramanathan et
al., ICSE’07]
Revealing neglected conditions [Chang et al.,
ISSTA’07]
Step 2:Mining copy-paste patterns from source code
CP-Miner [Li et al., OSDI’04] to find copy-paste bugs
An Overview of Extracting Rules –
Observation: elements are usually used
together.Idea: finding association among elements that
arefrequently used togetherin source code Implies frequent
item set mining [9].Examples:spin_lock_irqsave and
spin_unlock_irqrestore
appear together within the same function more than 3600
times.
Step 3: Mining Programming Patterns and Generation of Rules
–
Parsing Source Code – Purpose: building an item set database.
Element: function call, variable, data type, etc. are mapped to a
number. The Source code is mapped to an item set database.A
frequent sub-item set corresponds to a programming pattern
and application of frequent item set mining algorithm on the
item set database.
E.g., {39, 68, 36, 92}:27 corresponds to pattern
{Scsi_Host, host_alloc, add_host, scan_host}
Tradeoff: consider order or not
Step 4: Generating Programming Rules Programming patterns
- programming rules
E.g., Patterns: {a, b, d} : 3,
{a} : 4
3. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 02 December 2012 Page No.36-39
ISSN: 2278-2419
38
Source files
Parsing & hashing
Pre-Processing
Itemsets
Mining
Programming patterns
Post-Processing
Generating rules
Programming rules
Fig.1 Flowchart of Extracting Rules
Rules:
{a} => {b,d} with confidence = ¾ =75%
{b} => {a,d} with confidence = 100%
{d} => {a,b} with confidence = 100%
{a,b} => {d} with confidence = 100%
{a,d} => {b} with confidence = 100%
{b,d} => {a} with confidence = 100%
Rule Explosion Problem
Exponential number of rules
Solution: closed mining
Example:
{a,b,d}:3, {a}:4
{a,b}:3, {a,d}:3, {b,d}:3 are not closed
Close rules
{a,b,d}:3 | {a}:4
Detection of Violations
For violations of a programming rule
(i) The rule holds for most cases
Confidence > threshold
(ii) The rule is violated for a few cases
Confidence < 100%
Example: Detecting Violations
Step 5:Programming patterns:
{Scsi_Host, host_alloc, add_host, scan_host}: 27
{Scsi_Host, host_alloc, add_host}: 29
Programming rule:
{Scsi_Host, host_alloc, add_host}=>
{scan_host}
with confidence 27/29 = 93%
Missing
Table 2: Some Results of Bug Detection
Software #C files LOC #functions
Linux 3,538 3,037,403 73,607
Postgre SQL 409 381,192 6,964
Apache 160 84,724 1,912
Table-3
Software Inspected (top 60)
Bugs Anomalies False Positives
Linux 16 20 24
Postgre SQL 6 9 45
Apache 1 0 6
V. LIMITATIONS OF PR-MINER
Rules across multiple functions
Not using inter-procedural analysis
False negatives of violations in control paths
Not using sophisticated analysis techniques
Inter-procedural, path-sensitive inference of function
precedence protocols to address the limitations
[Ramanathan et al., ICSE’07] [10].
We shall now discuss Mining Function Precedence Protocols
fp = fopen(…);
fclose(…);
a) Definition:-Precedence protocol:
A call tofcloseis always preceded by a call tofopen
b) Definition:-Successor protocol :
A call tofopenis always succeeded by a call tofclose
c) Violation of Precedence Protocols
fp = fopen(…);
if(fp == NULL)
exit(-1);
fclose(…);
d) Tool Implementation/Evaluation
CHRONICLER – tool implemented in C has the following
features:
Tested on open source C programs
Apache, linux, openssh, gimp,postgresql
Lines of code varies from 66K to 2M
Number of call-sites varies from 10K to 110K
e) Some Results of Precedence-Related Bug Detection
Case Study: Linux
Hardware Bug
Difficult to detect using traditional testing techniques
Platform dependent error
Transparently identified using CHRONICLER
4. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 02 December 2012 Page No.36-39
ISSN: 2278-2419
39
Performance Bug
Cache lookup operation was absent
Not easily specified as a bug for testing
Deviation delays data write flushes [11].
f) Limitation of Precedence-Related Bug Detection
Does not take data flow or data dependency into
account
A new approach to discovering neglected conditions
[Chang et al., ISSTA’07] addresses the issue
Based on dependence analysis, frequent item set, and
frequent sub graph mining
g) Crucial Observation
Things that are frequently changed together often form a
pattern...also known as co-changeCo-changed items = patterns
h) Finding Patterns
Find “frequent itemsets” (with Apriori)
o.enterAlignment()
o.exitAlignment()
o.redoAlignment()
iter.hasNext()
iter.next()
{enterAlignment (), exitAlignment(),
redoAlignment()}
i) Ranking Patterns
Support count = #occurrences of a pattern
Confidence count= Strength of a pattern, P (A|B)
j) Pattern classification
Post-process
v validations, e violations
Usage error unlikely
patterns patterns patterns
e<v/10 v/10<=e<=2v otherwise
Fig. 2
Results of Mining Patterns
Usage pattern – 15
Error Pattern- 8
Unlikely Pattern – 11
Not Hit – 24
Total – 56 Patterns
Mining into Computer SystemsHuge volume of data from
computer systems
Persistent state interactions, event logs, network logs, CPU
usage …
Mining system data for …
Reliability
Performance
Manageability …
VI. CONCLUSION
Challenges in data mining-Statistical modelling of computer
systemsOnline, scalability, interpretability …Data Mining for
Software Bug DetectionFrequent pattern mining.Automated
Debugging in Software Programs-From frequent patterns to
software bugs.Statistical debugging-Automated Debugging in
Computer Systems.Automated diagnosis of system
misconfigurations.Limitations of Bugs as Deviant Behaviour
Fixed rule templates.Need specific knowledge about the
software.2 elements.PR-Miner [Li et al., FSE’05] (mining
implicit programming rules) developed to address the
limitations.General method (No prior knowledge; No
templates).General rules (Different types: function, variable,
data type, etc.;Multiple elements)Ubiquitous computing
demands reliable software- Mining for software
reliability.Mining program source code/version histories to
find bugs.Mining program runtime data to locate why an
execution fails.Mining system snapshots to diagnose
misconfigurations and performance problems.An active and
rewarding research area.International Workshop on Mining
Software Repositories since 2004.SIGCOMM Workshop on
Mining Network Data since 2005.Systems and Machine
Learning Workshop since 2006.Workshop on Statistical
Learning Techniques for Solving Systems.Problems, co-
located with NIPS
REFERENCES
[1] Ian Sommerville, Software Engineering 8th
edition, Pearson Education
Publications, 2007.
[2] Roger S. Pressman, Software Engineering: A Practitioner’s Approach, 6th
edition McGraw-Hill International edition Publications, 2005.
[3] James S. Peters &WitoldPedrycz, Software Engineering an Engineering
Approach, Wiley Publications, 2000.
[4] Jiawei Han &MichelineKamber, Data Mining: Concepts and Techniques,
2nd
edition,, Elsevier Publications, March 2006.
[5] Chai Liu, Long Fei, Xifang Yan, Jiawei Han and Samuel Midkiff,
Statistical Debugging: A Hypothesis Testing-based approach, IEEETSE
2006.
[6] Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou and Benjamin
Chelf, Bugs as Deviant Behaviour: A General approach to inferring
errors in systems code, SOSP 2001.
[7] Zhenmin Li, Shan Lu, SuvdaMyagmar and Yuanyan Zhou, CP-Miner: A
tool for finding copy-paste and related bugs in operating system code,
OSPI 2004.
[8] Prof. S. Chitra&Dr. M. Rajaram, A Software Reliability Estimation tool
using Artificial Immune Recognition System: Proceedings of the
International Multiconference of Engineers and computer scientists 2008
vol 1, IMECS 2008, pp. 19-21 March 2008, Hong Kong.
[9] Leon Wu, BoyiXie, Gail Kaiser & Rebecca Passonneau, Department of
Computer Science, Columbia University, Newyork NY 10027 USA,
BUGMINER: Software Reliability Analysis via Data Mining of Bug
Reports2007.
[10] Swapna S. Gokhale, Member, IEEE, A Simulation Approach to
structured-based software reliability analysis, IEEE transactions on
Software Engineering, vol 31, No. 8, August 2005.
[11] Simon P. Wilson and Francisco J. Samaniego,Nonparametric Analysis of
the order-statistic model in software reliability, IEEE transactions on
software engineering, vol 33, No. 3, March 2007.