2021 IEEE/ACIS 19th International Conference on Software Engineering Research, Management and Applications (SERA)
Class Imbalance Problem is the major issue in machine intelligence producing biased classifiers t... more Class Imbalance Problem is the major issue in machine intelligence producing biased classifiers that work well for the majority class but have a relatively poor performance for the minority class. To ensure the development of accurate prediction models, it is essential to deal with the class imbalance problem. In this paper, the class imbalance problem is handled using focused undersampling techniques viz. Cluster Based, Tomek Link and Condensed Nearest Neighbours which equalize the number of instances of the two types of classes by undersampling the majority class based on some particular criteria. This is in contrast to random undersampling where the data samples are selected randomly from the majority class leading to underfitting and loss of some important datapoints. To fairly compare and evaluate the performance of focused undersampling approaches, prediction models are constructed using popular machine learning classifiers like K-Nearest Neighbor, Decision Tree and Naive Bayes. The results have shown that Decision Tree outperformed other machine learning techniques. Comparing and contrasting the undersampling approaches for Decision Tree concluded Condensed Nearest Neighbours to be best amongst others.
In today’s era where there are continuous virus-alerts along with a threat of cyber terrorism and... more In today’s era where there are continuous virus-alerts along with a threat of cyber terrorism and malicious crackers, defects which hamper the security are bound to enter the system, thereby exposing the system to security vulnerabilities. Thus, dealing with securityrelated defects is the necessity which needs to be dealt with utmost caution for failure-free functioning of the software. This would in turn lead to an overall benefit of the organization. But, due to limited resources, it is not possible to pay equal attention to all the securityrelated defects introduced in the software. Thus, securityrelated defects need to be assigned with an appropriate severity level which would signify the extent to which that particular defect can be harmful for the system. Assigning severity levels to the securityrelated defects can prove to be very useful as it would help industry professionals to prioritize defects, thereby allocating available resources and man-power to the defects which hav...
International Journal of Information Technologies and Systems Approach
COVID-19 pandemic has affected the patients to a large extent who are in immediate need of medica... more COVID-19 pandemic has affected the patients to a large extent who are in immediate need of medical care. Now-a-days people rely on online reviews shared on review websites to gather information about hospitals like the availability of beds, availability of ventilators etc. However, these reviews are large in number and unstructured which makes it difficult to interpret them. Hence, the authors have proposed a methodology that will apply an aspect based sentiment analysis on the reviews to gain meaningful insights of the hospital based on different aspects like doctor, staff, facilities etc. For the purpose of empirical validation, a total of 26,071 reviews pertaining to 325 hospitals were scrapped. The authors concluded that the study can be useful for patients as it helps them to select the hospital which best suits them. The study also helps hospital administration to improve the current services according to needs of the patients.
Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the sys... more Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the system, thus it is essential to identify NFRs in the initial phases of software development. Aim: The work is based on extraction of relevant keywords from NFR descriptions by employing text mining steps and thereafter classifying these descriptions into one of the nine types of NFRs. Method: For each NFR type, keywords are extracted from a set of pre-categorized specifications using Information-Gain measure. Then models using 8 Machine Learning (ML) techniques are developed for classification of NFR descriptions. A set of 15 projects (containing 326 NFR descriptions) developed by MS students at DePaul University are used to evaluate the models. The study analyzes the performance of ML models in terms of classification and misclassification rate to determine the best model for predicting each type NFR descriptions. The Naïve Bayes model has performed best in predicting "maintainability" and "availability" type of NFRs. The NFR descriptions should be analyzed and mapped into their corresponding NFR types during the initial phases. The authors conducted cost benefit analysis to appreciate the advantage of using the proposed models.
International Journal of Information System Modeling and Design, 2021
Huge and reputed software industries are expected to deliver quality products. However, industry ... more Huge and reputed software industries are expected to deliver quality products. However, industry suffers from a loss of approximately $500 billion due to shoddy software quality. The quality of the product in terms of its accuracy, efficiency, and reliability can be revamped through testing by focusing attention on testing the product through effective test case generation and prioritization. The authors have proposed a test-case generation technique based on iterative listener genetic algorithm that generates test cases automatically. The proposed technique uses its adaptive nature and solves the issues like redundant test cases, inefficient test coverage percentage, high execution time, and increased computation complexity by maintaining the diversity of the population which will decrease the redundancy in test cases. The performance of the technique is compared with four existing test-case generation algorithms in terms of computational complexity, execution time, coverage, and i...
Defect severity assessment is highly essential for the software practitioners so that they can fo... more Defect severity assessment is highly essential for the software practitioners so that they can focus their attention and resources on the defects having a higher priority than the other defects. This would directly impact resource allocation and planning of subsequent defect fixing activities. In this paper, we intend to predict a model which will be used to assign a severity level to each of the defect found during testing. The model is based on text mining and machine learning technique. We have used KNN machine learning method to predict the model employed on an open source NASA dataset available in the PITS database. Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis is used as the performance measure to validate and analyze the results. The obtained results show that the performance of KNN technique is exceptionally well in predicting the defects corresponding to top 100 words for all the severity levels. Its performance is less for top 5...
Automated classification of text into predefined categories has always been considered as a vital... more Automated classification of text into predefined categories has always been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. This kind of web information, popularly known as the digital/electronic information is in the form of documents, conference material, publications, journals, editorials, web pages, e-mail etc. People largely access information from these online sources rather than being limited to archaic paper sources like books, magazines, newspapers etc. But the main problem is that this enormous information lacks organization which makes it difficult to manage. Text classification is recognized as one of the key techniques used for organizing such kind of digital data. In this paper we have studied the existing work in the area of text classification which will allow us to have a fair evaluation of the progress made in this field till date. We have investigated the papers to the ...
International Journal of System Dynamics Applications, 2022
The need of the customers to be connected to the network at all times has led to the evolution of... more The need of the customers to be connected to the network at all times has led to the evolution of mobile technology. Operating systems play a vitol role when we talk of technology. Nowadays, Android is one of the popularly used operating system in mobile phones. Authors have analysed three stable versions of Android, 6.0, 7.0 and 8.0. Incorporating a change in the version after it is released requires a lot of rework and thus huge amount of costs are incurred. In this paper, the aim is to reduce this rework by identifying certain parts of a version during early phase of development which need careful attention. Machine learning prediction models are developed to identify the parts which are more prone to changes. The accuracy of such models should be high as the developers heavily rely on them. The high dimensionality of the dataset may hamper the accuracy of the models. Thus, the authors explore four dimensionality reduction techniques, which are unexplored in the field of network ...
Automated classification of text into predefined categories has always been considered as a vital... more Automated classification of text into predefined categories has always been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. This kind of web information, popularly known as the digital/electronic information is in the form of documents, conference material, publications, journals, editorials, web pages, e-mail etc. People largely access information from these online sources rather than being limited to archaic paper sources like books, magazines, newspapers etc. But the main problem is that this enormous information lacks organization which makes it difficult to manage. Text classification is recognized as one of the key techniques used for organizing such kind of digital data. In this paper we have studied the existing work in the area of text classification which will allow us to have a fair evaluation of the progress made in this field till date. We have investigated the papers to the ...
2019 International Conference on Computing, Power and Communication Technologies (GUCON), 2019
Software keeps on evolving due to the changing requirements and demands of the customer. Due to t... more Software keeps on evolving due to the changing requirements and demands of the customer. Due to this, nowadays, we have multiple versions of a software. Incorporating any change at a later phase in the software development life cycle leads to investment of lots of resources. In this study, we will develop models to predict the classes which are more prone to changes at a very early stage in software development life cycle. This will allow the developers to focus their attention and resources on limited number of classes rather than all the classes. In addition, rigorous verification activities can be focused on such classes leading to less number of bugs/errors at a later stage. However, the datasets in software engineering domain suffer from the class imbalance problem leading to inaccurate or biased model prediction. In this study, we have used a sampling technique to balance the dataset. The aim of this study is to use meta heuristic algorithms for model prediction using imbalanc...
Huge and reputed software industries are expected to deliver quality products. However, industry ... more Huge and reputed software industries are expected to deliver quality products. However, industry suffers from a loss of approximately $500 billion due to shoddy software quality. The quality of the product in terms of its accuracy, efficiency, and reliability can be revamped through testing by focusing attention on testing the product through effective test case generation and prioritization. The authors have proposed a test-case generation technique based on iterative listener genetic algorithm that generates test cases automatically. The proposed technique uses its adaptive nature and solves the issues like redundant test cases, inefficient test coverage percentage, high execution time, and increased computation complexity by maintaining the diversity of the population which will decrease the redundancy in test cases. The performance of the technique is compared with four existing test-case generation algorithms in terms of computational complexity, execution time, coverage, and i...
Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the sys... more Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the system, thus it is essential to identify NFRs in the initial phases of software development. Aim: The work is based on extraction of relevant keywords from NFR descriptions by employing text mining steps and thereafter classifying these descriptions into one of the nine types of NFRs. Method: For each NFR type, keywords are extracted from a set of pre-categorized specifications using Information-Gain measure. Then models using 8 Machine Learning (ML) techniques are developed for classification of NFR descriptions. A set of 15 projects (containing 326 NFR descriptions) developed by MS students at DePaul University are used to evaluate the models. The study analyzes the performance of ML models in terms of classification and misclassification rate to determine the best model for predicting each type NFR descriptions. The Naïve Bayes model has performed best in predicting "maintainability" and "availability" type of NFRs. The NFR descriptions should be analyzed and mapped into their corresponding NFR types during the initial phases. The authors conducted cost benefit analysis to appreciate the advantage of using the proposed models.
2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016
Requirement engineers are not able to elicit and analyze the security requirements clearly, that ... more Requirement engineers are not able to elicit and analyze the security requirements clearly, that are essential for the development of secure and reliable software. Proper identification of security requirements present in the Software Requirement Specification (SRS) document has been a problem being faced by the developers. As a result, they are not able to deliver the software free from threats and vulnerabilities. Thus, in this paper, we intend to mine the descriptions of security requirements present in the SRS document and thereafter develop the classification models. The security-based descriptions are analyzed using text mining techniques and are then classified into four types of security requirements viz. authentication-authorization, access control, cryptography-encryption and data integrity using J48 decision tree method. Corresponding to each type of security requirement, a prediction model has been developed. The effectiveness of the prediction models is evaluated against requirement specifications collected from 15 projects which have been developed by MS students at DePaul University. The result analysis indicated that all the four models have performed very well in predicting their respective type of security requirements.
International Journal of Reliability, Quality and Safety Engineering, 2016
Changes in the software are unavoidable due to an ever changing dynamic and active environment wh... more Changes in the software are unavoidable due to an ever changing dynamic and active environment wherein expectations and requirements of the users tend to change rapidly. As a result, software needs to upgrade itself from its previous version to the next version in order to meet expectations of the user. The upgradation of the software is in terms of total number of Lines of Code (LOC) that might have been inserted, deleted or modified in moving from one version of software to the next. These changes are maintained in the change reports which constitute of the defect ID and defect description. Defect description describes the cause of defect which might have occurred in the previous version of the software due to which either new LOC needs to be inserted or existing LOC need to be deleted or modified. A lot of effort is required to correct the defects identified in software at the maintenance phase i.e., when software is delivered at the customers end. Thus, in this paper, we intend ...
2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), 2015
Software Maintenance is an important phase of software development lifecycle, which starts once t... more Software Maintenance is an important phase of software development lifecycle, which starts once the software has been deployed at the customer's end. A lot of maintenance effort is required to change the software after it is in operation. Therefore, predicting the effort and cost associated with the maintenance activities such as correcting and fixing the defects has become one of the key issues that need to be analyzed for effective resource allocation and decision-making. In view of this issue, we have developed a model based on text mining techniques using machine learning method namely, Radial Basis Function of neural network. We apply text mining techniques to identify the relevant attributes from defect reports and relate these relevant attributes to software maintenance effort prediction. The proposed model is validated using `Browser' application package of Android Operating System. Receiver Operating Characteristics (ROC) analysis is done to interpret the results obtained from model prediction by using the value of Area Under the Curve (AUC), sensitivity and a suitable threshold criterion known as the cut-off point. It is evident from the results that the performance of the model is dependent on the number of words considered for classification and therefore shows the best results with respect to top-100 words. The performance is irrespective of the type of effort category.
2021 IEEE/ACIS 19th International Conference on Software Engineering Research, Management and Applications (SERA)
Class Imbalance Problem is the major issue in machine intelligence producing biased classifiers t... more Class Imbalance Problem is the major issue in machine intelligence producing biased classifiers that work well for the majority class but have a relatively poor performance for the minority class. To ensure the development of accurate prediction models, it is essential to deal with the class imbalance problem. In this paper, the class imbalance problem is handled using focused undersampling techniques viz. Cluster Based, Tomek Link and Condensed Nearest Neighbours which equalize the number of instances of the two types of classes by undersampling the majority class based on some particular criteria. This is in contrast to random undersampling where the data samples are selected randomly from the majority class leading to underfitting and loss of some important datapoints. To fairly compare and evaluate the performance of focused undersampling approaches, prediction models are constructed using popular machine learning classifiers like K-Nearest Neighbor, Decision Tree and Naive Bayes. The results have shown that Decision Tree outperformed other machine learning techniques. Comparing and contrasting the undersampling approaches for Decision Tree concluded Condensed Nearest Neighbours to be best amongst others.
In today’s era where there are continuous virus-alerts along with a threat of cyber terrorism and... more In today’s era where there are continuous virus-alerts along with a threat of cyber terrorism and malicious crackers, defects which hamper the security are bound to enter the system, thereby exposing the system to security vulnerabilities. Thus, dealing with securityrelated defects is the necessity which needs to be dealt with utmost caution for failure-free functioning of the software. This would in turn lead to an overall benefit of the organization. But, due to limited resources, it is not possible to pay equal attention to all the securityrelated defects introduced in the software. Thus, securityrelated defects need to be assigned with an appropriate severity level which would signify the extent to which that particular defect can be harmful for the system. Assigning severity levels to the securityrelated defects can prove to be very useful as it would help industry professionals to prioritize defects, thereby allocating available resources and man-power to the defects which hav...
International Journal of Information Technologies and Systems Approach
COVID-19 pandemic has affected the patients to a large extent who are in immediate need of medica... more COVID-19 pandemic has affected the patients to a large extent who are in immediate need of medical care. Now-a-days people rely on online reviews shared on review websites to gather information about hospitals like the availability of beds, availability of ventilators etc. However, these reviews are large in number and unstructured which makes it difficult to interpret them. Hence, the authors have proposed a methodology that will apply an aspect based sentiment analysis on the reviews to gain meaningful insights of the hospital based on different aspects like doctor, staff, facilities etc. For the purpose of empirical validation, a total of 26,071 reviews pertaining to 325 hospitals were scrapped. The authors concluded that the study can be useful for patients as it helps them to select the hospital which best suits them. The study also helps hospital administration to improve the current services according to needs of the patients.
Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the sys... more Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the system, thus it is essential to identify NFRs in the initial phases of software development. Aim: The work is based on extraction of relevant keywords from NFR descriptions by employing text mining steps and thereafter classifying these descriptions into one of the nine types of NFRs. Method: For each NFR type, keywords are extracted from a set of pre-categorized specifications using Information-Gain measure. Then models using 8 Machine Learning (ML) techniques are developed for classification of NFR descriptions. A set of 15 projects (containing 326 NFR descriptions) developed by MS students at DePaul University are used to evaluate the models. The study analyzes the performance of ML models in terms of classification and misclassification rate to determine the best model for predicting each type NFR descriptions. The Naïve Bayes model has performed best in predicting "maintainability" and "availability" type of NFRs. The NFR descriptions should be analyzed and mapped into their corresponding NFR types during the initial phases. The authors conducted cost benefit analysis to appreciate the advantage of using the proposed models.
International Journal of Information System Modeling and Design, 2021
Huge and reputed software industries are expected to deliver quality products. However, industry ... more Huge and reputed software industries are expected to deliver quality products. However, industry suffers from a loss of approximately $500 billion due to shoddy software quality. The quality of the product in terms of its accuracy, efficiency, and reliability can be revamped through testing by focusing attention on testing the product through effective test case generation and prioritization. The authors have proposed a test-case generation technique based on iterative listener genetic algorithm that generates test cases automatically. The proposed technique uses its adaptive nature and solves the issues like redundant test cases, inefficient test coverage percentage, high execution time, and increased computation complexity by maintaining the diversity of the population which will decrease the redundancy in test cases. The performance of the technique is compared with four existing test-case generation algorithms in terms of computational complexity, execution time, coverage, and i...
Defect severity assessment is highly essential for the software practitioners so that they can fo... more Defect severity assessment is highly essential for the software practitioners so that they can focus their attention and resources on the defects having a higher priority than the other defects. This would directly impact resource allocation and planning of subsequent defect fixing activities. In this paper, we intend to predict a model which will be used to assign a severity level to each of the defect found during testing. The model is based on text mining and machine learning technique. We have used KNN machine learning method to predict the model employed on an open source NASA dataset available in the PITS database. Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis is used as the performance measure to validate and analyze the results. The obtained results show that the performance of KNN technique is exceptionally well in predicting the defects corresponding to top 100 words for all the severity levels. Its performance is less for top 5...
Automated classification of text into predefined categories has always been considered as a vital... more Automated classification of text into predefined categories has always been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. This kind of web information, popularly known as the digital/electronic information is in the form of documents, conference material, publications, journals, editorials, web pages, e-mail etc. People largely access information from these online sources rather than being limited to archaic paper sources like books, magazines, newspapers etc. But the main problem is that this enormous information lacks organization which makes it difficult to manage. Text classification is recognized as one of the key techniques used for organizing such kind of digital data. In this paper we have studied the existing work in the area of text classification which will allow us to have a fair evaluation of the progress made in this field till date. We have investigated the papers to the ...
International Journal of System Dynamics Applications, 2022
The need of the customers to be connected to the network at all times has led to the evolution of... more The need of the customers to be connected to the network at all times has led to the evolution of mobile technology. Operating systems play a vitol role when we talk of technology. Nowadays, Android is one of the popularly used operating system in mobile phones. Authors have analysed three stable versions of Android, 6.0, 7.0 and 8.0. Incorporating a change in the version after it is released requires a lot of rework and thus huge amount of costs are incurred. In this paper, the aim is to reduce this rework by identifying certain parts of a version during early phase of development which need careful attention. Machine learning prediction models are developed to identify the parts which are more prone to changes. The accuracy of such models should be high as the developers heavily rely on them. The high dimensionality of the dataset may hamper the accuracy of the models. Thus, the authors explore four dimensionality reduction techniques, which are unexplored in the field of network ...
Automated classification of text into predefined categories has always been considered as a vital... more Automated classification of text into predefined categories has always been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. This kind of web information, popularly known as the digital/electronic information is in the form of documents, conference material, publications, journals, editorials, web pages, e-mail etc. People largely access information from these online sources rather than being limited to archaic paper sources like books, magazines, newspapers etc. But the main problem is that this enormous information lacks organization which makes it difficult to manage. Text classification is recognized as one of the key techniques used for organizing such kind of digital data. In this paper we have studied the existing work in the area of text classification which will allow us to have a fair evaluation of the progress made in this field till date. We have investigated the papers to the ...
2019 International Conference on Computing, Power and Communication Technologies (GUCON), 2019
Software keeps on evolving due to the changing requirements and demands of the customer. Due to t... more Software keeps on evolving due to the changing requirements and demands of the customer. Due to this, nowadays, we have multiple versions of a software. Incorporating any change at a later phase in the software development life cycle leads to investment of lots of resources. In this study, we will develop models to predict the classes which are more prone to changes at a very early stage in software development life cycle. This will allow the developers to focus their attention and resources on limited number of classes rather than all the classes. In addition, rigorous verification activities can be focused on such classes leading to less number of bugs/errors at a later stage. However, the datasets in software engineering domain suffer from the class imbalance problem leading to inaccurate or biased model prediction. In this study, we have used a sampling technique to balance the dataset. The aim of this study is to use meta heuristic algorithms for model prediction using imbalanc...
Huge and reputed software industries are expected to deliver quality products. However, industry ... more Huge and reputed software industries are expected to deliver quality products. However, industry suffers from a loss of approximately $500 billion due to shoddy software quality. The quality of the product in terms of its accuracy, efficiency, and reliability can be revamped through testing by focusing attention on testing the product through effective test case generation and prioritization. The authors have proposed a test-case generation technique based on iterative listener genetic algorithm that generates test cases automatically. The proposed technique uses its adaptive nature and solves the issues like redundant test cases, inefficient test coverage percentage, high execution time, and increased computation complexity by maintaining the diversity of the population which will decrease the redundancy in test cases. The performance of the technique is compared with four existing test-case generation algorithms in terms of computational complexity, execution time, coverage, and i...
Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the sys... more Background: Non-Functional Requirements (NFR) have a direct impact on the architecture of the system, thus it is essential to identify NFRs in the initial phases of software development. Aim: The work is based on extraction of relevant keywords from NFR descriptions by employing text mining steps and thereafter classifying these descriptions into one of the nine types of NFRs. Method: For each NFR type, keywords are extracted from a set of pre-categorized specifications using Information-Gain measure. Then models using 8 Machine Learning (ML) techniques are developed for classification of NFR descriptions. A set of 15 projects (containing 326 NFR descriptions) developed by MS students at DePaul University are used to evaluate the models. The study analyzes the performance of ML models in terms of classification and misclassification rate to determine the best model for predicting each type NFR descriptions. The Naïve Bayes model has performed best in predicting "maintainability" and "availability" type of NFRs. The NFR descriptions should be analyzed and mapped into their corresponding NFR types during the initial phases. The authors conducted cost benefit analysis to appreciate the advantage of using the proposed models.
2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2016
Requirement engineers are not able to elicit and analyze the security requirements clearly, that ... more Requirement engineers are not able to elicit and analyze the security requirements clearly, that are essential for the development of secure and reliable software. Proper identification of security requirements present in the Software Requirement Specification (SRS) document has been a problem being faced by the developers. As a result, they are not able to deliver the software free from threats and vulnerabilities. Thus, in this paper, we intend to mine the descriptions of security requirements present in the SRS document and thereafter develop the classification models. The security-based descriptions are analyzed using text mining techniques and are then classified into four types of security requirements viz. authentication-authorization, access control, cryptography-encryption and data integrity using J48 decision tree method. Corresponding to each type of security requirement, a prediction model has been developed. The effectiveness of the prediction models is evaluated against requirement specifications collected from 15 projects which have been developed by MS students at DePaul University. The result analysis indicated that all the four models have performed very well in predicting their respective type of security requirements.
International Journal of Reliability, Quality and Safety Engineering, 2016
Changes in the software are unavoidable due to an ever changing dynamic and active environment wh... more Changes in the software are unavoidable due to an ever changing dynamic and active environment wherein expectations and requirements of the users tend to change rapidly. As a result, software needs to upgrade itself from its previous version to the next version in order to meet expectations of the user. The upgradation of the software is in terms of total number of Lines of Code (LOC) that might have been inserted, deleted or modified in moving from one version of software to the next. These changes are maintained in the change reports which constitute of the defect ID and defect description. Defect description describes the cause of defect which might have occurred in the previous version of the software due to which either new LOC needs to be inserted or existing LOC need to be deleted or modified. A lot of effort is required to correct the defects identified in software at the maintenance phase i.e., when software is delivered at the customers end. Thus, in this paper, we intend ...
2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), 2015
Software Maintenance is an important phase of software development lifecycle, which starts once t... more Software Maintenance is an important phase of software development lifecycle, which starts once the software has been deployed at the customer's end. A lot of maintenance effort is required to change the software after it is in operation. Therefore, predicting the effort and cost associated with the maintenance activities such as correcting and fixing the defects has become one of the key issues that need to be analyzed for effective resource allocation and decision-making. In view of this issue, we have developed a model based on text mining techniques using machine learning method namely, Radial Basis Function of neural network. We apply text mining techniques to identify the relevant attributes from defect reports and relate these relevant attributes to software maintenance effort prediction. The proposed model is validated using `Browser' application package of Android Operating System. Receiver Operating Characteristics (ROC) analysis is done to interpret the results obtained from model prediction by using the value of Area Under the Curve (AUC), sensitivity and a suitable threshold criterion known as the cut-off point. It is evident from the results that the performance of the model is dependent on the number of words considered for classification and therefore shows the best results with respect to top-100 words. The performance is irrespective of the type of effort category.
Uploads
Papers by abha jain