Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2970276.2970300acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Local-based active classification of test report to assist crowdsourced testing

Published: 25 August 2016 Publication History

Abstract

In crowdsourced testing, an important task is to identify the test reports that actually reveal fault - true fault, from the large number of test reports submitted by crowd workers. Most existing approaches towards this problem utilized supervised machine learning techniques, which often require users to manually label a large amount of training data. Such process is time-consuming and labor-intensive. Thus, reducing the onerous burden of manual labeling while still being able to achieve good performance is crucial. Active learning is one potential technique to address this challenge, which aims at training a good classifier with as few labeled data as possible. Nevertheless, our observation on real industrial data reveals that existing active learning approaches generate poor and unstable performances on crowdsourced testing data. We analyze the deep reason and find that the dataset has significant local biases. To address the above problems, we propose LOcal-based Active ClassiFication (LOAF) to classify true fault from crowdsourced test reports. LOAF recommends a small portion of instances which are most informative within local neighborhood, and asks user their labels, then learns classifiers based on local neighborhood. Our evaluation on 14,609 test reports of 34 commercial projects from one of the Chinese largest crowdsourced testing platforms shows that our proposed LOAF can generate promising results. In addition, its performance is even better than existing supervised learning approaches which built on large amounts of labelled historical data. Moreover, we also implement our approach and evaluate its usefulness using real-world case studies. The feedbacks from testers demonstrate its practical value.

References

[1]
Y. Feng, Z. Chen, J. A. Jones, C. Fang, and B. Xu, “Test report prioritization to assist crowdsourced testing,” in Proceedings of the 10th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (FSE 2015), 2015, pp. 225–236.
[2]
T. Menzies and A. Marcus, “Automated severity assessment of software defect reports,” in Proceedings of IEEE International Conference onSoftware Maintenance (ICSM 2008), 2008, pp. 346–355.
[3]
Y. Tian, D. Lo, and C. Sun, “Drone: Predicting priority of reported bugs by multi-factor analysis,” in Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM 2013), 2013, pp. 200–209.
[4]
X. Wang, L. Zhang, T. Xie, J. Anvik, and J. Sun, “An approach to detecting duplicate bug reports using natural language and execution information,” in Proceedings of the 30th International Conference on Software Engineering (ICSE 2008), 2008, pp. 461–470.
[5]
Y. Zhou, Y. Tong, R. Gu, and H. Gall, “Combining text mining and data mining for bug report classification,” in Proceedings of the 30th IEEE International Conference on Software Maintenance (ICSM 2014), 2014, pp. 311–320.
[6]
B. Settles, “Active learning literature survey,” University of Wisconsin-Madison, Tech. Rep., 2010.
[7]
D. D. Lewis and W. A. Gale, “A sequential algorithm for training text classifiers,” in Proceedings of ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1994), 1994.
[8]
A. Storkey, Dataset shift in machine learning. The MIT Press, Cambridge, MA, 2009.
[9]
I. H. Witten and E. Frank, Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.
[10]
Y. Yang, Z. He, K. Mao, Q. Li, V. Nguyen, B. Boehm, and R. Valerdi, “Analyzing and handling local bias for calibrating parametric cost estimation models,” Information and Software Technology, vol. 55, no. 8, pp. 1496 – 1511, 2013.
[11]
T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: A large scale experiment on data vs. domain vs. process,” in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering (FSE 2009), 2009, pp. 91–100.
[12]
T. Menzies, A. Butcher, A. Marcus, T. Zimmermann, and D. Cok, “Local vs. global models for effort estimation and defect prediction,” in Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), 2011, pp. 343–351.
[13]
T. Menzies, A. Butcher, A. Marcus, D. Cok, F. Shull, B. Turhan, and T. Zimmermann, “Local versus global lessons for defect prediction and effort estimation,” IEEE Transactions on software engineering, vol. 39, no. 6, pp. 822–834, 2013.
[14]
Z. He, F. Peters, T. Menzies, and Y. Yang, “Learning from open-source projects: An empirical study on defect prediction,” in Proceedings of ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2013), Oct 2013, pp. 45–54.
[15]
H. Finch, “Comparison of distance measures in cluster analysis with dichotomous data,” Journal of Data Science, vol. 3, pp. 85–100, 2005.
[16]
S.-J. Huang, R. Jin, and Z.-H. Zhou, “Active learning by querying informative and representative examples,” IEEE Transactions on Pattern Analysis and Machine Intelligence, in press.
[17]
B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empirical Software Engineering, vol. 14, no. 5, Oct. 2009.
[18]
L. L. Minku and X. Yao, “How to make best use of cross-company data in software effort estimation?” in Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), 2014, pp. 446–456.
[19]
S. Hido, T. Idé, H. Kashima, H. Kubo, and H. Matsuzawa, “Unsupervised change analysis using supervised learning,” in Proceedings of 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2008), 2008, pp. 148–159.
[20]
S. Kotsiantis, “Supervised machine learning: A review of classification techniques,” Informatica, vol. 31, pp. 249–268, 2007.
[21]
A. Berson, S. Smith, and K. Thearling, “An overview of data mining techniques,” Building Data Mining Application for CRM, 2004.
[22]
C. D. Manning, P. Raghavan, and H. Sch´l´ ztze, Introduction to Information Retrieval. Cambridge University Press, 2008.
[23]
K.-J. Stol and B. Fitzgerald, “Two’s company, three’s a crowd: A case study of crowdsourcing software development,” in Proceedings of the 36th International Conference on Software Engineering (ICSE 2014), 2014, pp. 187–198.
[24]
N. Chen and S. Kim, “Puzzle-based automatic testing: Bringing humans into the loop by solving puzzles,” in Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE 2012), 2012, pp. 140–149.
[25]
R. Musson, J. Richards, D. Fisher, C. Bird, B. Bussone, and S. Ganguly, “Leveraging the crowd: How 48,000 users helped improve lync performance,” IEEE Software, vol. 30, no. 4, pp. 38–45, 2013.
[26]
V. H. M. Gomide, P. A. Valle, J. O. Ferreira, J. R. G. Barbosa, A. F. da Rocha, and T. M. G. d. A. Barbosa, “Affective crowdsourcing applied to usability testing,” International Journal of Computer Science and Information Technologies, vol. 5, no. 1, pp. 575–579, 2014.
[27]
M. Gómez, R. Rouvoy, B. Adams, and L. Seinturier, “Reproducing context-sensitive crashes of mobile apps using crowdsourced monitoring,” in Proceedings of the IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft), 2016, pp. 88–99.
[28]
J. Wang, Q. Cui, Q. Wang, and S. Wang, “Towards effectively test report classification to assist crowdsourced testing,” in Proceedings of ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2016), 2016.
[29]
M. S. Zanetti, I. Scholtes, C. J. Tessone, and F. Schweitzer, “Categorizing bugs with social networks: A case study on four open source software communities,” in Proceedings of the International Conference on Software Engineering (ICSE 2013), 2013, pp. 1032–1041.
[30]
S. Wang, W. Zhang, and Q. Wang, “FixerCache: Unsupervised caching active developers for diverse bug triage,” in Proceedings of the International Symposium on Empirical Software Engineering and Measurement (ESEM 2014), 2014, pp. 25:1–25:10.
[31]
K. Mao, Y. Yang, Q. Wang, Y. Jia, and M. Harman, “Developer recommendation for crowdsourced software development tasks,” in Proceedings of IEEE Symposium on Service-Oriented System Engineering (SOSE 2015), 2015, pp. 347–356.
[32]
W. Maalej and H. Nabil, “Bug report, feature request, or simply praise? On automatically classifying app reviews,” in Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE 2015), 2015, pp. 116–125.
[33]
E. Guzman, M. El-Halaby, and B. Bruegge, “Ensemble methods for app review classification: An approach for software evolution,” in Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE 2015), 2015.
[34]
S. Panichella, A. D. Sorbo, E. Guzman, C. A.Visaggio, G. Canfora, and H. C. Gall, “How can i improve my app? Classifying user reviews for software maintenance and evolution,” in Proceedings of the 31st IEEE International Conference on Software Maintenance (ICSM 2015), 2015, pp. 281–290.
[35]
V. Avdiienko, K. Kuznetsov, A. Gorla, A. Zeller, S. Arzt, S. Rasthofer, and E. Bodden, “Mining apps for abnormal usage of sensitive data,” in Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE 2015), vol. 1, May 2015, pp. 426–436.
[36]
W. Yang, X. Xiao, B. Andow, S. Li, T. Xie, and W. Enck, “AppContext: Differentiating malicious and benign mobile app behaviors using context,” in Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE 2015), vol. 1, May 2015, pp. 303–313.
[37]
S. Chakraborty, V. Balasubramanian, A. R. Sankar, S. Panchanathan, and J. Ye, “Batchrank: A novel batch mode active learning framework for hierarchical classification,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2015), 2015, pp. 99–108.
[38]
J. F. Bowring, J. M. Rehg, and M. J. Harrold, “Active learning for automatic classification of software behavior,” in Proceedings of the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2004), 2004, pp. 195–205.
[39]
Lucia, D. Lo, L. Jiang, and A. Budi, “Active refinement of clone anomaly reports,” in Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), June 2012, pp. 397–407.
[40]
S. Wang, D. Lo, and L. Jiang, “Active code search: Incorporating user feedback to improve code search relevance,” in Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (ASE 2014), 2014, pp. 677–682.
[41]
F. Thung, X.-B. D. Le, and D. Lo, “Active semi-supervised defect categorization,” in Proceedings of the 23rd International Conference on Program Comprehension (ICPC 2015), 2015, pp. 60–70.
[42]
S. Ma, S. Wang, D. Lo, R. H. Deng, and C. Sun, “Active semi-supervised approach for checking app behavior against its description,” in Proceedings of the 39th Annual Computer Software and Applications Conference (COMPSAC 2015), vol. 2, July 2015, pp. 179–184.
[43]
E. Kocaguneli, T. Menzies, J. Keung, D. Cok, and R. Madachy, “Active learning and effort estimation: Finding the essential content of software effort estimation data,” IEEE Transactions on Software Engineering, vol. 39, no. 8, pp. 1040–1053, Aug 2013.

Cited By

View all
  • (2024)Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text ClassificationInternational Journal of Web Services Research10.4018/IJWSR.35799921:1(1-19)Online publication date: 7-Nov-2024
  • (2024)Optimizing Prioritization of Crowdsourced Test Reports of Web Applications through Image-to-Text ConversionSymmetry10.3390/sym1601008016:1(80)Online publication date: 8-Jan-2024
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '16: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
August 2016
899 pages
ISBN:9781450338455
DOI:10.1145/2970276
  • General Chair:
  • David Lo,
  • Program Chairs:
  • Sven Apel,
  • Sarfraz Khurshid
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Active Learning
  2. Crowdsourced Testing
  3. Test Report Classification

Qualifiers

  • Research-article

Conference

ASE'16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Clustering and Prioritization of Web Crowdsourced Test Reports Based on Text ClassificationInternational Journal of Web Services Research10.4018/IJWSR.35799921:1(1-19)Online publication date: 7-Nov-2024
  • (2024)Optimizing Prioritization of Crowdsourced Test Reports of Web Applications through Image-to-Text ConversionSymmetry10.3390/sym1601008016:1(80)Online publication date: 8-Jan-2024
  • (2024)Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding RulesProceedings of the ACM on Software Engineering10.1145/36607761:FSE(1540-1563)Online publication date: 12-Jul-2024
  • (2023)Mobile App Crowdsourced Test Report Consistency Detection via Deep Image-and-Text Fusion UnderstandingIEEE Transactions on Software Engineering10.1109/TSE.2023.3285787(1-20)Online publication date: 2023
  • (2022)A New Text-Mining–Bayesian Network Approach for Identifying Chemical Safety Risk FactorsMathematics10.3390/math1024481510:24(4815)Online publication date: 18-Dec-2022
  • (2022)Predictive Models in Software Engineering: Challenges and OpportunitiesACM Transactions on Software Engineering and Methodology10.1145/350350931:3(1-72)Online publication date: 9-Apr-2022
  • (2022)Context- and Fairness-Aware In-Process Crowdworker RecommendationACM Transactions on Software Engineering and Methodology10.1145/348757131:3(1-31)Online publication date: 7-Mar-2022
  • (2022)Identifying High-impact Bug Reports with Imbalance Distribution by Instance Fuzzy EntropyInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402250053X32:09(1389-1417)Online publication date: 28-Sep-2022
  • (2022)Context-Aware Personalized Crowdtesting Task RecommendationIEEE Transactions on Software Engineering10.1109/TSE.2021.308117148:8(3131-3144)Online publication date: 1-Aug-2022
  • (2022)Estimate the Precision of Defects Based on Reports Duplication in Crowdsourced TestingIEEE Access10.1109/ACCESS.2022.322793010(130415-130423)Online publication date: 2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media