En-LDA: An Novel Approach to Automatic Bug Report Assignment with Entropy Optimized Latent Dirichlet Allocation
Abstract
:1. Introduction
2. Background
2.1. Entropy
2.2. Open Bug Repository
2.2.1. Bug Report
2.2.2. Bug Life-Cycle
3. The Proposed Approach
3.1. LDA Topic Model
3.2. Entropy Optimized LDA Model
3.3. Associating Developers and Topics
3.4. Recommendation
4. Experiments
4.1. The Data
4.2. Experiments Setup
4.3. Evaluation
4.4. Experimental Results
5. Conclusions
Acknowledgments
Author Contributions
Conflicts of Interest
References
- Raymond, E.S. The Cathedral & The Bazzar; O’Reilly Media: Newton, MA, USA, 1999. [Google Scholar]
- Bagozzi, R.P.; Dholakia, U.M. Open Source Software User Communities: A Study of Participation in Linux User Groups. Manag. Sci. 2006, 52, 1099–1115. [Google Scholar] [CrossRef]
- Raymond, E.S. The Cathedral and the Bazaar, 1st ed.; O’Reilly & Associates, Inc.: Sebastopol, CA, USA, 1999. [Google Scholar]
- Anvik, J.; Hiew, L.; Murphy, G.C. Coping with an open bug repository. In Proceedings of the 2005 OOPSLA Workshop on Eclipse Technology Exchange (Eclipse ’05), San Diego, CA, USA, 16–17 October 2005; pp. 35–39. [Google Scholar]
- Wu, W.; Zhang, W.; Yang, Y.; Wang, Q. Time series analysis for bug number prediction. In Proceedings of the 2010 2nd International Conference on Software Engineering and Data Mining (SEDM), Chengdu, China, 23–25 June 2010; pp. 589–596. [Google Scholar]
- Anvik, J.; Hiew, L.; Murphy, G.C. Who should fix this bug? In Proceedings of the 28th International Conference on Software Engineering (ICSE ’06), Shanghai, China, 20–28 May 2006; pp. 361–370. [Google Scholar]
- Anvik, J.; Murphy, G.C. Reducing the effort of bug report triage: Recommenders for development-oriented decisions. ACM Trans. Softw. Eng. Methodol. 2011, 20. [Google Scholar] [CrossRef]
- Guo, P.J.; Zimmermann, T.; Nagappan, N.; Murphy, B. “Not my bug!” and other reasons for software bug report reassignments. In Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work (CSCW ’11), Hangzhou, China, 19–23 March 2011; pp. 395–404. [Google Scholar]
- Bertram, D.; Voida, A.; Greenberg, S.; Walker, R. Communication, collaboration, and bugs: The social nature of issue tracking in small, collocated teams. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (CSCW ’10), Savannah, GA, USA, 6–10 February 2010; pp. 291–300. [Google Scholar]
- Wu, W.; Zhang, W.; Yang, Y.; Wang, Q. DREX: Developer Recommendation with K-Nearest-Neighbor Search and Expertise Ranking. In Proceedings of the 18th Asian Pacific Software Engineering Conference, Ho Chi Minh, Vietnam, 5–8 December 2011; pp. 389–396. [Google Scholar]
- Minto, S.; Murphy, G.C. Recommending Emergent Teams. In Proceedings of the Fourth International Workshop on Mining Software Repositories (MSR ’07), Minneapolis, MN, USA, 20–26 May 2007; p. 5. [Google Scholar]
- Nguyen, T.T.; Nguyen, T.N.; Phuong, T.M. Topic-based defect prediction (NIER track). In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11), Honolulu, HI, USA, 21–28 May 2011; pp. 932–935. [Google Scholar]
- Quinlan, J. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Griffiths, T.L.; Steyvers, M. Finding scientific topics. Proc. Natl. Acad. Sci. USA 2004, 101, 5228–5235. [Google Scholar] [CrossRef] [PubMed]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Thomas, S.W.; Adams, B.; Hassan, A.E.; Blostein, D. Validating the Use of Topic Models for Software Evolution. In Proceedings of the 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM ’10), Timisoara, Rumania, 12–13 September 2010; pp. 55–64. [Google Scholar]
- Phan, X.H.; Nguyen, C.T. GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation. 2007. Available online: http://gibbslda.sourceforge.net/ (accessed on 17 April 2017).
- Thomas, S.W. Mining software repositories using topic models. In Proceedings of the 33rd International Conference on Software Engineering (ICSE ’11), Honolulu, HI, USA, 21–28 May 2011; pp. 1138–1139. [Google Scholar]
- Porteous, I.; Newman, D.; Ihler, A.; Asuncion, A.; Smyth, P.; Welling, M. Fast collapsed gibbs sampling for latent dirichlet allocation. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’08), Henderson, NV, USA, 24–27 August 2008; pp. 569–577. [Google Scholar]
- Ramage, D.; Rosen, E.; Chuang, J.; Manning, C.D.; McFarland, D.A. Topic Modeling for the Social Sciences. In Proceedings of the NIPS 2009 Workshop on Applications for Topic Models: Text and Beyond, Vancouver, BC, Canada, 7–10 December 2009. [Google Scholar]
- Barnard, K.; Duygulu, P.; Forsyth, D.; de Freitas, N.; Blei, D.M.; Jordan, M.I. Matching words and pictures. J. Mach. Learn. Res. 2003, 3, 1107–1135. [Google Scholar]
- Hindle, A.; Godfrey, M.; Holt, R. What’s hot and what’s not: Windowed developer topic analysis. In Proceedings of the 2009 IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, AB, Canada, 20–26 September 2009; pp. 339–348. [Google Scholar]
- Lukins, S.K.; Kraft, N.A.; Etzkorn, L.H. Bug localization using latent Dirichlet allocation. Inf. Softw. Technol. 2010, 52, 972–990. [Google Scholar] [CrossRef]
- Linstead, E.; Lopes, C.; Baldi, P. An Application of Latent Dirichlet Allocation to Analyzing Software Evolution. In Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications, Kunming, China, 12–15 July 2008; pp. 813–818. [Google Scholar]
- Thomas, S.W.; Adams, B.; Hassan, A.E.; Blostein, D. Modeling the evolution of topics in source code histories. In Proceedings of the 8th Working Conference on Mining Software Repositories (MSR ’11), Honoluli, HI, USA, 21–22 May 2011; pp. 173–182. [Google Scholar]
- Guo, P.J.; Zimmermann, T.; Nagappan, N.; Murphy, B. Characterizing and Predicting Which Bugs Get Fixed: An Empirical Study of Microsoft Windows. In Proceedings of the 32th International Conference on Software Engineering, Cape Town, South Africa, 2–8 May 2010; pp. 495–504. [Google Scholar]
- Xia, X.; Lo, D.; Ding, Y.; Nguyen, T.N.; Wang, X. Improving Automated Bug Triaging with Specialized Topic Model. IEEE Trans. Softw. Eng. 2017, 43, 272–297. [Google Scholar] [CrossRef]
- Canfora, G.; Cerulo, L. Supporting change request assignment in open source development. In Proceedings of the 2006 ACM Symposium on Applied Computing, Dijon, France, 23–27 April 2006; pp. 1767–1772. [Google Scholar]
Rank | Topic-1 | Topic-2 | Topic-3 | Topic-4 |
---|---|---|---|---|
1 | menu | source | launch | error |
2 | action | package | debug | compiler |
3 | selection | folder | run | interface |
4 | view | jar | context | annotation |
5 | context | files | default | quick |
6 | show | create | config | warning |
7 | editor | src | resource | method |
8 | clean | explorer | dialog | override |
9 | open | path | remote | problem |
10 | add | copy | tab | missing |
Rank | Topic-1 | Topic-2 | Topic-3 | Topic-4 |
---|---|---|---|---|
1 | return | error | image | cache |
2 | const | fail | video | parser |
3 | null | log | background | time |
4 | string | unit | color | leak |
5 | static | dom | border | document |
6 | class | fix | media | content |
7 | type | check | frame | html |
8 | check | pass | element | event |
9 | fix | content | box | cycle |
10 | method | reply | canvas | thread |
Precision (%)/Recall (%) | |||||
---|---|---|---|---|---|
Top 1 | Top 2 | Top 3 | Top 4 | Top 5 | |
0 | 5/4 | 9/13 | 24/53 | 22/73 | 18/82 |
0.1 | 6/5 | 9/16 | 25/61 | 24/76 | 18/82 |
0.2 | 6/6 | 9/17 | 28/71 | 22/77 | 22/84 |
0.3 | 7/6 | 11/19 | 22/68 | 23/80 | 19/81 |
0.4 | 8/7 | 13/21 | 21/63 | 24/82 | 18/81 |
0.5 | 12/08 | 15/23 | 22/63 | 23/74 | 20/73 |
0.6 | 17/11 | 16/25 | 23/62 | 21/70 | 19/76 |
0.7 | 18/15 | 18/30 | 22/61 | 19/71 | 18/71 |
0.8 | 16/13 | 21/38 | 21/55 | 19/63 | 16/72 |
0.9 | 13/11 | 17/32 | 20/43 | 18/51 | 14/71 |
1 | 10/9 | 10/19 | 19/26 | 12/39 | 12/53 |
Precision (%)/Recall (%) | |||||||
---|---|---|---|---|---|---|---|
Top 1 | Top 2 | Top 3 | Top 4 | Top 5 | Top 6 | Top 7 | |
0 | 29/8 | 33/19 | 31/24 | 28/28 | 26/32 | 26/37 | 25/43 |
0.1 | 31/10 | 34/18 | 32/25 | 29/29 | 29/35 | 26/39 | 25/42 |
0.2 | 31/11 | 35/19 | 32/23 | 31/30 | 28/35 | 29/39 | 25/43 |
0.3 | 32/10 | 36/20 | 31/24 | 32/30 | 28/36 | 28/41 | 27/46 |
0.4 | 33/11 | 37/20 | 33/26 | 33/31 | 27/35 | 28/44 | 25/48 |
0.5 | 35/10 | 37/21 | 34/28 | 31/31 | 28/37 | 31/43 | 29/52 |
0.6 | 33/9 | 41/24 | 37/28 | 32/33 | 32/43 | 32/48 | 32/58 |
0.7 | 33/9 | 39/22 | 34/29 | 33/45 | 31/42 | 30/48 | 29/52 |
0.8 | 32/10 | 38/21 | 35/31 | 35/48 | 31/42 | 29/48 | 28/53 |
0.9 | 31/11 | 32/20 | 32/32 | 31/47 | 29/41 | 28/47 | 25/51 |
1 | 32/9 | 27/16 | 27/21 | 22/29 | 24/33 | 23/41 | 24/48 |
© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Cui, Y.; Yoshida, T. En-LDA: An Novel Approach to Automatic Bug Report Assignment with Entropy Optimized Latent Dirichlet Allocation. Entropy 2017, 19, 173. https://doi.org/10.3390/e19050173
Zhang W, Cui Y, Yoshida T. En-LDA: An Novel Approach to Automatic Bug Report Assignment with Entropy Optimized Latent Dirichlet Allocation. Entropy. 2017; 19(5):173. https://doi.org/10.3390/e19050173
Chicago/Turabian StyleZhang, Wen, Yangbo Cui, and Taketoshi Yoshida. 2017. "En-LDA: An Novel Approach to Automatic Bug Report Assignment with Entropy Optimized Latent Dirichlet Allocation" Entropy 19, no. 5: 173. https://doi.org/10.3390/e19050173
APA StyleZhang, W., Cui, Y., & Yoshida, T. (2017). En-LDA: An Novel Approach to Automatic Bug Report Assignment with Entropy Optimized Latent Dirichlet Allocation. Entropy, 19(5), 173. https://doi.org/10.3390/e19050173