Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2820518.2820530acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Sameness: an experiment in code search

Published: 16 May 2015 Publication History

Abstract

To date, most dedicated code search engines use ranking algorithms that focus only on the relevancy between the query and the results. In practice, this means that a developer may receive search results that are all drawn from the same project, all implement the same algorithm using the same external library, or all exhibit the same complexity or size, among other possibilities that are less than ideal. In this paper, we propose that code search engines should also locate both diverse and concise (brief but complete) sets of code results. We present four novel algorithms that use relevance, diversity, and conciseness in ranking code search results. To evaluate these algorithms and the value of diversity and conciseness in code search, twenty-one professional programmers were asked to compare pairs of top ten results produced by competing algorithms. We found that two of our new algorithms produce top ten results that are strongly preferred by the programmers.

References

[1]
"About WordNet - WordNet - About WordNet." {Online}. Available: http://wordnet.princeton.edu/wordnet/. {Accessed: 14-Feb-2015}.
[2]
"Apache Lucene - Apache Solr." {Online}. Available: http://lucene.apache.org/solr/. {Accessed: 14-Feb-2015}.
[3]
"Code Search · GitHub." {Online}. Available: https://github.com/search. {Accessed: 14-Feb-2015}.
[4]
"CodeExchange." {Online}. Available: http://codeexchange.ics.uci.edu/. {Accessed: 14-Feb-2015}.
[5]
"GitHub · Build software better, together." {Online}. Available: https://github.com/. {Accessed: 14-Feb-2015}.
[6]
"Institute for Software Research." {Online}. Available: http://isr.uci.edu/. {Accessed: 14-Feb-2015}.
[7]
"JWI 2.3.3." {Online}. Available: http://projects.csail.mit.edu/jwi/. {Accessed: 14-Feb-2015}.
[8]
"MALLET homepage." {Online}. Available: http://mallet.cs.umass.edu/. {Accessed: 14-Feb-2015}.
[9]
"Open Hub Code Search." {Online}. Available: http://code.openhub.net/. {Accessed: 14-Feb-2015}.
[10]
"searchcode | source code search engine." {Online}. Available: https://searchcode.com/. {https://searchcode.com/.}.
[11]
"Similarity (Lucene 2.9.4 API)." {Online}. Available: https://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/search/Similarity.html. {Accessed: 14-Feb-2015}.
[12]
"Sourcegraph." {Online}. Available: https://sourcegraph.com/. {Accessed: 14-Feb-2015}.
[13]
"Stack Overflow." {Online}. Available: http://stackoverflow.com/. {Accessed: 14-Feb-2015}.
[14]
A. J. Ko, B. A. Myers, M. J. Coblenz, and H. H. Aung, "An Exploratory Study of How Developers Seek, Relate, and Collect Relevant Information During Software Maintenance Tasks," IEEE Trans. Softw. Eng., vol. 32, no. 12, pp. 971--987, Dec. 2006.
[15]
A. T. Nguyen, H. A. Nguyen, T. T. Nguyen, and T. N. Nguyen, "GraPacc: A Graph-based Pattern-oriented, Context-sensitive Code Completion Tool," Proceedings of the 34th International Conference on Software Engineering, Piscataway, NJ, USA, 2012, pp. 1407--1410.
[16]
C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. New York, NY, USA: Cambridge University Press, 2008.
[17]
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon, "Novelty and Diversity in Information Retrieval Evaluation," in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 2008, pp. 659--666.
[18]
C. Lopes, S. Bajracharya, J. Ossher, P. Baldi (2010). UCI Source Code Data Sets {http://www.ics.uci.edu/~lopes/datasets}. Irvine, CA: University of California, Bren School of Information and Computer Sciences.
[19]
D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. Mach. Learn. Res., vol. 3, pp. 993--1022, Mar. 2003.
[20]
D. Mandelin, L. Xu, R. Bodík, and D. Kimelman, "Jungloid Mining: Helping to Navigate the API Jungle," in Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, New York, NY, USA, 2005, pp. 48--61.
[21]
E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi, "Sourcerer: Mining and Searching Internet-scale Software Repositories," Data Min. Knowl. Discov., vol. 18, no. 2, pp. 300--336, Apr. 2009.
[22]
F. A. Durão, T. A. Vanderlei, E. S. Almeida, and S. R. de L. Meira, "Applying a Semantic Layer in a Source Code Search Tool," in Proceedings of the 2008 ACM Symposium on Applied Computing, New York, NY, USA, 2008, pp. 1151--1157.
[23]
G. K. Gill and C. F. Kemerer, "Cyclomatic complexity density and software maintenance productivity," IEEE Transactions on Software Engineering, vol. 17, no. 12, pp. 1284--1288, Dec. 1991.
[24]
I. Keivanloo, J. Rilling, and Y. Zou, "Spotting Working Code Examples," in Proceedings of the 36th International Conference on Software Engineering, New York, NY, USA, 2014, pp. 664--675.
[25]
J. Brandt, P. J. Guo, J. Lewenstein, M. Dontcheva, and S. R. Klemmer, "Two Studies of Opportunistic Programming: Interleaving Web Foraging, Learning, and Writing Code," Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY, USA, 2009, pp. 1589--1598.
[26]
J. Carbonell and J. Goldstein, "The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries," in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 1998, pp. 335--336.
[27]
J. Cohen, Statistical Power Analysis for the Behavioral Sciences, 2 edition. Hillsdale, N.J: Routledge, 1988.
[28]
J. Kim, S. Lee, S. Hwang, and S. Kim, "Towards an Intelligent Code Search Engine," in Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
[29]
J. Singer, T. Lethbridge, N. Vinson, and N. Anquetil, "An Examination of Software Engineering Work Practices," in Proceedings of the 1997 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, Ontario, Canada, 1997, p. 21--.
[30]
J. Stylos and B. A. Myers, "Mica: A Web-Search Tool for Finding API Components and Examples," IEEE Symposium on Visual Languages and Human-Centric Computing, 2006. VL/HCC 2006, 2006, pp. 195--202.
[31]
J. Wang, Y. Dang, H. Zhang, K. Chen, T. Xie, and D. Zhang, "Mining succinct and high-coverage API usage patterns from source code," in 2013 10th IEEE Working Conference on Mining Software Repositories (MSR), 2013, pp. 319--328.
[32]
M. Bruch, M. Monperrus, and M. Mezini, "Learning from Examples to Improve Code Completion Systems," Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, New York, NY, USA, 2009, pp. 213--222.
[33]
M. Umarji, S. E. Sim, and C. Lopes, "Archetypal Internet-Scale Source Code Searching," in Open Source Development, Communities and Quality, B. Russo, E. Damiani, S. Hissam, B. Lundell, and G. Succi, Eds. Springer US, 2008, pp. 257--263.
[34]
O. A. L. Lemos, A. C. de Paula, F. C. Zanichelli, and C. V. Lopes, "Thesaurus-based Automatic Query Expansion for Interface-driven Code Search," in Proceedings of the 11th Working Conference on Mining Software Repositories, New York, NY, USA, 2014, pp. 212--221.
[35]
O. A. L. Lemos, S. K. Bajracharya, J. Ossher, R. S. Morla, P. C. Masiero, P. Baldi, and C. V. Lopes, "CodeGenie: Using Test-cases to Search and Reuse Source Code," in Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, New York, NY, USA, 2007, pp. 525--526.
[36]
O. Barzilay, O. Hazzan, and A. Yehudai, "Characterizing Example Embedding as a software activity," in ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation, 2009. SUITE '09, 2009, pp. 5--8.
[37]
O. Hummel, W. Janjic, and C. Atkinson, "Code Conjurer: Pulling Reusable Software out of Thin Air," IEEE Software, vol. 25, no. 5, pp. 45--52, Sep. 2008.
[38]
R. E. Gallardo-Valencia and S. E. Sim, "What Kinds of Development Problems Can Be Solved by Searching the Web?: A Field Study," in Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation, New York, NY, USA, 2011, pp. 41--44.
[39]
R. Feldman and J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, 2007.
[40]
R. Holmes, R. J. Walker, and G. C. Murphy, "Approximate Structural Context Matching: An Approach to Recommend Relevant Examples," IEEE Trans. Softw. Eng., vol. 32, no. 12, pp. 952--970, Dec. 2006.
[41]
R. P. L. Buse and W. Weimer, "Synthesizing API Usage Examples," in Proceedings of the 34th International Conference on Software Engineering, Piscataway, NJ, USA, 2012, pp. 782--792.
[42]
S. D. Fleming, C. Scaffidi, D. Piorkowski, M. Burnett, R. Bellamy, J. Lawrance, and I. Kwan, "An Information Foraging Theory Perspective on Tools for Debugging, Refactoring, and Reuse Tasks," ACM Trans. Softw. Eng. Methodol., vol. 22, no. 2, pp. 14:1--14:41, Mar. 2013.
[43]
S. E. Sim, M. Agarwala, and M. Umarji, "A Controlled Experiment on the Process Used by Developers During Internet-Scale Code Search," Finding Source Code on the Web for Remix and Reuse, S. E. Sim and R. E. Gallardo-Valencia, Eds. Springer New York, 2013, pp. 53--77.
[44]
S. Guo and S. Sanner, "Probabilistic Latent Maximal Marginal Relevance," in Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 2010, pp. 833--834.
[45]
S. Haiduc, G. Bavota, A. Marcus, R. Oliveto, A. De Lucia, and T. Menzies, "Automatic Query Reformulations for Text Retrieval in Software Engineering," in Proceedings of the 2013 International Conference on Software Engineering, Piscataway, NJ, USA, 2013, pp. 842--851.
[46]
S. K. Bajracharya and C. V. Lopes, "Analyzing and mining a code search engine usage log," Empir Software Eng, vol. 17, no. 4--5, pp. 424--466, Aug. 2012.
[47]
S. Kullback and R. A. Leibler, "On Information and Sufficiency," The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79--86, Mar. 1951.
[48]
S. P. Reiss, "Semantics-based Code Search," Proceedings of the 31st International Conference on Software Engineering, Washington, DC, USA, 2009, pp. 243--253.
[49]
S. Thummalapenta and T. Xie, "Parseweb: A Programmer Assistant for Reusing Open Source Code on the Web," in Proceedings of the Twenty-second IEEE/ACM International Conference on Automated Software Engineering, New York, NY, USA, 2007, pp. 204--213.
[50]
T. J. McCabe, "A Complexity Measure," IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308--320, Dec. 1976.
[51]
Y. Ye and G. Fischer, "Supporting Reuse by Delivering Task-relevant and Personalized Information," Proceedings of the 24th International Conference on Software Engineering, New York, NY, USA, 2002, pp. 513--523.

Cited By

View all
  • (2021)CodeMatcher: Searching Code Based on Sequential Semantics of Important Query WordsACM Transactions on Software Engineering and Methodology10.1145/346540331:1(1-37)Online publication date: 28-Sep-2021
  • (2017)Understanding the impact of support for iteration on code searchProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering10.1145/3106237.3106293(774-785)Online publication date: 21-Aug-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '15: Proceedings of the 12th Working Conference on Mining Software Repositories
May 2015
542 pages
ISBN:9780769555942

Sponsors

Publisher

IEEE Press

Publication History

Published: 16 May 2015

Check for updates

Author Tags

  1. code
  2. concise
  3. diversity
  4. results
  5. sameness
  6. search
  7. similarity
  8. top ten

Qualifiers

  • Research-article

Conference

ICSE '15
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)CodeMatcher: Searching Code Based on Sequential Semantics of Important Query WordsACM Transactions on Software Engineering and Methodology10.1145/346540331:1(1-37)Online publication date: 28-Sep-2021
  • (2017)Understanding the impact of support for iteration on code searchProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering10.1145/3106237.3106293(774-785)Online publication date: 21-Aug-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media