Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Towards a Model-based Software Mining Infrastructure

Published: 06 February 2015 Publication History

Abstract

Software mining is concerned with two primary goals: the extraction of basic facts from software repositories and the derivation of knowledge resulting from the assessment of the basic facts. Facts extraction approaches rely on custom and task-specific infrastructures and tools. The resulting facts assets are usually represented in heterogeneous formats at a low level of abstraction. Due to this, facts extracted from different sources are also not well integrated, even if they are related. To manage this, existing infrastructures often aim at supporting an all-in-one information meta-structures which try to integrate all facts in one connected whole. We propose a generic infrastructure that translates extracted facts to homogeneous high-level representations conforming to domain-specific metamodels, and then transforms these high-level model instances to instances of domain-specific models related to a particular assessment task, which can be incrementally enriched with additional facts as these become available or necessary. This allows researchers and practitioners to focus on the assessment task at hand, without being concerned with low-level representation details or complex data models containing large amounts of often irrelevant data. We present an example scenario with a concrete instantiation of the proposed infrastructure targeting the assessment of developer behaviour.

References

[1]
H. Benestad, B. Anda, and E. Arisholm. Understanding cost drivers of software evolution: a quantitative and qualitative investigation of change effort in two evolving software systems. Empirical Software Engineering, 15(2):166--203, Apr. 2010.
[2]
M. Broy, F. Deissenboeck, and M. Pizka. Demystifying maintainability. In Proceedings of the 2006 international workshop on Software quality, WoSQ '06, pages 21--26, New York, NY, USA, 2006. ACM.
[3]
R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. ICSE '13, pages 422--431, Piscataway, NJ, USA, 2013. IEEE Press.
[4]
D. German. Mining CVS repositories, the softChange experience. IEE Seminar Digests, 2004(917):17--21, Jan. 2004.
[5]
T. Girba, J.-M. Favre, and S. Ducasse. Using meta-model transformation to model software evolution. Electronic Notes in Theoretical Computer Science, 137(3):57--64, Sept. 2005.
[6]
M. Godfrey and Q. Tu. Tracking structural evolution using origin analysis. In Proceedings of the international workshop on Principles of software evolution - IWPSE '02, page 117, Orlando, Florida, 2002.
[7]
M. Goeminne, M. Claes, and T. Mens. A historical dataset for the gnome ecosystem. MSR '13, pages 225--228, Piscataway, NJ, USA, 2013. IEEE Press.
[8]
M. Goeminne and T. Mens. A framework for analysing and visualising open source software ecosystems. IWPSE-EVOL '10, pages 42--47, New York, NY, USA, 2010. ACM.
[9]
G. Gousios and D. Spinellis. A platform for software engineering research. pages 31--40, Vancouver, BC, Canada, May 2009.
[10]
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The weka data mining software: an update. ACM SIGKDD Explorations Newsletter, 11(1):10--18, 2009.
[11]
A. Hassan. The road ahead for mining software repositories. In ICSE Workshops MSR '07. Fourth International Workshop on Mining Software Repositories, pages 48--57, 2008.
[12]
H. Hemmati, S. Nadi, O. Baysal, O. Kononenko, W. Wang, R. Holmes, and M. W. Godfrey. The MSR cookbook: mining a decade of research. MSR '13, pages 343--352, Piscataway, NJ, USA, 2013. IEEE Press.
[13]
I. Keivanloo, C. Forbes, A. Hmood, M. Erfani, C. Neal, G. Peristerakis, and J. Rilling. A linked data platform for mining software repositories. pages 32--35, 2012.
[14]
C. Kiefer, A. Bernstein, and J. Tappolet. Mining software repositories with iSPARQL and a software evolution ontology. In ICSE Workshops MSR '07. Fourth International Workshop on Mining Software Repositories, page 10, 2007.
[15]
S. Kim, E. Whitehead, and Y. Zhang. Classifying software changes: Clean or buggy? IEEE Transactions on Software Engineering, 34(2):181--196, Apr. 2008.
[16]
D. S. Kolovos, R. F. Paige, and F. A. C. Polack. The epsilon object language (EOL). In A. Rensink and J. Warmer, editors, Model Driven Architecture - Foundations and Applications, number 4066 in Lecture Notes in Computer Science, pages 128--142. Springer Berlin Heidelberg, Jan. 2006. Cited by 0118.
[17]
D. S. Kolovos, R. F. Paige, and F. A. C. Polack. The epsilon transformation language. In A. Vallecillo, J. Gray, and A. Pierantonio, editors, Theory and Practice of Model Transformations, number 5063 in Lecture Notes in Computer Science, pages 46--60. Springer Berlin Heidelberg, Jan. 2008.
[18]
T. Menzies, B. Caglayan, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan. The promise repository of empirical software engineering data, June 2012.
[19]
N. Nagappan and T. Ball. Use of relative code churn measures to predict system defect density. In 27th International Conference on Software Engineering, 2005. ICSE 2005, pages 284--292. IEEE, May 2005.
[20]
N. Nagappan, T. Ball, and A. Zeller. Mining metrics to predict component failures. In Proceedings of the ACM/IEEE 28th International Conference on Software engineering, ICSE '06, pages 452--461, New York, NY, USA, 2006. ACM.
[21]
M. Ohira, R. Yokomori, M. Sakai, K. Matsumoto, K. Inoue, and K. Torii. Empirical project monitor: a tool for mining multiple project data. IEE Seminar Digests, 2004(917):42--46, Jan. 2004.
[22]
T. Ostrand, E. Weyuker, and R. Bell. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4):340--355, 2005.
[23]
R. Robbes. Mining a change-based software repository. In Fourth International Workshop on Mining Software Repositories, 2007. ICSE Workshops MSR '07, pages 15--15, 2007.
[24]
R. Robbes and M. Lanza. SpyWare: a change-aware development toolset. In Proceedings of the 30th international conference on Software engineering, ICSE '08, pages 847--850, New York, NY, USA, 2008. ACM.
[25]
G. Robles. Replicating MSR: a study of the potential replicability of papers published in the mining software repositories proceedings. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pages 171--180, 2010.
[26]
G. Robles, S. Koch, J. M. González-Barahona, and J. Carlos. Remote analysis and measurement of libre software systems by means of the CVSAnalY tool. In In Proceedings of the 2nd ICSE Workshop on Remote Analysis and Measurement of Software Systems (RAMSS, pages 51--55, 2004.
[27]
C. Sadowski, C. Lewis, Z. Lin, X. Zhu, and E. J. Whitehead,Jr. An empirical analysis of the FixCache algorithm. In Proceeding of the 8th working conference on Mining software repositories, MSR '11, pages 219--222, New York, NY, USA, 2011. ACM.
[28]
M. Scheidgen. Reference representation techniques for large models. BigMDE '13, pages 5:1--5:9, New York, NY, USA, 2013. ACM.
[29]
M. Scheidgen, A. Zubow, J. Fischer, and T. H. Kolbe. Automated and transparent model fragmentation for persisting large models. In R. B. France, J. Kazmeier, R. Breu, and C. Atkinson, editors, Model Driven Engineering Languages and Systems, number 7590 in Lecture Notes in Computer Science, pages 102--118. Springer Berlin Heidelberg, Jan. 2012.
[30]
E. Shihab, A. E. Hassan, B. Adams, and Z. M. Jiang. An industrial study on the risk of software changes. FSE '12, pages 62:1--62:11, New York, NY, USA, 2012. ACM.
[31]
J. Sliwerski, T. Zimmermann, and A. Zeller. When do changes induce fixes? In Proceedings of the 2005 international workshop on Mining software repositories, pages 1--5, St. Louis, Missouri, 2005. ACM.
[32]
J. Spacco, J. Strecker, D. Hovemeyer, and W. Pugh. Software repository mining with marmoset: an automated programming project snapshot and testing system. In MSR '05 Proceedings of the 2005 international workshop on Mining software repositories, pages 1--5, St. Louis, Missouri, 2005. ACM.
[33]
L. Yu and S. Ramaswamy. Mining CVS repositories to understand open-source project developer roles. In ICSE Workshops MSR '07. Fourth International Workshop on Mining Software Repositories, page 8, 2007.
[34]
T. Zimmermann, S. Kim, A. Zeller, and J. E. James Whitehead. Mining version archives for co-changed lines. pages 72--75, Shanghai, China, 2006. ACM.
[35]
T. Zimmermann, N. Nagappan, P. J. Guo, and B. Murphy. Characterizing and predicting which bugs get reopened. In Proceedings of the 2012 International Conference on Software Engineering, ICSE 2012, pages 1074--1083, Piscataway, NJ, USA, 2012. IEEE Press.
[36]
T. Zimmermann, A. Zeller, P. Weissgerber, and S. Diehl. Mining version histories to guide software changes. IEEE Transactions on Software Engineering, 31(6):429--445, June 2005.

Cited By

View all
  • (2019)Abstract Layers and Generic Elements as a Basis for Expressing Multidimensional Software KnowledgeNew Trends in Databases and Information Systems10.1007/978-3-030-30278-8_26(232-242)Online publication date: 1-Sep-2019
  • (2018)Addressing problems with replicability and validity of repository mining studies through a smart data platformEmpirical Software Engineering10.1007/s10664-017-9537-x23:2(1036-1083)Online publication date: 1-Apr-2018
  • (2017)GitcProc: a tool for processing and classifying GitHub commitsProceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3092703.3098230(396-399)Online publication date: 10-Jul-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 40, Issue 1
January 2015
237 pages
ISSN:0163-5948
DOI:10.1145/2693208
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 February 2015
Published in SIGSOFT Volume 40, Issue 1

Check for updates

Author Tags

  1. Infrastructure
  2. Mining
  3. Modeling
  4. data integration
  5. data mining
  6. domain modeling
  7. facts extraction

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Abstract Layers and Generic Elements as a Basis for Expressing Multidimensional Software KnowledgeNew Trends in Databases and Information Systems10.1007/978-3-030-30278-8_26(232-242)Online publication date: 1-Sep-2019
  • (2018)Addressing problems with replicability and validity of repository mining studies through a smart data platformEmpirical Software Engineering10.1007/s10664-017-9537-x23:2(1036-1083)Online publication date: 1-Apr-2018
  • (2017)GitcProc: a tool for processing and classifying GitHub commitsProceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3092703.3098230(396-399)Online publication date: 10-Jul-2017
  • (2016)Adressing problems with external validity of repository mining studies through a smart data platformProceedings of the 13th International Conference on Mining Software Repositories10.1145/2901739.2901753(97-108)Online publication date: 14-May-2016
  • (2015)CrossPareProceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineering Workshop (ASEW)10.1109/ASEW.2015.8(90-96)Online publication date: 9-Nov-2015

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media