Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An Information Retrieval Approach for Automatically Constructing Software Libraries

Published: 01 August 1991 Publication History

Abstract

A technology for automatically assembling large software libraries which promote software reuse by helping the user locate the components closest to her/his needs is described. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two steps. First, attributes are automatically extracted from natural language documentation by using an indexing scheme based on the notions of lexical affinities and quantity of information. Then a hierarchy for browsing is automatically generated using a clustering technique which draws only on the information provided by the attributes. Due to the free-text indexing scheme, tools following this approach can accept free-style natural language queries.

References

[1]
{1} M. Adanson., Histoire Naturelle du Sénégal. Coquillages. Avec la relation abrégée d'un voyage fait en ce pays, pendant les années 1749,50,51,52 et 53. Paris: Bauche, 1757.
[2]
{2} B. P. Allen and S. D. Lee, "A knowledge-based environment for the development of software parts composition systems," in Proc. 11th ICSE (Pittsburgh, PA), May 1989, pp. 104-112.
[3]
{3} S. P. Arnold and S. L. Stepoway, "The reuse system: Cataloging and retrieval of reusable software," in Software Reuse: Emerging Technology, W. Tracz, Ed. Los Alamitos, CA: IEEE Computer Soc., 1987, pp. 138-141.
[4]
{4} R. Ash, Information Theory. New York: Wiley-Interscience, 1965.
[5]
{5} D. C. Blair and M. E. Maron, "An evaluation of retrieval effectiveness for a full-text document retrieval system," Commun. ACM, vol. 28, no. 3, pp. 289-299, Mar. 1985.
[6]
{6} B. A. Burton, R. Wienk Aragon, S. A. Bailey, K. D. Koelher, and L. A. Mayes, "The reusable software library, " in Software Reuse: Emerging Technology, W. Tracz, Ed. Los Alamitos, CA: IEEE Computer Soc., 1987, pp. 129-137.
[7]
{7} F. Can and E. A. Ozkarahan, "A clustering scheme," in Proc. SIGIR'83 (Bethesda, MD), 1983, pp. 115-121.
[8]
{8} F. de Saussure, Cours de Linguistique Générale, Quatrième Edition. Paris: Librairie Payot, 1949.
[9]
{9} S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman," Indexing by latent semantic analysis," J. Amer. Soc. Inform. Sci., vol. 41, no. 6, pp. 391-407, 1990.
[10]
{10} P. Devanbu, "Re-use of software knowledge: A progress report," presented at the 3rd Ann. Workshop: Methods and Tools for Reuse, Syracuse, NY, June 1990.
[11]
{11} P. Devanbu, P. G. Selfridge, B. W. Ballard, and R. J. Brachman, "A knowledge-based software information system," in Proc. IJCAI'89 (Detroit, MI), Aug. 1989, pp. 110-115.
[12]
{12} E. Diday, J. Lemaire, and F. Testu, Eléments d'Analyse des Données. Paris: Dunod, 1982.
[13]
{13} B. Everitt, Cluster Analysis. New York: Halsted, 1980.
[14]
{14} W. B. Frakes and P. B. Gandel, "Classification, storage and retrieval of reusable components," in Proc. SIGIR'89 (Cambridge, MA), June 1989, N. J. Belkin and C. J. van Rijsbergen, Eds., pp. 251-254.
[15]
{15} W. B. Frekes and P. B. Gandel, "Representing reusable software," Inform. Software Technol., Nov. 1990.
[16]
{16} W. B. Frakes and B. A. Nejmeh, "Software reuse through information retrieval," in Proc. 20th Ann. HICSS (Kona, HI), Jan. 1987, pp. 530-535.
[17]
{17} A. Griffiths, L. A. Robinson, and P. Willett, "Hierarchical agglomerative clustering methods for automatic document classification," J. Documentation, vol. 40, no. 3, pp. 175-205, Sept. 1984.
[18]
{18} W. Harrison, "A program development environment for programming by refinement and reuse," in Proc. 19th HICSS (Kona, HI), 1986, pp. 459-469.
[19]
{19} IBM AIX Version 3 for RISC System/6000. Commands Reference . Yorktown Heights; NY: IBM, 1990.
[20]
{20} T. Ichikawa and M. Hirakawa, "Ares: A relational database with the capability of performing flexible interpretation of queries," IEEE Trans. Software Eng., vol. SE-12, pp. 624-634, May 1986.
[21]
{21} N. Jardine and C. J. van Rijsbergen, "The use of hierarchic clustering in information retrieval," Inform. Storage and Retrieval, vol. 7, no. 5, pp. 217-240, Dec. 1971.
[22]
{22} S. M. Kaplan and Y. S. Maarek, "Incremental maintenance of semantic links in dynamically changing hypertext systems," Interacting with Computers, vol. 2, no. 3, Dec. 1990.
[23]
{23} P. H. Klingbiel, "Machine-aided indexing of technical literature," Inform. Storage and Retrieval, vol. 9, pp. 79-84, 1973.
[24]
{24} G. N. Lance and W. T. Williams, "A general theory of classificatory sorting strategies," Computer J., vol. 9, pp. 373-380, 1967.
[25]
{25} M. Luhn, "The automatic creation of literature abstracts," IBM J. Res. Develop., vol. 2, no. 2, pp. 159-165, Apr. 1958.
[26]
{26} Y. S. Maarek, "Using structural information for managing very large software systems," Ph.D. thesis, Technion, Israel Instit. Technol., Haifa, Israel, Jan. 1989.
[27]
{27} Y. S. Maarek, "An incremental conceptual clustering algorithm with input-ordering bias correction, in Advances in Artificial Intelligence, Natural Language and Knowledge Base Systems, M. C. Golumbic, Ed. New York: Springer-Verlag, 1990.
[28]
{28} Y. S. Maarek and G. E. Kaiser, "On the use of conceptual clustering for classifying reusable ada code, " in Proc. Ada Letters, Using Ada: ACM SIGAda Int. Conf. (Boston, MA), Dec. 1987, pp. 208-215.
[29]
{29} Y. S. Maarek and F. A. Smadja, "Full text indexing based on lexical relations, an application: Software libraries," in Proc. SIGIR'89 (Cambridge, MA), June 1989, N. J. Belkin and C. J. van Rijsbergen, Eds., pp. 198-206.
[30]
{30} W. J. R. Martin, B. P. F. Al, and P. J. G. van Sterkenburg, "On the processing of a text corpus: From textual data to lexicographic information," in Lexicographiy: Principles and Practice (Applied Language Studies Series), R. R. K. Hartmann, Ed. London: Academic, 1983.
[31]
{31} R. Michalski and R. Stepp, "Automated constructions of classifications: Conceptual clustering versus numerical taxonomy," IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-5, pp. 396-409, July 1983.
[32]
{32} R. Prieto Diaz and P. Freeman, "Classifying software for reusability," IEEE Software, vol. 4, pp. 6-16, Jan. 1987.
[33]
{33} G. Salton, Automatic Text Processing: The Transformation. Analysis and Retrieval of Information by Computer. Reading, MA: Addison-Wesley, 1989.
[34]
{34} G. Salton and M. J. McGill, Introduction to Modern Information Retrieval (Computer Series). New York: McGraw-Hill, 1983.
[35]
{35} G. Salton and M. Smith, "On the application of syntactic methodologies in automatic text analysis," in Proc. SIGIR'89 (Cambridge, MA), June 1989, pp. 137-150.
[36]
{36} R. W. Schwanke, R. Z. Altucher, and M. A. Platoff, "Discovering, visualizing and controllling software structure," in Proc. 5th Int. Workshop on Software Specifications and Design (Pittsburgh, PA), May 1989, pp. 147-150.
[37]
{37} F. A. Smadja, "Lexical co-occurrence: The missing link," J. Assoc. Literary and Linguistic Computing, vol. 4, no. 3, 1989.
[38]
{38} K. Sparck Jones and J. I. Tait, "Automatic search variant generation," J. Documentation, vol. 40, no. 1, pp. 50-66, Mar. 1984.
[39]
{39} W. F. Tichy, R. L. Adams, and L. Holter, "NLH/E: A natural-language help system," in Proc. 11th ICSE (Pittsburgh, PA), May 1989, pp. 364-374.
[40]
{40} C. J. van Rijsbergen, Information Retrieval, 2nd ed. Stoneham, MA: Butterworths, 1979.
[41]
{41} M. Wood and I. Sommerville, "An information retrieval system for software components," SIGIR Forum, vol. 22, nos. 314, pp. 11-25, Spring/Summer 1988.

Cited By

View all
  • (2022)Automated assertion generation via information retrieval and its integration with deep learningProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510149(163-174)Online publication date: 21-May-2022
  • (2022)Learning I/O Variables from Scientific Software’s User ManualsComputational Science – ICCS 202210.1007/978-3-031-08760-8_42(503-516)Online publication date: 21-Jun-2022
  • (2021)A unified framework for semantic similarity computation of conceptsMultimedia Tools and Applications10.1007/s11042-021-10966-180:21-23(32335-32378)Online publication date: 1-Sep-2021
  • Show More Cited By

Recommendations

Reviews

Roger William Elliott

The idea of reusing software has a great deal of appeal as a technique for improving programmer productivity. The utility of the concept is as yet unproven, because libraries of reusable software are not available. This paper addresses this problem. The authors describe a system that uses information retrieval techniques to automatically construct and search software libraries. The system makes use of automatic indexing of written documentation associated with modules of code to be included in the library. A powerful retrieval function that allows multiple search strategies is incorporated into the system. The system has been implemented in C under AIX. The authors have demonstrated empirical results using documentation totaling more than 800,000 words for 1100 modules. The empirical results are encouraging, and I see no reason to believe that they would not be duplicated in other settings. The paper is carefully written, and the authors have taken pains to provide a context for their design decisions. Numerous examples are provided, and the paper is comprehensively referenced. It is a must read for anyone who wants to design a software library retrieval system.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering  Volume 17, Issue 8
August 1991
113 pages
ISSN:0098-5589
Issue’s Table of Contents

Publisher

IEEE Press

Publication History

Published: 01 August 1991

Author Tags

  1. attributes
  2. automatic programming
  3. browsing
  4. clustering technique
  5. free-style natural language queries
  6. free-text indexing scheme
  7. indexing scheme
  8. information retrieval approach
  9. information retrieval systems
  10. large software libraries
  11. lexical affinities
  12. natural language documentation
  13. natural languages
  14. software reusability
  15. software reuse
  16. subroutines

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Automated assertion generation via information retrieval and its integration with deep learningProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510149(163-174)Online publication date: 21-May-2022
  • (2022)Learning I/O Variables from Scientific Software’s User ManualsComputational Science – ICCS 202210.1007/978-3-031-08760-8_42(503-516)Online publication date: 21-Jun-2022
  • (2021)A unified framework for semantic similarity computation of conceptsMultimedia Tools and Applications10.1007/s11042-021-10966-180:21-23(32335-32378)Online publication date: 1-Sep-2021
  • (2021)The nature of build changesEmpirical Software Engineering10.1007/s10664-020-09926-426:3Online publication date: 16-Mar-2021
  • (2020)Feature Terms PredictionProceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering10.1145/3383219.3383229(90-99)Online publication date: 15-Apr-2020
  • (2020)Taming behavioral backward incompatibilities via cross-project testing and analysisProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380436(112-124)Online publication date: 27-Jun-2020
  • (2020)An automated approach to assess the similarity of GitHub repositoriesSoftware Quality Journal10.1007/s11219-019-09483-028:2(595-631)Online publication date: 1-Jun-2020
  • (2019)Similarity reasoning in formal concept analysisKnowledge and Information Systems10.1007/s10115-018-1252-460:2(715-739)Online publication date: 1-Aug-2019
  • (2018)Automatic generation of text descriptive comments for code blocksProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence10.5555/3504035.3504676(5229-5236)Online publication date: 2-Feb-2018
  • (2018)A language-agnostic model for semantic source code labelingProceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis10.1145/3243127.3243132(36-44)Online publication date: 3-Sep-2018
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media