Abstract
This paper surveys applications of data mining techniques to large text collections, and illustrates how those techniques can be used to support the management of science and technology research. Specific issues that arise repeatedly in the conduct of research management are described, and a textual data mining architecture that extends a classic paradigm for knowledge discovery in databases is introduced. That architecture integrates information retrieval from text collections, information extraction to obtain data from individual texts, data warehousing for the extracted data, data mining to discover useful patterns in the data, and visualization of the resulting patterns. At the core of this architecture is a broad view of data mining—the process of discovering patterns in large collections of data—and that step is described in some detail. The final section of the paper illustrates how these ideas can be applied in practice, drawing upon examples from the recently completed first phase of the textual data mining program at the Office of Naval Research. The paper concludes by identifying some research directions that offer significant potential for improving the utility of textual data mining for research management applications.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Apte, C. (1997). Data Mining: An Industrial Research Perspective. IEEE Computational Science and Engineering, 4.
Califf, M.E. and Mooney, R.J. (1997). Applying ILP-Based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning. In Workshop Notes of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, Nagoya, Japan (pp. 7–11).
Chen, H., Houston, A.L., Sewel, R.R., and Schatz, B.R. (1998). Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. Journal of the American Society for Information Science, 49(7), 582–603.
Cost, S. and Salzberg, S. (1993). A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning, 10, 57.
Doermann, D. (1998). The Indexing and Retrieval of Document Images: A Survey. Computer Vision and Image Understanding.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases. AI Magazine, 17(3), 37–54.
Foote, J. (To appear). An Overview of Audio Information Retrieval. ACM-Springer Multimedia Systems. Available at http://www.fxpal.xerox.com/people/foote/.
Freeman, L. (1997). UsingAvailable Graph Theoretic or Molecular Modeling Programs in Social Network Analysis [http://tarski.ss.uci.edu/new.html].
Gallippi, A. (1996). Automatic Cross-Language Proper Name Determination in Text using Robust Methods. Ph.D. Thesis, University of Southern California, Los Angeles.
Gey, F., Chen, H.-M., Norgard, B., Buckland, M., Kim, Y., Chen, A., Lam, B., Purat, J., and Larson, R. (1999). Advanced Search Technologies for Unfamiliar Metadata. In Third IEEE Meta-Data Conference, Bethesda, MD. Available at http://www.sims.berkeley.edu/research/metadata/papers.html.
Gilman, M. (1988). Nuggets TM and Data Mining. Data Mining Technologies Inc. White Paper.
Hlava, M.M.K., Hainbebach, R., Belanogov, G., and Kuznetsov, B. (1997). Cross-Language Retrieval-English/Russian/French. In Symposium on Cross-Language Text and Speech Retrieval. Technical Report SS-97–05, American Association for Artificial Intelligence. Available at http://www.clis.umd.edu/dlrg/filter/sss/.
Kostoff, R.N. (1991). Database Tomography: Multidisciplinary Research Thrusts from Co-Word Analysis. In Proceedings: Portland International Conference on Management of Engineering and Technology.
Kostoff, R.N. (1992). Research Impact Assessment. In Proceedings: Third International Conference on Management of Technology, Miami, FL. Larger text available from author.
Kostoff, R.N. (1993). Database Tomography for Technical Intelligence. Competitive Intelligence Review, 4, 1.
Kostoff, R.N. (1994). Database Tomography: Origins and Applications. Competitive Intelligence Review, Special Issue on Technology, 5, 1.
Kostoff, R.N. (1999a). Science and Technology Innovation. Technovation, 19. Earlier versions on www.scicentral.com; www.dtic.mil/dtic/kostoff/index.html.
Kostoff, R.N. et al. (1995). System and Method for Database Tomography. U.S. Patent Number 5440481.
Kostoff, R.N., Eberhart, H.J., and Toothman, D.R. (1997). Database Tomography for Information Retrieval. Journal of Information Science, 23, 4.
Kostoff, R.N., Eberhart, H.J., and Toothman, D.R. (1998). Database Tomography for Technical Intelligence: A Roadmap of the Near-Earth Space Science andTechnology Literature. Information Processing and Management, 34, 1.
Kostoff, R.N., Eberhart, H.J., and Toothman, D.R. (1999b). Hypersonic and Supersonic Flow Roadmaps Using Bibliometrics and Database Tomography. JASIS, 50(5), 15.
Lawrence, S., Giles, C.L., and Bollacker, K. (1999). Digital Libraries and Autonomous Citation Indexing. Computer, 32(6), 67–71.
McCulloch, W.S. and Pitts,W. (1988). A Logical Calculus of Ideas Immanent in Nervous Activity. In J.A. Anderson and E. Rosenfeld (Eds.), Neurocomputing: Foundations of Research. Cambridge, MA: MIT Press.
Oard, D.W. and Kim, J. (1998). Implicit Feedback for Recommender Systems. In AAAIWorkshop on Recommender Systems, Madison, WI. Available at http://www.glue.umd.edu/ oard/research.html.
Riloff, E. and Schmelzenbach, M. (1998). An Empirical Approach to Conceptual Case Frame Acquisition. In Proceedings of the Sixth Workshop on Very Large Corpora, Montreal. Available at <http://www.cs.utah.edu/ \(\tilde r\) riloff/publications.html.
Rohrer, R.M., Ebert, D.S., and Sibert, J.L. (1998). The Shape of Shakespeare: Visualizing Text using Implicit Surfaces. In Fourth IEEE Symposium on Information Visualization, Durham, NC.
Selden, C.R. and Humphries, B.L. (1996). Unified Medical Language System, Current Bibliographies in Medicine 96–8, National Library of Medicine. Available at http://www.nlm.nih.gov/pubs/cbm/umlscbm.html.
Sheth, B. (1994). A Learning Approach to Personalized Information Filtering. Master's Thesis, MIT.
Westphal, C. and Blaxton T. (1998). Data Mining Solutions. New York, NY: John Wiley & Sons.
White, A.P. and Liu, W.Z. (1994). Bias in Information-based Measures in Decision Tree Induction. Machine Learning, 15, 321–329.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Losiewicz, P., Oard, D.W. & Kostoff, R.N. Textual Data Mining to Support Science and Technology Management. Journal of Intelligent Information Systems 15, 99–119 (2000). https://doi.org/10.1023/A:1008777222412
Issue Date:
DOI: https://doi.org/10.1023/A:1008777222412