Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1244002.1244182acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Graph-based text representation and knowledge discovery

Published: 11 March 2007 Publication History

Abstract

For information retrieval and text-mining, a robust scalable framework is required to represent the information extracted from documents and enable visualization and query of such information. One very widely used model is the vector space model which is based on the bag-of-words approach. However, it suffers from the fact that it loses important information about the original text, such as information about the order of the terms in the text or about the frontiers between sentences or paragraphs. In this paper, we propose a graph-based text representation, which is capable of capturing (i) Term order (ii) Term frequency (iii) Term co-occurrence (iv) Term context in documents. We also apply the graph model into our text mining task, which is to discover unapparent associations between two and more concepts (e.g. individuals) from a large text corpus. Counterterrorism corpus is used to evaluate the performance of various retrieval models, which demonstrates feasibility and effectiveness of graphic text representation in information retrieval and text mining.

References

[1]
Baeza-Yates, R., Ribeiro-Neto, B. Modern Information Retrieval. ACM Press, 1999.
[2]
Fuhr, N. Models for retrieval with probabilistic indexing. Information Processing and Management, 25(1):55--71, 1989.
[3]
Hensman, S. Construction of conceptual graph representation of texts. In Proceedings of Student Research Workshop at HLT-NAACL, Boston, 2004, 49--54.
[4]
Bhoopesh, P. Text clustering using semantics. The 11th International Word Wide Web Conference, (WWW 2002), Hawai, USA, 2002.
[5]
Mani, I., Bloedorn, E. Multi-document summarization by graph search and matching. In Proceedings of Fifteenth National Conference Artificial Intelligence, 1997, 622--628.
[6]
Schenker, A., Last, M., Bunke, H. Classification of web documents using a graph model. In Proc. of 7th International Conf. On Document Analysis and Recognition (IC-DAR2003), Scotland, Computer Society Press, 2003.
[7]
Montes-y-Gómez, N., López-López, A. and Gelbukh, A. Information Retrieval with Conceptual Graph Matching. Proc. DEXA-2000, 11th International Conference and Workshop on Database and Expert Systems Applications, Greenwich, England, 2000.
[8]
Srihari, R. K., Li, W., Niu, C. and Cornell, T. Infoxtract: A customizable intermediate level information extraction engine. Natural Language Engineering, 12 (4): 1--37, 2006.
[9]
Srinivasan, P. Text Mining: Generating hypothesis from Medline. JASIST, Vol. 55, pp. 396--413, 2004.

Cited By

View all
  • (2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
  • (2024)A Context-Supported Hyperlink Navigation Process2024 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp60711.2024.00099(452-456)Online publication date: 18-Feb-2024
  • (2024)Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLPExpert Systems with Applications10.1016/j.eswa.2023.122269239(122269)Online publication date: Apr-2024
  • Show More Cited By

Index Terms

  1. Graph-based text representation and knowledge discovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '07: Proceedings of the 2007 ACM symposium on Applied computing
    March 2007
    1688 pages
    ISBN:1595934804
    DOI:10.1145/1244002
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 March 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information retrieval
    2. knowledge discovery
    3. text representation

    Qualifiers

    • Article

    Conference

    SAC07
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exploring the Software Quality Maze: Detecting Scattered and Tangled Crosscutting Quality Concerns in Source Code in Support of Maintenance Tasksundefined10.12794/metadc2332577Online publication date: May-2024
    • (2024)A Context-Supported Hyperlink Navigation Process2024 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp60711.2024.00099(452-456)Online publication date: 18-Feb-2024
    • (2024)Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLPExpert Systems with Applications10.1016/j.eswa.2023.122269239(122269)Online publication date: Apr-2024
    • (2024)WebMap - Large Language Model-assisted Semantic Link Induction in the WebInnovations for Community Services10.1007/978-3-031-60433-1_8(121-131)Online publication date: 31-May-2024
    • (2024)State of the ArtSupporting Web Search and Navigation by an Overlay Linking Structure10.1007/978-3-031-48393-6_2(9-35)Online publication date: 3-Jan-2024
    • (2024)IntroductionSupporting Web Search and Navigation by an Overlay Linking Structure10.1007/978-3-031-48393-6_1(1-8)Online publication date: 3-Jan-2024
    • (2023)Maximal gSpan: Multi-Document Summarization through Frequent Subgraph Mining2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)10.1109/IMCOM56909.2023.10035618(1-7)Online publication date: 3-Jan-2023
    • (2023)Towards semantically enhanced detection of emerging quality-related concerns in source codeSoftware Quality Journal10.1007/s11219-023-09614-831:3(865-915)Online publication date: 17-Feb-2023
    • (2022)Events in Tweets: Graph-Based TechniquesRecent Advances in Computer Science and Communications10.2174/266625581399920090413375915:2(155-169)Online publication date: Feb-2022
    • (2022)Citation recommendation using semantic representation of cited papers’ relations and contentExpert Systems with Applications10.1016/j.eswa.2021.115826187(115826)Online publication date: Jan-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media