Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1148170.1148179acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Contextual search and name disambiguation in email using graphs

Published: 06 August 2006 Publication History
  • Get Citation Alerts
  • Abstract

    Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.

    References

    [1]
    M. Aery and S. Chakravarthy. EmailSift: Email classification based on structure and content. In ICDM, 2005.
    [2]
    A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, 2004.
    [3]
    R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In ICML, 2005.
    [4]
    H. Berger, M. Dittenbach, and D Merkl. An adaptive information retrieval system. based on associative networks. In APCCM, 2004.
    [5]
    V. R. Carvalho and W. W. Cohen. On the collective classification of email "speech acts". In SIGIR, 2005.
    [6]
    W. W. Cohen. Data integration using similarity joins and a word-based information representation language. ACM Transactions on Information Systems, 18(3):288--321, 2000.
    [7]
    W. W. Cohen, P. Ravikumar, and S. Fienberg. A comparison of string distance metrics for name-matching tasks. In IIWEB, 2003.
    [8]
    W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. Journal of Artificial Intelligence Research (JAIR), 10:243--270, 1999.
    [9]
    M. Collins. Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In ACL, 2002.
    [10]
    M. Collins and T. Koo. Discriminative reranking for natural language parsing. Computational Linguistics, 31(1):25--69, 2005.
    [11]
    K. Collins-Thompson and J. Callan. Query expansion using random walk models. In CIKM, 2005.
    [12]
    W. B. Croft and J. Lafferty. Language Modeling for Information Retrieval. Springer, 2003.
    [13]
    C. P. Diehl, L. Getoor, and G. Namata. Name reference resolution in organizational email archives. In SIAM, 2006.
    [14]
    M. Diligenti, M. Gori, and M. Maggini. Learning web page scores by error back-propagation. In IJCAI, 2005.
    [15]
    S. Haykin. Neural Networks. Macmillan College Publishing Company, 1994.
    [16]
    M. Hearst. Texttiling: Segmenting text into multi-paragraph subtopic passages. Computational Linguistics, 23(1):33--64, 1997.
    [17]
    G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In SIGKDD, 2002.
    [18]
    D. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationship for domain independent data cleaning. In SIAM, 2005.
    [19]
    B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In ECML, 2004.
    [20]
    O. Kurland and L. Lee. Pagerank without hyperlinks: Structural re-ranking using links induced by language models. In SIGIR, 2005.
    [21]
    D. E. Lewis and K. A. Knowles. Threading electronic mail: A preliminary study. Information Processing and Management, 1997.
    [22]
    B. Malin, E. M. Airoldi, and K. M. Carley. A social network analysis model for name disambiguation in lists. Journal of Computational and Mathematical Organization Theory, 11(2), 2005.
    [23]
    E. Minkov, R. C. Wang, and W. W. Cohen. Extracting personal names from emails: Applying named entity recognition to informal text. In HLT-EMNLP, 2005.
    [24]
    Z. Nie, Y. Zhang, J.-R. Wen, and W.-Y. Ma. Object-level ranking: Bringing order to web objects. In WWW, 2005.
    [25]
    L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Technical Report, Computer Science department, Stanford University, 1998.
    [26]
    G. Salton and C. Buckley. On the use of spreading activation methods in automatic information retrieval. In SIGIR, 1988.
    [27]
    G. Salton and C. Buckley. Global text matching for information retrieval. Science, 253:1012--1015, 1991.
    [28]
    G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summarization. Information Processing and Management, 33(2):193--208, 1997.
    [29]
    R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297--336, 1999.
    [30]
    K. Toutanova, C. D. Manning, and A. Y. Ng. Learning random walk models for inducing word dependency distributions. In ICML, 2004.
    [31]
    W. Xi, E. A. Fox, W. P. Fan, B. Zhang, Z. Chen, J. Yan, and D. Zhuang. Simfusion: Measuring similarity using unified relationship matrix. In SIGIR, 2005.
    [32]
    Y. Yang and C. Chute. An example-based mapping method for text classification and retrieval. ACM Transactions on Information Systems, 12(3), 1994.
    [33]
    D. Zhou, B. Scholkopf, and T. Hofmann. Semi-supervised learning on directed graphs. In NIPS, 2005.
    [34]
    X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In ICML, 2003.

    Cited By

    View all
    • (2024)Robust Chinese Short Text Entity Disambiguation Method Based on Feature Fusion and Contrastive LearningInformation10.3390/info1503013915:3(139)Online publication date: 29-Feb-2024
    • (2024)Adaptive deep learning for entity disambiguation via knowledge-based risk analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122342238:PEOnline publication date: 27-Feb-2024
    • (2021)It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family TreesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3096670(1-1)Online publication date: 2021
    • Show More Cited By

    Index Terms

    1. Contextual search and name disambiguation in email using graphs

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
        August 2006
        768 pages
        ISBN:1595933697
        DOI:10.1145/1148170
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 06 August 2006

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. email
        2. graph-based retrieval
        3. name disambiguation
        4. threading

        Qualifiers

        • Article

        Conference

        SIGIR06
        Sponsor:
        SIGIR06: The 29th Annual International SIGIR Conference
        August 6 - 11, 2006
        Washington, Seattle, USA

        Acceptance Rates

        Overall Acceptance Rate 792 of 3,983 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)10
        • Downloads (Last 6 weeks)0

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Robust Chinese Short Text Entity Disambiguation Method Based on Feature Fusion and Contrastive LearningInformation10.3390/info1503013915:3(139)Online publication date: 29-Feb-2024
        • (2024)Adaptive deep learning for entity disambiguation via knowledge-based risk analysisExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122342238:PEOnline publication date: 27-Feb-2024
        • (2021)It Runs in the Family: Unsupervised Algorithm for Alternative Name Suggestion Using Digitized Family TreesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3096670(1-1)Online publication date: 2021
        • (2021)Information ExtractionText Data Mining10.1007/978-981-16-0100-2_10(227-283)Online publication date: 21-Jan-2021
        • (2019)Morphological Disambiguation of Turkish with Free-order Co-occurrence StatisticsGümüşhane Üniversitesi Fen Bilimleri Enstitüsü Dergisi10.17714/gumusfenbil.430034Online publication date: 31-Jan-2019
        • (2019)Debiasing Vandalism Detection Models at WikidataThe World Wide Web Conference10.1145/3308558.3313507(670-680)Online publication date: 13-May-2019
        • (2018)GraPUProceedings of the ACM Symposium on Cloud Computing10.1145/3267809.3267811(301-312)Online publication date: 11-Oct-2018
        • (2018)Ontology-based Adaptive e-Textbook Platform for Student and Machine Co-Learning2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2018.8491480(1-7)Online publication date: Jul-2018
        • (2018)A Smart Email Client Prototype for Effective Reuse of Past RepliesIEEE Access10.1109/ACCESS.2018.28785236(69453-69471)Online publication date: 2018
        • (2017)Person entity linking in email with NIL detectionJournal of the Association for Information Science and Technology10.5555/3204593.320460368:10(2412-2424)Online publication date: 1-Oct-2017
        • Show More Cited By

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media