Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1951365.1951391acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Unified structure and content search for personal information management systems

Published: 21 March 2011 Publication History

Abstract

User data stored in personal information systems is growing massively. Simultaneously, this data is increasingly distributed across multiple organizational domains such as email, music databases, and photo albums, some of which are structured automatically by applications. Powerful search tools are needed to help users locate data in these expanding yet fragmented data sets. In this paper, we present a novel fuzzy search approach that considers approximate matches to structure and content query conditions. Our framework uses unified data and query processing models so that structure conditions can be approximately matched by content and vice versa. Our models also unify external structure (e.g., directories) with internal structure (e.g., XML structure), supporting integrated queries matched to a single data domain. We propose indexes and algorithms for efficient query processing. We evaluate our approach using a real data set, showing that it can leverage structure information to significantly improve search accuracy, yet is robust to mistakes in query conditions.

References

[1]
S. Amer-Yahia, S. Cho, and D. Srivastava. Tree Pattern Relaxation. In Proc. of the EDBT Conference, 2002.
[2]
S. Amer-Yahia, N. Koudas, A. Marian, D. Srivastava, and D. Toman. Structure and Content Scoring for XML. In Proc. of the VLDB Conference, 2005.
[3]
S. Amer-Yahia, L. V. S. Lakshmanan, and S. Pandit. FleXPath: Flexible Structure and Full-Text Querying for XML. In Proc. of the SIGMOD Conference, 2004.
[4]
Lucene. http://lucene.apache.org/.
[5]
T. Blanc-Brude and D. L. Scapin. What do People Recall about their Documents?: Implications for Desktop Search Tools. In Proc. of the IUI Conference, 2007.
[6]
N. Bruno, N. Koudas, and D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. In Proc. of the SIGMOD Conference, 2002.
[7]
Y. Cai, X. L. Dong, A. Halevy, J. M. Liu, and J. Madhavan. Personal Information Management with SEMEX. In Proc. of the SIGMOD Conference, 2005.
[8]
D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML Documents via XML Fragments. In Proc. of the SIGIR Conference, 2003.
[9]
S. Cohen, C. Domshlak, and N. Zwerdling. On Ranking Techniques for Desktop Search. ACM Transactions on Information Systems (TOIS), 26(2), 2008.
[10]
W. B. Croft, P. Krovetz, and H. Turtle. Interactive retrieval of complex documents. Information Processing and Management, 26(5), 1990.
[11]
J.-P. Dittrich and M. A. V. Salles. iDM: A Unified and Versatile Data Model for Personal Dataspace Management. In Proc. of the VLDB Conference, 2006.
[12]
R. Fagin, A. Lotem, and M. Naor. Optimal Aggregation Algorithms for Middleware. Journal of Computer and System Sciences, 2003.
[13]
M. Franklin, A. Halevy, and D. Maier. From Databases to Dataspaces: a New Abstraction for Information Management. SIGMOD Record, 34(4), 2005.
[14]
N. Fuhr and K. Großjohann. XIRQL: An XML Query Language Based on Information Retrieval Concepts. ACM Transactions on Information Systems (TOIS), 22(2), 2004.
[15]
Google desktop. http://desktop.google.com.
[16]
T. Grust. Accelerating XPath Location Steps. In Proc. of the SIGMOD Conference, 2002.
[17]
D. R. Karger, K. Bakshi, D. Huynh, D. Quan, and V. Sinha. Haystack: A General Purpose Information Management Tool for End Users of Semistructured Data. In Proc. of the CIDR Conference, 2005.
[18]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008.
[19]
C. Peery, W. Wang, A. Marian, and T. D. Nguyen. Multi-Dimensional Search for Personal Information Management Systems. In Proc. of the EDBT Conference, 2008.
[20]
Apple MAC OS X spotlight. http://www.apple.com/macosx/features/spotlight.
[21]
M. Theobald, H. Bast, D. Majumdar, R. Schenkel, and G. Weikum. TopX: Efficient and Versatile Top-k Query Processing for Semistructured Data. VLDB Journal, 17(1), 2008.
[22]
W. Wang, C. Peery, A. Marian, and T. D. Nguyen. Efficient Multi-Dimensional Query Processing in Personal Information Management Systems. Technical Report DCS-TR-627, Computer Science, Rutgers University, 2008.
[23]
X. Wu, S. Souldatos, D. Theodoratos, T. Dalamagas, and T. Sellis. Efficient Evaluation of Generalized Path Pattern Queries on XML Data. In Proc. of the WWW Conference, 2008.
[24]
An XML Query Language. http://www.w3.org/TR/xquery/.
[25]
Z. Xu, M. Karlsson, C. Tang, and C. Karamanolis. Towards a Semantic-Aware File Store. In Proc. of the Workshop on Hot Topics in Operating Systems (HotOS), 2003.

Cited By

View all
  • (2012)A Novel PIM System and its Effective Storage Compression SchemeJournal of Software10.4304/jsw.7.6.1385-13927:6Online publication date: 1-Jun-2012
  • (2011)A survey on information re‐finding techniquesInternational Journal of Web Information Systems10.1108/174400811111875387:4(313-332)Online publication date: 22-Nov-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT/ICDT '11: Proceedings of the 14th International Conference on Extending Database Technology
March 2011
587 pages
ISBN:9781450305280
DOI:10.1145/1951365
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Microsoft Research: Microsoft Research

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 March 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. personal information search
  2. query path matching
  3. query processing
  4. structure and content search

Qualifiers

  • Research-article

Funding Sources

Conference

EDBT/ICDT '11
Sponsor:
  • Microsoft Research
EDBT/ICDT '11: EDBT/ICDT '11 joint conference
March 21 - 24, 2011
Uppsala, Sweden

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2012)A Novel PIM System and its Effective Storage Compression SchemeJournal of Software10.4304/jsw.7.6.1385-13927:6Online publication date: 1-Jun-2012
  • (2011)A survey on information re‐finding techniquesInternational Journal of Web Information Systems10.1108/174400811111875387:4(313-332)Online publication date: 22-Nov-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media