Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1989323.1989391acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TI: an efficient indexing mechanism for real-time search on tweets

Published: 12 June 2011 Publication History

Abstract

Real-time search dictates that new contents be made available for search immediately following their creation. From the database perspective, this requirement may be quite easily met by creating an up-to-date index for the contents and measuring search quality by the time gap between insertion time and availability of the index. This approach, however, poses new challenges for micro-blogging systems where thousands of concurrent users may upload their micro-blogs or tweets simultaneously. Due to the high update and query loads, conventional approaches would either fail to index the huge amount of newly created contents in real time or fall short of providing a scalable indexing service.
In this paper, we propose a tweet index called the TI (Tweet Index), an adaptive indexing scheme for microblogging systems such as Twitter. The intuition of the TI is to index the tweets that may appear as a search result with high probability and delay indexing some other tweets. This strategy significantly reduces the indexing cost without compromising the quality of the search results. In the TI, we also devise a new ranking scheme by combining the relationship between the users and tweets. We group tweets into topics and update the ranking of a topic dynamically. The experiments on a real Twitter dataset confirm the efficiency of the TI.

References

[1]
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated selection of materialized views and indexes in sql databases. In VLDB, pages 496--505, 2000.
[2]
L. Backstrom, J. Kleinberg, R. Kumar, and J. Novak. Spatial variation in search engine queries. In WWW, pages 357--366, 2008.
[3]
E. Baralis, S. Paraboschi, and E. Teniente. Materialized views selection in a multidimensional database. In VLDB, pages 156--165, 1997.
[4]
D. Bertsimas, K. Natarajan, and C.-P. Teo. Tight bounds on expected order statistics. Probab. Eng. Inf. Sci., 20(4):667--686, 2006.
[5]
R. Chirkova, C. Li, and J. Li. Answering queries using materialized views with minimum size. The VLDB Journal, 15(3):191--210, 2006.
[6]
M. D. Choudhury, Y.-R. Lin, H. Sundaram, K. S. Candan, L. Xie, and A. Kelliher. How does the sampling strategy impact the discovery of information diffusion in social media? In ICWSM, 2010.
[7]
S. Inc. Replies and retweets on twitter. 2010.
[8]
iProspect. iprospect search engine user behavior study.
[9]
H. V. Jagadish, B. C. Ooi, and Q. H. Vu. Baton: a balanced tree structure for peer-to-peer networks. In VLDB, pages 661--672, 2005.
[10]
B. J. Jansen, G. Campbell, and M. Gregg. Real time search user behavior. In CHI, pages 3961--3966, 2010.
[11]
A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In WebKDD, pages 56--65, 2007.
[12]
W. Li. Random texts exhibit zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, pages 1842--1845, 1992.
[13]
B. T. Loo, J. M. Hellerstein, R. Huebsch, S. Shenker, and I. Stoica. Enhancing p2p file-sharing with an internet-scale query processor. In VLDB, pages 432--443, 2004.
[14]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Technical Report, Stanford University, 1998.
[15]
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, pages 851--860, 2010.
[16]
J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D. Lieberman, and J. Sperling. Twitterstand: news in tweets. In GIS, pages 42--51, 2009.
[17]
J. Seo, W. B. Croft, and D. A. Smith. Online community search using thread structure. In CIKM, pages 1907--1910, 2009.
[18]
P. Seshadri and A. N. Swami. Generalized partial indexes. In ICDE, pages 420--427, 1995.
[19]
A. Silberstein, J. Terrace, B. F. Cooper, and R. Ramakrishnan. Feeding frenzy: selectively materializing users' event feeds. In SIGMOD, pages 831--842, 2010.
[20]
M. Stonebraker. The case for partial indexes. SIGMOD Rec., 18(4):4--11, 1989.
[21]
A. Sun, M. Hu, and E.-P. Lim. Searching blogs and news: a study on popular queries. In SIGIR, pages 729--730, 2008.
[22]
I. Tatarinov, S. D. Viglas, K. Beyer, J. Shanmugasundaram, E. Shekita, and C. Zhang. Storing and querying ordered xml using a relational database system. In SIGMOD, pages 204--215, 2002.
[23]
J. Weng, E.-P. Lim, J. Jiang, and Q. He. Twitterrank: finding topic-sensitive influential twitterers. In WSDM, pages 261--270, 2010.
[24]
K. Wu, E. J. Otoo, and A. Shoshani. Compressing bitmap indexes for faster search operations. In SSDBM, pages 99--108, 2002.
[25]
S. Wu, J. Li, B. C. Ooi, and K.-L. Tan. Just-in-time query retrieval over partially indexed data on structured p2p overlays. In SIGMOD, pages 279--290, 2008.
[26]
W. Xi, J. Lind, and E. Brill. Learning effective ranking functions for newsgroup search. In SIGIR, pages 394--401, 2004.
[27]
J. Yang, K. Karlapalem, and Q. Li. Algorithms for materialized view design in data warehousing environment. In VLDB, pages 136--145, 1997.
[28]
C. Yu, B. C. Ooi, K.-L. Tan, and H. V. Jagadish. Indexing the distance: An efficient method to knn processing. In VLDB, pages 421--430, 2001.

Cited By

View all
  • (2023)Khronos: A Real-Time Indexing Framework for Time Series Databases on Large-Scale Performance Monitoring SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614944(1607-1616)Online publication date: 21-Oct-2023
  • (2022)A Sketching Approach for Obtaining Real-Time Statistics Over Data Streams in CloudIEEE Transactions on Cloud Computing10.1109/TCC.2020.298702310:2(1462-1475)Online publication date: 1-Apr-2022
  • (2022)Immediate Text Search on Streams Using Apoptosic IndexesAdvances in Information Retrieval10.1007/978-3-030-99736-6_11(157-169)Online publication date: 5-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. index
  2. ranking
  3. real-time search

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Khronos: A Real-Time Indexing Framework for Time Series Databases on Large-Scale Performance Monitoring SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614944(1607-1616)Online publication date: 21-Oct-2023
  • (2022)A Sketching Approach for Obtaining Real-Time Statistics Over Data Streams in CloudIEEE Transactions on Cloud Computing10.1109/TCC.2020.298702310:2(1462-1475)Online publication date: 1-Apr-2022
  • (2022)Immediate Text Search on Streams Using Apoptosic IndexesAdvances in Information Retrieval10.1007/978-3-030-99736-6_11(157-169)Online publication date: 5-Apr-2022
  • (2021)Event Related Data Collection from Microblog StreamsDatabase and Expert Systems Applications10.1007/978-3-030-86475-0_31(319-331)Online publication date: 1-Sep-2021
  • (2020)MicroblogsSIGSPATIAL Special10.1145/3404820.340482712:1(41-52)Online publication date: 8-Jul-2020
  • (2020)A Data Indexing Technique to Improve the Search Latency of AND Queries for Large Scale Textual Documents2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)10.1109/BDCAT50828.2020.00019(37-46)Online publication date: Dec-2020
  • (2020)SAMA: a real-time Web search architectureInternational Journal of Computers and Applications10.1080/1206212X.2020.185924544:7(633-640)Online publication date: 22-Dec-2020
  • (2020)Identification of influential users on Twitter: A novel weighted correlated influence measure for Covid-19Chaos, Solitons & Fractals10.1016/j.chaos.2020.110037139(110037)Online publication date: Oct-2020
  • (2019)Ring: Real-Time Emerging Anomaly Monitoring System Over Text StreamsIEEE Transactions on Big Data10.1109/TBDATA.2017.26726725:4(506-519)Online publication date: 1-Dec-2019
  • (2019)Towards Longitudinal Analytics on Social Media Data2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00039(350-361)Online publication date: Apr-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media