Abstract
Web 2.0 streams, like blog postings, micro-blogging tweets, or RSS feeds from online communities, offer a wealth of latest news about real-world events and societal discussion. From a user’s perspective, it becomes harder and harder to get a decent overview of recent events, given these massive streams of information that are continuously flowing. Ideally, a system would continuously put together recent information, ranked by the current social impact but also weighted by the users’ personal interests. In this work, we develop methods to meet these requirements. The presented approach continuously tracks the most popular tags attached to the incoming items and based on this, constructs a dynamic top-k query. By continuous evaluation of this query on the incoming stream, we are able to retrieve the currently hottest items. These hottest items are then fed into an engine that re-ranks them w.r.t. user specified interests, given in form of term based topic descriptions. This calls for high performance algorithms for efficient hot document retrieval and subsequently personalizing these documents based on user profiles, given the high rate of incoming data and the immense number of user profiles. In this work we present a combined solution, making use of our prior work on information filtering and showing how it can be used in combination with the current work, on how to continuously determine the hottest documents. To demonstrate the suitability of our approach, we perform a performance evaluation using a real-world dataset obtained from a weblog crawl.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Allan J, Carbonell J, Doddington G, Yamron J, Yang Y (1998a) Topic detection and tracking pilot study final report. Computer Science Department. Carnegie Mellon University. Paper 341. http://repository.cmu.edu/compsci/341
Allan J, Papka R, Lavrenko V (1998b) On-line new event detection and tracking. In: SIGIR, pp 37–45
Alon N, Gibbons PB, Matias Y, Szegedy M (2002) Tracking join and self-join sizes in limited storage. J Comput Syst Sci 64(3):719–747
Alvanaki F, Michel S, Ramamritham K, Weikum G (2011) Enblogue—emergent topic detection in Web 2.0 streams. In: SIGMOD conference
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: PODS, pp 1–16
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: ICDE, pp 421–430
Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: ICDM, pp 83–92
Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams. Theor Comput Sci 312(1):3–15
Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: PODS, pp 296–306
Das G, Gunopulos D, Koudas N, Tsirogiannis D (2006) Answering top-k queries using views. In: VLDB, pp 451–462
Das G, Gunopulos D, Koudas N, Sarkas N (2007) Ad-hoc top-k query answering for data streams. In: VLDB, pp 183–194
Fagin R (2002) Combining fuzzy information: an overview. SIGMOD Rec 31(2):109–118
Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209
Flickr, photo sharing: http://www.flickr.com
Haghani P, Michel S, Aberer K (2010) The gist of everything new: personalized top-k processing over Web 2.0 streams. In: CIKM, pp 489–498
Haghani P, Michel S, Aberer K (2011) Tracking hot-k items over Web 2.0 streams. In: BTW, pp 105–122
He Q, Chang K, Lim EP (2007) Analyzing feature trajectories for event detection. In: SIGIR, pp 207–214
Hotho A, Jäschke R, Schmitz C, Stumme G (2006) Trend detection in folksonomies. In: SAMT, pp 56–70
Hristidis V, Koudas N, Papakonstantinou Y (2001) Prefer: a system for the efficient execution of multi-parametric ranked queries. In: SIGMOD conference, pp 259–270
Jin C, Yi K, Yu JX, Lin X (2008) Sliding-window top-k queries on uncertain streams. PVLDB 1(1):301–312
Kleinberg J (2006) Temporal dynamics of on-line information streams. In: Data stream management: processing high-speed data. Springer, Berlin
Kumar R, Novak J, Raghavan P, Tomkins A (2005) On the bursty evolution of blogspace. World Wide Web 8(2):159–178
Kumar R, Punera K, Suel T, Vassilvitskii S (2009) Top-k aggregation using intersections of ranked inputs. In: WSDM, pp 222–231
Mathioudakis M, Koudas N (2009) Efficient identification of starters and followers in social media. In: EDBT, pp 708–719
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: SIGMOD conference, pp 1155–1158
Mehlhorn K, Sanders P (2008) Algorithms and data structures: the basic toolbox. Springer, Berlin
Mouratidis K, Pang H (2009) An incremental threshold method for continuous text search queries. In: ICDE, pp 1187–1190
Mouratidis K, Bakiras S, Papadias D (2006) Continuous monitoring of top-k queries over sliding windows. In: SIGMOD conference, pp 635–646
Muthukrishnan S (2005) Data streams: algorithms and applications. In: Foundations and trends in theoretical computer science. Now Publishers Inc
Yan TW, Garcia-Molina H (1994) Index structures for selective dissemination of information under the boolean model. ACM Trans Database Syst 19(2):332–364
Yi K, Yu H, Yang J, Xia G, Chen Y (2003) Efficient maintenance of materialized top-k views. In: ICDE, pp 189–200
Youtube, broadcast yourself: http://www.youtube.com/
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is partially supported by NCCR-MICS (grant number 5005-67322), the FP7 EU Project OKKAM (contract no. ICT-215032), and the German Research Foundation (DFG) Cluster of Excellence “Multimodal Computing and Interaction” (MMCI).
Rights and permissions
About this article
Cite this article
Haghani, P., Michel, S. & Aberer, K. Efficient monitoring of personalized hot news over Web 2.0 streams. Comput Sci Res Dev 27, 81–92 (2012). https://doi.org/10.1007/s00450-011-0178-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00450-011-0178-9