Efficient monitoring of personalized hot news over Web 2.0 streams

Haghani, Parisa; Michel, Sebastian; Aberer, Karl

doi:10.1007/s00450-011-0178-9

Efficient monitoring of personalized hot news over Web 2.0 streams

Special Issue Paper
Published: 21 May 2011

Volume 27, pages 81–92, (2012)
Cite this article

Computer Science - Research and Development

Parisa Haghani¹,
Sebastian Michel² &
Karl Aberer¹

142 Accesses
3 Citations
Explore all metrics

Abstract

Web 2.0 streams, like blog postings, micro-blogging tweets, or RSS feeds from online communities, offer a wealth of latest news about real-world events and societal discussion. From a user’s perspective, it becomes harder and harder to get a decent overview of recent events, given these massive streams of information that are continuously flowing. Ideally, a system would continuously put together recent information, ranked by the current social impact but also weighted by the users’ personal interests. In this work, we develop methods to meet these requirements. The presented approach continuously tracks the most popular tags attached to the incoming items and based on this, constructs a dynamic top-k query. By continuous evaluation of this query on the incoming stream, we are able to retrieve the currently hottest items. These hottest items are then fed into an engine that re-ranks them w.r.t. user specified interests, given in form of term based topic descriptions. This calls for high performance algorithms for efficient hot document retrieval and subsequently personalizing these documents based on user profiles, given the high rate of incoming data and the immense number of user profiles. In this work we present a combined solution, making use of our prior work on information filtering and showing how it can be used in combination with the current work, on how to continuously determine the hottest documents. To demonstrate the suitability of our approach, we perform a performance evaluation using a real-world dataset obtained from a weblog crawl.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Continuous Top-k Queries in Social Networks

Continuous Top-k Processing of Social Network Information Streams: A Vision

Large-Scale Real-Time News Recommendation Based on Semantic Data Analysis and Users’ Implicit and Explicit Behaviors

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Allan J, Carbonell J, Doddington G, Yamron J, Yang Y (1998a) Topic detection and tracking pilot study final report. Computer Science Department. Carnegie Mellon University. Paper 341. http://repository.cmu.edu/compsci/341
Allan J, Papka R, Lavrenko V (1998b) On-line new event detection and tracking. In: SIGIR, pp 37–45
Google Scholar
Alon N, Gibbons PB, Matias Y, Szegedy M (2002) Tracking join and self-join sizes in limited storage. J Comput Syst Sci 64(3):719–747
Article MathSciNet MATH Google Scholar
Alvanaki F, Michel S, Ramamritham K, Weikum G (2011) Enblogue—emergent topic detection in Web 2.0 streams. In: SIGMOD conference
Google Scholar
Babcock B, Babu S, Datar M, Motwani R, Widom J (2002) Models and issues in data stream systems. In: PODS, pp 1–16
Google Scholar
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: ICDE, pp 421–430
Google Scholar
Calders T, Dexters N, Goethals B (2007) Mining frequent itemsets in a stream. In: ICDM, pp 83–92
Google Scholar
Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams. Theor Comput Sci 312(1):3–15
Article MathSciNet MATH Google Scholar
Cormode G, Muthukrishnan S (2003) What’s hot and what’s not: tracking most frequent items dynamically. In: PODS, pp 296–306
Google Scholar
Das G, Gunopulos D, Koudas N, Tsirogiannis D (2006) Answering top-k queries using views. In: VLDB, pp 451–462
Google Scholar
Das G, Gunopulos D, Koudas N, Sarkas N (2007) Ad-hoc top-k query answering for data streams. In: VLDB, pp 183–194
Google Scholar
Fagin R (2002) Combining fuzzy information: an overview. SIGMOD Rec 31(2):109–118
Article Google Scholar
Flajolet P, Martin GN (1985) Probabilistic counting algorithms for data base applications. J Comput Syst Sci 31(2):182–209
Article MathSciNet MATH Google Scholar
Flickr, photo sharing: http://www.flickr.com
Haghani P, Michel S, Aberer K (2010) The gist of everything new: personalized top-k processing over Web 2.0 streams. In: CIKM, pp 489–498
Google Scholar
Haghani P, Michel S, Aberer K (2011) Tracking hot-k items over Web 2.0 streams. In: BTW, pp 105–122
Google Scholar
He Q, Chang K, Lim EP (2007) Analyzing feature trajectories for event detection. In: SIGIR, pp 207–214
Google Scholar
Hotho A, Jäschke R, Schmitz C, Stumme G (2006) Trend detection in folksonomies. In: SAMT, pp 56–70
Google Scholar
Hristidis V, Koudas N, Papakonstantinou Y (2001) Prefer: a system for the efficient execution of multi-parametric ranked queries. In: SIGMOD conference, pp 259–270
Google Scholar
Jin C, Yi K, Yu JX, Lin X (2008) Sliding-window top-k queries on uncertain streams. PVLDB 1(1):301–312
Google Scholar
Kleinberg J (2006) Temporal dynamics of on-line information streams. In: Data stream management: processing high-speed data. Springer, Berlin
Google Scholar
Kumar R, Novak J, Raghavan P, Tomkins A (2005) On the bursty evolution of blogspace. World Wide Web 8(2):159–178
Article Google Scholar
Kumar R, Punera K, Suel T, Vassilvitskii S (2009) Top-k aggregation using intersections of ranked inputs. In: WSDM, pp 222–231
Chapter Google Scholar
Mathioudakis M, Koudas N (2009) Efficient identification of starters and followers in social media. In: EDBT, pp 708–719
Chapter Google Scholar
Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: SIGMOD conference, pp 1155–1158
Chapter Google Scholar
Mehlhorn K, Sanders P (2008) Algorithms and data structures: the basic toolbox. Springer, Berlin
MATH Google Scholar
Mouratidis K, Pang H (2009) An incremental threshold method for continuous text search queries. In: ICDE, pp 1187–1190
Google Scholar
Mouratidis K, Bakiras S, Papadias D (2006) Continuous monitoring of top-k queries over sliding windows. In: SIGMOD conference, pp 635–646
Google Scholar
Muthukrishnan S (2005) Data streams: algorithms and applications. In: Foundations and trends in theoretical computer science. Now Publishers Inc
Yan TW, Garcia-Molina H (1994) Index structures for selective dissemination of information under the boolean model. ACM Trans Database Syst 19(2):332–364
Article Google Scholar
Yi K, Yu H, Yang J, Xia G, Chen Y (2003) Efficient maintenance of materialized top-k views. In: ICDE, pp 189–200
Google Scholar
Youtube, broadcast yourself: http://www.youtube.com/

Download references

Author information

Authors and Affiliations

EPFL IC ISC LSIR, Station 14, 1015, Lausanne, Switzerland
Parisa Haghani & Karl Aberer
Saarland University, Campus El.7, 66123, Saarbrücken, Germany
Sebastian Michel

Authors

Parisa Haghani
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Michel
View author publications
You can also search for this author in PubMed Google Scholar
Karl Aberer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Michel.

Additional information

This work is partially supported by NCCR-MICS (grant number 5005-67322), the FP7 EU Project OKKAM (contract no. ICT-215032), and the German Research Foundation (DFG) Cluster of Excellence “Multimodal Computing and Interaction” (MMCI).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haghani, P., Michel, S. & Aberer, K. Efficient monitoring of personalized hot news over Web 2.0 streams. Comput Sci Res Dev 27, 81–92 (2012). https://doi.org/10.1007/s00450-011-0178-9

Download citation

Published: 21 May 2011
Issue Date: February 2012
DOI: https://doi.org/10.1007/s00450-011-0178-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient monitoring of personalized hot news over Web 2.0 streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Continuous Top-k Queries in Social Networks

Continuous Top-k Processing of Social Network Information Streams: A Vision

Large-Scale Real-Time News Recommendation Based on Semantic Data Analysis and Users’ Implicit and Explicit Behaviors

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Efficient monitoring of personalized hot news over Web 2.0 streams

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Continuous Top-k Queries in Social Networks

Continuous Top-k Processing of Social Network Information Streams: A Vision

Large-Scale Real-Time News Recommendation Based on Semantic Data Analysis and Users’ Implicit and Explicit Behaviors

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation