Abstract
This paper presents GeoTrend+; a system approach to support scalable local trend discovery on recent microblogs, e.g., tweets, comments, online reviews, and check-ins, that come in real time. GeoTrend+ discovers top-k trending keywords in arbitrary spatial regions from recent microblogs that continuously arrive with high rates and a significant portion has uncertain geolocations. GeoTrend+ distinguishes itself from existing techniques in different aspects: (1) Discovering trends in arbitrary spatial regions, e.g., city blocks. (2) Considering both exact geolocations, e.g., accurate latitude/longitude coordinates, and uncertain geolocations, e.g., district-level or city-level, that represents a significant portion of past years microblogs. (3) Promoting recent microblogs as first-class citizens and optimizes different components to digest a continuous flow of fast data in main-memory while removing old data efficiently. (4) Providing various main-memory optimization techniques that are able to distinguish useful from useless data to effectively utilize tight memory resources while maintaining accurate query results on relatively large amounts of data. (5) Supporting various trending measures that effectively capture trending items under a variety of definitions that suit different applications. GeoTrend+ limits its scope to real-time data that is posted during the last T time units. To support its queries efficiently, GeoTrend+ employs an in-memory spatial index that is able to efficiently digest incoming data and expire data that is beyond the last T time units. The index also materializes top-k keywords in different spatial regions so that incoming queries can be processed with low latency. In peak times, the main-memory optimization techniques are employed to shed less important data to sustain high query accuracy with limited memory resources. Experimental results based on real data and queries show the scalability of GeoTrend+ to support high arrival rates and low query response time, and at least 90+% query accuracy even under limited memory resources.
Similar content being viewed by others
References
Abdelhaq H, Sengstock C, Gertz M (2013) EvenTweet: Online Localized Event Detection from Twitter. In: VLDB
Ahmed P, Hasan M, Kashyap A, Hristidis V, Tsotras VJ (2017) Efficient Computation of Top-k Frequent Terms over Spatio-temporal Ranges. In:s SIGMOD
Arasu A, Manku GS (2004) Approximate counts and quantiles over sliding windows. In: PODS
Aref WG, Samet H (1990) Efficient processing of window queries in the pyramid data structure. In: PODS
Social media ’outstrips TV’ as news source for young people. http://www.bbc.com/news/uk-36528256, 2016
After Boston Explosions, People Rush to Twitter for Breaking News. http://www.latimes.com/business/technology/la-fi-tn-after-boston-explosions-people-rush-to-twitter-for-breaking-news-20130415,0,3729783.story, 2013
Budak C, Agrawal D, El Abbadi A (2011) Structural trend analysis for online social networks. PVLDB 4(10):646–656
Budak C, Georgiou T, Agrawal D, El Abbadi A (2014) GeoScope: Online detection of Geo-Correlated information trends in social networks. In: VLDB
Busch M, Gade K, Larson B, Lok P, Luckenbill S, Lin J (2012) Earlybird: real-time search at twitter In: ICDE
Chi Y, Tseng BL, Tatemura J (2006) Eigen-Trend: trend analysis in the blogosphere based on singular value decompositions. In: CIKM, pp 68–77
Weibo S China Twitter, comes to rescue amid flooding in Beijing. http://thenextweb.com/asia/2012/07/23/sina-weibo-chinas-twitter-comes-to-rescue-amid-flooding-in-beijing/, 2012
Cunha E, Magno G, Comarela G, Almeida V, Gonçalves MA, Benevenuto F (2011) Analyzing the dynamic evolution of hashtags on twitter: a language-based approach. In: Proceedings of the Workshop on Languages in Social Media, pp 58–65
Datar M, Gionis A, Indyk P, Motwani R (2002) Maintaining stream statistics over sliding windows (extended abstract). In: SODA
Fagin R, Kumar R, Sivakumar D (2003) Comparing Top k Lists. SIAM J Discret Math 17(1):134–160
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: PODS, pp 102–113
Farazi S, et al. (2019) Top-K Spatial term queries on streaming data. In: ICDE
Feng W, Han J, Wang J, Aggarwal C, Huang J (2015) STREAMCUBE: Hierarchical Spatio-temporal Hashtag Clustering for Event Exploration Over the Twitter Stream. In: ICDE
Finkel RA, Bentley JL (1974) Quad Trees: A Data Structure for Retrieval on Composite Keys. ACTA, 4(1)
Gao H, Tang J, Liu H (2012) Exploring Social-Historical ties on Location-Based social networks. In: The 6th Intl AAAI Conf on Weblogs and Social Media
Golab L, DeHaan D, Demaine ED, López-Ortiz A, Ian Munro J (2003) Identifying frequent items in sliding windows over on-line packet streams. In: Internet Measurement Comference
Us department of health and human services disease tracking. https://nowtrending.hhs.gov
Hong L, Ahmed A, Gurumurthy S, Smola AJ, Tsioutsiouliklis K (2012) Discovering geographical topics in the twitter stream. In: WWW
Huang J, Peng M, Wang H, Cao J, Gao W, Zhang X (2017) A probabilistic method for emerging topic tracking in microblog stream. World Wide Web 20(2):325–350
Ikawa Y, Enoki M, Tatsubori M (2012) Location inference using microblog messages. In: WWW
Indyk P, Koudas N, Muthukrishnan S (2000) Identifying representative trends in massive time series data sets using sketches. In: VLDB, pp 363–372
Jonathan C, Magdy A, Mokbel M, Jonathan A (2016) GARNET A holistic system approach for trending queries in microblogs. In: ICDE
Kenney JF, Sydney E (1962) Keeping. Mathematics of Statistics, Part 1, chapter 15, pp 252–285. van Nostrand 3rd edn
Kim K-S, Kojima I, Ogawa H (2016) Discovery of local topics by using latent spatio-temporal relationships in geo-social media. Int J Geogr Inf Sci 30(9):1899–1922
Krumm J, Eyewitness EH (2015) Identifying local events via space-time signals in twitter feeds. In: Proceedings of the 23rd Sigspatial International Conference on Advances in Geographic Information Systems, ACM, p 20
Lazaridis I, Mehrotra S (2001) Progressive approximate aggregate queries with a Multi-Resolution tree structure. In: SIGMOD, pp 401–412
Lee L-K, Ting HF (2006) A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: PODS
Li G, Jun H, Feng J (2014) Kian-lee tan effective location identification from microblogs. In: ICDE
Li R, Lei KH, Khadiwala R, Chen-Chuan K (2012) Chang. TEDAS: a twitter-based event detection and analysis system. In: ICDE
López IFV, Snodgrass RT, Moon B (2005) Spatiotemporal Aggregate Computation: A Survey. TKDE 17(2):271–286
Magdy A, Aly AM, Mokbel MF, Elnikety S, He Y, Nath S, Aref WG (2016) GeoTrend: Spatial Trending Queries on Real-time Microblogs. In: SIGSPATIAL
Magdy A, Mokbel MF, Elnikety S, Nath S, Mercury YH (2014) A memory-constrained spatio-temporal real-time search on microblogs. In: ICDE
Manku GS, Motwani R (2002) Approximate frequency counts over data streams. In: VLDB
Mathioudakis M, TwitterMonitor NK (2010) Trend detection over the twitter stream. In: SIGMOD
How Michael Jackson’s Death Shut Down Twitter, Brought Chaos to Google, and Killed Off Jeff Goldblum. https://www.dailymail.co.uk/sciencetech/article-1195651/How-Michael-Jacksons-death-shut-Twitter-overwhelmed-Google--killed-Jeff-Goldblum.html, 2009
Nath S, Lin F (2013) Lenin ravindranath, and jitu padhye. Smartads: Bringing contextual ads to mobile apps. In: ACM Mobisys
Nguyen K, Tran DA (2011) An analysis of activities in Facebook. In: IEEE Consumer communications and networking conference (CCNC)
Papadias D, Kalnis P, Zhang J, Tao Y (2001) Efficient OLAP operations in spatial data warehouses. In: SSTD, pp 443–459
Sankaranarayanan J, Samet H, Teitler BE, Lieberman MD, TwitterStand JS (2009) News in tweets. In: GIS
Shin S, Choi M, Choi J, Langevin S, Bethune C, Horne P, Kronenfeld N, Kannan R, Drake B, Park H et al (2017) Stexnmf: Spatio-temporally exclusive topic discovery for anomalous event detection. In: 2017 IEEE International Conference on Data Mining (ICDM), IEEE, pp 435–444
Skovsgaard A, Sidlauskas D, Jensen CS (2014) Scalable top-k spatio-temporal term querying. In: ICDE, pp 148–159
Tao Y, Kollios G, Considine J, Li F, Papadias D (2004) Spatio-Temporal Aggregation using sketches. In: ICDE, p 214–225
Trends 24. http://trends24.in
Twitter Location Trends. https://support.twitter.com/articles/101125#Trend_Location
Le HV, Takasu A (2018) Parallelizing top-k frequent spatio-temporal terms computation on key-value stores. In: SIGSPATIAL
Weber I, Garimella VRK (2014) Visualizing user-defined, discriminative geo-temporal twitter activity. In ICWSM
Wei H, Sankaranarayanan J, Samet H (2017) Finding and tracking local twitter users for news detection. In: SIGSPATIAL
Wei H, Sankaranarayanan J, Samet H (2017) Measuring spatial influence of twitter users by interactions. In: Proceedings of the 1st ACM SIGSPATIAL Workshop on Analytics for Local Events and News
Wei H, Sankaranarayanan J, Samet H (2018) Enhancing local live tweet stream to detect news. In: Proceedings of the 2nd ACM SIGSPATIAL Workshop on Analytics for Local Events and News
Lingkun W, Lin W, Xiao X, Yabo X (2013) LSII An indexing structure for exact Real-Time search on microblogs. In: ICDE
Zhang Donghui, Tsotras VJ, Gunopulos D (2002) Efficient aggregation over objects with extent. In: PODS, pp 121–132
Zhang T, Zhou B, Huang J, Jia Y, Zhang B, Li Z (2017) A refined method for detecting interpretable and Real-Time bursty topic in microblog stream. In: WISE
Acknowledgments
Amr Magdy acknowledges the support of the National Science Foundation under Grants Number IIS-1849971, SES-1831615, and CNS-1837577. Mohamed Mokbel acknowledges the support of the National Science Foundation under Grants Number IIS-1525953, CNS-1512877, and IIS-1907855. Walid Aref acknowledges the support of the National Science Foundation under Grants Number III-1815796, and IIS-1910216.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Trend Line Slope
Appendix: Trend Line Slope
GeoTrend+ uses statistical linear regression slope to measure the trendiness of a certain keyword. The following Lemma derives the equation that determines the trendiness of a keyword:
Lemma 1
Given a keyword consecutive frequencies vector f =[f0,f1,...,fN], thekeyword trend line can be estimated with the following formula:
Proof
The simple linear regression slope Trendreg of x and y is given with the following equation:
Where Mean(x) is the average value of the vector and xy is a vector that results from value-wise multiplication of the vectors x and y. In GeoTrend+, the vector x values are always constants while the vector y contains the frequencies of a keyword W. Thus values of vector x are always be [1,2,3,...,N] while values of vector y are [f1,f2,f3,...,fN]. Thus, Mean(x2) can be simplified as \(\frac {(N+1)(2N+1)}{6}\). On the other hand, Mean(xy) can be calculated as \(\frac {{\sum }_{i=1}^{N} i \times f_{i}}{N}\). Substitutes both variables to Equation 1:
The equation above assumes that the measurement is used from the start of the stream and each keyword W starts from frequency 0. However, in GeoTrend+, we need to consider the start position of a keyword W by using the previous frequency, namely f0. Thus, the equation above can be modified to:
□
Rights and permissions
About this article
Cite this article
Almaslukh, A., Magdy, A., Aly, A.M. et al. Local trend discovery on real-time microblogs with uncertain locations in tight memory environments. Geoinformatica 24, 301–337 (2020). https://doi.org/10.1007/s10707-019-00380-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-019-00380-z