Abstract
The last two decades witnessed tremendous and astonishing developments in technology. This pushed for visible revolution in communication and electronics design leading to the production of computing devices of various sizes and capabilities, ranging from tiny sensors with limited specifications to mobile devices with huge power and rich functionalities, among others. These stimulated researchers and practitioners work hard seeking the best possible benefit from such novel devices to serve humanity. Gathering huge amounts of data is way easier and more affordable than ever before. Indeed, there is a clear shift from paper-based manual data collection to totally automated data collection even under sever conditions which were never feasible to consider before. Data is captured as a stream which may encapsulate some trends that may reveal certain aspects essential to our daily life. Identifying such trends in data streams is the main theme of the study described in this chapter. We mainly concentrate on real-time stream data analysis to better serve time-critical applications where instant decision making is crucial. This study builds on our methodology described in (Xylogiannopoulos et al. Frequent and non-frequent pattern detection in big data streams: an experimental simulation in 1 trillion data points. In: Advances in social networks analysis and mining (ASONAM), pp. 931–938, 2016) which considers detecting all repeated patterns in a big data stream. In the new dynamic approach, a sliding window is employed with LERP Reduced Suffix Array and the ARPaD algorithm to analyze one trillion digits composed from one million subsequences of one million digits each. We achieved like generating one data point every 300 ns.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Repeated patterns detection in big data using classification and parallelism on LERP reduced suffix arrays. Appl. Intell. 45(3), 567–597 (2016). https://doi.org/10.1007/s10489-016-0766-2
Xylogiannopoulos, K. F.: Data structures, algorithms and applications for big data analytics: single, multiple and all repeated patterns detection in discrete sequences. Unpublished PhD thesis, University of Calgary (2017)
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Analyzing very large time series using suffix arrays. Appl. Intell. 41(3), 941–955 (2014). https://doi.org/10.1007/s10489-014-0553-x
Apostolico, A., Preparata, F.P.: Optimal off-line detection of repetitions in a string. Theor. Comput. Sci. 22, 297–315 (1983)
Weiner, P.: Linear pattern matching algorithms. In: SWAT ‘73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (Swat 1973), pp. 1–11 (1973)
Guo, D., Hu, X., Xie, F., Wu, X.: Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 39, 57–74 (2013)
Wu, Y., Wang, L., Ren, J., Ding, W., Wu, X.: Mining sequential patterns with periodic wildcards. Appl. Intell. 41, 99–116 (2014)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. In: Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)
Franek, F., Smyth, W.F., Tang, Y.: Computing all repeats using suffix arrays. JALC. 8(4), 579–591 (2003)
Puglishi, S.J., Smyth, W.F., Yusufu, M.: Fast optimal algorithms for computing all the repeats in a string. In: Proceedings of PSC, pp. 161–169 (2008)
Cormode, G., Hadjieleftheriou, M.: Methods for finding frequent items in data streams. VLDB J. 19(1), 3–20 (2009). https://doi.org/10.1007/s00778-009-0172-z
Boyer, R.S., Moore, J.: A fast majority vote algorithm. Technical Report ICSCA-CMP-32, Institute for Computer Science, University of Texas (1981)
Demaine, E., López-Ortiz, A., Munro, J.I.: Frequency estimation of internet packet streams with limited space. In: European Symposium on Algorithms (ESA) (2002)
Karp, R., Papadimitriou, C., Shenker, S.: A simple algorithm for finding frequent elements in sets and bags. ACM Trans. Database Syst. 28, 51–55 (2003)
Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: International Conference on Very Large Data Bases, pp. 346–357 (2002)
Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient computation of frequent and top-k elements in data streams. In: International Conference on Database Theory (2005)
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: ACM SIGMOD International Conference on Management of Data (2001)
Shrivastava, N., Buragohain, C., Agrawal, D., Suri, S.: Medians and beyond: new aggregation techniques for sensor networks. In: Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, pp. 239–249. ACM (2004)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 137–147 (1999)
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithm. 55(1), 58–75 (2005)
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Sequential all frequent Itemsets detection – a method to detect all frequent sequential itemsets using LERP–reduced suffix array data structure and ARPaD algorithhm. In: Proceedings of International Conference on Advances in Social Networks Analysis and Mining, pp. 1141–1148 (2015). https://doi.org/10.1145/2808797.2809301
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Real time early warning DDoS attack detection. In: Proceedings of the 11th International Conference on Cyber Warfare and Security, (2016), pp. 344–351 (2016)
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Pattern detection and analysis in financial time series using suffix arrays. In: Doumpos, M., Zopounidis, C., Pardalos, P.M. (eds.) Financial Decision Making Using Computational Intelligence, pp. 129–157 (2012). https://doi.org/10.1007/978-1-4614-3773-4_5
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R.: Frequent and non-frequent pattern detection in big data streams: an experimental simulation in 1 trillion data points. In: Advances in Social Networks Analysis and Mining (ASONAM), pp. 931–938 (2016). https://doi.org/10.1109/ASONAM.2016.7752351
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Xylogiannopoulos, K.F., Karampelas, P., Alhajj, R. (2018). Dynamic Pattern Detection for Big Data Stream Analytics. In: Kaya, M., Kawash, J., Khoury, S., Day, MY. (eds) Social Network Based Big Data Analysis and Applications. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-78196-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-78196-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78195-2
Online ISBN: 978-3-319-78196-9
eBook Packages: Social SciencesSocial Sciences (R0)