Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mining Profitable and Concise Patterns in Large-Scale Internet of Things Environments

Published: 01 January 2021 Publication History

Abstract

In recent years, HUIM (or a.k.a. high-utility itemset mining) can be seen as investigated in an extensive manner and studied in many applications especially in basket-market analysis and its relevant applications. Since current basket-market scenario also involves IoT equipment to collect information, i.e., sensor or smart devices, it is necessary to consider the mining of HUIs (or a.k.a. high-utility itemsets) in a large-scale database especially with IoT situations. First, a GA-based MapReduce model is presented in this work known as GMR-Miner for mining closed patterns with high utilization in large-scale databases. The k-means model is initially adopted to group transactions regarding their relevant correlation based on the frequency factor. A genetic algorithm (GA) is utilized in the developed MapReduce framework that can be used to explore the potential and possible candidates in a limited time. Also, the developed 3-tier MapReduce model can be easily deployed in Spark for the handlings of any database of large scale for knowledge discovery of closed patterns with high utilization. We created sets of extensive experimental environments for evaluating the results of the developed GMR-Miner compared to the well-known and state-of-the-art CLS-Miner. We present our in-depth results to show that the developed GMR-Miner outperforms CLS-Miner in many criteria, i.e., memory usage, scalability, and runtime.

References

[1]
M. J. Zaki and C. J. Hsiao, “Efficient algorithms for mining closed itemsets and their lattice structure,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 462–478, 2005.
[2]
B. Lin, F. Zhu, J. Zhang, J. Chen, X. Chen, N. Xiong, and J. Lloret Mauri, “A time-driven data placement strategy for a scientific workflow combining edge computing and cloud computing,” IEEE Transactions on Industrial Informatics, vol. 15, no. 7, pp. 4254–4265, 2019.
[3]
Y. Qu and N. Xiong, “RFH: a resilient, fault-tolerant and high-efficient replication algorithm for distributed cloud storage,” in 2012 41st International Conference on Parallel Processing, pp. 520–529, Pittsburgh, PA, USA, 2012.
[4]
R. Agrawal, T. Imielinski, and A. N. Swami, “Database mining: a performance perspective,” IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 6, pp. 914–925, 1993.
[5]
A. Belhadi, Y. Djenouri, J. C. W. Lin, and A. Cano, “A general-purpose distributed pattern mining system,” Applied Intelligence, vol. 50, no. 9, pp. 2647–2662, 2020.
[6]
G. Grahne and J. Zhu, “Fast algorithms for frequent itemset mining using FP-trees,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 10, pp. 1347–1362, 2005.
[7]
R. U. Kiran, A. Anirudh, C. Saideep, M. Toyoda, P. K. Reddy, and M. Kitsuregawa, “Finding periodic-frequent patterns in temporal databases using periodic summaries,” Data Science and Pattern Recognition, vol. 3, no. 2, pp. 24–46, 2019.
[8]
H. Si, J. Zhou, Z. Chen, J. Wan, N. N. Xiong, W. Zhang, and A. V. Vasilakos, “Association rules mining among interests and applications for users on social networks,” IEEE Access, vol. 7, pp. 116014–116026, 2019.
[9]
U. Yun, H. Ryang, and K. H. Ryu, “High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates,” Expert Systems with Applications, vol. 41, no. 8, pp. 3861–3878, 2014.
[10]
J. Han, J. Pei, Y. Yin, and R. Mao, “Mining frequent patterns without candidate generation: a frequent-pattern tree approach,” Data Mining and Knowledge Discovery, vol. 8, no. 1, pp. 53–87, 2004.
[11]
R. Chan, Q. Yang, and Y. D. Shen, “Mining high utility itemsets,” in IEEE International Conference on Data Mining, pp. 19–26, Melbourne, FL, USA, 2003.
[12]
W. Gan, J. C. W. Lin, P. Fournier-Viger, H. C. Chao, V. Tseng, and P. S. Yu, “A survey of utility-oriented pattern mining,” IEEE Transactions on Knowledge and Data Engineering, vol. 33, pp. 1306–1327, 2021.
[13]
R. Gunawan, E. Winarko, and R. Pulungan, “A BPSO-based method for high-utility itemset mining without minimum utility threshold,” Knowledge-Based Systems, vol. 190, article 105164, 2020.
[14]
Y. Liu, W. Liao, and A. N. Choudhary, “A two-phase algorithm for fast discovery of high utility itemsets,” in Advances in Knowledge Discovery and Data Mining. PAKDD 2005, vol. 3518 of Lecture Notes in Computer Science, T. B. Ho, D. Cheung, and H. Liu, Eds., pp. 689–695, Springer, Berlin, Heidelberg, 2005.
[15]
M. Liu and J. Qu, “Mining high utility itemsets without candidate generation,” in ACM International Conference on Information and Knowledge Management, pp. 55–64, Maui, HI, USA, 2012.
[16]
H. Yao, H. J. Hamilton, and C. J. Butz, “A foundational approach to mining itemset utilities from databases,” in SIAM International Conference on Data Mining, pp. 482–486, Lake Buena Vista, Florida, US, 2004.
[17]
V. S. Tseng, B. Shie, C. Wu, and P. S. Yu, “Efficient algorithms for mining high utility itemsets from transactional databases,” IEEE Transactions Knowledge and Data Engineering, vol. 25, no. 8, pp. 1772–1786, 2013.
[18]
J. C. W. Lin, T. Hong, and W. Lu, “An effective tree structure for mining high utility itemsets,” Expert Systems with Applications, vol. 38, no. 6, pp. 7419–7424, 2011.
[19]
P. Fournier-Viger, C. W. Wu, S. Zida, and V. S. Tseng, “FHM: faster high-utility itemset mining using estimated utility co-occurrence pruning,” in Foundations of Intelligent Systems. ISMIS 2014, vol. 8502 of Lecture Notes in Computer Science, T. Andreasen, H. Christiansen, J. C. Cubero, and Z. W. Raś, Eds., pp. 83–92, Springer, Cham, 2014.
[20]
J. Liu, K. Wang, and B. C. M. Fung, “Direct discovery of high utility itemsets without candidate generation,” in 2012 IEEE 12th International Conference on Data Mining, pp. 984–989, Brussels, Belgium, 2012.
[21]
C. Yin, S. Zhang, J. Wang, and N. N. Xiong, “Anomaly detection based on convolutional recurrent autoencoder for IoT time series,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 21, no. 14, pp. 15626–15634, 2020.
[22]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Efficient mining of association rules using closed itemset lattices,” Information Systems, vol. 24, no. 1, pp. 25–46, 1999.
[23]
C. Lucchese, S. Orlando, and R. Perego, “Fast and memory efficient mining of frequent closed itemsets,” IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 1, pp. 21–36, 2006.
[24]
B. Vo, L. T. T. Nguyen, N. Bui, T. D. D. Nguyen, V. N. Huynh, and T. P. Hong, “An efficient method for mining closed potential high-utility itemsets,” IEEE Access, vol. 8, pp. 31813–31822, 2020.
[25]
T. Wei, B. Wang, Y. Zhang, K. Hu, Y. Yao, and H. Liu, “FCHUIM: efficient frequent and closed high-utility itemsets mining,” IEEE Access, vol. 8, pp. 109928–109939, 2020.
[26]
V. S. Tseng, C. W. Wu, P. Fournier-Viger, and P. S. Yu, “Efficient algorithms for mining the concise and lossless representation of high utility itemsets,” IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 3, pp. 726–739, 2015.
[27]
C. W. Wu, P. Fournier-Viger, J. Y. Gu, and V. S. Tseng, “Mining closed+ high utility itemsets without candidate generation,” in 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 187–194, Tainan, Taiwan, 2015.
[28]
T. L. Dam, K. Li, P. Fournier-Viger, and Q. H. Duong, “CLS-Miner: efficient and effective closed high-utility itemset mining,” Frontiers of Computer Science, vol. 13, no. 2, pp. 357–381, 2019.
[29]
Y. C. Lin, C. W. Wu, and V. S. Tseng, “Mining high utility itemsets in big data,” in Advances in Knowledge Discovery and Data Mining. PAKDD 2015, vol. 9078 of Lecture Notes in Computer Science, T. Cao, E. P. Lim, Z. H. Zhou, T. B. Ho, D. Cheung, and H. Motoda, Eds., pp. 649–661, Springer, Cham, 2015.
[30]
J. Dean and S. Ghemawat, “MapReduce,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, 2008.
[31]
M. Y. Lin, P. Y. Lee, and S. C. Hsueh, “Apriori-based frequent itemset mining algorithms on MapReduce,” in The International Conference on Ubiquitous Information Management and Communication, pp. 1–8, Kuala Lumpur, Malaysia, 2012.
[32]
J. H. Holland, Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press, 1992.
[33]
K. Elbaz, S. L. Shen, A. Zhou, D. J. Yuan, and Y. S. Xu, “Optimization of EPB shield performance with adaptive neuro-fuzzy inference system and genetic algorithm,” Applied Sciences, vol. 9, no. 4, pp. 780–797, 2019.
[34]
R. Guha, M. Ghosh, S. Kapri, S. Shaw, S. Mutsuddi, V. Bhateja, and R. Sarkar, “Deluge based genetic algorithm for feature selection,” Evolutionary Intelligence, vol. 14, pp. 357–367, 2021.
[35]
H. R. Qodmanan, M. Nasiri, and B. Minaei-Bidgoli, “Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence,” Expert Systems with Applications, vol. 38, no. 1, pp. 288–298, 2011.
[36]
S. Kannimuthu and K. Premalatha, “Discovery of high utility itemsets using genetic algorithm with ranked mutation,” Applied Artificial Intelligence, vol. 28, no. 4, pp. 337–359, 2014.
[37]
W. Song and C. Huang, “Mining high average-utility itemsets based on particle swarm optimization,” Data Science and Pattern Recognition, vol. 4, no. 2, pp. 19–32, 2020.
[38]
S. Zida, P. Fournier-Viger, J. C. W. Lin, C. W. Wu, and V. S. Tseng, “EFIM: a fast and memory efficient algorithm for high-utility itemset mining,” Knowledge and Information Systems, vol. 51, no. 2, pp. 595–625, 2017.
[39]
G. Srivastava, J. C. W. Lin, M. Pirouz, Y. Li, and U. Yun, “A pre-large weighted-fusion system of sensed high-utility patterns,” IEEE Sensors Journal, 2021.
[40]
C. Zhang, G. Almpanidis, W. Wang, and C. Liu, “An empirical evaluation of high utility itemset mining algorithms,” Expert Systems with Applications, vol. 101, pp. 91–115, 2018.
[41]
P. Franti and S. Sieranoja, “How much can k-means be improved by using better initialization and repeats?” Pattern Recognition, vol. 93, pp. 95–112, 2019.
[42]
E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “DBSCAN revisited, revisited,” ACM Transactions on Database Systems, vol. 42, no. 3, pp. 1–21, 2017.
[43]
P. Fournier-Viger, J. C. W. Lin, A. Gomariz, T. Gueniche, A. Soltani, Z. Deng, and H. T. Lam, “The SPMF open-source data mining library version 2,” in Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2016, vol. 9853 of Lecture Notes in Computer Science, B. Berendt et al., Eds., pp. 36–40, Springer, Cham, 2016.

Index Terms

  1. Mining Profitable and Concise Patterns in Large-Scale Internet of Things Environments
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Wireless Communications & Mobile Computing
    Wireless Communications & Mobile Computing  Volume 2021, Issue
    2021
    14355 pages
    This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    Publisher

    John Wiley and Sons Ltd.

    United Kingdom

    Publication History

    Published: 01 January 2021

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 0
      Total Downloads
    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media