Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
tutorial

A Review on OLAP Technologies Applied to Information Networks

Published: 13 December 2019 Publication History

Abstract

Many real systems produce network data or highly interconnected data, which can be called information networks. These information networks form a critical component in modern information infrastructure, constituting a large graph data volume. The analysis of information network data covers several technological areas, among them OLAP technologies. OLAP is a technology that enables multi-dimensional and multi-level analysis on a large volume of data, providing aggregated data visualizations with different perspectives. This article presents a literature review on the main applications of OLAP technology in the analysis of information network data. To achieve such goal, it shows a systematic review to list the works that apply OLAP technologies in graph data. It defines seven comparison criteria (Materialization, Network, Selection, Aggregation, Model, OLAP Operations, Analytics) to qualify the works found based on their functionalities. The works are analyzed according to each criterion and discussed to identify trends and challenges in the application of OLAP in the information network.

References

[1]
Ziv Bar-yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in streaming algorithms, with an application to counting triangles in graphs. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, San Francisco, California, 623--632.
[2]
Albert-László Barabasi and Réka Albert. 1999. Emergence of scaling in random networks. Science (New York, N.Y.) 286, 5439 (October 1999), 509--512.
[3]
Seyed-Mehdi-Reza Beheshti, Boualem Benatallah, and Hamid Reza Motahari-Nezhad. 2016. Scalable graph-based OLAP analytics over process execution data. Distributed and Parallel Databases 34, 3 (2016), 379--423.
[4]
Seyed-Mehdi-Reza Beheshti, Boualem Benatallah, Hamid Reza Motahari-Nezhad, and Mohammad Allahbakhsh. 2012. A framework and a language for on-line analytical processing on graphs. In Proceedings of the Web Information Systems Engineering (WISE’12). 213--227.
[5]
Stephen P. Borgatti and Martin G. Everett. 2006. A graph-theoretic perspective on centrality. Social Networks 28, 4 (October 2006), 466--484.
[6]
Ulrik Brandes and Thomas Erlebach (Eds.). 2005. Network Analysis (Methodological Foundations), Vol. 3418. Springer, Berlin.
[7]
William Brendel and Sinisa Todorovic. 2011. Learning spatiotemporal graphs of human activities. In Proceedings of the 2011 International Conference on Computer Vision. IEEE, 778--785.
[8]
Michel Caradec. 2018. Graph OLAP with Neo4j. Retrieved from https://github.com/michelcaradec/Graph-OLAP.
[9]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of 4th SIAM International Conference on Data Mining.
[10]
Zakia Challal, Omar Boussaid, and Kamel Boukhalfa. 2017. Minimizing negative influence in social networks: A graph OLAP based approach. In Proceedings of the Database and Expert Systems Applications. 378--386.
[11]
Surajit Chaudhuri and Umeshwar Dayal. 1997. An overview of data warehousing and OLAP technology. ACM SIGMOD Record 26, 1 (March 1997), 65--74.
[12]
Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and Philip S. Yu. 2008. Graph OLAP: Towards online analytical processing on graphs. In Proceedings of the 2008 8th IEEE International Conference on Data Mining. IEEE, 103--112.
[13]
Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and Philip S. Yu. 2009. Graph OLAP: A multi-dimensional framework for graph data analysis. Knowledge and Information Systems 21, 1 (October 2009), 41--63.
[14]
George Colliat. 1996. OLAP, relational, and multidimensional database systems. ACM SIGMOD Record 25, 3 (September 1996), 64--69.
[15]
Nigel Collier and Son Doan. 2012. GENI-DB: A database of global events for epidemic intelligence. Bioinformatics (Oxford, England) 28, 8 (April 2012), 1186--1188.
[16]
Benoit Denis, Amine Ghrab, and Sabri Skhiri. 2013. A distributed approach for graph-oriented multidimensional analysis. In Proceedings of the 2013 IEEE International Conference on Big Data. IEEE, 9--16.
[17]
Reinhard Diestel. 2005. Graph Theory (Graduate Texts in Mathematics). Springer.
[18]
Lorena Etcheverry and Alejandro A. Vaisman. 2012. QB4OLAP: A new vocabulary for OLAP cubes on the semantic web. In Proceedings of the CEUR Workshop. 905.
[19]
Michalis Faloutsos, Petros Faloutsos, Christos Faloutsos, Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. 1999. On power-law relationships of the Internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM’99), Vol. 29. ACM, New York, New York, 251--262.
[20]
Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, Rajeev Motwani, and Jeffrey D. Ullman. 1998. Computing iceberg queries efficiently. In Proceedings of VLDB Conference. New York. http://www.vldb.org/conf/1998/p299.pdf.
[21]
Daniela Florescu, Alon Levy, and Alberto Mendelzon. 1998. Database techniques for the World-Wide Web. ACM SIGMOD Record 27, 3 (September 1998), 59--74.
[22]
Linton C. Freeman. 1978. Centrality in social networks conceptual clarification. Social Networks 1, 3 (January 1978), 215--239.
[23]
Amine Ghrab, Oscar Romero, Sabri Skhiri, Alejandro Vaisman, and Esteban Zimányi. 2015. A framework for building OLAP cubes on graphs. In Proceedings of the Advances in Databases and Information Systems. 92--105.
[24]
Amine Ghrab, Oscar Romero, Sabri Skhiri, and Esteban Zimányi. 2014. Analytics-Aware Graph Database Modeling. Technical Report. EURA NOVA Technical Series. Retrieved from https://research.euranova.eu/wp-content/uploads/analytics-aware-graph-database-modeling.pdf.
[25]
Amine Ghrab, Sabri Skhiri, Salim Jouili, and Esteban Zimányi. 2013. An analytics-aware conceptual model for evolving graphs. In Proceedings of the Data Warehousing and Knowledge Discovery. 1--12.
[26]
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining and Knowledge Discovery 1, 1 (March 1997), 29--53.
[27]
Per Hage and Frank Harary. 1995. Eccentricity and centrality in networks. Social Networks 17, 1 (January 1995), 57--63.
[28]
Jiawei Han. 2009. Mining heterogeneous information networks by exploring the power of links. In Lecture Notes in Computer Science. Springer, 13--30.
[29]
Venky Harinarayan, Anand Rajaraman, and Jeffrey D. Ullman. 1996. Implementing data cubes efficiently. ACM SIGMOD Record 25, 2 (1996), 205--216.
[30]
Wararat Jakawat, Cécile Favre, and Sabine Loudcher. 2014. OLAP on information networks: A new framework for dealing with bibliographic data. In New Trends in Databases and Information Systems. Springer, 361--370.
[31]
Wararat Jakawat, Cécile Favre, and Sabine Loudcher. 2016. Graphs enriched by cubes for OLAP on bibliographic networks. International Journal of Business Intelligence and Data Mining 11, 1 (2016), 85.
[32]
Wararat Jakawat, Cécile Favre, and Sabine Loudcher. 2016. OLAP cube-based graph approach for bibliographic data. In Proceedings of the 42nd International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM’16), Vol. 1548. Harrachov, Czech Republic, 87--99.
[33]
Glen Jeh and Jennifer Widom. 2002. SimRank: A measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’02). ACM, New York, 538.
[34]
Ming Ji, Jiawei Han, and Marina Danilevsky. 2011. Ranking-based classification of heterogeneous information networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11). ACM, New York, 1298.
[35]
Han Jiawei, Micheline Kamber, and Jian Pei. 2012. Data Mining. Concepts and Techniques. Morgan Kaufmann, 159--160.
[36]
Xin Jin, Jiawei Han, Liangliang Cao, Jiebo Luo, Bolin Ding, and Cindy Xide Lin. 2010. Visual cube and on-line analytical processing of images. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). ACM, New York, 849.
[37]
Benedikt Kämpgen, Seán O’Riain, and Andreas Harth. 2015. Interacting with statistical linked data via OLAP operations. In Proceedings of the Semantic Web: ESWC 2012 Satellite Events. 87--101.
[38]
Seok Kang, Suan Lee, and Jinho Kim. 2019. Distributed graph cube generation using Spark framework. The Journal of Supercomputing OnlineFirst (10 January 2019), 1--22. https://link.springer.com/journal/11227/onlineFirst/page/9.
[39]
Ralph. Kimball and Margy Ross. 2002. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. Wiley, 436 pages.
[40]
B. Kitchenham and S. Charters. 2007. Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report. Department of Computer Science University of Durham, Durham, UK. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.471.
[41]
Sangkeun Lee, Sreenivas R. Sukumar, Seokyong Hong, and Seung Hwan Lim. 2016. Enabling graph mining in RDF triplestores using SPARQL for holistic in-situ graph analysis. Expert Systems with Applications 48 (2016), 9--25. https://www.sciencedirect.com/science/article/pii/S0957417415007708?via%3Dihub.
[42]
Jure Leskovec, Lada A. Adamic, and Bernardo A. Huberman. 2007. The dynamics of viral marketing. ACM Transactions on the Web 1, 1 (May 2007), Article 5.
[43]
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. Retrieved from http://snap.stanford.edu/data.
[44]
Jure Leskovec and Rok Sosic. 2016. SNAP: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology 8, 1 (2016), 1.
[45]
Bingdong Li, Jeff Springer, George Bebis, and Mehmet Hadi Gunes. 2013. A survey of network flow applications. Journal of Network and Computer Applications 36, 2 (2013), 567--581.
[46]
Sabine Loudcher, Wararat Jakawat, Edmundo Pavel Soriano Morales, and Cécile Favre. 2015. Combining OLAP and information networks for bibliographic data analysis: A survey. Scientometrics 103, 2 (May 2015), 471--487.
[47]
Adriana Matei, Kuo-ming Chao, and Nick Godwin. 2015. OLAP for multidimensional semantic web databases. In Proceedings of the International Workshop on Business Intelligence for the Real-Time Enterprise. 81--96.
[48]
Konstantinos Morfonios and Georgia Koutrika. 2008. OLAP cubes for social searches: Standing on the shoulders of giants? In Proceedings of the 11th International Workshop on the Web and Databases (WebBD’08).
[49]
Nan Li, Ziyu Guan, Lijie Ren, Jian Wu, Jiawei Han, and Xifeng Yan. 2013. gIceberg: Towards iceberg analysis in large graphs. In Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE’13), Vol. 1. IEEE, 1021--1032.
[50]
Mark Newman. 2010. Networks: An Introduction (1st ed.). Oxford University Press.
[51]
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1998. The PageRank citation ranking: Bringing order to the web. In Proceedings of the 7th International World Wide Web Conference. Brisbane, Australia, 161--172.
[52]
Georgios A. Pavlopoulos, Maria Secrier, Charalampos N. Moschopoulos, Theodoros G. Soldatos, Sophia Kossida, Jan Aerts, Reinhard Schneider, and Pantelis G. Bagos. 2011. Using graph theory to analyze biological networks. BioData Mining 4, 1 (April 2011), 10.
[53]
Mary K. Pratt. 2017. What is BI? Business Intelligence Definition and Solutions | CIO. Retrieved from https://www.cio.com/article/2439504/business-intelligence/business-intelligence-definition-and-solutions.html.
[54]
Lu Qin, Jeffrey Xu Yu, Lijun Chang, Hong Cheng, Chengqi Zhang, and Xuemin Lin. 2014. Scalable big graph processing in MapReduce. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14). ACM, New York, 827--838.
[55]
Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu, and Hongyan Li. 2011. Efficient topological OLAP on information networks. In Proceedings of the Database Systems for Advanced Applications. 389--403.
[56]
Mehwish Riaz, Emilia Mendes, and Ewan Tempero. 2009. A systematic review of software maintainability prediction and metrics. In Proceedings of the 2009 3rd International Symposium on Empirical Software Engineering and Measurement. IEEE, 367--377.
[57]
Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and Philip S. Yu. 2015. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering 14, 8 (2015), 1--45.
[58]
Chuan Shi and Philip S. Yu. 2017. Heterogeneous Information Network Analysis and Applications. Springer International Publishing, Cham.
[59]
Yizhou Sun and Jiawei Han. 2012. Mining heterogeneous information networks: Principles and methodologies. Synthesis Lectures on Data Mining and Knowledge Discovery 3, 2 (July 2012), 1--159.
[60]
Yizhou Sun and Jiawei Han. 2013. Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter 14, 2 (April 2013), 20.
[61]
Yuanyuan Tian, Richard A. Hankins, and Jignesh M. Patel. 2008. Efficient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York,567.
[62]
Hanghang Tong, Christos Faloutsos, and Jia-yu Pan. 2006. Fast random walk with restart and its applications. In Proceedings of the 6th International Conference on Data Mining (ICDM’06). IEEE, 613--622.
[63]
Charalampos E. Tsourakakis. 2008. Fast counting of triangles in large real networks without counting: Algorithms and laws. In Proceedings of the 2008 8th IEEE International Conference on Data Mining. IEEE, 608--617.
[64]
C. Von Ferber, T. Holovatch, Yu Holovatch, and V. Palchykov. 2009. Public transport networks: Empirical analysis and modeling. European Physical Journal B 68, 2 (2009), 261--275.
[65]
Jingdong Wang, Sujia Luo, and Jie Yuan. 2018. Analysis of computer network and communication system. Journal of Networking and Telecommunications 1, 1 (February 2018), 507--550. Retrieved from http://systems.enpress-publisher.com/index.php/JNT/article/view/228/217.
[66]
Pengsen Wang, Bin Wu, and Bai Wang. 2015. TSMH graph cube: A novel framework for large scale multi-dimensional network analysis. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA’15). IEEE, 1--10.
[67]
Zhengkui Wang, Qi Fan, Huiju Wang, Kian-Lee Tan, Divyakant Agrawal, and Amr El Abbadi. 2014. Pagrol: Parallel graph OLAP over large-scale attributed graphs. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering, Vol. 1. IEEE, 496--507.
[68]
Lili Wu, Roshan Sumbaly, Chris Riccomini, Gordon Koo, Hyung Jin Kim, Jay Kreps, and Sam Shah. 2012. Avatara: OLAP for web-scale analytics products. Proceedings of the VLDB Endowment 5, 12 (August 2012), 1874--1877.
[69]
Dan Yin and Hong Gao. 2014. Iceberg cube query on heterogeneous information networks. In Proceedings of the Wireless Algorithms, Systems, and Applications. 740--749.
[70]
Dan Yin, Hong Gao, Zhaonian Zou, Jianzhong Li, and Zhipeng Cai. 2016. Approximate iceberg cube on heterogeneous dimensions. In Proceedings of the Database Systems for Advanced Applications, Vol. 9049. 82--97.
[71]
Mu Yin, Bin Wu, and Zengfeng Zeng. 2012. HMGraph OLAP: A novel framework for multi-dimensional heterogeneous network analysis. In Proceedings of the 15th International Workshop on Data Warehousing and OLAP (DOLAP’12). ACM, New York, 137.
[72]
Zixing Zhang, Bin Wu, and Zeao Wang. 2017. A parallel framework for large-scale multidimensional heterogeneous network analysis. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’17). ACM, New York, 625--626.
[73]
Peixiang Zhao, Xiaolei Li, Dong Xin, and Jiawei Han. 2011. Graph cube: OnWarehousing and OLAP multidimensional networks. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD’11). ACM, New York, 853.

Cited By

View all
  • (2024)Sustainable Information System for Enhancing Virtual Company Resilience Through Machine Learning in Smart City Socio-Economic ScenariosECONOMICS10.2478/eoik-2024-002212:2(69-96)Online publication date: 7-Jun-2024
  • (2024)The GraphTempo Framework for Exploring the Evolution of a Graph Through Pattern AggregationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341064736:11(7143-7156)Online publication date: Nov-2024
  • (2024)Novel Defense Method of Malicious Code Injection in High Concurrency Database2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL)10.1109/ICSADL61749.2024.00084(481-487)Online publication date: 13-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 1
February 2020
325 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3375789
Issue’s Table of Contents
© 2019 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 December 2019
Accepted: 01 October 2019
Revised: 01 October 2019
Received: 01 September 2018
Published in TKDD Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph-database
  2. OLAP
  3. cube
  4. data warehousing
  5. graph analytics
  6. information network

Qualifiers

  • Tutorial
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)95
  • Downloads (Last 6 weeks)16
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Sustainable Information System for Enhancing Virtual Company Resilience Through Machine Learning in Smart City Socio-Economic ScenariosECONOMICS10.2478/eoik-2024-002212:2(69-96)Online publication date: 7-Jun-2024
  • (2024)The GraphTempo Framework for Exploring the Evolution of a Graph Through Pattern AggregationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.341064736:11(7143-7156)Online publication date: Nov-2024
  • (2024)Novel Defense Method of Malicious Code Injection in High Concurrency Database2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL)10.1109/ICSADL61749.2024.00084(481-487)Online publication date: 13-Mar-2024
  • (2024)Data Cube Technology for Accessing of Large DatabaseProceedings of Fifth International Conference on Computer and Communication Technologies10.1007/978-981-99-9704-6_4(39-48)Online publication date: 14-Feb-2024
  • (2023)Information Technology Job Profile Using Average-Linkage Hierarchical Clustering AnalysisIEEE Access10.1109/ACCESS.2023.331120311(94647-94663)Online publication date: 2023
  • (2023)Multi-dimensional Data Optimal Classification Algorithm for Quality Evaluation of Distance Teaching in UniversitiesMobile Networks and Applications10.1007/s11036-023-02186-828:3(889-899)Online publication date: 1-Jun-2023
  • (2023)Raven: Benchmarking Monetary Expense and Query Efficiency of OLAP Engines on the CloudDatabase Systems for Advanced Applications10.1007/978-3-031-30678-5_45(593-605)Online publication date: 14-Apr-2023
  • (2023)Designing Hybrid Storage Architectures with RDBMS and NoSQL Systems: A SurveyInternational Conference on Advanced Intelligent Systems for Sustainable Development10.1007/978-3-031-26384-2_29(332-343)Online publication date: 10-Jun-2023
  • (2022)Modelo de Data Mart para mejorar la productividad de las empresas privadas, Caso empresa inmobiliariaEDUCATECONCIENCIA10.58299/edu.v30i37.57430:37(28-43)Online publication date: 6-Oct-2022
  • (2022)Sosyal Bilimlerde Veri Madenciliğinin Pazarlama Alanında KullanımıAnadolu Üniversitesi Sosyal Bilimler Dergisi10.18037/ausbd.122734222:Özel Sayı 2(197-212)Online publication date: 31-Dec-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media