Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3613828acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media Access

Published: 27 October 2023 Publication History

Abstract

The graph has recently enabled substantial advances in big data analysis. As graphs are increasing from billions to trillions, efficient graph processing requires large-scale distributed clusters, which have up to thousands of nodes. For big data applications of which the computation is relatively simple, while the communication, especially for imbalanced communication is the bottleneck on distributed clusters, where huge numbers of small messages are transferred through 2D-topology networks. Graph partitioning is the dominant factor to affect the performance of large-scale distributed graph processing. Current graph partitioning policies have paid extensive attention to the utilization of the power law of big graphs but failed to exploit the advanced architectural benefits of 2D topology. To address such a problem, this paper presents GraphMedia, a communication-balanced graph partitioning for distributed search at scale. The key idea of GraphMedia is a communication-balanced partitioning to balance communication based on hardware/software co-design, in which the power law of graphs would be explored to average communication among nodes, and communication would be balanced between row and column by leveraging advanced 2D-topology knowledge. We use both benchmarks and real-world graphs to validate GraphMedia. Specially, GraphMedia-based Graph500 tests on the Tianhe supercomputer are superior to the fastest systems in the latest Graph500 lists (June 2022). We finally apply GraphMedia to real-world graphs for online graph media access, which outperforms the state-of-the-art graph partitioning and graph system by orders of magnitude.

References

[1]
Scott Beamer, Krste Asanovic, and David Patterson. 2012. Direction-optimizing breadth-first search. In SC'12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 1--10.
[2]
Abhinav Bhatele. 2010. Automating topology aware mapping for supercomputers. (2010).
[3]
Fabio Checconi and Fabrizio Petrini. 2014. Traversing trillions of edges in real time: Graph exploration on large-scale parallel machines. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. IEEE, 425--434.
[4]
R. Chen, J. Shi, Y. Chen, and H. Chen. 2015. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. European Conference on Computer Systems (2015), 1--15.
[5]
Rong Chen, Jiaxin Shi, Yanzhe Chen, Binyu Zang, Haibing Guan, and Haibo Chen. 2019. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. ACM Transactions on Parallel Computing (TOPC), Vol. 5, 3 (2019), 1--39.
[6]
Yongzhi Chen and Yuefan Deng. 2009. A detailed analysis of communication load balance on BlueGene supercomputer. Computer physics communications, Vol. 180, 8 (2009), 1251--1258.
[7]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor Lee. 2019. Tigergraph: A native MPP graph database. arXiv preprint arXiv:1901.08248 (2019).
[8]
Michael Dewing. 2010. Social media: An introduction. Vol. 1. Library of Parliament Ottawa.
[9]
Martin Erwig. 2000. The graph Voronoi diagram with applications. Networks: An International Journal, Vol. 36, 3 (2000), 156--163.
[10]
Wenfei Fan, Tao He, Longbin Lai, Xue Li, Yong Li, Zhao Li, Zhengping Qian, Chao Tian, Lei Wang, Jingbo Xu, et al. 2021. GraphScope: a unified engine for big graph processing. Proceedings of the VLDB Endowment, Vol. 14, 12 (2021), 2879--2892.
[11]
fbcurrent. 2023. Laboratory for Web Algorithmics. https://law.di.unimi.it/webdata/fb-current/. Last accessed 03 March 2023.
[12]
Xinbiao Gan, Yiming Zhang, Ruibo Wang, Tiejun Li, Tiaojie Xiao, Ruigeng Zeng, Jie Liu, and Kai Lu. 2021. TianheGraph: Customizing Graph Search for Graph500 on Tianhe Supercomputer. IEEE Transactions on Parallel and Distributed Systems, Vol. 33, 4 (2021), 941--951.
[13]
Xinbiao Gan, Yiming Zhang, Ruigeng Zeng, Jie Liu, Ruibo Wang, Tiejun Li, Li Chen, and Kai Lu. 2022. XTree: Traversal-Based Partitioning for Extreme-Scale Graph Processing on Supercomputers. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2046--2059.
[14]
Siva Charan Reddy Gangireddy, Cheng Long, and Tanmoy Chakraborty. 2020. Unsupervised fake news detection: A graph-based approach. In Proceedings of the 31st ACM conference on hypertext and social media. 75--83.
[15]
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation. 17--30.
[16]
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. 2014. GraphX: graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation. 599--613.
[17]
graph500BFS. 2021. graph500BFS. http://graph500.org/?page_id=1009
[18]
Graph500.org. 2021. graph500. https://graph500.org/
[19]
http://graph500.org/. 2021. The Graph 500 List. https://graph500.org/ Last accessed 03 March 2022.
[20]
Fei Hu, Zhenlong Li, Chaowei Yang, and Yongyao Jiang. 2019. A graph-based approach to detecting tourist movement patterns using social media data. Cartography and Geographic Information Science, Vol. 46, 4 (2019), 368--382.
[21]
Jiewen Huang and Daniel J Abadi. 2016. Leopard: lightweight edge-oriented partitioning and replication for dynamic graphs. PVLDB, Vol. 9, 7 (2016), 540--551.
[22]
Twitter Inc. 2021. twitter-2010. https://law.di.unimi.it/webdata/twitter-2010/ Last accessed 03 December 2021.
[23]
Norman P Jouppi, Doe Hyun Yoon, George Kurian, Sheng Li, Nishant Patil, James Laudon, Cliff Young, and David Patterson. 2020. A domain-specific supercomputer for training deep neural networks. Commun. ACM, Vol. 63, 7 (2020), 67--78.
[24]
George Karypis and Vipin Kumar. 1998. A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on scientific Computing, Vol. 20, 1 (1998), 359--392.
[25]
George Karypis, Kirk Schloegel, and Vipin Kumar. 1997. Parmetis: Parallel graph partitioning and sparse matrix ordering library. (1997).
[26]
Deyu Kong, Xike Xie, and Zhuoxu Zhang. 2022. Clustering-based Partitioning for Large Web Graphs. arXiv preprint arXiv:2201.00472 (2022).
[27]
J. Leskovec, Deepayan Chakrabarti, J. Kleinberg, C. Faloutsos, and Zoubin Ghahramani. 2010. Kronecker Graphs: An Approach to Modeling Networks. ArXiv, Vol. abs/0812.4905 (2010).
[28]
Dongsheng Li, Yiming Zhang, Jinyan Wang, and KianLee Tan. 2019a. TopoX: Topology Refactorization for Efficient Graph Partitioning and Processing. PVLDB, Vol. 12, 8 (2019), 891--905.
[29]
Dongsheng Li, Yiming Zhang, Jinyan Wang, and Kian-Lee Tan. 2019b. TopoX: Topology refactorization for efficient graph partitioning and processing. Proceedings of the VLDB Endowment, Vol. 12, 8 (2019), 891--905.
[30]
Lingda Li, Robel Geda, Ari B Hayes, Yanhao Chen, Pranav Chaudhari, Eddy Z Zhang, and Mario Szegedy. 2017. A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing. Proceedings of the ACM on Measurement and Analysis of Computing Systems, Vol. 1, 1 (2017), 14.
[31]
Yi-shui Li, Xin-hai Chen, Jie Liu, Bo Yang, Chun-ye Gong, Xin-biao Gan, Sheng-guo Li, and Han Xu. 2020. OHTMA: an optimized heuristic topology-aware mapping algorithm on the tianhe-3 exascale supercomputer prototype. Frontiers of Information Technology & Electronic Engineering, Vol. 21, 6 (2020), 939--949.
[32]
Heng Lin, Xiongchao Tang, Bowen Yu, Youwei Zhuo, Wenguang Chen, Jidong Zhai, Wanwang Yin, and Weimin Zheng. 2017. Scalable graph traversal on sunway taihulight with ten million cores. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 635--645.
[33]
Heng Lin, Xiaowei Zhu, Bowen Yu, Xiongchao Tang, Wei Xue, Wenguang Chen, Lufei Zhang, Torsten Hoefler, Xiaosong Ma, Xin Liu, et al. 2018. Shentu: processing multi-trillion edge graphs on millions of cores in seconds. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 706--716.
[34]
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning in the Cloud. PVLDB, Vol. 5, 8 (2012), 716--727.
[35]
Yucheng Low, Joseph E Gonzalez, Aapo Kyrola, Danny Bickson, Carlos E Guestrin, and Joseph Hellerstein. 2014. Graphlab: A new framework for parallel machine learning. arXiv preprint arXiv:1408.2041 (2014).
[36]
Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2009. Pregel: a system for large-scale graph processing. Sigmod (2009), 135--146.
[37]
Adam McLaughlin and David A Bader. 2015. Fast execution of simultaneous breadth-first searches on sparse graphs. In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 9--18.
[38]
Justin J Miller. 2013. Graph database applications and concepts with Neo4j. In Proceedings of the southern association for information systems conference, Atlanta, GA, USA, Vol. 2324.
[39]
Jayanta Mondal and Amol Deshpande. 2014. Eagr: Supporting continuous ego-centric aggregate queries over large dynamic graphs. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of data. ACM, 1335--1346.
[40]
Don Monroe. 2020. Fugaku takes the lead. Commun. ACM, Vol. 64, 1 (2020), 16--18.
[41]
Masahiro Nakao, Koji Ueno, Katsuki Fujisawa, Yuetsu Kodama, and Mitsuhisa Sato. 2020. Performance evaluation of supercomputer fugaku using breadth-first search benchmark in Graph500. In 2020 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 408--409.
[42]
Joel Nishimura and Johan Ugander. 2013. Restreaming graph partitioning: simple versatile algorithms for advanced balancing. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1106--1114.
[43]
Amir Hossein Nodehi Sabet, Junqiao Qiu, and Zhijia Zhao. 2018. Tigr: Transforming Irregular Graphs for GPU-Friendly Graph Processing. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 622--636.
[44]
nscc. 2023. National supercomputing Center in Changsha. http://nscc.hnu.edu.cn/info/1013/1011.htm. Last accessed 03 March 2023.
[45]
Ioannis Pitas. 2016. Graph-based social media analysis. Vol. 39. CRC Press.
[46]
Yujie Qian, Enrico Santus, Zhijing Jin, Jiang Guo, and Regina Barzilay. 2018. Graphie: A graph-based framework for information extraction. arXiv preprint arXiv:1810.13083 (2018).
[47]
Hassan Saif, Thomas Dickinson, Leon Kastler, Miriam Fernandez, and Harith Alani. 2017. A semantic graph-based approach for radicalisation detection on social media. In European semantic web conference. Springer, 571--587.
[48]
Manos Schinas, Symeon Papadopoulos, Yiannis Kompatsiaris, and Pericles A Mitkas. 2016. Mgraph: multimodal event summarization in social media using topic models and graph-based ranking. International Journal of Multimedia Information Retrieval, Vol. 5, 1 (2016), 51--69.
[49]
Ekaterina Shabunina and Gabriella Pasi. 2018. A graph-based approach to ememes identification and tracking in social media streams. Knowledge-Based Systems, Vol. 139 (2018), 108--118.
[50]
Hari Subramoni, Albert Mathews Augustine, Mark Arnold, Jonathan Perkins, Xiaoyi Lu, Khaled Hamidouche, and Dhabaleswar K Panda. 2016. INAM 2: InfiniBand Network Analysis and Monitoring with MPI. In International Conference on High Performance Computing. Springer, 300--320.
[51]
TOP500.org. 2021. TOP 500 List. https://www.top500.org/ Last accessed 01 March 2022.
[52]
Koji Ueno and Toyotaro Suzumura. 2012. 2d partitioning based graph search for the graph500 benchmark. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum. IEEE, 1925--1931.
[53]
Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka. 2016. Extreme scale breadth-first search on supercomputers. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 1040--1047.
[54]
Carnegie Mellon University. 2021. ClueWeb12 Dataset. https://lemurproject.org/clueweb12/ Last accessed 03 December 2021.
[55]
Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM, Vol. 33, 8 (1990), 103--111.
[56]
Erik Vermij, Leandro Fiorin, Christoph Hagleitner, and Koen Bertels. 2017. Boosting the efficiency of HPCG and Graph500 with near-data processing. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 31--40.
[57]
whzhlight. [n.d.]. https://www.laitimes.com/en/article/85ga_86m5.html. Last accessed 03 March 2023.
[58]
Wikipedia. 2021. Tianhe-2. https://en.wikipedia.org/wiki/Tianhe-2 Last accessed 20 September 2021.
[59]
Jingjin Wu, Xuanxing Xiong, and Zhiling Lan. 2015. Hierarchical task mapping for parallel applications on supercomputers. The Journal of supercomputing, Vol. 71, 5 (2015), 1776--1802.
[60]
Min Wu, Xinglu Yi, Hui Yu, Yu Liu, and Yujue Wang. 2022. Nebula Graph: An open source distributed graph database. arXiv preprint arXiv:2206.07278 (2022).
[61]
Ning Xu, Lei Chen, and Bin Cui. 2014. LogGP: a log-based dynamic graph partitioning method. PVLDB, Vol. 7, 14 (2014), 1917--1928.
[62]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, Vol. 7, 14 (2014), 1981--1992.
[63]
Shengqi Yang, Xifeng Yan, Bo Zong, and Arijit Khan. 2012. Towards effective partition management for large graphs. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. ACM, 517--528.
[64]
Hao Yu, I-Hsin Chung, and Jose Moreira. 2006. Topology mapping for Blue Gene/L supercomputer. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing. 116--es.
[65]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, Vol. 10. 10.
[66]
Yiming Zhang, Haonan Wang, Menghan Jia, Jinyan Wang, Dong sheng Li, Guangtao Xue, and K. Tan. 2020. TopoX: Topology Refactorization for Minimizing Network Communication in Graph Computations. IEEE/ACM Transactions on Networking, Vol. 28 (2020), 2768--2782.
[67]
Yishui Li etc. Zhe Li, Chengkun Wu. 2021. FEP-Based Large-Scale Virtual Screening for Effective Drug Discovery against COVID-19. https://www.hpcwire.com/2021/11/18/gordon-bell-special-prize-goes-to-world-shaping-covid-droplet-work/
[68]
Angen Zheng, Alexandros Labrinidis, Panos K Chrysanthis, and Jack Lange. 2016. Argo: Architecture-aware graph partitioning. In 2016 IEEE International Conference on Big Data (Big Data). IEEE, 284--293.
[69]
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, USA, November 2-4, 2016, Kimberly Keeton and Timothy Roscoe (Eds.). USENIX Association, 301--316. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhu
[70]
Lei Zou, Jinghui Mo, Lei Chen, M Tamer Özsu, and Dongyan Zhao. 2011. gStore: answering SPARQL queries via subgraph matching. Proceedings of the VLDB Endowment, Vol. 4, 8 (2011), 482--493.

Cited By

View all
  • (2024)GraphService: Topology-aware Constructor for Large-scale Graph ApplicationsACM Transactions on Architecture and Code Optimization10.1145/3689341Online publication date: 17-Aug-2024

Index Terms

  1. GraphMedia: Communication-balanced Graph Searching for Billion-scale Social Media Access

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. communication balance
    2. graph partitioning
    3. graph search
    4. graph500
    5. social media

    Qualifiers

    • Research-article

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)107
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 01 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GraphService: Topology-aware Constructor for Large-scale Graph ApplicationsACM Transactions on Architecture and Code Optimization10.1145/3689341Online publication date: 17-Aug-2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media