Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 60 results for author: Besta, M

.
  1. arXiv:2406.12841  [pdf, other

    cs.LG cs.AI cs.SI

    Demystifying Higher-Order Graph Neural Networks

    Authors: Maciej Besta, Florian Scheidl, Lukas Gianinazzi, Shachar Klaiman, Jürgen Müller, Torsten Hoefler

    Abstract: Higher-order graph neural networks (HOGNNs) are an important class of GNN models that harness polyadic relations between vertices beyond plain edges. They have been used to eliminate issues such as over-smoothing or over-squashing, to significantly enhance the accuracy of GNN predictions, to improve the expressiveness of GNN architectures, and for numerous other goals. A plethora of HOGNN models h… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.05085  [pdf, other

    cs.CL cs.AI cs.IR

    Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

    Authors: Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, Marcin Chrapek, Michał Podstawski, Torsten Hoefler

    Abstract: Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. Such queries occur frequently, but are challenging because the embed… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2406.02524  [pdf, other

    cs.CL

    CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks

    Authors: Maciej Besta, Lorenzo Paleari, Ales Kubicek, Piotr Nyczyk, Robert Gerstenberger, Patrick Iff, Tomasz Lehmann, Hubert Niewiadomski, Torsten Hoefler

    Abstract: Large Language Models (LLMs) are revolutionizing various domains, yet verifying their answers remains a significant challenge, especially for intricate open-ended tasks such as consolidation, summarization, and extraction of knowledge. In this work, we propose CheckEmbed: an accurate, scalable, and simple LLM verification approach. CheckEmbed is driven by a straightforward yet powerful idea: in or… ▽ More

    Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  4. arXiv:2404.12953  [pdf, ps, other

    cs.DC cs.DS

    Low-Depth Spatial Tree Algorithms

    Authors: Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski

    Abstract: Contemporary accelerator designs exhibit a high degree of spatial localization, wherein two-dimensional physical distance determines communication costs between processing elements. This situation presents considerable algorithmic challenges, particularly when managing sparse data, a pivotal component in progressing data science. The spatial computer model quantifies communication locality by weig… ▽ More

    Submitted 7 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: to appear at IPDPS 2024

    ACM Class: F.2.2

  5. Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication

    Authors: Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboos, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler

    Abstract: We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit th… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    ACM Class: F.2.1

    Journal ref: PPoPP'24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (2024) 404-416

  6. arXiv:2401.14295  [pdf, other

    cs.CL cs.AI cs.LG

    Demystifying Chains, Trees, and Graphs of Thoughts

    Authors: Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler

    Abstract: The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models' (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the… ▽ More

    Submitted 5 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  7. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  8. arXiv:2311.18526  [pdf, other

    cs.LG cs.SI

    HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

    Authors: Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler

    Abstract: Many graph representation learning (GRL) problems are dynamic, with millions of edges added or removed per second. A fundamental workload in this setting is dynamic link prediction: using a history of graph updates to predict whether a given pair of vertices will become connected. Recent schemes for link prediction in such dynamic settings employ Transformers, modeling individual graph updates as… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Journal ref: Proceedings of Learning on Graphs (LOG), 2023

  9. arXiv:2311.06081  [pdf, other

    cs.AR

    RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures

    Authors: Patrick Iff, Benigna Bruggmann, Maciej Besta, Luca Benini, Torsten Hoefler

    Abstract: Chiplet architectures are a promising paradigm to overcome the scaling challenges of monolithic chips. Chiplets offer heterogeneity, modularity, and cost-effectiveness. The design space of chiplet architectures is huge as there are many degrees of freedom such as the number, size and placement of chiplets, the topology of the inter-chiplet interconnect and many more. Existing tools for cost and pe… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  10. arXiv:2310.03742  [pdf, other

    cs.NI

    A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network

    Authors: Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler

    Abstract: Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully a… ▽ More

    Submitted 21 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Journal ref: Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24) Santa Clara, CA, USA April 16-18, 2024

  11. arXiv:2308.12093  [pdf, other

    cs.LG cs.PF

    Cached Operator Reordering: A Unified View for Fast GNN Training

    Authors: Julia Bazinska, Andrei Ivanov, Tal Ben-Nun, Nikoli Dryden, Maciej Besta, Siyuan Shen, Torsten Hoefler

    Abstract: Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering. However, the sparse nature of GNN computation poses new challenges for performance optimization compared to traditional deep neural networks. We address these challenges by providing a unified view of GNN computation, I/O, and m… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  12. Graph of Thoughts: Solving Elaborate Problems with Large Language Models

    Authors: Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler

    Abstract: We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges co… ▽ More

    Submitted 6 February, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 2024 (AAAI'24)

  13. The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

    Authors: Maciej Besta, Robert Gerstenberger, Marc Fischer, Michał Podstawski, Nils Blach, Berke Egeli, Georgy Mitenkov, Wojciech Chlapek, Marek Michalewicz, Hubert Niewiadomski, Jürgen Müller, Torsten Hoefler

    Abstract: Graph databases (GDBs) are crucial in academic and industry applications. The key challenges in developing GDBs are achieving high performance, scalability, programmability, and portability. To tackle these challenges, we harness established practices from the HPC landscape to build a system that outperforms all past GDBs presented in the literature by orders of magnitude, for both OLTP and OLAP w… ▽ More

    Submitted 20 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Best Paper Finalist at ACM Supercomputing '23 (SC '23)

    Journal ref: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2023 (SC '23)

  14. arXiv:2302.07217  [pdf, other

    cs.NI cs.DC math.CO

    PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks

    Authors: Kartik Lakhotia, Laura Monroe, Kelly Isham, Maciej Besta, Nils Blach, Torsten Hoefler, Fabrizio Petrini

    Abstract: In this paper, we present PolarStar, a novel family of diameter-3 network topologies derived from the star product of two low-diameter factor graphs. The proposed PolarStar construction gives the largest known diameter-3 network topologies for almost all radixes. When compared to state-of-the-art diameter-3 networks, PolarStar achieves 31% geometric mean increase in scale over Bundlefly, 91% over… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 13 pages, 13 figures, 4 tables

    ACM Class: B.4.3; B.4.4; G.2.2

  15. arXiv:2211.13989  [pdf, other

    cs.AR cs.DC cs.NI

    HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

    Authors: Patrick Iff, Maciej Besta, Matheus Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler

    Abstract: 2.5D integration is an important technique to tackle the growing cost of manufacturing chips in advanced technology nodes. This poses the challenge of providing high-performance inter-chiplet interconnects (ICIs). As the number of chiplets grows to tens or hundreds, it becomes infeasible to hand-optimize their arrangement in a way that maximizes the ICI performance. In this paper, we propose HexaM… ▽ More

    Submitted 8 October, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

  16. arXiv:2211.13980  [pdf, other

    cs.AR cs.DC cs.NI

    Sparse Hamming Graph: A Customizable Network-on-Chip Topology

    Authors: Patrick Iff, Maciej Besta, Matheus Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler

    Abstract: Chips with hundreds to thousands of cores require scalable networks-on-chip (NoCs). Customization of the NoC topology is necessary to reach the diverse design goals of different chips. We introduce sparse Hamming graph, a novel NoC topology with an adjustable costperformance trade-off that is based on four NoC topology design principles we identified. To efficiently customize this topology, we dev… ▽ More

    Submitted 28 June, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

  17. arXiv:2209.09732  [pdf, other

    cs.LG cs.DB

    Neural Graph Databases

    Authors: Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler

    Abstract: Graph databases (GDBs) enable processing and analysis of unstructured, complex, rich, and usually vast graph datasets. Despite the large significance of GDBs in both academia and industry, little effort has been made into integrating them with the predictive power of graph neural networks (GNNs). In this work, we show how to seamlessly combine nearly any GNN model with the computational capabiliti… ▽ More

    Submitted 24 November, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

    Journal ref: Learning on Graphs (LOG) 2022

  18. arXiv:2208.11469  [pdf, other

    cs.DC cs.DS

    ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations

    Authors: Maciej Besta, Cesare Miglioli, Paolo Sylos Labini, Jakub Tětek, Patrick Iff, Raghavendra Kanakagiri, Saleh Ashkboos, Kacper Janda, Michal Podstawski, Grzegorz Kwasniewski, Niels Gleinig, Flavio Vella, Onur Mutlu, Torsten Hoefler

    Abstract: Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as… ▽ More

    Submitted 21 November, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

    Comments: Best Paper Award at ACM/IEEE Supercomputing'22 (SC22)

    Journal ref: Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis, November 2022

  19. PolarFly: A Cost-Effective and Flexible Low-Diameter Topology

    Authors: Kartik Lakhotia, Maciej Besta, Laura Monroe, Kelly Isham, Patrick Iff, Torsten Hoefler, Fabrizio Petrini

    Abstract: In this paper we present PolarFly, a diameter-2 network topology based on the Erdos-Renyi family of polarity graphs from finite geometry. This is a highly scalable low-diameter topology that asymptotically reaches the Moore bound on the number of nodes for a given network degree and diameter PolarFly achieves high Moore bound efficiency even for the moderate radixes commonly seen in current and… ▽ More

    Submitted 2 May, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: In Proceedings of International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) 2022

    ACM Class: B.4.3; B.4.4

  20. arXiv:2206.10007  [pdf, other

    cs.NI

    Building Blocks for Network-Accelerated Distributed File Systems

    Authors: Salvatore Di Girolamo, Daniele De Sensi, Konstantin Taranov, Milos Malesevic, Maciej Besta, Timo Schneider, Severin Kistler, Torsten Hoefler

    Abstract: High-performance clusters and datacenters pose increasingly demanding requirements on storage systems. If these systems do not operate at scale, applications are doomed to become I/O bound and waste compute cycles. To accelerate the data path to remote storage nodes, remote direct memory access (RDMA) has been embraced by storage systems to let data flow from the network to storage targets, reduci… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

  21. arXiv:2205.09702  [pdf, other

    cs.LG cs.AR cs.DC

    Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis

    Authors: Maciej Besta, Torsten Hoefler

    Abstract: Graph neural networks (GNNs) are among the most powerful tools in deep learning. They routinely solve complex problems on unstructured networks, such as node classification, graph classification, or link prediction, with high accuracy. However, both inference and training of GNNs are complex, and they uniquely combine the features of irregular graph processing with dense and regular computations.… ▽ More

    Submitted 17 August, 2023; v1 submitted 19 May, 2022; originally announced May 2022.

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

  22. arXiv:2205.04934  [pdf, other

    cs.DS cs.DC

    The spatial computer: A model for energy-efficient parallel computation

    Authors: Lukas Gianinazzi, Tal Ben-Nun, Maciej Besta, Saleh Ashkboos, Yves Baumann, Piotr Luczynski, Torsten Hoefler

    Abstract: We present a new parallel model of computation suitable for spatial architectures, for which the energy used for communication heavily depends on the distance of the communicating processors. In our model, processors have locations on a conceptual two-dimensional grid, and their distance therein determines their communication cost. In particular, we introduce the energy cost of a spatial computati… ▽ More

    Submitted 17 January, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

    ACM Class: F.2.0

  23. arXiv:2202.13976  [pdf, other

    cs.DC

    Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching

    Authors: András Strausz, Flavio Vella, Salvatore Di Girolamo, Maciej Besta, Torsten Hoefler

    Abstract: Triangle count and local clustering coefficient are two core metrics for graph analysis. They find broad application in analyses such as community detection and link recommendation. Current state-of-the-art solutions suffer from synchronization overheads or expensive pre-computations needed to distribute the graph, achieving limited scaling capabilities. We propose a fully asynchronous implementat… ▽ More

    Submitted 1 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: 11 pages, 10 figures, to be published at IPDPS'22

  24. Parallel Algorithms for Finding Large Cliques in Sparse Graphs

    Authors: Lukas Gianinazzi, Maciej Besta, Yannick Schaffner, Torsten Hoefler

    Abstract: We present a parallel k-clique listing algorithm with improved work bounds (for the same depth) in sparse graphs with low degeneracy or arboricity. We achieve this by introducing and analyzing a new pruning criterion for a backtracking search. Our algorithm has better asymptotic performance, especially for larger cliques (when k is not constant), where we avoid the straightforwardly exponential ru… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

    ACM Class: F.2.2

    Journal ref: SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, Virtual Event, USA, 6-8 July, 2021, 243-253

  25. arXiv:2108.09337  [pdf, other

    cs.DC cs.CC cs.PF

    On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations

    Authors: Grzegorz Kwasniewski, Marko Kabić, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, Torsten Hoefler

    Abstract: Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving p… ▽ More

    Submitted 25 April, 2023; v1 submitted 20 August, 2021; originally announced August 2021.

    Comments: 15 pages (including references), 11 figures. arXiv admin note: substantial text overlap with arXiv:2010.05975

    Journal ref: Published at Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November, 2021(SC'21)

  26. arXiv:2106.03594  [pdf, other

    cs.LG

    Learning Combinatorial Node Labeling Algorithms

    Authors: Lukas Gianinazzi, Maximilian Fries, Nikoli Dryden, Tal Ben-Nun, Maciej Besta, Torsten Hoefler

    Abstract: We present a novel neural architecture to solve graph optimization problems where the solution consists of arbitrary node labels, allowing us to solve hard problems like graph coloring. We train our model using reinforcement learning, specifically policy gradients, which gives us both a greedy and a probabilistic policy. Our architecture builds on a graph attention network and uses several inducti… ▽ More

    Submitted 10 May, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

    ACM Class: I.2.2; I.2.8

  27. arXiv:2106.00761  [pdf, other

    cs.SI cs.LG

    Motif Prediction with Graph Neural Networks

    Authors: Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler

    Abstract: Link prediction is one of the central problems in graph mining. However, recent studies highlight the importance of higher-order network analysis, where complex structures called motifs are the first-class citizens. We first show that existing link prediction schemes fail to effectively predict motifs. To alleviate this, we establish a general motif prediction problem and we propose several heuris… ▽ More

    Submitted 21 May, 2022; v1 submitted 26 May, 2021; originally announced June 2021.

    Journal ref: Proceedings of the 28th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22), 2022

  28. arXiv:2105.12663  [pdf, other

    cs.NI cs.DC cs.PF

    Towards Million-Server Network Simulations on Just a Laptop

    Authors: Maciej Besta, Marcel Schneider, Salvatore Di Girolamo, Ankit Singla, Torsten Hoefler

    Abstract: The growing size of data center and HPC networks pose unprecedented requirements on the scalability of simulation infrastructure. The ability to simulate such large-scale interconnects on a simple PC would facilitate research efforts. Unfortunately, as we first show in this work, existing shared-memory packet-level simulators do not scale to the sizes of the largest networks considered today. We t… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  29. Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs

    Authors: Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, Torsten Hoefler

    Abstract: Determining I/O lower bounds is a crucial step in obtaining communication-efficient parallel algorithms, both across the memory hierarchy and between processors. Current approaches either study specific algorithms individually, disallow programmatic motifs such as recomputation, or produce asymptotic bounds that exclude important constants. We propose a novel approach for obtaining precise I/O low… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

    Comments: 13 pages, 4 figures, published at Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'21)

  30. arXiv:2104.07582  [pdf, other

    cs.AR cs.DC cs.DS cs.PF

    SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems

    Authors: Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez Luna, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Marek Konieczny, Nils Blach, Onur Mutlu, Torsten Hoefler

    Abstract: Simple graph algorithms such as PageRank have been the target of numerous hardware accelerators. Yet, there also exist much more complex graph mining algorithms for problems such as clustering or maximal clique listing. These algorithms are memory-bound and thus could be accelerated by hardware techniques such as Processing-in-Memory (PIM). However, they also come with nonstraightforward paralleli… ▽ More

    Submitted 25 October, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Proceedings of the 54th IEEE/ACM International Symposium on Microarchitecture (MICRO'21), 2021

  31. arXiv:2103.03653  [pdf, other

    cs.DC cs.CV cs.DS cs.MS cs.PF

    GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra

    Authors: Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi, Jakub Beranek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Ozdemir, Adrian Balla, Marcin Copik, Philipp Lindenberger, Pavel Kalvoda, Marek Konieczny, Onur Mutlu, Torsten Hoefler

    Abstract: We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of dif… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

    Journal ref: International Conference on Very Large Data Bases (VLDB), 2021

  32. arXiv:2012.14132  [pdf, other

    cs.DC

    SeBS: A Serverless Benchmark Suite for Function-as-a-Service Computing

    Authors: Marcin Copik, Grzegorz Kwasniewski, Maciej Besta, Michal Podstawski, Torsten Hoefler

    Abstract: Function-as-a-Service (FaaS) is one of the most promising directions for the future of cloud services, and serverless functions have immediately become a new middleware for building scalable and cost-efficient microservices and applications. However, the quickly moving technology hinders reproducibility, and the lack of a standardized benchmarking suite leads to ad-hoc solutions and microbenchmark… ▽ More

    Submitted 1 July, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: Extended version of the paper accepted at Middleware 2021

  33. arXiv:2012.06171  [pdf, other

    cs.DC cs.DB

    The Future is Big Graphs! A Community View on Graph Processing Systems

    Authors: Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow , et al. (16 additional authors not shown)

    Abstract: Graphs are by nature unifying abstractions that can leverage interconnectedness to represent, explore, predict, and explain real- and digital-world phenomena. Although real users and consumers of graph instances and graph workloads understand these abstractions, future problems will require new abstractions and systems. What needs to happen in the next decade for big graph processing to continue t… ▽ More

    Submitted 11 December, 2020; originally announced December 2020.

    Comments: 12 pages, 3 figures, collaboration between the large-scale systems and data management communities, work started at the Dagstuhl Seminar 19491 on Big Graph Processing Systems, to be published in the Communications of the ACM

    ACM Class: C.3; E.0; H.2; J.0

  34. arXiv:2010.16012  [pdf, other

    cs.DC cs.DS

    To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations

    Authors: Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, Torsten Hoefler

    Abstract: We reduce the cost of communication and synchronization in graph processing by analyzing the fastest way to process graphs: pushing the updates to a shared state or pulling the updates to a private state.We investigate the applicability of this push-pull dichotomy to various algorithms and its impact on complexity, performance, and the amount of used locks, atomics, and reads/writes. We consider 1… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 26th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'17), 2017

  35. arXiv:2010.15879  [pdf, other

    cs.DS cs.DB cs.DC cs.IR

    Log(Graph): A Near-Optimal High-Performance Graph Representation

    Authors: Maciej Besta, Dimitri Stanojevic, Tijana Zivic, Jagpreet Singh, Maurice Hoerold, Torsten Hoefler

    Abstract: Today's graphs used in domains such as machine learning or social network analysis may contain hundreds of billions of edges. Yet, they are not necessarily stored efficiently, and standard graph representations such as adjacency lists waste a significant number of bits while graph compression schemes such as WebGraph often require time-consuming decompression. To address this, we propose Log(Graph… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 27th International Conference on Parallel Architectures and Compilation (PACT'18), 2018

  36. arXiv:2010.14684  [pdf, other

    cs.DC cs.AR cs.DS

    Substream-Centric Maximum Matchings on FPGA

    Authors: Maciej Besta, Marc Fischer, Tal Ben-Nun, Dimitri Stanojevic, Johannes De Fine Licht, Torsten Hoefler

    Abstract: Developing high-performance and energy-efficient algorithms for maximum matchings is becoming increasingly important in social network analysis, computational sciences, scheduling, and others. In this work, we propose the first maximum matching algorithm designed for FPGAs; it is energy-efficient and has provable guarantees on accuracy, performance, and storage utilization. To achieve this, we for… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Best Paper finalist at ACM FPGA'19, invited to special issue of ACM TRETS'20

    Journal ref: Proceedings of the ACM Transactions on Reconfigurable Technology and Systems (TRETS), 2020. Proceedings of the 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2019

  37. arXiv:2010.10683  [pdf, other

    cs.AR cs.DC cs.NI

    Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability

    Authors: Maciej Besta, Syed Minhaj Hassan, Sudhakar Yalamanchili, Rachata Ausavarungnirun, Onur Mutlu, Torsten Hoefler

    Abstract: Emerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency and scalability compared to the state-of-the-art. The key idea is to use two concepts from graph and number theory, degree-diameter graphs combined… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 23rd ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'18), 2018

  38. arXiv:2010.09913  [pdf, other

    cs.DC cs.DS

    SlimSell: A Vectorizable Graph Representation for Breadth-First Search

    Authors: Maciej Besta, Florian Marending, Edgar Solomonik, Torsten Hoefler

    Abstract: Vectorization and GPUs will profoundly change graph processing. Traditional graph algorithms tuned for 32- or 64-bit based memory accesses will be inefficient on architectures with 512-bit wide (or larger) instruction units that are already present in the Intel Knights Landing (KNL) manycore CPU. Anticipating this shift, we propose SlimSell: a vectorizable graph representation to accelerate Breadt… ▽ More

    Submitted 21 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS'17), 2017

  39. arXiv:2010.09854  [pdf, other

    cs.DC

    High-Performance Distributed RMA Locks

    Authors: Patrick Schmid, Maciej Besta, Torsten Hoefler

    Abstract: We propose a topology-aware distributed Reader-Writer lock that accelerates irregular workloads for supercomputers and data centers. The core idea behind the lock is a modular design that is an interplay of three distributed data structures: a counter of readers/writers in the critical section, a set of queues for ordering writers waiting for the lock, and a tree that binds all the queues and sync… ▽ More

    Submitted 23 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: Best Paper Award at ACM HPDC'16 (1/129)

    Journal ref: Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (ACM HPDC'16), 2016

  40. arXiv:2010.09852  [pdf, other

    cs.DC cs.AR cs.DS cs.PF

    Evaluating the Cost of Atomic Operations on Modern Architectures

    Authors: Hermann Schweizer, Maciej Besta, Torsten Hoefler

    Abstract: Atomic operations (atomics) such as Compare-and-Swap (CAS) or Fetch-and-Add (FAA) are ubiquitous in parallel programming. Yet, performance tradeoffs between these operations and various characteristics of such systems, such as the structure of caches, are unclear and have not been thoroughly analyzed. In this paper we establish an evaluation methodology, develop a performance model, and present a… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Journal ref: Proceedings of the 24th International Conference on Parallel Architectures and Compilation (PACT'15), 2015

  41. arXiv:2010.09135  [pdf, other

    cs.DC

    Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages

    Authors: Maciej Besta, Torsten Hoefler

    Abstract: We propose Atomic Active Messages (AAM), a mechanism that accelerates irregular graph computations on both shared- and distributed-memory machines. The key idea behind AAM is that hardware transactional memory (HTM) can be used for simple and efficient processing of irregular structures in highly parallel environments. We illustrate techniques such as coarsening and coalescing that enable hardware… ▽ More

    Submitted 29 October, 2020; v1 submitted 18 October, 2020; originally announced October 2020.

    Comments: Best Paper Award at ACM HPDC'15 (1/116)

    Journal ref: Proceedings of the 24th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'15), 2015

  42. arXiv:2010.09025  [pdf, other

    cs.DC

    Fault Tolerance for Remote Memory Access Programming Models

    Authors: Maciej Besta, Torsten Hoefler

    Abstract: Remote Memory Access (RMA) is an emerging mechanism for programming high-performance computers and datacenters. However, little work exists on resilience schemes for RMA-based applications and systems. In this paper we analyze fault tolerance for RMA and show that it is fundamentally different from resilience mechanisms targeting the message passing (MP) model. We design a model for reasoning abou… ▽ More

    Submitted 18 October, 2020; originally announced October 2020.

    Comments: Best Paper Finalist (3/130)

    Journal ref: Proceedings of the 23rd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'14), 2014

  43. arXiv:2010.05975  [pdf, other

    cs.DC

    On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization

    Authors: Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, Torsten Hoefler

    Abstract: Dense linear algebra kernels, such as linear solvers or tensor contractions, are fundamental components of many scientific computing applications. In this work, we present a novel method of deriving parallel I/O lower bounds for this broad family of programs. Based on the X-partitioning abstraction, our method explicitly captures inter-statement dependencies. Applying our analysis to LU factorizat… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: 13 pages without references, 12 figures, submitted to PPoPP 2021: 26th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

  44. arXiv:2008.11321  [pdf, other

    cs.DS cs.DC cs.PF

    High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality

    Authors: Maciej Besta, Armon Carigiet, Zur Vonarburg-Shmaria, Kacper Janda, Lukas Gianinazzi, Torsten Hoefler

    Abstract: We develop the first parallel graph coloring heuristics with strong theoretical guarantees on work and depth and coloring quality. The key idea is to design a relaxation of the vertex degeneracy order, a well-known graph theory concept, and to color vertices in the order dictated by this relaxation. This introduces a tunable amount of parallelism into the degeneracy ordering that is otherwise hard… ▽ More

    Submitted 11 November, 2020; v1 submitted 25 August, 2020; originally announced August 2020.

    Journal ref: Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC20), November 2020

  45. arXiv:2007.03776  [pdf, other

    cs.NI cs.DC cs.PF

    High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks

    Authors: Maciej Besta, Jens Domke, Marcel Schneider, Marek Konieczny, Salvatore Di Girolamo, Timo Schneider, Ankit Singla, Torsten Hoefler

    Abstract: The recent line of research into topology design focuses on lowering network diameter. Many low-diameter topologies such as Slim Fly or Jellyfish that substantially reduce cost, power consumption, and latency have been proposed. A key challenge in realizing the benefits of these topologies is routing. On one hand, these networks provide shorter path lengths than established topologies such as Clos… ▽ More

    Submitted 29 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

    Journal ref: IEEE Transactions on Parallel and Distributed Systems (TPDS), 2021

  46. Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided

    Authors: Robert Gerstenberger, Maciej Besta, Torsten Hoefler

    Abstract: Modern interconnects offer remote direct memory access (RDMA) features. Yet, most applications rely on explicit message passing for communications albeit their unwanted overheads. The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly, however, it's scalability and practicability has to be demonstrated in practice. In this work, we develop scalable bufferless pr… ▽ More

    Submitted 30 June, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

    Comments: Best Paper Award at ACM/IEEE Supercomputing'13 (1/92), also Best Student Paper finalist (8/92); source code of foMPI can be downloaded from http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI

    ACM Class: C.5.1; J.2

    Journal ref: Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis, pages 53:1--53:12, November 2013

  47. arXiv:1912.12740  [pdf, other

    cs.DC cs.DB cs.DS cs.PF

    Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems

    Authors: Maciej Besta, Marc Fischer, Vasiliki Kalavri, Michael Kapralov, Torsten Hoefler

    Abstract: Graph processing has become an important part of various areas of computing, including machine learning, medical applications, social network analysis, computational sciences, and others. A growing amount of the associated graph processing workloads are dynamic, with millions of edges added or removed per second. Graph streaming frameworks are specifically crafted to enable the processing of such… ▽ More

    Submitted 27 October, 2021; v1 submitted 29 December, 2019; originally announced December 2019.

    Journal ref: IEEE Transactions on Parallel and Distributed Systems (TPDS), 2022

  48. arXiv:1912.08968  [pdf, ps, other

    cs.NI

    Slim Fly: A Cost Effective Low-Diameter Network Topology

    Authors: Maciej Besta, Torsten Hoefler

    Abstract: We introduce a high-performance cost-effective network topology called Slim Fly that approaches the theoretically optimal network diameter. Slim Fly is based on graphs that approximate the solution to the degree-diameter problem. We analyze Slim Fly and compare it to both traditional and state-of-the-art networks. Our analysis shows that Slim Fly has significant advantages over other topologies in… ▽ More

    Submitted 30 June, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: Best Student Paper Award at ACM/IEEE Supercomputing'14 (1/82)

    Journal ref: Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis, November 2014

  49. arXiv:1912.08950  [pdf, other

    cs.DS cs.DC cs.PF

    Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics

    Authors: Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, Torsten Hoefler

    Abstract: We propose Slim Graph: the first programming model and framework for practical lossy graph compression that facilitates high-performance approximate graph processing, storage, and analytics. Slim Graph enables the developer to express numerous compression schemes using small and programmable compression kernels that can access and modify local parts of input graphs. Such kernels are executed in pa… ▽ More

    Submitted 3 August, 2021; v1 submitted 18 December, 2019; originally announced December 2019.

    Journal ref: Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC19), November 2020. Best Paper Finalist, Best Student Paper Finalist

  50. arXiv:1911.04200  [pdf, other

    cs.CE cs.DC cs.PF q-bio.GN

    Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

    Authors: Maciej Besta, Raghavendra Kanakagiri, Harun Mustafa, Mikhail Karasikov, Gunnar Rätsch, Torsten Hoefler, Edgar Solomonik

    Abstract: The Jaccard similarity index is an important measure of the overlap of two sets, widely used in machine learning, computational genomics, information retrieval, and many other areas. We design and implement SimilarityAtScale, the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets. Our algorithm provides an efficient encoding of th… ▽ More

    Submitted 11 November, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

    Journal ref: Proceedings of the 34st IEEE International Parallel and Distributed Processing Symposium (IPDPS'20), 2020