Search | arXiv e-print repository

RISC-V for HPC: Where we are and where we need to go

Abstract: Funded by the UK ExCALIBUR H&ES exascale programme, since early 2022 we have provided a RISC-V testbed for HPC to offer free access for scientific software developers to experiment with RISC-V for their workloads. Based upon our experiences of providing access to RISC-V for the HPC community, and our involvement with the RISC-V community at large, in this extended abstract we summarise the current… ▽ More Funded by the UK ExCALIBUR H&ES exascale programme, since early 2022 we have provided a RISC-V testbed for HPC to offer free access for scientific software developers to experiment with RISC-V for their workloads. Based upon our experiences of providing access to RISC-V for the HPC community, and our involvement with the RISC-V community at large, in this extended abstract we summarise the current state of RISC-V for HPC and consider the high priority areas that should be addressed to help drive adoption. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Extended abstract accepted to the RISC-V European Summit 2024

arXiv:2406.12394 [pdf, other]

Performance characterisation of the 64-core SG2042 RISC-V CPU for HPC

Authors: Nick Brown, Maurice Jamieson

Abstract: Whilst RISC-V has grown phenomenally quickly in embedded computing, it is yet to gain significant traction in High Performance Computing (HPC). However, as we move further into the exascale era, the flexibility offered by RISC-V has the potential to be very beneficial in future supercomputers especially as the community places an increased emphasis on decarbonising its workloads. Sophon's SG2042 i… ▽ More Whilst RISC-V has grown phenomenally quickly in embedded computing, it is yet to gain significant traction in High Performance Computing (HPC). However, as we move further into the exascale era, the flexibility offered by RISC-V has the potential to be very beneficial in future supercomputers especially as the community places an increased emphasis on decarbonising its workloads. Sophon's SG2042 is the first mass produced, commodity available, high-core count RISC-V CPU designed for high performance workloads. First released in summer 2023, and at the time of writing now becoming widely available, a key question is whether this is a realistic proposition for HPC applications. In this paper we use NASA's NAS Parallel Benchmark (NPB) suite to characterise performance of the SG2042 against other CPUs implementing the RISC-V, x86-64, and AArch64 ISAs. We find that the SG2042 consistently outperforms all other RISC-V solutions, delivering between a 2.6 and 16.7 performance improvement at the single core level. When compared against the x86-64 and AArch64 CPUs, which are commonplace for high performance workloads, we find that the SG2042 performs comparatively well with computationally bound algorithms but decreases in relative performance when the algorithms are memory bandwidth or latency bound. Based on this work, we identify that performance of the SG2042's memory subsystem is the greatest bottleneck. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Preprint of paper submitted to RISC-V for HPC workshop at ISC

arXiv:2406.01943 [pdf, ps, other]

Enhancing Trust in LLMs: Algorithms for Comparing and Interpreting LLMs

Authors: Nik Bear Brown

Abstract: This paper surveys evaluation techniques to enhance the trustworthiness and understanding of Large Language Models (LLMs). As reliance on LLMs grows, ensuring their reliability, fairness, and transparency is crucial. We explore algorithmic methods and metrics to assess LLM performance, identify weaknesses, and guide development towards more trustworthy applications. Key evaluation metrics include… ▽ More This paper surveys evaluation techniques to enhance the trustworthiness and understanding of Large Language Models (LLMs). As reliance on LLMs grows, ensuring their reliability, fairness, and transparency is crucial. We explore algorithmic methods and metrics to assess LLM performance, identify weaknesses, and guide development towards more trustworthy applications. Key evaluation metrics include Perplexity Measurement, NLP metrics (BLEU, ROUGE, METEOR, BERTScore, GLEU, Word Error Rate, Character Error Rate), Zero-Shot and Few-Shot Learning Performance, Transfer Learning Evaluation, Adversarial Testing, and Fairness and Bias Evaluation. We introduce innovative approaches like LLMMaps for stratified evaluation, Benchmarking and Leaderboards for competitive assessment, Stratified Analysis for in-depth understanding, Visualization of Blooms Taxonomy for cognitive level accuracy distribution, Hallucination Score for quantifying inaccuracies, Knowledge Stratification Strategy for hierarchical analysis, and Machine Learning Models for Hierarchy Generation. Human Evaluation is highlighted for capturing nuances that automated metrics may miss. These techniques form a framework for evaluating LLMs, aiming to enhance transparency, guide development, and establish user trust. Future papers will describe metric visualization and demonstrate each approach on practical examples. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: An extensive survey of the literature specifying algorithms and techniques enhancing the trustworthiness and understanding of Large Language Models (LLMs)

MSC Class: 2020: 68T50; 68Q25 ACM Class: I.2.7; F.2.2

arXiv:2405.16021 [pdf, other]

VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Authors: Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson, Nikhil J Joshi, Daniel Lam, Tsang-Wei Edward Lee, Alex Luong, Sharath Maddineni, Harsh Patel, Jodilyn Peralta, Jornell Quiambao, Diego Reyes, Rosario M Jauregui Ruano, Dorsa Sadigh, Pannag Sanketi, Leila Takayama, Pavel Vodenski, Fei Xia

Abstract: Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon ta… ▽ More Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/ △ Less

Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures

arXiv:2404.07302 [pdf, other]

Altruism Improves Congestion in Series-Parallel Nonatomic Congestion Games

Authors: Colton Hill, Philip N. Brown

Abstract: Self-interested routing polices from individual users in a system can collectively lead to poor aggregate congestion in routing networks. The introduction of altruistic agents, whose goal is to benefit other agents in the system, can seemingly improve aggregate congestion. However, it is known in that in some network routing problems, altruistic agents can actually worsen congestion compared to th… ▽ More Self-interested routing polices from individual users in a system can collectively lead to poor aggregate congestion in routing networks. The introduction of altruistic agents, whose goal is to benefit other agents in the system, can seemingly improve aggregate congestion. However, it is known in that in some network routing problems, altruistic agents can actually worsen congestion compared to that which would arise in the presence of a homogeneously selfish population. This paper provides a thorough investigation into the necessary conditions for altruists to be guaranteed to improve total congestion. In particular, we study the class of series-parallel non-atomic congestion games, where one sub-population is altruistic and the other is selfish. We find that a game is guaranteed to have improved congestion in the presence of altruistic agents (even if only a small part of the total population) compared to the homogeneously selfish version of the game, provided the network is symmetric, where all agents are given access to all paths in the network, and the series-parallel network for the game does not have sub-networks which emulate Braess's paradox -- a phenomenon we refer to as a Braess-resistant network. Our results appear to be the most complete characterization of when behavior that is designed to improve total congestion (which we refer to as altruism) is actually guaranteed to do so. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 7 pages, 2 figures

arXiv:2404.02218 [pdf, other]

A shared compilation stack for distributed-memory parallelism in stencil DSLs

Authors: George Bisbas, Anton Lydike, Emilien Bauer, Nick Brown, Mathieu Fehr, Lawrence Mitchell, Gabriel Rodriguez-Canal, Maurice Jamieson, Paul H. J. Kelly, Michel Steuwer, Tobias Grosser

Abstract: Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target current- and next-generation supercomputers. The convenience and performance of DSLs come with significant development and maintenance costs. The siloe… ▽ More Domain Specific Languages (DSLs) increase programmer productivity and provide high performance. Their targeted abstractions allow scientists to express problems at a high level, providing rich details that optimizing compilers can exploit to target current- and next-generation supercomputers. The convenience and performance of DSLs come with significant development and maintenance costs. The siloed design of DSL compilers and the resulting inability to benefit from shared infrastructure cause uncertainties around longevity and the adoption of DSLs at scale. By tailoring the broadly-adopted MLIR compiler framework to HPC, we bring the same synergies that the machine learning community already exploits across their DSLs (e.g. Tensorflow, PyTorch) to the finite-difference stencil HPC community. We introduce new HPC-specific abstractions for message passing targeting distributed stencil computations. We demonstrate the sharing of common components across three distinct HPC stencil-DSL compilers: Devito, PSyclone, and the Open Earth Compiler, showing that our framework generates high-performance executables based upon a shared compiler ecosystem. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.14011 [pdf, other]

A Unified Toll Lane Framework for Autonomous and High-Occupancy Vehicles in Interactive Mixed Autonomy

Authors: Ruolin Li, Philip N. Brown, Roberto Horowitz

Abstract: In this study, we introduce a toll lane framework that optimizes the mixed flow of autonomous and high-occupancy vehicles on freeways, where human-driven and autonomous vehicles of varying commuter occupancy share a segment. Autonomous vehicles, with their ability to maintain shorter headways, boost traffic throughput. Our framework designates a toll lane for autonomous vehicles with high occupanc… ▽ More In this study, we introduce a toll lane framework that optimizes the mixed flow of autonomous and high-occupancy vehicles on freeways, where human-driven and autonomous vehicles of varying commuter occupancy share a segment. Autonomous vehicles, with their ability to maintain shorter headways, boost traffic throughput. Our framework designates a toll lane for autonomous vehicles with high occupancy to use free of charge, while others pay a toll. We explore the lane choice equilibria when all vehicles minimize travel costs, and characterize the equilibria by ranking vehicles by their mobility enhancement potential, a concept we term the mobility degree. Through numerical examples, we demonstrate the framework's utility in addressing design challenges such as setting optimal tolls, determining occupancy thresholds, and designing lane policies, showing how it facilitates the integration of high-occupancy and autonomous vehicles. We also propose an algorithm for assigning rational tolls to decrease total commuter delay and examine the effects of toll non-compliance. Our findings suggest that self-interest-driven behavior mitigates moderate non-compliance impacts, highlighting the framework's resilience. This work presents a pioneering comprehensive analysis of a toll lane framework that emphasizes the coexistence of autonomous and high-occupancy vehicles, offering insights for traffic management improvements and the integration of autonomous vehicles into existing transportation infrastructures. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.04087 [pdf, other]

The Cognitive Type Project -- Mapping Typography to Cognition

Authors: Nik Bear Brown

Abstract: The Cognitive Type Project is focused on developing computational tools to enable the design of typefaces with varying cognitive properties. This initiative aims to empower typographers to craft fonts that enhance click-through rates for online ads, improve reading levels in children's books, enable dyslexics to create personalized type, or provide insights into customer reactions to textual conte… ▽ More The Cognitive Type Project is focused on developing computational tools to enable the design of typefaces with varying cognitive properties. This initiative aims to empower typographers to craft fonts that enhance click-through rates for online ads, improve reading levels in children's books, enable dyslexics to create personalized type, or provide insights into customer reactions to textual content in media. A significant challenge in research related to mapping typography to cognition is the creation of thousands of typefaces with minor variations, a process that is both labor-intensive and requires the expertise of skilled typographers. Cognitive science research highlights that the design and form of letters, along with the text's overall layout, are crucial in determining the ease of reading and other cognitive properties of type such as perceived beauty and memorability. These factors affect not only the legibility and clarity of information presentation but also the likability of a typeface. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.12111 [pdf, other]

doi 10.1145/3626202.3637578

Evaluating Versal AI Engines for option price discovery in market risk analysis

Authors: Mark Klaisoongnoen, Nick Brown, Tim Dykes, Jessica R. Jones, Utz-Uwe Haus

Abstract: Whilst Field-Programmable Gate Arrays (FPGAs) have been popular in accelerating high-frequency financial workload for many years, their application in quantitative finance, the utilisation of mathematical models to analyse financial markets and securities, is less mature. Nevertheless, recent work has demonstrated the benefits that FPGAs can deliver to quantitative workloads, and in this paper, we… ▽ More Whilst Field-Programmable Gate Arrays (FPGAs) have been popular in accelerating high-frequency financial workload for many years, their application in quantitative finance, the utilisation of mathematical models to analyse financial markets and securities, is less mature. Nevertheless, recent work has demonstrated the benefits that FPGAs can deliver to quantitative workloads, and in this paper, we study whether the Versal ACAP and its AI Engines (AIEs) can also deliver improved performance. We focus specifically on the industry standard Strategic Technology Analysis Center's (STAC) derivatives risk analysis benchmark STAC-A2. Porting a purely FPGA-based accelerator STAC-A2 inspired market risk (SIMR) benchmark to the Versal ACAP device by combining Programmable Logic (PL) and AIEs, we explore the development approach and techniques, before comparing performance across PL and AIEs. Ultimately, we found that our AIE approach is slower than a highly optimised existing PL-only version due to limits on both the AIE and PL that we explore and describe. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Author accepted version of paper accepted to the 32nd ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

arXiv:2401.14673 [pdf, other]

doi 10.1145/3610977.3634999

Generative Expressive Robot Behaviors using Large Language Models

Authors: Karthik Mahadevan, Jonathan Chien, Noah Brown, Zhuo Xu, Carolina Parada, Fei Xia, Andy Zeng, Leila Takayama, Dorsa Sadigh

Abstract: People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalitie… ▽ More People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/. △ Less

Submitted 30 January, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.14550 [pdf, ps, other]

Interactive and Urgent HPC: Challenges and Opportunities

Authors: Albert Reuther, Nick Brown, William Arndt, Johannes Blaschke, Christian Boehme, Antony Chazapis, Bjoern Enders, Robert Henschel, Julian Kunkel, Maxime Martinasso

Abstract: As a broader set of applications from simulations to data analysis and machine learning require more parallel computational capability, the demand for interactive and urgent high performance computing (HPC) continues to increase. This paper overviews the progress made so far and elucidates the challenges and opportunities for greater integration of interactive and urgent HPC policies, techniques,… ▽ More As a broader set of applications from simulations to data analysis and machine learning require more parallel computational capability, the demand for interactive and urgent high performance computing (HPC) continues to increase. This paper overviews the progress made so far and elucidates the challenges and opportunities for greater integration of interactive and urgent HPC policies, techniques, and technologies into HPC ecosystems. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 10 pages, submitted to IEEE CiSE journal

ACM Class: K.6.4

arXiv:2401.13815 [pdf]

SoK: Game-Theoretic Cybersecurity: Assumptions, Models, Gaps, and Bridges

Authors: Brandon Collins, Shouhuai Xu, Philip N. Brown

Abstract: The discipline of game theory was introduced in the context of economics, and has been applied to study cyber attacker and defender behaviors. While adaptions have been made to accommodate features in the cyber domain, these studies are inherently limited by the root of game theory in economic systems where players (i.e., agents) may be selfish but not malicious. In this SoK, we systematize the ma… ▽ More The discipline of game theory was introduced in the context of economics, and has been applied to study cyber attacker and defender behaviors. While adaptions have been made to accommodate features in the cyber domain, these studies are inherently limited by the root of game theory in economic systems where players (i.e., agents) may be selfish but not malicious. In this SoK, we systematize the major cybersecurity problems that have been studied with the game-theoretic approach, the assumptions that have been made, the models and solution concepts that have been proposed. The systematization leads to a characterization of the technical gaps that must be addressed in order to make game-theoretic cybersecurity models truly useful. We explore bridges to address them. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: 21 pages, Finished October 17th, 2023

arXiv:2311.15377 [pdf, other]

Increased Compute Efficiency and the Diffusion of AI Capabilities

Authors: Konstantin Pilz, Lennart Heim, Nicholas Brown

Abstract: Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time - a concept we describe as increasing compute efficiency. We find that while an access effect increases the number of acto… ▽ More Training advanced AI models requires large investments in computational resources, or compute. Yet, as hardware innovation reduces the price of compute and algorithmic advances make its use more efficient, the cost of training an AI model to a given performance falls over time - a concept we describe as increasing compute efficiency. We find that while an access effect increases the number of actors who can train models to a given performance over time, a performance effect simultaneously increases the performance available to each actor. This potentially enables large compute investors to pioneer new capabilities, maintaining a performance advantage even as capabilities diffuse. Since large compute investors tend to develop new capabilities first, it will be particularly important that they share information about their AI models, evaluate them for emerging risks, and, more generally, make responsible development and release decisions. Further, as compute efficiency increases, governments will need to prepare for a world where dangerous AI capabilities are widely available - for instance, by developing defenses against harmful AI models or by actively intervening in the diffusion of particularly dangerous capabilities. △ Less

Submitted 13 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.13657 [pdf, other]

Efficient Transformer Knowledge Distillation: A Performance Review

Authors: Nathan Brown, Ashton Williamson, Tahj Anderson, Logan Lawrence

Abstract: As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In… ▽ More As pretrained transformer language models continue to achieve state-of-the-art performance, the Natural Language Processing community has pushed for advances in model compression and efficient attention mechanisms to address high computational requirements and limited input sequence length. Despite these separate efforts, no investigation has been done into the intersection of these two fields. In this work, we provide an evaluation of model compression via knowledge distillation on efficient attention transformers. We provide cost-performance trade-offs for the compression of state-of-the-art efficient attention architectures and the gains made in performance in comparison to their full attention counterparts. Furthermore, we introduce a new long-context Named Entity Recognition dataset, GONERD, to train and test the performance of NER models on long sequences. We find that distilled efficient attention transformers can preserve a significant amount of original model performance, preserving up to 98.6% across short-context tasks (GLUE, SQUAD, CoNLL-2003), up to 94.6% across long-context Question-and-Answering tasks (HotpotQA, TriviaQA), and up to 98.8% on long-context Named Entity Recognition (GONERD), while decreasing inference times by up to 57.8%. We find that, for most models on most tasks, performing knowledge distillation is an effective method to yield high-performing efficient attention models with low costs. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023. 12 pages, 1 figure, 11 tables. Models and data available at https://huggingface.co/giant-oak

arXiv:2311.04538 [pdf, ps, other]

Faster Maximal Exact Matches with Lazy LCP Evaluation

Authors: Adrián Goga, Lore Depuydt, Nathaniel K. Brown, Jan Fostier, Travis Gagie, Gonzalo Navarro

Abstract: MONI (Rossi et al., {\it JCB} 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the ope… ▽ More MONI (Rossi et al., {\it JCB} 2022) is a BWT-based compressed index for computing the matching statistics and maximal exact matches (MEMs) of a pattern (usually a DNA read) with respect to a highly repetitive text (usually a database of genomes) using two operations: LF-steps and longest common extension (LCE) queries on a grammar-compressed representation of the text. In practice, most of the operations are constant-time LF-steps but most of the time is spent evaluating LCE queries. In this paper we show how (a variant of) the latter can be evaluated lazily, so as to bound the total time MONI needs to process the pattern in terms of the number of MEMs between the pattern and the text, while maintaining logarithmic latency. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.01914 [pdf, ps, other]

doi 10.1145/3624062.362454

Stencil-HMLS: A multi-layered approach to the automatic optimisation of stencil codes on FPGA

Authors: Gabriel Rodriguez-Canal, Nick Brown, Maurice Jamieson, Emilien Bauer, Anton Lydike, Tobias Grosser

Abstract: The challenges associated with effectively programming FPGAs have been a major blocker in popularising reconfigurable architectures for HPC workloads. However new compiler technologies, such as MLIR, are providing new capabilities which potentially deliver the ability to extract domain specific information and drive automatic structuring of codes for FPGAs. In this paper we explore domain specif… ▽ More The challenges associated with effectively programming FPGAs have been a major blocker in popularising reconfigurable architectures for HPC workloads. However new compiler technologies, such as MLIR, are providing new capabilities which potentially deliver the ability to extract domain specific information and drive automatic structuring of codes for FPGAs. In this paper we explore domain specific optimisations for stencils, a fundamental access pattern in scientific computing, to obtain high performance on FPGAs via automated code structuring. We propose Stencil-HMLS, a multi-layered approach to automatic optimisation of stencil codes and introduce the HLS dialect, which brings FPGA programming into the MLIR ecosystem. Using the PSyclone Fortran DSL, we demonstrate an improvement of 14-100$\times$ with respect to the next best performant state-of-the-art tool. Furthermore, our approach is 14 to 92 times more energy efficient than the next most energy efficient approach. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: Author accepted version which appears in ACM Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023)

arXiv:2310.01882 [pdf, ps, other]

doi 10.1145/3624062.3624167

Fortran performance optimisation and auto-parallelisation by leveraging MLIR-based domain specific abstractions in Flang

Authors: Nick Brown, Maurice Jamieson, Anton Lydike, Emilien Bauer, Tobias Grosser

Abstract: MLIR has become popular since it was open sourced in 2019. A sub-project of LLVM, the flexibility provided by MLIR to represent Intermediate Representations (IR) as dialects at different abstraction levels, to mix these, and to leverage transformations between dialects provides opportunities for automated program optimisation and parallelisation. In addition to general purpose compilers built upon… ▽ More MLIR has become popular since it was open sourced in 2019. A sub-project of LLVM, the flexibility provided by MLIR to represent Intermediate Representations (IR) as dialects at different abstraction levels, to mix these, and to leverage transformations between dialects provides opportunities for automated program optimisation and parallelisation. In addition to general purpose compilers built upon MLIR, domain specific abstractions have also been developed. In this paper we explore complimenting the Flang MLIR general purpose compiler by combining with the domain specific Open Earth Compiler's MLIR stencil dialect. Developing transformations to discover and extracts stencils from Fortran, this specialisation delivers between a 2 and 10 times performance improvement for our benchmarks on a Cray supercomputer compared to using Flang alone. Furthermore, by leveraging existing MLIR transformations we develop an auto-parallelisation approach targeting multi-threaded and distributed memory parallelism, and optimised execution on GPUs, without any modifications to the serial Fortran source code. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: Author accepted version of paper in ACM Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023)

arXiv:2309.00381 [pdf, ps, other]

doi 10.1145/3624062.3624234

Is RISC-V ready for HPC prime-time: Evaluating the 64-core Sophon SG2042 RISC-V CPU

Authors: Nick Brown, Maurice Jamieson, Joseph Lee, Paul Wang

Abstract: The Sophon SG2042 is the world's first commodity 64-core RISC-V CPU for high performance workloads and an important question is whether the SG2042 has the potential to encourage the HPC community to embrace RISC-V. In this paper we undertaking a performance exploration of the SG2042 against existing RISC-V hardware and high performance x86 CPUs in use by modern supercomputers. Leveraging the RAJ… ▽ More The Sophon SG2042 is the world's first commodity 64-core RISC-V CPU for high performance workloads and an important question is whether the SG2042 has the potential to encourage the HPC community to embrace RISC-V. In this paper we undertaking a performance exploration of the SG2042 against existing RISC-V hardware and high performance x86 CPUs in use by modern supercomputers. Leveraging the RAJAPerf benchmarking suite, we discover that on average, the SG2042 delivers, per core, between five and ten times the performance compared to the nearest widely available RISC-V hardware. We found that, on average, the x86 high performance CPUs under test outperform the SG2042 by between four and eight times for multi-threaded workloads, although some individual kernels do perform faster on the SG2042. The result of this work is a performance study that not only contrasts this new RISC-V CPU against existing technologies, but furthermore shares performance best practice. △ Less

Submitted 3 October, 2023; v1 submitted 1 September, 2023; originally announced September 2023.

Comments: Author accepted version of paper in ACM Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis (SC-W 2023)

arXiv:2308.13274 [pdf, other]

Fortran High-Level Synthesis: Reducing the barriers to accelerating HPC codes on FPGAs

Authors: Gabriel Rodriguez-Canal, Nick Brown, Tim Dykes, Jessica R. Jones, Utz-Uwe Haus

Abstract: In recent years the use of FPGAs to accelerate scientific applications has grown, with numerous applications demonstrating the benefit of FPGAs for high performance workloads. However, whilst High Level Synthesis (HLS) has significantly lowered the barrier to entry in programming FPGAs by enabling programmers to use C++, a major challenge is that most often these codes are not originally written i… ▽ More In recent years the use of FPGAs to accelerate scientific applications has grown, with numerous applications demonstrating the benefit of FPGAs for high performance workloads. However, whilst High Level Synthesis (HLS) has significantly lowered the barrier to entry in programming FPGAs by enabling programmers to use C++, a major challenge is that most often these codes are not originally written in C++. Instead, Fortran is the lingua franca of scientific computing and-so it requires a complex and time consuming initial step to convert into C++ even before considering the FPGA. In this paper we describe work enabling Fortran for AMD Xilinx FPGAs by connecting the LLVM Flang front end to AMD Xilinx's LLVM back end. This enables programmers to use Fortran as a first-class language for programming FPGAs, and as we demonstrate enjoy all the tuning and optimisation opportunities that HLS C++ provides. Furthermore, we demonstrate that certain language features of Fortran make it especially beneficial for programming FPGAs compared to C++. The result of this work is a lowering of the barrier to entry in using FPGAs for scientific computing, enabling programmers to leverage their existing codebase and language of choice on the FPGA directly. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: Author accepted version to appear in 33rd International Conference on Field-Programmable Logic and Applications

arXiv:2307.15818 [pdf, other]

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal , et al. (29 additional authors not shown)

Abstract: We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.… ▽ More We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink). △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: Website: https://robotics-transformer.github.io/

arXiv:2307.01928 [pdf, other]

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

Authors: Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar

Abstract: Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for… ▽ More Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io △ Less

Submitted 4 September, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: Conference on Robot Learning (CoRL) 2023, Oral Presentation

arXiv:2306.11722 [pdf, other]

doi 10.1145/3636515

Automated Grading and Feedback Tools for Programming Education: A Systematic Review

Authors: Marcus Messer, Neil C. C. Brown, Michael Kölling, Miaojing Shi

Abstract: We conducted a systematic literature review on automated grading and feedback tools for programming education. We analysed 121 research papers from 2017 to 2021 inclusive and categorised them based on skills assessed, approach, language paradigm, degree of automation and evaluation techniques. Most papers assess the correctness of assignments in object-oriented languages. Typically, these to… ▽ More We conducted a systematic literature review on automated grading and feedback tools for programming education. We analysed 121 research papers from 2017 to 2021 inclusive and categorised them based on skills assessed, approach, language paradigm, degree of automation and evaluation techniques. Most papers assess the correctness of assignments in object-oriented languages. Typically, these tools use a dynamic technique, primarily unit testing, to provide grades and feedback to the students or static analysis techniques to compare a submission with a reference solution or with a set of correct student submissions. However, these techniques' feedback is often limited to whether the unit tests have passed or failed, the expected and actual output, or how they differ from the reference solution. Furthermore, few tools assess the maintainability, readability or documentation of the source code, with most using static analysis techniques, such as code quality metrics, in conjunction with grading correctness. Additionally, we found that most tools offered fully automated assessment to allow for near-instantaneous feedback and multiple resubmissions, which can increase student satisfaction and provide them with more opportunities to succeed. In terms of techniques used to evaluate the tools' performance, most papers primarily use student surveys or compare the automatic assessment tools to grades or feedback provided by human graders. However, because the evaluation dataset is frequently unavailable, it is more difficult to reproduce results and compare tools to a collection of common assignments. △ Less

Submitted 5 December, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

Comments: Accepted version of the manuscript

arXiv:2305.03270 [pdf, other]

Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

Authors: Alexander Herzog, Kanishka Rao, Karol Hausman, Yao Lu, Paul Wohlhart, Mengyuan Yan, Jessica Lin, Montserrat Gonzalez Arenas, Ted Xiao, Daniel Kappler, Daniel Ho, Jarek Rettinghouse, Yevgen Chebotar, Kuang-Huei Lee, Keerthana Gopalakrishnan, Ryan Julian, Adrian Li, Chuyuan Kelly Fu, Bob Wei, Sangeetha Ramesh, Khem Holden, Kim Kleiven, David Rendleman, Sean Kirmani, Jeff Bingham , et al. (15 additional authors not shown)

Abstract: We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL… ▽ More We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects. The projects website and videos can be found at \href{http://rl-at-scale.github.io}{rl-at-scale.github.io}. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Published at Robotics: Science and Systems 2023

arXiv:2305.00512 [pdf, other]

Experiences of running an HPC RISC-V testbed

Authors: Nick Brown, Maurice Jamieson, Joseph K. L. Lee

Abstract: Funded by the UK ExCALIBUR H\&ES exascale programme, in early 2022 a RISC-V testbed for HPC was stood up to provide free access for scientific software developers to experiment with RISC-V for their workloads. Here we report on successes, challenges, and lessons learnt from this activity with a view to better understanding the suitability of RISC-V for HPC and important areas to focus RISC-V HPC c… ▽ More Funded by the UK ExCALIBUR H\&ES exascale programme, in early 2022 a RISC-V testbed for HPC was stood up to provide free access for scientific software developers to experiment with RISC-V for their workloads. Here we report on successes, challenges, and lessons learnt from this activity with a view to better understanding the suitability of RISC-V for HPC and important areas to focus RISC-V HPC community efforts upon. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Author accepted version of extended abstract in RISC-V Summit Europe

arXiv:2304.13138 [pdf, other]

The Update-Equivalence Framework for Decision-Time Planning

Authors: Samuel Sokota, Gabriele Farina, David J. Wu, Hengyuan Hu, Kevin A. Wang, J. Zico Kolter, Noam Brown

Abstract: The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes gr… ▽ More The process of revising (or constructing) a policy at execution time -- known as decision-time planning -- has been key to achieving superhuman performance in perfect-information games like chess and Go. A recent line of work has extended decision-time planning to imperfect-information games, leading to superhuman performance in poker. However, these methods involve solving subgames whose sizes grow quickly in the amount of non-public information, making them unhelpful when the amount of non-public information is large. Motivated by this issue, we introduce an alternative framework for decision-time planning that is not based on solving subgames, but rather on update equivalence. In this update-equivalence framework, decision-time planning algorithms replicate the updates of last-iterate algorithms, which need not rely on public information. This facilitates scalability to games with large amounts of non-public information. Using this framework, we derive a provably sound search algorithm for fully cooperative games based on mirror descent and a search algorithm for adversarial games based on magnetic mirror descent. We validate the performance of these algorithms in cooperative and adversarial domains, notably in Hanabi, the standard benchmark for search in fully cooperative imperfect-information games. Here, our mirror descent approach exceeds or matches the performance of public information-based search while using two orders of magnitude less search time. This is the first instance of a non-public-information-based algorithm outperforming public-information-based approaches in a domain they have historically dominated. △ Less

Submitted 13 May, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.10324 [pdf, other]

Backporting RISC-V Vector assembly

Authors: Joseph K. L. Lee, Maurice Jamieson, Nick Brown

Abstract: Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstr… ▽ More Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Preprint of paper accepted to First International Workshop on RISC-V for HPC (2023)

arXiv:2304.10319 [pdf, other]

Test-driving RISC-V Vector hardware for HPC

Authors: Joseph K. L. Lee, Maurice Jamieson, Nick Brown, Ricardo Jesus

Abstract: Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 proces… ▽ More Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive's U74 processor. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Preprint of paper accepted to First International Workshop on RISC-V for HPC (2023)

arXiv:2304.09511 [pdf, other]

Morpheus unleashed: Fast cross-platform SpMV on emerging architectures

Authors: Christodoulos Stylianou, Mark Klaisoongnoen, Ricardo Jesus, Nick Brown, Michele Weiland

Abstract: Sparse matrices and linear algebra are at the heart of scientific simulations. Over the years, more than 70 sparse matrix storage formats have been developed, targeting a wide range of hardware architectures and matrix types, each of which exploit the particular strengths of an architecture, or the specific sparsity patterns of the matrices. In this work, we explore the suitability of storage fo… ▽ More Sparse matrices and linear algebra are at the heart of scientific simulations. Over the years, more than 70 sparse matrix storage formats have been developed, targeting a wide range of hardware architectures and matrix types, each of which exploit the particular strengths of an architecture, or the specific sparsity patterns of the matrices. In this work, we explore the suitability of storage formats such as COO, CSR and DIA for emerging architectures such as AArch64 CPUs and FPGAs. In addition, we detail hardware-specific optimisations to these targets and evaluate the potential of each contribution to be integrated into Morpheus, a modern library that provides an abstraction of sparse matrices (currently) across x86 CPUs and NVIDIA/AMD GPUs. Finally, we validate our work by comparing the performance of the Morpheus-enabled HPCG benchmark against vendor-optimised implementations. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.13525 [pdf, other]

Forecasting Workload in Cloud Computing: Towards Uncertainty-Aware Predictions and Transfer Learning

Authors: Andrea Rossi, Andrea Visentin, Diego Carraro, Steven Prestwich, Kenneth N. Brown

Abstract: Predicting future resource demand in Cloud Computing is essential for optimizing the trade-off between serving customers' requests efficiently and minimizing the provisioning cost. Modelling prediction uncertainty is also desirable to better inform the resource decision-making process, but research in this field is under-investigated. In this paper, we propose univariate and bivariate Bayesian dee… ▽ More Predicting future resource demand in Cloud Computing is essential for optimizing the trade-off between serving customers' requests efficiently and minimizing the provisioning cost. Modelling prediction uncertainty is also desirable to better inform the resource decision-making process, but research in this field is under-investigated. In this paper, we propose univariate and bivariate Bayesian deep learning models that provide predictions of future workload demand and its uncertainty. We run extensive experiments on Google and Alibaba clusters, where we first train our models with datasets from different cloud providers and compare them with LSTM-based baselines. Results show that modelling the uncertainty of predictions has a positive impact on performance, especially on service level metrics, because uncertainty quantification can be tailored to desired target service levels that are critical in cloud applications. Moreover, we investigate whether our models benefit transfer learning capabilities across different domains, i.e. dataset distributions. Experiments on the same workload datasets reveal that acceptable transfer learning performance can be achieved within the same provider (because distributions are more similar). Also, domain knowledge does not transfer when the source and target domains are very different (e.g. from different providers), but this performance degradation can be mitigated by increasing the training set size of the source domain. △ Less

Submitted 12 November, 2023; v1 submitted 24 February, 2023; originally announced March 2023.

arXiv:2301.13016 [pdf, other]

doi 10.1145/3543622.3573047

Exploring the Versal AI engines for accelerating stencil-based atmospheric advection simulation

Authors: Nick Brown

Abstract: AMD Xilinx's new Versal Adaptive Compute Acceleration Platform (ACAP) is an FPGA architecture combining reconfigurable fabric with other on-chip hardened compute resources. AI engines are one of these and, by operating in a highly vectorized manner, they provide significant raw compute that is potentially beneficial for a range of workloads including HPC simulation. However, this technology is sti… ▽ More AMD Xilinx's new Versal Adaptive Compute Acceleration Platform (ACAP) is an FPGA architecture combining reconfigurable fabric with other on-chip hardened compute resources. AI engines are one of these and, by operating in a highly vectorized manner, they provide significant raw compute that is potentially beneficial for a range of workloads including HPC simulation. However, this technology is still early-on, and as yet unproven for accelerating HPC codes, with a lack of benchmarking and best practice. This paper presents an experience report, exploring porting of the Piacsek and Williams (PW) advection scheme onto the Versal ACAP, using the chip's AI engines to accelerate the compute. A stencil-based algorithm, advection is commonplace in atmospheric modelling, including several Met Office codes who initially developed this scheme. Using this algorithm as a vehicle, we explore optimal approaches for structuring AI engine compute kernels and how best to interface the AI engines with programmable logic. Evaluating performance using a VCK5000 against non-AI engine FPGA configurations on the VCK5000 and Alveo U280, as well as a 24-core Xeon Platinum Cascade Lake CPU and Nvidia V100 GPU, we found that whilst the number of channels between the fabric and AI engines are a limitation, by leveraging the ACAP we can double performance compared to an Alveo U280. △ Less

Submitted 28 December, 2022; originally announced January 2023.

Comments: Accepted version of paper appearing in the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '23), February 12--14, 2023, Monterey, CA, US

arXiv:2301.09159 [pdf, other]

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

Authors: Samuel Sokota, Ryan D'Orazio, Chun Kai Ling, David J. Wu, J. Zico Kolter, Noam Brown

Abstract: In their seminal work, Nayyar et al. (2013) showed that imperfect information can be abstracted away from common-payoff games by having players publicly announce their policies as they play. This insight underpins sound solvers and decision-time planning algorithms for common-payoff games. Unfortunately, a naive application of the same insight to two-player zero-sum games fails because Nash equili… ▽ More In their seminal work, Nayyar et al. (2013) showed that imperfect information can be abstracted away from common-payoff games by having players publicly announce their policies as they play. This insight underpins sound solvers and decision-time planning algorithms for common-payoff games. Unfortunately, a naive application of the same insight to two-player zero-sum games fails because Nash equilibria of the game with public policy announcements may not correspond to Nash equilibria of the original game. As a consequence, existing sound decision-time planning algorithms require complicated additional mechanisms that have unappealing properties. The main contribution of this work is showing that certain regularized equilibria do not possess the aforementioned non-correspondence problem -- thus, computing them can be treated as perfect-information problems. Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective to solving two-player zero-sum games and yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches. △ Less

Submitted 31 July, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

arXiv:2301.07615 [pdf, other]

Task-based preemptive scheduling on FPGAs leveraging partial reconfiguration

Authors: Gabriel Rodriguez-Canal, Nick Brown, Yuri Torres, Arturo Gonzalez-Escribano

Abstract: FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, especially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that i… ▽ More FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, especially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that internally leverages High-Level Synthesis, Dynamic Partial Reconfiguration and synchronisation mechanisms to use an FPGA as a multi-tasking server with preemptive scheduling and priority queues. This leads to an improved use of the FPGA resources, allowing the execution of several different kernels concurrently and deploying the most urgent ones as fast as possible. The results of our experimental study show that our approach incurs only a 10% overhead in the worst case when using two reconfigurable regions, whilst providing a significant performance improvement of at least 24% over the traditional full reconfiguration approach. △ Less

Submitted 18 January, 2023; originally announced January 2023.

Comments: Article preprint. arXiv admin note: text overlap with arXiv:2209.04410

arXiv:2212.13981 [pdf, other]

doi 10.1109/CIW-IUS56691.2022.00007

Web-based volunteer distributed computing for handling time-critical urgent workloads

Authors: Nick Brown, Simon Newby

Abstract: Urgent computing workloads are time critical, unpredictable, and highly dynamic. Whilst efforts are on-going to run these on traditional HPC machines, another option is to leverage the computing power donated by volunteers. Volunteer computing, where members of the public donate some of their CPU time to large scale projects has been popular for many years because it is a powerful way of deliverin… ▽ More Urgent computing workloads are time critical, unpredictable, and highly dynamic. Whilst efforts are on-going to run these on traditional HPC machines, another option is to leverage the computing power donated by volunteers. Volunteer computing, where members of the public donate some of their CPU time to large scale projects has been popular for many years because it is a powerful way of delivering compute for specific problems, with the public often eager to contribute to a good cause with societal benefits. However, traditional volunteer computing has required user installation of specialist software which is a barrier to entry, and the development of the software itself by the projects, even on-top of existing frameworks, is non-trivial. As such, the number of users donating CPU time to these volunteer computing projects has decreased in recent years, and this comes at a time when the frequency of disasters, often driven by climate change, are rising fast. We believe that an alternative approach, where visitors to websites donate some of their CPU time whilst they are browsing, has the potential to address these issues. However, web-based distributed computing is an immature field and there are numerous questions that must be answered to fully understand the viability of leveraging the large scale parallelism that website visitors represent. In this paper we describe our web-based distributed computing framework, Panther, and perform in-depth performance experiments for two benchmarks using real world hardware and real world browsing habits for the first time. By exploring the performance characteristics of our approach we demonstrate that this is viable for urgent workloads, but there are numerous caveats, not least the most appropriate visitor patterns to a website, that must be considered. △ Less

Submitted 28 December, 2022; originally announced December 2022.

Comments: Accepted version of paper presented at 2022 First Combined International Workshop on Interactive Urgent Supercomputing (CIW-IUS)

arXiv:2212.13977

doi 10.1109/H2RC56700.2022.00008

Fast and energy-efficient derivatives risk analysis: Streaming option Greeks on Xilinx and Intel FPGAs

Authors: Mark Klaisoongnoen, Nick Brown, Oliver Brown

Abstract: Whilst FPGAs have enjoyed success in accelerating high-frequency financial workloads for some time, their use for quantitative finance, which is the use of mathematical models to analyse financial markets and securities, has been far more limited to-date. Currently, CPUs are the most common architecture for such workloads, and an important question is whether FPGAs can ameliorate some of the bottl… ▽ More Whilst FPGAs have enjoyed success in accelerating high-frequency financial workloads for some time, their use for quantitative finance, which is the use of mathematical models to analyse financial markets and securities, has been far more limited to-date. Currently, CPUs are the most common architecture for such workloads, and an important question is whether FPGAs can ameliorate some of the bottlenecks encountered on those architectures. In this paper we extend our previous work accelerating the industry standard Securities Technology Analysis Center's (STAC\textregistered) derivatives risk analysis benchmark STAC-A2\texttrademark{}, by first porting this from our previous Xilinx implementation to an Intel Stratix-10 FPGA, exploring the challenges encountered when moving from one FPGA architecture to another and suitability of techniques. We then present a host-data-streaming approach that ultimately outperforms our previous version on a Xilinx Alveo U280 FPGA by up to 4.6 times and requiring 9 times less energy at the largest problem size, while outperforming the CPU and GPU versions by up to 8.2 and 5.2 times respectively. The result of this work is a significant enhancement in FPGA performance against the previous version for this industry standard benchmark running on both Xilinx and Intel FPGAs, and furthermore an exploration of optimisation and porting techniques that can be applied to other HPC workloads. △ Less

Submitted 2 February, 2024; v1 submitted 28 December, 2022; originally announced December 2022.

Comments: This work uses a benchmark of STAC, whilst this was approved at the time they have asked we remove the paper as it needs to be made more explicit that these are unofficial ports and are entirely independent from any vendor and don't follow STAC rules. As we are comparing vendor hardware in the paper, it was felt that this could easily be mistaken to be representing something that the paper is not

arXiv:2212.06817 [pdf, other]

RT-1: Robotics Transformer for Real-World Control at Scale

Authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Joseph Dabis, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Tomas Jackson, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Kuang-Huei Lee, Sergey Levine, Yao Lu, Utsav Malla, Deeksha Manjunath , et al. (26 additional authors not shown)

Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, wher… ▽ More By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io △ Less

Submitted 11 August, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: See website at robotics-transformer1.github.io

arXiv:2211.14907 [pdf, other]

Strategically revealing capabilities in General Lotto games

Authors: Keith Paarporn, Philip N. Brown

Abstract: Can revealing one's competitive capabilities to an opponent offer strategic benefits? In this paper, we address this question in the context of General Lotto games, a class of two-player competitive resource allocation models. We consider an asymmetric information setting where the opponent is uncertain about the resource budget of the other player, and holds a prior belief on its value. We assume… ▽ More Can revealing one's competitive capabilities to an opponent offer strategic benefits? In this paper, we address this question in the context of General Lotto games, a class of two-player competitive resource allocation models. We consider an asymmetric information setting where the opponent is uncertain about the resource budget of the other player, and holds a prior belief on its value. We assume the other player, called the signaler, is able to send a noisy signal about its budget to the opponent. With its updated belief, the opponent then must decide to invest in costly resources that it will deploy against the signaler's resource budget in a General Lotto game. We derive the subgame perfect equilibrium to this extensive-form game. In particular, we identify necessary and sufficient conditions for which a signaling policy improves the signaler's resulting performance in comparison to the scenario where it does not send any signal. Moreover, we provide the optimal signaling policy when these conditions are met. Notably we find that for some scenarios, the signaler can effectively double its performance. △ Less

Submitted 6 April, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: 8 pages

arXiv:2211.07794 [pdf, other]

Augmented Thresholds for MONI

Authors: César Martínez-Guardiola, Nathaniel K. Brown, Fernando Silva-Coira, Dominik Köppl, Travis Gagie, Susana Ladra

Abstract: MONI (Rossi et al., 2022) can store a pangenomic dataset T in small space and later, given a pattern P, quickly find the maximal exact matches (MEMs) of P with respect to T. In this paper we consider its one-pass version (Boucher et al., 2021), whose query times are dominated in our experiments by longest common extension (LCE) queries. We show how a small modification lets us avoid most of these… ▽ More MONI (Rossi et al., 2022) can store a pangenomic dataset T in small space and later, given a pattern P, quickly find the maximal exact matches (MEMs) of P with respect to T. In this paper we consider its one-pass version (Boucher et al., 2021), whose query times are dominated in our experiments by longest common extension (LCE) queries. We show how a small modification lets us avoid most of these queries and thus significantly speeds up MONI in practice while only slightly increasing its size. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: 10 pages, 2 figures, preprint

arXiv:2210.05492 [pdf, other]

Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Authors: Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown

Abstract: No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address… ▽ More No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We prove that this is a no-regret learning algorithm under a modified utility function. We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model. We used RL-DiL-piKL to train an agent we name Diplodocus. In a 200-game no-press Diplomacy tournament involving 62 human participants spanning skill levels from beginner to expert, two Diplodocus agents both achieved a higher average score than all other participants who played more than two games, and ranked first and third according to an Elo ratings model. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2210.05125 [pdf, ps, other]

Human-AI Coordination via Human-Regularized Search and Learning

Authors: Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown

Abstract: We consider the problem of making AI agents that collaborate well with humans in partially observable fully cooperative environments given datasets of human behavior. Inspired by piKL, a human-data-regularized search method that improves upon a behavioral cloning policy without diverging far away from it, we develop a three-step algorithm that achieve strong performance in coordinating with real h… ▽ More We consider the problem of making AI agents that collaborate well with humans in partially observable fully cooperative environments given datasets of human behavior. Inspired by piKL, a human-data-regularized search method that improves upon a behavioral cloning policy without diverging far away from it, we develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark. We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels. Then, we integrate the policy regularization idea into reinforcement learning to train a human-like best response to the human model. Finally, we apply regularized search on top of the best response policy at test time to handle out-of-distribution challenges when playing with humans. We evaluate our method in two large scale experiments with humans. First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams. Second, we show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents. △ Less

Submitted 10 October, 2022; originally announced October 2022.

arXiv:2210.04795 [pdf, other]

TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer Scale Engine

Authors: Nick Brown, Brandon Echols, Justs Zarins, Tobias Grosser

Abstract: The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw compute means that it is also a very interesting potential target for accelerating traditional HPC computational codes. Many of these algorithms are stencil-based,… ▽ More The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw compute means that it is also a very interesting potential target for accelerating traditional HPC computational codes. Many of these algorithms are stencil-based, where update operations involve contributions from neighbouring elements, and in this paper we explore the suitability of this technology for such codes from the perspective of an early adopter of the technology, compared to CPUs and GPUs. Using TensorFlow as the interface, we explore the performance and demonstrate that, whilst there is still work to be done around exposing the programming interface to users, performance of the WSE is impressive as it out performs four V100 GPUs by two and a half times and two Intel Xeon Platinum CPUs by around 114 times in our experiments. There is significant potential therefore for this technology to play an important role in accelerating HPC codes on future exascale supercomputers. △ Less

Submitted 26 August, 2022; originally announced October 2022.

Comments: This preprint has not undergone any post-submission improvements or corrections. Preprint of paper submitted to Euro-Par DSL-HPC workshop

arXiv:2209.07055 [pdf, other]

Valid Utility Games with Information Sharing Constraints

Authors: David Grimsman, Philip N. Brown, Jason R. Marden

Abstract: The use of game theoretic methods for control in multiagent systems has been an important topic in recent research. Valid utility games in particular have been used to model real-world problems; such games have the convenient property that the value of any decision set which is a Nash equilibrium of the game is guaranteed to be within 1/2 of the value of the optimal decision set. However, an impli… ▽ More The use of game theoretic methods for control in multiagent systems has been an important topic in recent research. Valid utility games in particular have been used to model real-world problems; such games have the convenient property that the value of any decision set which is a Nash equilibrium of the game is guaranteed to be within 1/2 of the value of the optimal decision set. However, an implicit assumption in this guarantee is that each agent is aware of the decisions of all other agents. In this work, we first describe how this guarantee degrades as agents are only aware of a subset of the decisions of other agents. We then show that this loss can be mitigated by restriction to a relevant subclass of games. △ Less

Submitted 15 September, 2022; originally announced September 2022.

arXiv:2209.04410 [pdf, other]

Programming abstractions for preemptive scheduling in FPGAs using partial reconfiguration

Authors: Gabriel Rodriguez-Canal, Nick Brown, Yuri Torres, Arturo Gonzalez-Escribano

Abstract: FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, specially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that in… ▽ More FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use, specially in scenarios where diverse types of tasks should be dynamically executed. In this work we present a programming abstraction with a simple interface that internally leverages High-Level Synthesis, Dynamic Partial Reconfiguration and synchronisation mechanisms to use an FPGA as a multi-tasking server with preemptive scheduling and priority queues. This leads to a better use of the FPGA resources, allowing the execution of several kernels at the same time and deploying the most urgent ones as fast as possible. The results of our experimental study show that our approach incurs only a 1.66% overhead when using only one Reconfigurable Region (RR), and 4.04% when using two RRs, whilst presenting a significant performance improvement over the traditional non-preemptive full reconfiguration approach. △ Less

Submitted 26 August, 2022; originally announced September 2022.

Comments: This preprint to the HeteroPar workshop has not undergone any post-submission improvements or corrections

arXiv:2209.02847

DC-Art-GAN: Stable Procedural Content Generation using DC-GANs for Digital Art

Authors: Rohit Gandikota, Nik Bear Brown

Abstract: Art is an artistic method of using digital technologies as a part of the generative or creative process. With the advent of digital currency and NFTs (Non-Fungible Token), the demand for digital art is growing aggressively. In this manuscript, we advocate the concept of using deep generative networks with adversarial training for a stable and variant art generation. The work mainly focuses on usin… ▽ More Art is an artistic method of using digital technologies as a part of the generative or creative process. With the advent of digital currency and NFTs (Non-Fungible Token), the demand for digital art is growing aggressively. In this manuscript, we advocate the concept of using deep generative networks with adversarial training for a stable and variant art generation. The work mainly focuses on using the Deep Convolutional Generative Adversarial Network (DC-GAN) and explores the techniques to address the common pitfalls in GAN training. We compare various architectures and designs of DC-GANs to arrive at a recommendable design choice for a stable and realistic generation. The main focus of the work is to generate realistic images that do not exist in reality but are synthesised from random noise by the proposed model. We provide visual results of generated animal face images (some pieces of evidence showing a blend of species) along with recommendations for training, architecture and design choices. We also show how training image preprocessing plays a massive role in GAN training. △ Less

Submitted 13 March, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: the project is done as an undergrad report. On the hind sight, it does not contain full and exhaustive analysis

arXiv:2209.00894 [pdf, other]

Performance of the Vipera framework for DSLs on micro-core architectures

Authors: Maurice Jamieson, Nick Brown

Abstract: Vipera provides a compiler and runtime framework for implementing dynamic Domain-Specific Languages on micro-core architectures. The performance and code size of the generated code is critical on these architectures. In this paper we present the results of our investigations into the efficiency of Vipera in terms of code performance and size. Vipera provides a compiler and runtime framework for implementing dynamic Domain-Specific Languages on micro-core architectures. The performance and code size of the generated code is critical on these architectures. In this paper we present the results of our investigations into the efficiency of Vipera in terms of code performance and size. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: This preprint to the DSL-HPC workshop has not undergone any post-submission improvements or corrections

arXiv:2207.06504 [pdf, other]

A Coupling Approach to Analyzing Games with Dynamic Environments

Authors: Brandon C. Collins, Shouhuai Xu, Philip N. Brown

Abstract: The theory of learning in games has extensively studied situations where agents respond dynamically to each other by optimizing a fixed utility function. However, in real situations, the strategic environment varies as a result of past agent choices. Unfortunately, the analysis techniques that enabled a rich characterization of the emergent behavior in static environment games fail to cope with dy… ▽ More The theory of learning in games has extensively studied situations where agents respond dynamically to each other by optimizing a fixed utility function. However, in real situations, the strategic environment varies as a result of past agent choices. Unfortunately, the analysis techniques that enabled a rich characterization of the emergent behavior in static environment games fail to cope with dynamic environment games. To address this, we develop a general framework using probabilistic couplings to extend the analysis of static environment games to dynamic ones. Using this approach, we obtain sufficient conditions under which traditional characterizations of Nash equilibria with best response dynamics and stochastic stability with log-linear learning can be extended to dynamic environment games. As a case study, we pose a model of cyber threat intelligence sharing between firms and a simple dynamic game-theoretic model of social precautions in an epidemic, both of which feature dynamic environments. For both examples, we obtain conditions under which the emergent behavior is characterized in the dynamic game by performing the traditional analysis on a reference static environment game. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: arXiv admin note: text overlap with arXiv:2103.13475

arXiv:2207.06411 [pdf, other]

Information Design for Vehicle-to-Vehicle Communication

Authors: Brendan T. Gould, Philip N. Brown

Abstract: The emerging technology of Vehicle-to-Vehicle (V2V) communication over vehicular ad hoc networks promises to improve road safety by allowing vehicles to autonomously warn each other of road hazards. However, research on other transportation information systems has shown that informing only a subset of drivers of road conditions may have a perverse effect of increasing congestion. In the context of… ▽ More The emerging technology of Vehicle-to-Vehicle (V2V) communication over vehicular ad hoc networks promises to improve road safety by allowing vehicles to autonomously warn each other of road hazards. However, research on other transportation information systems has shown that informing only a subset of drivers of road conditions may have a perverse effect of increasing congestion. In the context of a simple (yet novel) model of V2V hazard information sharing, we ask whether partial adoption of this technology can similarly lead to undesirable outcomes. In our model, drivers individually choose how recklessly to behave as a function of information received from other V2V-enabled cars, and the resulting aggregate behavior influences the likelihood of accidents (and thus the information propagated by the vehicular network). We fully characterize the game-theoretic equilibria of this model using our new equilibrium concept. Our model indicates that for a wide range of the parameter space, V2V information sharing surprisingly increases the equilibrium frequency of accidents relative to no V2V information sharing, and that it may increase equilibrium social cost as well. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2207.05846

arXiv:2207.05846 [pdf, other]

On Partial Adoption of Vehicle-to-Vehicle Communication: When Should Cars Warn Each Other of Hazards?

Authors: Brendan T. Gould, Philip N. Brown

Abstract: The emerging technology of Vehicle-to-Vehicle (V2V) communication over vehicular ad hoc networks promises to improve road safety by allowing vehicles to autonomously warn each other of road hazards. However, research on other transportation information systems has shown that informing only a subset of drivers of road conditions may have a perverse effect of increasing congestion. In the context of… ▽ More The emerging technology of Vehicle-to-Vehicle (V2V) communication over vehicular ad hoc networks promises to improve road safety by allowing vehicles to autonomously warn each other of road hazards. However, research on other transportation information systems has shown that informing only a subset of drivers of road conditions may have a perverse effect of increasing congestion. In the context of a simple (yet novel) model of V2V hazard information sharing, we ask whether partial adoption of this technology can similarly lead to undesirable outcomes. In our model, drivers individually choose how recklessly to behave as a function of information received from other V2V-enabled cars, and the resulting aggregate behavior influences the likelihood of accidents (and thus the information propagated by the vehicular network). We fully characterize the game-theoretic equilibria of this model. Our model indicates that for a wide range of our parameter space, V2V information sharing surprisingly increases the equilibrium frequency of accidents relative to no V2V information sharing, and that it may increase equilibrium social cost as well. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: Presented at 2022 American Control Conference

arXiv:2207.05608 [pdf, other]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Authors: Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

Abstract: Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to… ▽ More Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: Project website: https://innermonologue.github.io

arXiv:2207.01721 [pdf, ps, other]

Providing slowdown information to improve selfish routing

Authors: Philip N. Brown

Abstract: Recent research in the social sciences has identified situations in which small changes in the way that information is provided to consumers can have large aggregate effects on behavior. This has been promoted in popular media in areas of public health and wellness, but its application to other areas has not been broadly studied. This paper presents a simple model which expresses the effect of pro… ▽ More Recent research in the social sciences has identified situations in which small changes in the way that information is provided to consumers can have large aggregate effects on behavior. This has been promoted in popular media in areas of public health and wellness, but its application to other areas has not been broadly studied. This paper presents a simple model which expresses the effect of providing commuters with carefully-curated information regarding aggregate traffic "slowdowns" on the various roads in a transportation network. Much of the work on providing information to commuters focuses specifically on travel-time information. However, the model in the present paper allows a system planner to provide slowdown information as well; that is, commuters are additionally told how much slower each route is as compared to its uncongested state. We show that providing this additional information can improve equilibrium routing efficiency when compared to the case when commuters are only given information about travel time, but that these improvements in congestion are not universal. That is, transportation networks exist on which any provision of slowdown information can harm equilibrium congestion. In addition, this paper illuminates a deep connection between the effects of commuter slowdown-sensitivity and the study of marginal-cost pricing and altruism in congestion games. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2206.14103 [pdf, other]

Workflows to driving high-performance interactive supercomputing for urgent decision making

Authors: Nick Brown, Rupert Nash, Gordon Gibb, Evgenij Belikov, Artur Podobas, Wei Der Chien, Stefano Markidis, Markus Flatken, Andreas Gerndt

Abstract: Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; na… ▽ More Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; namely the users, the simulation codes, and external data sources, together in a structured and accessible manner. In this paper we explore the role of workflows from both the perspective of marshalling and control of urgent workloads, and at the individual HPC machine level. Ultimately requiring two workflow systems, by using a space weather prediction urgent use-cases, we explore the benefit that these two workflow systems provide especially when one exploits the flexibility enabled by them interoperating. △ Less

Submitted 28 June, 2022; originally announced June 2022.

Comments: Pre-print of paper accepted to the InteractiveHPC workshop of ISC2022

Showing 1–50 of 132 results for author: Brown, N