Search | arXiv e-print repository

Development of REGAI: Rubric Enabled Generative Artificial Intelligence

Abstract: This paper presents and evaluates a new retrieval augmented generation (RAG) and large language model (LLM)-based artificial intelligence (AI) technique: rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics, which can be created manually or automatically by the system, to enhance the performance of LLMs for evaluation purposes. REGAI improves on the performance of both cla… ▽ More This paper presents and evaluates a new retrieval augmented generation (RAG) and large language model (LLM)-based artificial intelligence (AI) technique: rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics, which can be created manually or automatically by the system, to enhance the performance of LLMs for evaluation purposes. REGAI improves on the performance of both classical LLMs and RAG-based LLM techniques. This paper describes REGAI, presents data regarding its performance and discusses several possible application areas for the technology. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2406.11272 [pdf]

Development of an Adaptive Multi-Domain Artificial Intelligence System Built using Machine Learning and Expert Systems Technologies

Authors: Jeremy Straub

Abstract: Producing an artificial general intelligence (AGI) has been an elusive goal in artificial intelligence (AI) research for some time. An AGI would have the capability, like a human, to be exposed to a new problem domain, learn about it and then use reasoning processes to make decisions. While AI techniques have been used across a wide variety of problem domains, an AGI would require an AI that could… ▽ More Producing an artificial general intelligence (AGI) has been an elusive goal in artificial intelligence (AI) research for some time. An AGI would have the capability, like a human, to be exposed to a new problem domain, learn about it and then use reasoning processes to make decisions. While AI techniques have been used across a wide variety of problem domains, an AGI would require an AI that could reason beyond its programming and training. This paper presents a small step towards producing an AGI. It describes a mechanism for an AI to learn about and develop reasoning pathways to make decisions in an a priori unknown domain. It combines a classical AI technique, the expert system, with a its modern adaptation - the gradient descent trained expert system (GDTES) - and utilizes generative artificial intelligence (GAI) to create a network and training data set for this system. These can be created from available sources or may draw upon knowledge incorporated in a GAI's own pre-trained model. The learning process in GDTES is used to optimize the AI's decision-making. While this approach does not meet the standards that many have defined for an AGI, it provides a somewhat similar capability, albeit one which requires a learning process before use. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10224 [pdf, other]

EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

Authors: Julian Straub, Daniel DeTone, Tianwei Shen, Nan Yang, Chris Sweeney, Richard Newcombe

Abstract: The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D,… ▽ More The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D, a benchmark with two core 3D egocentric perception tasks. EFM3D is the first benchmark for 3D object detection and surface regression on high quality annotated egocentric data of Project Aria. We propose Egocentric Voxel Lifting (EVL), a baseline for 3D EFMs. EVL leverages all available egocentric modalities and inherits foundational capabilities from 2D foundation models. This model, trained on a large simulated dataset, outperforms existing methods on the EFM3D benchmark. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2405.00694 [pdf]

Analysis of the Efficacy of the Use of Inertial Measurement and Global Positioning System Data to Reverse Engineer Automotive CAN Bus Steering Signals

Authors: Kevin Setterstrom, Jeremy Straub

Abstract: Autonomous vehicle control is growing in availability for new vehicles and there is a potential need to retrofit older vehicles with this capability. Additionally, automotive cybersecurity has become a significant concern in recent years due to documented attacks on vehicles. As a result, researchers have been exploring reverse engineering techniques to automate vehicle control and improve vehicle… ▽ More Autonomous vehicle control is growing in availability for new vehicles and there is a potential need to retrofit older vehicles with this capability. Additionally, automotive cybersecurity has become a significant concern in recent years due to documented attacks on vehicles. As a result, researchers have been exploring reverse engineering techniques to automate vehicle control and improve vehicle security and threat analysis. In prior work, a vehicle's accelerator and brake pedal controller area network (CAN) channels were identified using reverse engineering techniques without prior knowledge of the vehicle. However, the correlation results for deceleration were lower than those for acceleration, which may be able to be improved by incorporating data from an additional telemetry device. In this paper, a method that uses IMU and GPS data to reverse-engineer a vehicle's steering wheel position CAN channels, without prior knowledge of the vehicle, is presented. Using GPS data is shown to greatly improve correlation values for deceleration, particularly for the brake pedal CAN channels. This work demonstrates the efficacy of using these data sources for automotive CAN reverse engineering. This has potential uses in automotive vehicle control and for improving vehicle security and threat analysis. △ Less

Submitted 27 March, 2024; originally announced May 2024.

arXiv:2404.11714 [pdf]

Implementation and Evaluation of a Gradient Descent-Trained Defensible Blackboard Architecture System

Authors: Jordan Milbrath, Jonathan Rivard, Jeremy Straub

Abstract: A variety of forms of artificial intelligence systems have been developed. Two well-known techniques are neural networks and rule-fact expert systems. The former can be trained from presented data while the latter is typically developed by human domain experts. A combined implementation that uses gradient descent to train a rule-fact expert system has been previously proposed. A related system typ… ▽ More A variety of forms of artificial intelligence systems have been developed. Two well-known techniques are neural networks and rule-fact expert systems. The former can be trained from presented data while the latter is typically developed by human domain experts. A combined implementation that uses gradient descent to train a rule-fact expert system has been previously proposed. A related system type, the Blackboard Architecture, adds an actualization capability to expert systems. This paper proposes and evaluates the incorporation of a defensible-style gradient descent training capability into the Blackboard Architecture. It also introduces the use of activation functions for defensible artificial intelligence systems and implements and evaluates a new best path-based training algorithm. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.18118 [pdf, other]

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Authors: Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney

Abstract: In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and… ▽ More In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale. △ Less

Submitted 22 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: ECCV 2024 camera ready version. Project page: https://egolifter.github.io/

arXiv:2403.14712 [pdf, other]

AI for bureaucratic productivity: Measuring the potential of AI to help automate 143 million UK government transactions

Authors: Vincent J. Straub, Youmna Hashem, Jonathan Bright, Satyam Bhagwanani, Deborah Morgan, John Francis, Saba Esnaashari, Helen Margetts

Abstract: There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK centra… ▽ More There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK central government, and measuring their potential for AI-driven automation. We estimate that UK central government conducts approximately one billion citizen-facing transactions per year in the provision of around 400 services, of which approximately 143 million are complex repetitive transactions. We estimate that 84% of these complex transactions are highly automatable, representing a huge potential opportunity: saving even an average of just one minute per complex transaction would save the equivalent of approximately 1,200 person-years of work every year. We also develop a model to estimate the volume of transactions a government service undertakes, providing a way for government to avoid conducting time consuming transaction volume measurements. Finally, we find that there is high turnover in the types of services government provide, meaning that automation efforts should focus on general procedures rather than services themselves which are likely to evolve over time. Overall, our work presents a novel perspective on the structure and functioning of modern government, and how it might evolve in the age of artificial intelligence. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2401.16840 [pdf, other]

Towards Large-scale Network Emulation on Analog Neuromorphic Hardware

Authors: Elias Arnold, Philipp Spilger, Jan V. Straub, Eric Müller, Dominik Dold, Gabriele Meoni, Johannes Schemmel

Abstract: We present a novel software feature for the BrainScaleS-2 accelerated neuromorphic platform that facilitates the emulation of partitioned large-scale spiking neural networks. This approach is well suited for many deep spiking neural networks, where the constraint of the largest recurrent subnetwork fitting on the substrate or the limited fan-in of neurons is often not a limitation in practice. We… ▽ More We present a novel software feature for the BrainScaleS-2 accelerated neuromorphic platform that facilitates the emulation of partitioned large-scale spiking neural networks. This approach is well suited for many deep spiking neural networks, where the constraint of the largest recurrent subnetwork fitting on the substrate or the limited fan-in of neurons is often not a limitation in practice. We demonstrate the training of two deep spiking neural network models, using the MNIST and EuroSAT datasets, that exceed the physical size constraints of a single-chip BrainScaleS-2 system. The ability to emulate and train networks larger than the substrate provides a pathway for accurate performance evaluation in planned or scaled systems, ultimately advancing the development and understanding of large-scale models and neuromorphic computing architectures. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.16027 [pdf, other]

Domain adaptation strategies for 3D reconstruction of the lumbar spine using real fluoroscopy data

Authors: Sascha Jecklin, Youyang Shen, Amandine Gout, Daniel Suter, Lilian Calvet, Lukas Zingg, Jennifer Straub, Nicola Alessandro Cavalcanti, Mazda Farshad, Philipp Fürnstahl, Hooman Esfandiari

Abstract: This study tackles key obstacles in adopting surgical navigation in orthopedic surgeries, including time, cost, radiation, and workflow integration challenges. Recently, our work X23D showed an approach for generating 3D anatomical models of the spine from only a few intraoperative fluoroscopic images. This negates the need for conventional registration-based surgical navigation by creating a dire… ▽ More This study tackles key obstacles in adopting surgical navigation in orthopedic surgeries, including time, cost, radiation, and workflow integration challenges. Recently, our work X23D showed an approach for generating 3D anatomical models of the spine from only a few intraoperative fluoroscopic images. This negates the need for conventional registration-based surgical navigation by creating a direct intraoperative 3D reconstruction of the anatomy. Despite these strides, the practical application of X23D has been limited by a domain gap between synthetic training data and real intraoperative images. In response, we devised a novel data collection protocol for a paired dataset consisting of synthetic and real fluoroscopic images from the same perspectives. Utilizing this dataset, we refined our deep learning model via transfer learning, effectively bridging the domain gap between synthetic and real X-ray data. A novel style transfer mechanism also allows us to convert real X-rays to mirror the synthetic domain, enabling our in-silico-trained X23D model to achieve high accuracy in real-world settings. Our results demonstrated that the refined model can rapidly generate accurate 3D reconstructions of the entire lumbar spine from as few as three intraoperative fluoroscopic shots. It achieved an 84% F1 score, matching the accuracy of our previous synthetic data-based research. Additionally, with a computational time of only 81.1 ms, our approach provides real-time capabilities essential for surgery integration. Through examining ideal imaging setups and view angle dependencies, we've further confirmed our system's practicality and dependability in clinical settings. Our research marks a significant step forward in intraoperative 3D reconstruction, offering enhancements to surgical planning, navigation, and robotics. △ Less

Submitted 18 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2310.01401 [pdf, other]

Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection

Authors: Yiming Xie, Huaizu Jiang, Georgia Gkioxari, Julian Straub

Abstract: We present PARQ - a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. Unlike previous works that use learnable features or only encode 3D point positions as queries in the decoder, PARQ leverages appearance-enhanced queries initialized from reference points in 3D space and updates their 3D location with recurrent cross-attention operations. Incorporating pixel-ali… ▽ More We present PARQ - a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. Unlike previous works that use learnable features or only encode 3D point positions as queries in the decoder, PARQ leverages appearance-enhanced queries initialized from reference points in 3D space and updates their 3D location with recurrent cross-attention operations. Incorporating pixel-aligned features and cross attention enables the model to encode the necessary 3D-to-2D correspondences and capture global contextual information of the input images. PARQ outperforms prior best methods on the ScanNet and ARKitScenes datasets, learns and detects faster, is more robust to distribution shifts in reference points, can leverage additional input views without retraining, and can adapt inference compute by changing the number of recurrent iterations. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: ICCV 2023. Project page: https://ymingxie.github.io/parq

arXiv:2308.13561 [pdf, other]

Project Aria: A New Tool for Egocentric Multi-Modal AI Research

Authors: Jakob Engel, Kiran Somasundaram, Michael Goesele, Albert Sun, Alexander Gamino, Andrew Turner, Arjang Talattof, Arnie Yuan, Bilal Souti, Brighid Meredith, Cheng Peng, Chris Sweeney, Cole Wilson, Dan Barnes, Daniel DeTone, David Caruso, Derek Valleroy, Dinesh Ginjupalli, Duncan Frost, Edward Miller, Elias Mueggler, Evgeniy Oleinik, Fan Zhang, Guruprasad Somasundaram, Gustavo Solaira , et al. (49 additional authors not shown)

Abstract: Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul… ▽ More Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data. △ Less

Submitted 1 October, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2307.11781 [pdf]

Development of an Autonomous Reverse Engineering Capability for Controller Area Network Messages to Support Autonomous Control Retrofits

Authors: Kevin Setterstrom, Jeremy Straub

Abstract: As the autonomous vehicle industry continues to grow, various companies are exploring the use of aftermarket kits to retrofit existing vehicles with semi-autonomous capabilities. However, differences in implementation of the controller area network (CAN) used by each vehicle manufacturer poses a significant challenge to achieving large-scale implementation of retrofits. To address this challenge,… ▽ More As the autonomous vehicle industry continues to grow, various companies are exploring the use of aftermarket kits to retrofit existing vehicles with semi-autonomous capabilities. However, differences in implementation of the controller area network (CAN) used by each vehicle manufacturer poses a significant challenge to achieving large-scale implementation of retrofits. To address this challenge, this research proposes a method for reverse engineering the CAN channels associated with a vehicle's accelerator and brake pedals, without any prior knowledge of the vehicle. By simultaneously recording inertial measurement unit (IMU) and CAN data during vehicle operation, the proposed algorithms can identify the CAN channels that correspond to each control. During testing of six vehicles from three manufacturers, the proposed method was shown to successfully identify the CAN channels for the accelerator pedal and brake pedal for each vehicle tested. These promising results demonstrate the potential for using this approach for developing aftermarket autonomous vehicle kits - potentially with additional research to facilitate real-time use. Notably, the proposed system has the potential to maintain its effectiveness despite changes in vehicle CAN standards, and it could potentially be adapted to function with any vehicle communications medium. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2306.11210 [pdf]

Analysis of the Benefits and Efficacy of the Addition of Variants and Reality Paths to the Blackboard Architecture

Authors: Ben Clark, Matthew Tassava, Cameron Kolodjski, Jeremy Straub

Abstract: While the Blackboard Architecture has been in use since the 1980s, it has recently been proposed for modeling computer networks to assess their security. To do this, it must account for complex network attack patterns involving multiple attack routes and possible mid-attack system state changes. This paper proposes a data structure which can be used to model paths from an ingress point to a given… ▽ More While the Blackboard Architecture has been in use since the 1980s, it has recently been proposed for modeling computer networks to assess their security. To do this, it must account for complex network attack patterns involving multiple attack routes and possible mid-attack system state changes. This paper proposes a data structure which can be used to model paths from an ingress point to a given egress point in Blackboard Architecture-modeled computer networks. It is designed to contain the pertinent information required for a systematic traversal through a changing network. This structure, called a reality path, represents a single potential pathway through the network with a given set of facts in a particular sequence of states. Another structure, called variants, is used during traversal of nodes (called containers) modeled in the network. The two structures - reality paths and variants - facilitate the use of a traversal algorithm, which will find all possible attack paths in Blackboard Architecture-modeled networks. This paper introduces and assesses the efficacy of variants and reality paths △ Less

Submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.04289 [pdf]

Introduction and Assessment of the Addition of Links and Containers to the Blackboard Architecture

Authors: Jordan Milbrath, Jeremy Straub

Abstract: The Blackboard Architecture provides a mechanism for storing data and logic and using it to make decisions that impact the application environment that the Blackboard Architecture network models. While rule-fact-action networks can represent numerous types of data, the relationships that can be easily modeled are limited by the propositional logic nature of the rule-fact network structure. This pa… ▽ More The Blackboard Architecture provides a mechanism for storing data and logic and using it to make decisions that impact the application environment that the Blackboard Architecture network models. While rule-fact-action networks can represent numerous types of data, the relationships that can be easily modeled are limited by the propositional logic nature of the rule-fact network structure. This paper proposes and evaluates the inclusion of containers and links in the Blackboard Architecture. These objects are designed to allow them to model organizational, physical, spatial and other relationships that cannot be readily or efficiently implemented as Boolean logic rules. Containers group related facts together and can be nested to implement complex relationships. Links interconnect containers that have a relationship that is relevant to their organizational purpose. Both objects, together, facilitate new ways of using the Blackboard Architecture and enable or simply its use for complex tasks that have multiple types of relationships that need to be considered during operations. △ Less