-
Development of REGAI: Rubric Enabled Generative Artificial Intelligence
Authors:
Zach Johnson,
Jeremy Straub
Abstract:
This paper presents and evaluates a new retrieval augmented generation (RAG) and large language model (LLM)-based artificial intelligence (AI) technique: rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics, which can be created manually or automatically by the system, to enhance the performance of LLMs for evaluation purposes. REGAI improves on the performance of both cla…
▽ More
This paper presents and evaluates a new retrieval augmented generation (RAG) and large language model (LLM)-based artificial intelligence (AI) technique: rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics, which can be created manually or automatically by the system, to enhance the performance of LLMs for evaluation purposes. REGAI improves on the performance of both classical LLMs and RAG-based LLM techniques. This paper describes REGAI, presents data regarding its performance and discusses several possible application areas for the technology.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Development of an Adaptive Multi-Domain Artificial Intelligence System Built using Machine Learning and Expert Systems Technologies
Authors:
Jeremy Straub
Abstract:
Producing an artificial general intelligence (AGI) has been an elusive goal in artificial intelligence (AI) research for some time. An AGI would have the capability, like a human, to be exposed to a new problem domain, learn about it and then use reasoning processes to make decisions. While AI techniques have been used across a wide variety of problem domains, an AGI would require an AI that could…
▽ More
Producing an artificial general intelligence (AGI) has been an elusive goal in artificial intelligence (AI) research for some time. An AGI would have the capability, like a human, to be exposed to a new problem domain, learn about it and then use reasoning processes to make decisions. While AI techniques have been used across a wide variety of problem domains, an AGI would require an AI that could reason beyond its programming and training. This paper presents a small step towards producing an AGI. It describes a mechanism for an AI to learn about and develop reasoning pathways to make decisions in an a priori unknown domain. It combines a classical AI technique, the expert system, with a its modern adaptation - the gradient descent trained expert system (GDTES) - and utilizes generative artificial intelligence (GAI) to create a network and training data set for this system. These can be created from available sources or may draw upon knowledge incorporated in a GAI's own pre-trained model. The learning process in GDTES is used to optimize the AI's decision-making. While this approach does not meet the standards that many have defined for an AGI, it provides a somewhat similar capability, albeit one which requires a learning process before use.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models
Authors:
Julian Straub,
Daniel DeTone,
Tianwei Shen,
Nan Yang,
Chris Sweeney,
Richard Newcombe
Abstract:
The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D,…
▽ More
The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D, a benchmark with two core 3D egocentric perception tasks. EFM3D is the first benchmark for 3D object detection and surface regression on high quality annotated egocentric data of Project Aria. We propose Egocentric Voxel Lifting (EVL), a baseline for 3D EFMs. EVL leverages all available egocentric modalities and inherits foundational capabilities from 2D foundation models. This model, trained on a large simulated dataset, outperforms existing methods on the EFM3D benchmark.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Analysis of the Efficacy of the Use of Inertial Measurement and Global Positioning System Data to Reverse Engineer Automotive CAN Bus Steering Signals
Authors:
Kevin Setterstrom,
Jeremy Straub
Abstract:
Autonomous vehicle control is growing in availability for new vehicles and there is a potential need to retrofit older vehicles with this capability. Additionally, automotive cybersecurity has become a significant concern in recent years due to documented attacks on vehicles. As a result, researchers have been exploring reverse engineering techniques to automate vehicle control and improve vehicle…
▽ More
Autonomous vehicle control is growing in availability for new vehicles and there is a potential need to retrofit older vehicles with this capability. Additionally, automotive cybersecurity has become a significant concern in recent years due to documented attacks on vehicles. As a result, researchers have been exploring reverse engineering techniques to automate vehicle control and improve vehicle security and threat analysis. In prior work, a vehicle's accelerator and brake pedal controller area network (CAN) channels were identified using reverse engineering techniques without prior knowledge of the vehicle. However, the correlation results for deceleration were lower than those for acceleration, which may be able to be improved by incorporating data from an additional telemetry device. In this paper, a method that uses IMU and GPS data to reverse-engineer a vehicle's steering wheel position CAN channels, without prior knowledge of the vehicle, is presented. Using GPS data is shown to greatly improve correlation values for deceleration, particularly for the brake pedal CAN channels. This work demonstrates the efficacy of using these data sources for automotive CAN reverse engineering. This has potential uses in automotive vehicle control and for improving vehicle security and threat analysis.
△ Less
Submitted 27 March, 2024;
originally announced May 2024.
-
Implementation and Evaluation of a Gradient Descent-Trained Defensible Blackboard Architecture System
Authors:
Jordan Milbrath,
Jonathan Rivard,
Jeremy Straub
Abstract:
A variety of forms of artificial intelligence systems have been developed. Two well-known techniques are neural networks and rule-fact expert systems. The former can be trained from presented data while the latter is typically developed by human domain experts. A combined implementation that uses gradient descent to train a rule-fact expert system has been previously proposed. A related system typ…
▽ More
A variety of forms of artificial intelligence systems have been developed. Two well-known techniques are neural networks and rule-fact expert systems. The former can be trained from presented data while the latter is typically developed by human domain experts. A combined implementation that uses gradient descent to train a rule-fact expert system has been previously proposed. A related system type, the Blackboard Architecture, adds an actualization capability to expert systems. This paper proposes and evaluates the incorporation of a defensible-style gradient descent training capability into the Blackboard Architecture. It also introduces the use of activation functions for defensible artificial intelligence systems and implements and evaluates a new best path-based training algorithm.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Authors:
Qiao Gu,
Zhaoyang Lv,
Duncan Frost,
Simon Green,
Julian Straub,
Chris Sweeney
Abstract:
In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and…
▽ More
In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale.
△ Less
Submitted 22 July, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
AI for bureaucratic productivity: Measuring the potential of AI to help automate 143 million UK government transactions
Authors:
Vincent J. Straub,
Youmna Hashem,
Jonathan Bright,
Satyam Bhagwanani,
Deborah Morgan,
John Francis,
Saba Esnaashari,
Helen Margetts
Abstract:
There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK centra…
▽ More
There is currently considerable excitement within government about the potential of artificial intelligence to improve public service productivity through the automation of complex but repetitive bureaucratic tasks, freeing up the time of skilled staff. Here, we explore the size of this opportunity, by mapping out the scale of citizen-facing bureaucratic decision-making procedures within UK central government, and measuring their potential for AI-driven automation. We estimate that UK central government conducts approximately one billion citizen-facing transactions per year in the provision of around 400 services, of which approximately 143 million are complex repetitive transactions. We estimate that 84% of these complex transactions are highly automatable, representing a huge potential opportunity: saving even an average of just one minute per complex transaction would save the equivalent of approximately 1,200 person-years of work every year. We also develop a model to estimate the volume of transactions a government service undertakes, providing a way for government to avoid conducting time consuming transaction volume measurements. Finally, we find that there is high turnover in the types of services government provide, meaning that automation efforts should focus on general procedures rather than services themselves which are likely to evolve over time. Overall, our work presents a novel perspective on the structure and functioning of modern government, and how it might evolve in the age of artificial intelligence.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Towards Large-scale Network Emulation on Analog Neuromorphic Hardware
Authors:
Elias Arnold,
Philipp Spilger,
Jan V. Straub,
Eric Müller,
Dominik Dold,
Gabriele Meoni,
Johannes Schemmel
Abstract:
We present a novel software feature for the BrainScaleS-2 accelerated neuromorphic platform that facilitates the emulation of partitioned large-scale spiking neural networks. This approach is well suited for many deep spiking neural networks, where the constraint of the largest recurrent subnetwork fitting on the substrate or the limited fan-in of neurons is often not a limitation in practice. We…
▽ More
We present a novel software feature for the BrainScaleS-2 accelerated neuromorphic platform that facilitates the emulation of partitioned large-scale spiking neural networks. This approach is well suited for many deep spiking neural networks, where the constraint of the largest recurrent subnetwork fitting on the substrate or the limited fan-in of neurons is often not a limitation in practice. We demonstrate the training of two deep spiking neural network models, using the MNIST and EuroSAT datasets, that exceed the physical size constraints of a single-chip BrainScaleS-2 system. The ability to emulate and train networks larger than the substrate provides a pathway for accurate performance evaluation in planned or scaled systems, ultimately advancing the development and understanding of large-scale models and neuromorphic computing architectures.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Domain adaptation strategies for 3D reconstruction of the lumbar spine using real fluoroscopy data
Authors:
Sascha Jecklin,
Youyang Shen,
Amandine Gout,
Daniel Suter,
Lilian Calvet,
Lukas Zingg,
Jennifer Straub,
Nicola Alessandro Cavalcanti,
Mazda Farshad,
Philipp Fürnstahl,
Hooman Esfandiari
Abstract:
This study tackles key obstacles in adopting surgical navigation in orthopedic surgeries, including time, cost, radiation, and workflow integration challenges. Recently, our work X23D showed an approach for generating 3D anatomical models of the spine from only a few intraoperative fluoroscopic images. This negates the need for conventional registration-based surgical navigation by creating a dire…
▽ More
This study tackles key obstacles in adopting surgical navigation in orthopedic surgeries, including time, cost, radiation, and workflow integration challenges. Recently, our work X23D showed an approach for generating 3D anatomical models of the spine from only a few intraoperative fluoroscopic images. This negates the need for conventional registration-based surgical navigation by creating a direct intraoperative 3D reconstruction of the anatomy. Despite these strides, the practical application of X23D has been limited by a domain gap between synthetic training data and real intraoperative images.
In response, we devised a novel data collection protocol for a paired dataset consisting of synthetic and real fluoroscopic images from the same perspectives. Utilizing this dataset, we refined our deep learning model via transfer learning, effectively bridging the domain gap between synthetic and real X-ray data. A novel style transfer mechanism also allows us to convert real X-rays to mirror the synthetic domain, enabling our in-silico-trained X23D model to achieve high accuracy in real-world settings.
Our results demonstrated that the refined model can rapidly generate accurate 3D reconstructions of the entire lumbar spine from as few as three intraoperative fluoroscopic shots. It achieved an 84% F1 score, matching the accuracy of our previous synthetic data-based research. Additionally, with a computational time of only 81.1 ms, our approach provides real-time capabilities essential for surgery integration.
Through examining ideal imaging setups and view angle dependencies, we've further confirmed our system's practicality and dependability in clinical settings. Our research marks a significant step forward in intraoperative 3D reconstruction, offering enhancements to surgical planning, navigation, and robotics.
△ Less
Submitted 18 June, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Authors:
Yiming Xie,
Huaizu Jiang,
Georgia Gkioxari,
Julian Straub
Abstract:
We present PARQ - a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. Unlike previous works that use learnable features or only encode 3D point positions as queries in the decoder, PARQ leverages appearance-enhanced queries initialized from reference points in 3D space and updates their 3D location with recurrent cross-attention operations. Incorporating pixel-ali…
▽ More
We present PARQ - a multi-view 3D object detector with transformer and pixel-aligned recurrent queries. Unlike previous works that use learnable features or only encode 3D point positions as queries in the decoder, PARQ leverages appearance-enhanced queries initialized from reference points in 3D space and updates their 3D location with recurrent cross-attention operations. Incorporating pixel-aligned features and cross attention enables the model to encode the necessary 3D-to-2D correspondences and capture global contextual information of the input images. PARQ outperforms prior best methods on the ScanNet and ARKitScenes datasets, learns and detects faster, is more robust to distribution shifts in reference points, can leverage additional input views without retraining, and can adapt inference compute by changing the number of recurrent iterations.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Project Aria: A New Tool for Egocentric Multi-Modal AI Research
Authors:
Jakob Engel,
Kiran Somasundaram,
Michael Goesele,
Albert Sun,
Alexander Gamino,
Andrew Turner,
Arjang Talattof,
Arnie Yuan,
Bilal Souti,
Brighid Meredith,
Cheng Peng,
Chris Sweeney,
Cole Wilson,
Dan Barnes,
Daniel DeTone,
David Caruso,
Derek Valleroy,
Dinesh Ginjupalli,
Duncan Frost,
Edward Miller,
Elias Mueggler,
Evgeniy Oleinik,
Fan Zhang,
Guruprasad Somasundaram,
Gustavo Solaira
, et al. (49 additional authors not shown)
Abstract:
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, mul…
▽ More
Egocentric, multi-modal data as available on future augmented reality (AR) devices provides unique challenges and opportunities for machine perception. These future devices will need to be all-day wearable in a socially acceptable form-factor to support always available, context-aware and personalized AI applications. Our team at Meta Reality Labs Research built the Aria device, an egocentric, multi-modal data recording and streaming device with the goal to foster and accelerate research in this area. In this paper, we describe the Aria device hardware including its sensor configuration and the corresponding software tools that enable recording and processing of such data.
△ Less
Submitted 1 October, 2023; v1 submitted 24 August, 2023;
originally announced August 2023.
-
Development of an Autonomous Reverse Engineering Capability for Controller Area Network Messages to Support Autonomous Control Retrofits
Authors:
Kevin Setterstrom,
Jeremy Straub
Abstract:
As the autonomous vehicle industry continues to grow, various companies are exploring the use of aftermarket kits to retrofit existing vehicles with semi-autonomous capabilities. However, differences in implementation of the controller area network (CAN) used by each vehicle manufacturer poses a significant challenge to achieving large-scale implementation of retrofits. To address this challenge,…
▽ More
As the autonomous vehicle industry continues to grow, various companies are exploring the use of aftermarket kits to retrofit existing vehicles with semi-autonomous capabilities. However, differences in implementation of the controller area network (CAN) used by each vehicle manufacturer poses a significant challenge to achieving large-scale implementation of retrofits. To address this challenge, this research proposes a method for reverse engineering the CAN channels associated with a vehicle's accelerator and brake pedals, without any prior knowledge of the vehicle. By simultaneously recording inertial measurement unit (IMU) and CAN data during vehicle operation, the proposed algorithms can identify the CAN channels that correspond to each control. During testing of six vehicles from three manufacturers, the proposed method was shown to successfully identify the CAN channels for the accelerator pedal and brake pedal for each vehicle tested. These promising results demonstrate the potential for using this approach for developing aftermarket autonomous vehicle kits - potentially with additional research to facilitate real-time use. Notably, the proposed system has the potential to maintain its effectiveness despite changes in vehicle CAN standards, and it could potentially be adapted to function with any vehicle communications medium.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Analysis of the Benefits and Efficacy of the Addition of Variants and Reality Paths to the Blackboard Architecture
Authors:
Ben Clark,
Matthew Tassava,
Cameron Kolodjski,
Jeremy Straub
Abstract:
While the Blackboard Architecture has been in use since the 1980s, it has recently been proposed for modeling computer networks to assess their security. To do this, it must account for complex network attack patterns involving multiple attack routes and possible mid-attack system state changes. This paper proposes a data structure which can be used to model paths from an ingress point to a given…
▽ More
While the Blackboard Architecture has been in use since the 1980s, it has recently been proposed for modeling computer networks to assess their security. To do this, it must account for complex network attack patterns involving multiple attack routes and possible mid-attack system state changes. This paper proposes a data structure which can be used to model paths from an ingress point to a given egress point in Blackboard Architecture-modeled computer networks. It is designed to contain the pertinent information required for a systematic traversal through a changing network. This structure, called a reality path, represents a single potential pathway through the network with a given set of facts in a particular sequence of states. Another structure, called variants, is used during traversal of nodes (called containers) modeled in the network. The two structures - reality paths and variants - facilitate the use of a traversal algorithm, which will find all possible attack paths in Blackboard Architecture-modeled networks. This paper introduces and assesses the efficacy of variants and reality paths
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Introduction and Assessment of the Addition of Links and Containers to the Blackboard Architecture
Authors:
Jordan Milbrath,
Jeremy Straub
Abstract:
The Blackboard Architecture provides a mechanism for storing data and logic and using it to make decisions that impact the application environment that the Blackboard Architecture network models. While rule-fact-action networks can represent numerous types of data, the relationships that can be easily modeled are limited by the propositional logic nature of the rule-fact network structure. This pa…
▽ More
The Blackboard Architecture provides a mechanism for storing data and logic and using it to make decisions that impact the application environment that the Blackboard Architecture network models. While rule-fact-action networks can represent numerous types of data, the relationships that can be easily modeled are limited by the propositional logic nature of the rule-fact network structure. This paper proposes and evaluates the inclusion of containers and links in the Blackboard Architecture. These objects are designed to allow them to model organizational, physical, spatial and other relationships that cannot be readily or efficiently implemented as Boolean logic rules. Containers group related facts together and can be nested to implement complex relationships. Links interconnect containers that have a relationship that is relevant to their organizational purpose. Both objects, together, facilitate new ways of using the Blackboard Architecture and enable or simply its use for complex tasks that have multiple types of relationships that need to be considered during operations.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Extension of the Blackboard Architecture with Common Properties and Generic Rules
Authors:
Jonathan Rivard,
Jeremy Straub
Abstract:
The Blackboard Architecture provides a mechanism for embodying data, decision making and actuation. Its versatility has been demonstrated across a wide number of application areas. However, it lacks the capability to directly model organizational, spatial and other relationships which may be useful in decision-making, in addition to the propositional logic embodied in the rule-fact-action network.…
▽ More
The Blackboard Architecture provides a mechanism for embodying data, decision making and actuation. Its versatility has been demonstrated across a wide number of application areas. However, it lacks the capability to directly model organizational, spatial and other relationships which may be useful in decision-making, in addition to the propositional logic embodied in the rule-fact-action network. Previous work has proposed the use of container objects and links as a mechanism to simultaneously model these organizational and other relationships, while leaving the operational logic modeled in the rules, facts and actions. While containers facilitate this modeling, their utility is limited by the need to manually define them. For systems which may have multiple instances of a particular type of object and which may build their network autonomously, based on sensing, the reuse of logical structures facilitates operations and reduces storage and processing needs. This paper, thus, presents and assesses two additional concepts to add to the Blackboard Architecture: common properties and generic rules. Common properties are facts associated with containers which are defined as representing the same information across the various objects that they are associated with. Generic rules provide logical propositions that use these generic rules across links and apply to any objects matching their definition. The potential uses of these two new concepts are discussed herein and their impact on system performance is characterized.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Development of a Multi-purpose Fuzzer to Perform Assessment as Input to a Cybersecurity Risk Assessment and Analysis System
Authors:
Jack Hance,
Jeremy Straub
Abstract:
Fuzzing is utilized for testing software and systems for cybersecurity risk via the automated adaptation of inputs. It facilitates the identification of software bugs and misconfigurations that may create vulnerabilities, cause abnormal operations or result in systems' failure. While many fuzzers have been purpose-developed for testing specific systems, this paper proposes a generalized fuzzer tha…
▽ More
Fuzzing is utilized for testing software and systems for cybersecurity risk via the automated adaptation of inputs. It facilitates the identification of software bugs and misconfigurations that may create vulnerabilities, cause abnormal operations or result in systems' failure. While many fuzzers have been purpose-developed for testing specific systems, this paper proposes a generalized fuzzer that provides a specific capability for testing software and cyber-physical systems which utilize configuration files. While this fuzzer facilitates the detection of system and software defects and vulnerabilities, it also facilitates the determination of the impact of settings on device operations. This later capability facilitates the modeling of the devices in a cybersecurity risk assessment and analysis system. This paper describes and assesses the performance of the proposed fuzzer technology. It also details how the fuzzer operates as part of the broader cybersecurity risk assessment and analysis system.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Development of a System Vulnerability Analysis Tool for Assessment of Complex Mission Critical Systems
Authors:
Matthew Tassava,
Cameron Kolodjski,
Jeremy Straub
Abstract:
A system vulnerability analysis technique (SVAT) for complex mission critical systems (CMCS) was developed in response to the need to be able to conduct penetration testing on large industrial systems which cannot be taken offline or risk disablement or impairment for conventional penetration testing. SVAT-CMCS facilitates the use of known vulnerability and exploit information, incremental testing…
▽ More
A system vulnerability analysis technique (SVAT) for complex mission critical systems (CMCS) was developed in response to the need to be able to conduct penetration testing on large industrial systems which cannot be taken offline or risk disablement or impairment for conventional penetration testing. SVAT-CMCS facilitates the use of known vulnerability and exploit information, incremental testing of system components and data analysis techniques to identify attack pathways in CMCSs. This data can be utilized for corrective activities or to target controlled manual follow-up testing. This paper presents the SVAT-CMCS paradigm and describes its implementation in a software tool, which was built using the Blackboard Architecture, that can be utilized for attack pathway identification. The performance of this tool is characterized using three example models. In particular, it explores the path generation speed and the impact of link cap restrictions on system operations, under different levels of network size and complexity. Accurate fact-rule processing is also tested using these models. The results show significant decreases in path generation efficiency as the link cap and network complexity increase; however, rule processing accuracy is not impacted.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
Development and Analysis of P2SCP: A Paradigm for Penetration Testing of Systems that Cannot be Subjected to the Risk of Penetration Testing
Authors:
Jeremy Straub
Abstract:
Penetration testing increases the security of systems through tasking testers to 'think like the adversary' and attempt to find the ways that an attacker would break into the system. For many systems, this can be conducted in a safe and controlled way; however, some systems are so critical to human life and safety that the risk of their failure or disablement due to active penetration testing cann…
▽ More
Penetration testing increases the security of systems through tasking testers to 'think like the adversary' and attempt to find the ways that an attacker would break into the system. For many systems, this can be conducted in a safe and controlled way; however, some systems are so critical to human life and safety that the risk of their failure or disablement due to active penetration testing cannot be assumed. These systems are also critical to evaluate the security of, to prevent attackers from disabling them or causing their maloperation; however, this must be done in a manner that doesn't risk the very malady that testing seeks to avoid through the testing process itself. This paper presents P2SCP, a paradigm for penetration testing of systems that cannot be subjected to the risk of penetration testing. It discusses how data collection, the creation of digital twins and cousins and evaluative analysis can be utilized to conduct virtual penetration tests on critical infrastructure systems. This proposed paradigm is analyzed through the use of several case studies.
△ Less
Submitted 7 June, 2023;
originally announced June 2023.
-
OrienterNet: Visual Localization in 2D Public Maps with Neural Matching
Authors:
Paul-Edouard Sarlin,
Daniel DeTone,
Tsun-Yi Yang,
Armen Avetisyan,
Julian Straub,
Tomasz Malisiewicz,
Samuel Rota Bulo,
Richard Newcombe,
Peter Kontschieder,
Vasileios Balntas
Abstract:
Humans can orient themselves in their 3D environments using simple 2D maps. Differently, algorithms for visual localization mostly rely on complex 3D point clouds that are expensive to build, store, and maintain over time. We bridge this gap by introducing OrienterNet, the first deep neural network that can localize an image with sub-meter accuracy using the same 2D semantic maps that humans use.…
▽ More
Humans can orient themselves in their 3D environments using simple 2D maps. Differently, algorithms for visual localization mostly rely on complex 3D point clouds that are expensive to build, store, and maintain over time. We bridge this gap by introducing OrienterNet, the first deep neural network that can localize an image with sub-meter accuracy using the same 2D semantic maps that humans use. OrienterNet estimates the location and orientation of a query image by matching a neural Bird's-Eye View with open and globally available maps from OpenStreetMap, enabling anyone to localize anywhere such maps are available. OrienterNet is supervised only by camera poses but learns to perform semantic matching with a wide range of map elements in an end-to-end manner. To enable this, we introduce a large crowd-sourced dataset of images captured across 12 cities from the diverse viewpoints of cars, bikes, and pedestrians. OrienterNet generalizes to new datasets and pushes the state of the art in both robotics and AR scenarios. The code and trained model will be released publicly.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
'Team-in-the-loop': Ostrom's IAD framework 'rules in use' to map and measure contextual impacts of AI
Authors:
Deborah Morgan,
Youmna Hashem,
John Francis,
Saba Esnaashari,
Vincent J. Straub,
Jonathan Bright
Abstract:
This article explores how the 'rules in use' from Ostrom's Institutional Analysis and Development Framework (IAD) can be developed as a context analysis approach for AI. AI risk assessment frameworks increasingly highlight the need to understand existing contexts. However, these approaches do not frequently connect with established institutional analysis scholarship. We outline a novel direction i…
▽ More
This article explores how the 'rules in use' from Ostrom's Institutional Analysis and Development Framework (IAD) can be developed as a context analysis approach for AI. AI risk assessment frameworks increasingly highlight the need to understand existing contexts. However, these approaches do not frequently connect with established institutional analysis scholarship. We outline a novel direction illustrated through a high-level example to understand how clinical oversight is potentially impacted by AI. Much current thinking regarding oversight for AI revolves around the idea of decision makers being in-the-loop and, thus, having capacity to intervene to prevent harm. However, our analysis finds that oversight is complex, frequently made by teams of professionals and relies upon explanation to elicit information. Professional bodies and liability also function as institutions of polycentric oversight. These are all impacted by the challenge of oversight of AI systems. The approach outlined has potential utility as a policy tool of context analysis aligned with the 'Govern and Map' functions of the National Institute of Standards and Technology (NIST) AI Risk Management Framework; however, further empirical research is needed. Our analysis illustrates the benefit of existing institutional analysis approaches in foregrounding team structures within oversight and, thus, in conceptions of 'human in the loop'.
△ Less
Submitted 30 June, 2024; v1 submitted 24 March, 2023;
originally announced March 2023.
-
A multidomain relational framework to guide institutional AI research and adoption
Authors:
Vincent J. Straub,
Deborah Morgan,
Youmna Hashem,
John Francis,
Saba Esnaashari,
Jonathan Bright
Abstract:
Calls for new metrics, technical standards and governance mechanisms to guide the adoption of Artificial Intelligence (AI) in institutions and public administration are now commonplace. Yet, most research and policy efforts aimed at understanding the implications of adopting AI tend to prioritize only a handful of ideas; they do not fully connect all the different perspectives and topics that are…
▽ More
Calls for new metrics, technical standards and governance mechanisms to guide the adoption of Artificial Intelligence (AI) in institutions and public administration are now commonplace. Yet, most research and policy efforts aimed at understanding the implications of adopting AI tend to prioritize only a handful of ideas; they do not fully connect all the different perspectives and topics that are potentially relevant. In this position paper, we contend that this omission stems, in part, from what we call the relational problem in socio-technical discourse: fundamental ontological issues have not yet been settled--including semantic ambiguity, a lack of clear relations between concepts and differing standard terminologies. This contributes to the persistence of disparate modes of reasoning to assess institutional AI systems, and the prevalence of conceptual isolation in the fields that study them including ML, human factors, social science and policy. After developing this critique, we offer a way forward by proposing a simple policy and research design tool in the form of a conceptual framework to organize terms across fields--consisting of three horizontal domains for grouping relevant concepts and related methods: Operational, Epistemic, and Normative. We first situate this framework against the backdrop of recent socio-technical discourse at two premier academic venues, AIES and FAccT, before illustrating how developing suitable metrics, standards, and mechanisms can be aided by operationalizing relevant concepts in each of these domains. Finally, we outline outstanding questions for developing this relational approach to institutional AI research and adoption.
△ Less
Submitted 17 July, 2023; v1 submitted 17 March, 2023;
originally announced March 2023.
-
Artificial intelligence in government: Concepts, standards, and a unified framework
Authors:
Vincent J. Straub,
Deborah Morgan,
Jonathan Bright,
Helen Margetts
Abstract:
Recent advances in artificial intelligence (AI), especially in generative language modelling, hold the promise of transforming government. Given the advanced capabilities of new AI systems, it is critical that these are embedded using standard operational procedures, clear epistemic criteria, and behave in alignment with the normative expectations of society. Scholars in multiple domains have subs…
▽ More
Recent advances in artificial intelligence (AI), especially in generative language modelling, hold the promise of transforming government. Given the advanced capabilities of new AI systems, it is critical that these are embedded using standard operational procedures, clear epistemic criteria, and behave in alignment with the normative expectations of society. Scholars in multiple domains have subsequently begun to conceptualize the different forms that AI applications may take, highlighting both their potential benefits and pitfalls. However, the literature remains fragmented, with researchers in social science disciplines like public administration and political science, and the fast-moving fields of AI, ML, and robotics, all developing concepts in relative isolation. Although there are calls to formalize the emerging study of AI in government, a balanced account that captures the full depth of theoretical perspectives needed to understand the consequences of embedding AI into a public sector context is lacking. Here, we unify efforts across social and technical disciplines by first conducting an integrative literature review to identify and cluster 69 key terms that frequently co-occur in the multidisciplinary study of AI. We then build on the results of this bibliometric analysis to propose three new multifaceted concepts for understanding and analysing AI-based systems for government (AI-GOV) in a more unified way: (1) operational fitness, (2) epistemic alignment, and (3) normative divergence. Finally, we put these concepts to work by using them as dimensions in a conceptual typology of AI-GOV and connecting each with emerging AI technical measurement standards to encourage operationalization, foster cross-disciplinary dialogue, and stimulate debate among those aiming to rethink government with AI.
△ Less
Submitted 25 October, 2023; v1 submitted 31 October, 2022;
originally announced October 2022.
-
Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild
Authors:
Garrick Brazil,
Abhinav Kumar,
Julian Straub,
Nikhila Ravi,
Justin Johnson,
Georgia Gkioxari
Abstract:
Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the succ…
▽ More
Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the success of 2D recognition, we revisit the task of 3D object detection by introducing a large benchmark, called Omni3D. Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 98 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types. We propose a model, called Cube R-CNN, designed to generalize across camera and scene types with a unified approach. We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks. Finally, we prove that Omni3D is a powerful dataset for 3D object recognition and show that it improves single-dataset performance and can accelerate learning on new smaller datasets via pre-training.
△ Less
Submitted 23 March, 2023; v1 submitted 21 July, 2022;
originally announced July 2022.
-
Automating the Design and Development of Gradient Descent Trained Expert System Networks
Authors:
Jeremy Straub
Abstract:
Prior work introduced a gradient descent trained expert system that conceptually combines the learning capabilities of neural networks with the understandability and defensible logic of an expert system. This system was shown to be able to learn patterns from data and to perform decision-making at levels rivaling those reported by neural network systems. The principal limitation of the approach, t…
▽ More
Prior work introduced a gradient descent trained expert system that conceptually combines the learning capabilities of neural networks with the understandability and defensible logic of an expert system. This system was shown to be able to learn patterns from data and to perform decision-making at levels rivaling those reported by neural network systems. The principal limitation of the approach, though, was the necessity for the manual development of a rule-fact network (which is then trained using backpropagation). This paper proposes a technique for overcoming this significant limitation, as compared to neural networks. Specifically, this paper proposes the use of larger and denser-than-application need rule-fact networks which are trained, pruned, manually reviewed and then re-trained for use. Multiple types of networks are evaluated under multiple operating conditions and these results are presented and assessed. Based on these individual experimental condition assessments, the proposed technique is evaluated. The data presented shows that error rates as low as 3.9% (mean, 1.2% median) can be obtained, demonstrating the efficacy of this technique for many applications.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation
Authors:
Gil Avraham,
Julian Straub,
Tianwei Shen,
Tsun-Yi Yang,
Hugo Germain,
Chris Sweeney,
Vasileios Balntas,
David Novotny,
Daniel DeTone,
Richard Newcombe
Abstract:
This paper presents a framework that combines traditional keypoint-based camera pose optimization with an invertible neural rendering mechanism. Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse. As opposed to existing invertible neural rendering systems which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic,…
▽ More
This paper presents a framework that combines traditional keypoint-based camera pose optimization with an invertible neural rendering mechanism. Our proposed 3D scene representation, Nerfels, is locally dense yet globally sparse. As opposed to existing invertible neural rendering systems which overfit a model to the entire scene, we adopt a feature-driven approach for representing scene-agnostic, local 3D patches with renderable codes. By modelling a scene only where local features are detected, our framework effectively generalizes to unseen local regions in the scene via an optimizable code conditioning mechanism in the neural renderer, all while maintaining the low memory footprint of a sparse 3D map representation. Our model can be incorporated to existing state-of-the-art hand-crafted and learned local feature pose estimators, yielding improved performance when evaluating on ScanNet for wide camera baseline scenarios.
△ Less
Submitted 4 June, 2022;
originally announced June 2022.
-
ODAM: Object Detection, Association, and Mapping using Posed RGB Video
Authors:
Kejie Li,
Daniel DeTone,
Steven Chen,
Minh Vo,
Ian Reid,
Hamid Rezatofighi,
Chris Sweeney,
Julian Straub,
Richard Newcombe
Abstract:
Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them t…
▽ More
Localizing objects and estimating their extent in 3D is an important step towards high-level 3D scene understanding, which has many applications in Augmented Reality and Robotics. We present ODAM, a system for 3D Object Detection, Association, and Mapping using posed RGB videos. The proposed system relies on a deep learning front-end to detect 3D objects from a given RGB frame and associate them to a global object-based map using a graph neural network (GNN). Based on these frame-to-model associations, our back-end optimizes object bounding volumes, represented as super-quadrics, under multi-view geometry constraints and the object scale prior. We validate the proposed system on ScanNet where we show a significant improvement over existing RGB-only methods.
△ Less
Submitted 23 August, 2021;
originally announced August 2021.
-
Fake News and Phishing Detection Using a Machine Learning Trained Expert System
Authors:
Benjamin Fitzpatrick,
Xinyu "Sherwin" Liang,
Jeremy Straub
Abstract:
Expert systems have been used to enable computers to make recommendations and decisions. This paper presents the use of a machine learning trained expert system (MLES) for phishing site detection and fake news detection. Both topics share a similar goal: to design a rule-fact network that allows a computer to make explainable decisions like domain experts in each respective area. The phishing webs…
▽ More
Expert systems have been used to enable computers to make recommendations and decisions. This paper presents the use of a machine learning trained expert system (MLES) for phishing site detection and fake news detection. Both topics share a similar goal: to design a rule-fact network that allows a computer to make explainable decisions like domain experts in each respective area. The phishing website detection study uses a MLES to detect potential phishing websites by analyzing site properties (like URL length and expiration time). The fake news detection study uses a MLES rule-fact network to gauge news story truthfulness based on factors such as emotion, the speaker's political affiliation status, and job. The two studies use different MLES network implementations, which are presented and compared herein. The fake news study utilized a more linear design while the phishing project utilized a more complex connection structure. Both networks' inputs are based on commonly available data sets.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Determining Sentencing Recommendations and Patentability Using a Machine Learning Trained Expert System
Authors:
Logan Brown,
Reid Pezewski,
Jeremy Straub
Abstract:
This paper presents two studies that use a machine learning expert system (MLES). One focuses on a system to advise to United States federal judges for regarding consistent federal criminal sentencing, based on both the federal sentencing guidelines and offender characteristics. The other study aims to develop a system that could prospectively assist the U.S. Patent and Trademark Office automate t…
▽ More
This paper presents two studies that use a machine learning expert system (MLES). One focuses on a system to advise to United States federal judges for regarding consistent federal criminal sentencing, based on both the federal sentencing guidelines and offender characteristics. The other study aims to develop a system that could prospectively assist the U.S. Patent and Trademark Office automate their patentability assessment process. Both studies use a machine learning-trained rule-fact expert system network to accept input variables for training and presentation and output a scaled variable that represents the system recommendation (e.g., the sentence length or the patentability assessment). This paper presents and compares the rule-fact networks that have been developed for these projects. It explains the decision-making process underlying the structures used for both networks and the pre-processing of data that was needed and performed. It also, through comparing the two systems, discusses how different methods can be used with the MLES system.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Consideration of the Need for Quantum Grid Computing
Authors:
Dominic Rosch-Grace,
Jeremy Straub
Abstract:
Quantum computing is poised to dramatically change the computational landscape, worldwide. Quantum computers can solve complex problems that are, at least in some cases, beyond the ability of even advanced future classical-style computers. In addition to being able to solve these classical computer-unsolvable problems, quantum computers have demonstrated a capability to solve some problems (such a…
▽ More
Quantum computing is poised to dramatically change the computational landscape, worldwide. Quantum computers can solve complex problems that are, at least in some cases, beyond the ability of even advanced future classical-style computers. In addition to being able to solve these classical computer-unsolvable problems, quantum computers have demonstrated a capability to solve some problems (such as prime factoring) much more efficiently than classical computing. This will create problems for encryption techniques, which depend on the difficulty of factoring for their security. Security, scientific, and other applications will require access to quantum computing resources to access their unique capabilities, speed and economic (aggregate computing time cost) benefits. Many scientific applications, as well as numerous other ones, use grid computing to provide benefits such as scalability and resource access. As these applications may benefit from quantum capabilities - and some future applications may require quantum capabilities - identifying how to integrate quantum computing systems into grid computing environments is critical. This paper discusses the benefits of grid-connected quantum computers and what is required to achieve this.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
Approach for modeling single branches of meadow orchard trees with 3D point clouds
Authors:
Jonas Straub,
David Reiser,
Hans W. Griepentrog
Abstract:
The cultivation of orchard meadows provides an ecological benefit for biodiversity, which is significantly higher than in intensively cultivated orchards. The goal of this research is to create a tree model to automatically determine possible pruning points for stand-alone trees within meadows. The algorithm which is presented here is capable of building a skeleton model based on a pre-segmented p…
▽ More
The cultivation of orchard meadows provides an ecological benefit for biodiversity, which is significantly higher than in intensively cultivated orchards. The goal of this research is to create a tree model to automatically determine possible pruning points for stand-alone trees within meadows. The algorithm which is presented here is capable of building a skeleton model based on a pre-segmented photogrammetric 3D point cloud. Good results were achieved in assigning the points to their leading branches and building a virtual tree model, reaching an overall accuracy of 95.19 %. This model provided the necessary information about the geometry of the tree for automated pruning.
△ Less
Submitted 12 April, 2021;
originally announced April 2021.
-
Defining, Evaluating, Preparing for and Responding to a Cyber Pearl Harbor
Authors:
Jeremy Straub
Abstract:
Despite not having a clear meaning, public perception and awareness makes the term cyber Pearl Harbor an important part of the public discourse. This paper considers what the term has meant and proposes its decomposition based on three different aspects of the historical Pearl Harbor attack, allowing the lessons from Pearl Harbor to be applied to threats and subjects that may not align with all as…
▽ More
Despite not having a clear meaning, public perception and awareness makes the term cyber Pearl Harbor an important part of the public discourse. This paper considers what the term has meant and proposes its decomposition based on three different aspects of the historical Pearl Harbor attack, allowing the lessons from Pearl Harbor to be applied to threats and subjects that may not align with all aspects of the 1941 attack. Using these three definitions, prior attacks and current threats are assessed and preparation for and response to cyber Pearl Harbor events is discussed.
△ Less
Submitted 13 March, 2021;
originally announced March 2021.
-
Expert System Gradient Descent Style Training: Development of a Defensible Artificial Intelligence Technique
Authors:
Jeremy Straub
Abstract:
Artificial intelligence systems, which are designed with a capability to learn from the data presented to them, are used throughout society. These systems are used to screen loan applicants, make sentencing recommendations for criminal defendants, scan social media posts for disallowed content and more. Because these systems don't assign meaning to their complex learned correlation network, they c…
▽ More
Artificial intelligence systems, which are designed with a capability to learn from the data presented to them, are used throughout society. These systems are used to screen loan applicants, make sentencing recommendations for criminal defendants, scan social media posts for disallowed content and more. Because these systems don't assign meaning to their complex learned correlation network, they can learn associations that don't equate to causality, resulting in non-optimal and indefensible decisions being made. In addition to making decisions that are sub-optimal, these systems may create legal liability for their designers and operators by learning correlations that violate anti-discrimination and other laws regarding what factors can be used in different types of decision making. This paper presents the use of a machine learning expert system, which is developed with meaning-assigned nodes (facts) and correlations (rules). Multiple potential implementations are considered and evaluated under different conditions, including different network error and augmentation levels and different training levels. The performance of these systems is compared to random and fully connected networks.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Beyond kinetic harm and towards a dynamic conceptualization of cyberterrorism
Authors:
Vince J. Straub
Abstract:
After more than two decades of discussion, the concept of cyberterrorism remains plagued by confusion. This article presents the result of an integrative review which maps the development of the term and situates the epistemic communities that have shaped the debate. After critically assessing existing accounts and highlighting the key ethical, social, and legal dimensions at stake in preventing c…
▽ More
After more than two decades of discussion, the concept of cyberterrorism remains plagued by confusion. This article presents the result of an integrative review which maps the development of the term and situates the epistemic communities that have shaped the debate. After critically assessing existing accounts and highlighting the key ethical, social, and legal dimensions at stake in preventing cyberterrorist attacks, it calls for a more dynamic conceptualization that views cyberterrorism as more abstract, difficult to predict, and hard to isolate; and which embraces a different conception of sufficient harm. In concluding it proposes a novel definition of cyberterrorism, intended to catalyse a new research programme, and sketches a roadmap for further research.
△ Less
Submitted 16 December, 2020;
originally announced December 2020.
-
The cost of coordination can exceed the benefit of collaboration in performing complex tasks
Authors:
Vince J. Straub,
Milena Tsvetkova,
Taha Yasseri
Abstract:
Humans and other intelligent agents often rely on collective decision making based on an intuition that groups outperform individuals. However, at present, we lack a complete theoretical understanding of when groups perform better. Here we examine performance in collective decision-making in the context of a real-world citizen science task environment in which individuals with manipulated differen…
▽ More
Humans and other intelligent agents often rely on collective decision making based on an intuition that groups outperform individuals. However, at present, we lack a complete theoretical understanding of when groups perform better. Here we examine performance in collective decision-making in the context of a real-world citizen science task environment in which individuals with manipulated differences in task-relevant training collaborated. We find 1) dyads gradually improve in performance but do not experience a collective benefit compared to individuals in most situations; 2) the cost of coordination to efficiency and speed that results when switching to a dyadic context after training individually is consistently larger than the leverage of having a partner, even if they are expertly trained in that task; and 3) on the most complex tasks having an additional expert in the dyad who is adequately trained improves accuracy. These findings highlight that the extent of training received by an individual, the complexity of the task at hand, and the desired performance indicator are all critical factors that need to be accounted for when weighing up the benefits of collective decision-making.
△ Less
Submitted 27 January, 2023; v1 submitted 23 September, 2020;
originally announced September 2020.
-
FroDO: From Detections to 3D Objects
Authors:
Kejie Li,
Martin Rünz,
Meng Tang,
Lingni Ma,
Chen Kong,
Tanner Schmidt,
Ian Reid,
Lourdes Agapito,
Julian Straub,
Steven Lovegrove,
Richard Newcombe
Abstract:
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel le…
▽ More
Object-oriented maps are important for scene understanding since they jointly capture geometry and semantics, allow individual instantiation and meaningful reasoning about objects. We introduce FroDO, a method for accurate 3D reconstruction of object instances from RGB video that infers object location, pose and shape in a coarse-to-fine manner. Key to FroDO is to embed object shapes in a novel learnt space that allows seamless switching between sparse point cloud and dense DeepSDF decoding. Given an input sequence of localized RGB frames, FroDO first aggregates 2D detections to instantiate a category-aware 3D bounding box per object. A shape code is regressed using an encoder network before optimizing shape and pose further under the learnt shape priors using sparse and dense shape representations. The optimization uses multi-view geometric, photometric and silhouette losses. We evaluate on real-world datasets, including Pix3D, Redwood-OS, and ScanNet, for single-view, multi-view, and multi-object reconstruction.
△ Less
Submitted 11 May, 2020;
originally announced May 2020.
-
Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction
Authors:
Rohan Chabra,
Jan Eric Lenssen,
Eddy Ilg,
Tanner Schmidt,
Julian Straub,
Steven Lovegrove,
Richard Newcombe
Abstract:
Efficiently reconstructing complex and intricate surfaces at scale is a long-standing goal in machine perception. To address this problem we introduce Deep Local Shapes (DeepLS), a deep shape representation that enables encoding and reconstruction of high-quality 3D shapes without prohibitive memory requirements. DeepLS replaces the dense volumetric signed distance function (SDF) representation us…
▽ More
Efficiently reconstructing complex and intricate surfaces at scale is a long-standing goal in machine perception. To address this problem we introduce Deep Local Shapes (DeepLS), a deep shape representation that enables encoding and reconstruction of high-quality 3D shapes without prohibitive memory requirements. DeepLS replaces the dense volumetric signed distance function (SDF) representation used in traditional surface reconstruction systems with a set of locally learned continuous SDFs defined by a neural network, inspired by recent work such as DeepSDF. Unlike DeepSDF, which represents an object-level SDF with a neural network and a single latent code, we store a grid of independent latent codes, each responsible for storing information about surfaces in a small local neighborhood. This decomposition of scenes into local shapes simplifies the prior distribution that the network must learn, and also enables efficient inference. We demonstrate the effectiveness and generalization power of DeepLS by showing object shape encoding and reconstructions of full scenes, where DeepLS delivers high compression, accuracy, and local shape completion.
△ Less
Submitted 21 August, 2020; v1 submitted 24 March, 2020;
originally announced March 2020.
-
Analyzing Visual Representations in Embodied Navigation Tasks
Authors:
Erik Wijmans,
Julian Straub,
Dhruv Batra,
Irfan Essa,
Judy Hoffman,
Ari Morcos
Abstract:
Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task. In this work, we present a methodology to study the underlying potential causes for this specialization. We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of…
▽ More
Recent advances in deep reinforcement learning require a large amount of training data and generally result in representations that are often over specialized to the target task. In this work, we present a methodology to study the underlying potential causes for this specialization. We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of visual representations learned in the same environment by performing different tasks.
We then leverage our proposed methodology to examine the task dependence of visual representations learned on related but distinct embodied navigation tasks. Surprisingly, we find that slight differences in task have no measurable effect on the visual representation for both SqueezeNet and ResNet architectures. We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
The Replica Dataset: A Digital Replica of Indoor Spaces
Authors:
Julian Straub,
Thomas Whelan,
Lingni Ma,
Yufan Chen,
Erik Wijmans,
Simon Green,
Jakob J. Engel,
Raul Mur-Artal,
Carl Ren,
Shobhit Verma,
Anton Clarkson,
Mingfei Yan,
Brian Budge,
Yajie Yan,
Xiaqing Pan,
June Yon,
Yuyang Zou,
Kimberly Leon,
Nigel Carter,
Jesus Briales,
Tyler Gillingham,
Elias Mueggler,
Luis Pesqueira,
Manolis Savva,
Dhruv Batra
, et al. (5 additional authors not shown)
Abstract:
We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale. Each scene consists of a dense mesh, high-resolution high-dynamic-range (HDR) textures, per-primitive semantic class and instance information, and planar mirror and glass reflectors. The goal of Replica is to enable machine learning (ML) research that relies on visually, geometr…
▽ More
We introduce Replica, a dataset of 18 highly photo-realistic 3D indoor scene reconstructions at room and building scale. Each scene consists of a dense mesh, high-resolution high-dynamic-range (HDR) textures, per-primitive semantic class and instance information, and planar mirror and glass reflectors. The goal of Replica is to enable machine learning (ML) research that relies on visually, geometrically, and semantically realistic generative models of the world - for instance, egocentric computer vision, semantic segmentation in 2D and 3D, geometric inference, and the development of embodied agents (virtual robots) performing navigation, instruction following, and question answering. Due to the high level of realism of the renderings from Replica, there is hope that ML systems trained on Replica may transfer directly to real world image and video data. Together with the data, we are releasing a minimal C++ SDK as a starting point for working with the Replica dataset. In addition, Replica is `Habitat-compatible', i.e. can be natively used with AI Habitat for training and testing embodied agents.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
StereoDRNet: Dilated Residual Stereo Net
Authors:
Rohan Chabra,
Julian Straub,
Chris Sweeney,
Richard Newcombe,
Henry Fuchs
Abstract:
We propose a system that uses a convolution neural network (CNN) to estimate depth from a stereo pair followed by volumetric fusion of the predicted depth maps to produce a 3D reconstruction of a scene. Our proposed depth refinement architecture, predicts view-consistent disparity and occlusion maps that helps the fusion system to produce geometrically consistent reconstructions. We utilize 3D dil…
▽ More
We propose a system that uses a convolution neural network (CNN) to estimate depth from a stereo pair followed by volumetric fusion of the predicted depth maps to produce a 3D reconstruction of a scene. Our proposed depth refinement architecture, predicts view-consistent disparity and occlusion maps that helps the fusion system to produce geometrically consistent reconstructions. We utilize 3D dilated convolutions in our proposed cost filtering network that yields better filtering while almost halving the computational cost in comparison to state of the art cost filtering architectures.For feature extraction we use the Vortex Pooling architecture. The proposed method achieves state of the art results in KITTI 2012, KITTI 2015 and ETH 3D stereo benchmarks. Finally, we demonstrate that our system is able to produce high fidelity 3D scene reconstructions that outperforms the state of the art stereo system.
△ Less
Submitted 2 June, 2019; v1 submitted 3 April, 2019;
originally announced April 2019.
-
Habitat: A Platform for Embodied AI Research
Authors:
Manolis Savva,
Abhishek Kadian,
Oleksandr Maksymets,
Yili Zhao,
Erik Wijmans,
Bhavana Jain,
Julian Straub,
Jia Liu,
Vladlen Koltun,
Jitendra Malik,
Devi Parikh,
Dhruv Batra
Abstract:
We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast -- when rendering a scen…
▽ More
We present Habitat, a platform for research in embodied artificial intelligence (AI). Habitat enables training embodied agents (virtual robots) in highly efficient photorealistic 3D simulation. Specifically, Habitat consists of: (i) Habitat-Sim: a flexible, high-performance 3D simulator with configurable agents, sensors, and generic 3D dataset handling. Habitat-Sim is fast -- when rendering a scene from Matterport3D, it achieves several thousand frames per second (fps) running single-threaded, and can reach over 10,000 fps multi-process on a single GPU. (ii) Habitat-API: a modular high-level library for end-to-end development of embodied AI algorithms -- defining tasks (e.g., navigation, instruction following, question answering), configuring, training, and benchmarking embodied agents.
These large-scale engineering contributions enable us to answer scientific questions requiring experiments that were till now impracticable or 'merely' impractical. Specifically, in the context of point-goal navigation: (1) we revisit the comparison between learning and SLAM approaches from two recent works and find evidence for the opposite conclusion -- that learning outperforms SLAM if scaled to an order of magnitude more experience than previous investigations, and (2) we conduct the first cross-dataset generalization experiments {train, test} x {Matterport3D, Gibson} for multiple sensors {blind, RGB, RGBD, D} and find that only agents with depth (D) sensors generalize across datasets. We hope that our open-source platform and these findings will advance research in embodied AI.
△ Less
Submitted 24 November, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
Authors:
Jeong Joon Park,
Peter Florence,
Julian Straub,
Richard Newcombe,
Steven Lovegrove
Abstract:
Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape re…
▽ More
Computer graphics, 3D computer vision and robotics communities have produced multiple approaches to representing 3D geometry for rendering and reconstruction. These provide trade-offs across fidelity, efficiency and compression capabilities. In this work, we introduce DeepSDF, a learned continuous Signed Distance Function (SDF) representation of a class of shapes that enables high quality shape representation, interpolation and completion from partial and noisy 3D input data. DeepSDF, like its classical counterpart, represents a shape's surface by a continuous volumetric field: the magnitude of a point in the field represents the distance to the surface boundary and the sign indicates whether the region is inside (-) or outside (+) of the shape, hence our representation implicitly encodes a shape's boundary as the zero-level-set of the learned function while explicitly representing the classification of space as being part of the shapes interior or not. While classical SDF's both in analytical or discretized voxel form typically represent the surface of a single shape, DeepSDF can represent an entire class of shapes. Furthermore, we show state-of-the-art performance for learned 3D shape representation and completion while reducing the model size by an order of magnitude compared with previous work.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
Probabilistic Analysis of the Dual-Pivot Quicksort "Count"
Authors:
Ralph Neininger,
Jasmin Straub
Abstract:
Recently, Aumüller and Dietzfelbinger proposed a version of a dual-pivot quicksort, called "Count", which is optimal among dual-pivot versions with respect to the average number of key comparisons required. In this note we provide further probabilistic analysis of "Count". We derive an exact formula for the average number of swaps needed by "Count" as well as an asymptotic formula for the variance…
▽ More
Recently, Aumüller and Dietzfelbinger proposed a version of a dual-pivot quicksort, called "Count", which is optimal among dual-pivot versions with respect to the average number of key comparisons required. In this note we provide further probabilistic analysis of "Count". We derive an exact formula for the average number of swaps needed by "Count" as well as an asymptotic formula for the variance of the number of swaps and a limit law. Also for the number of key comparisons the asymptotic variance and a limit law are identified. We also consider both complexity measures jointly and find their asymptotic correlation.
△ Less
Submitted 20 October, 2017;
originally announced October 2017.
-
Direction-Aware Semi-Dense SLAM
Authors:
Julian Straub,
Randi Cabezas,
John Leonard,
John W. Fisher III
Abstract:
To aide simultaneous localization and mapping (SLAM), future perception systems will incorporate forms of scene understanding. In a step towards fully integrated probabilistic geometric scene understanding, localization and mapping we propose the first direction-aware semi-dense SLAM system. It jointly infers the directional Stata Center World (SCW) segmentation and a surfel-based semi-dense map w…
▽ More
To aide simultaneous localization and mapping (SLAM), future perception systems will incorporate forms of scene understanding. In a step towards fully integrated probabilistic geometric scene understanding, localization and mapping we propose the first direction-aware semi-dense SLAM system. It jointly infers the directional Stata Center World (SCW) segmentation and a surfel-based semi-dense map while performing real-time camera tracking. The joint SCW map model connects a scene-wide Bayesian nonparametric Dirichlet Process von-Mises-Fisher mixture model (DP-vMF) prior on surfel orientations with the local surfel locations via a conditional random field (CRF). Camera tracking leverages the SCW segmentation to improve efficiency via guided observation selection. Results demonstrate improved SLAM accuracy and tracking efficiency at state of the art performance.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
Small-Variance Nonparametric Clustering on the Hypersphere
Authors:
Julian Straub,
Trevor Campbell,
Jonathan P. How,
John W. Fisher III
Abstract:
Structural regularities in man-made environments reflect in the distribution of their surface normals. Describing these surface normal distributions is important in many computer vision applications, such as scene understanding, plane segmentation, and regularization of 3D reconstructions. Based on the small-variance limit of Bayesian nonparametric von-Mises-Fisher (vMF) mixture distributions, we…
▽ More
Structural regularities in man-made environments reflect in the distribution of their surface normals. Describing these surface normal distributions is important in many computer vision applications, such as scene understanding, plane segmentation, and regularization of 3D reconstructions. Based on the small-variance limit of Bayesian nonparametric von-Mises-Fisher (vMF) mixture distributions, we propose two new flexible and efficient k-means-like clustering algorithms for directional data such as surface normals. The first, DP-vMF-means, is a batch clustering algorithm derived from the Dirichlet process (DP) vMF mixture. Recognizing the sequential nature of data collection in many applications, we extend this algorithm to DDP-vMF-means, which infers temporally evolving cluster structure from streaming data. Both algorithms naturally respect the geometry of directional data, which lies on the unit sphere. We demonstrate their performance on synthetic directional data and real 3D surface normals from RGB-D sensors. While our experiments focus on 3D data, both algorithms generalize to high dimensional directional data such as protein backbone configurations and semantic word vectors.
△ Less
Submitted 21 July, 2016;
originally announced July 2016.
-
Efficient Global Point Cloud Alignment using Bayesian Nonparametric Mixtures
Authors:
Julian Straub,
Trevor Campbell,
Jonathan P. How,
John W. Fisher III
Abstract:
Point cloud alignment is a common problem in computer vision and robotics, with applications ranging from 3D object recognition to reconstruction. We propose a novel approach to the alignment problem that utilizes Bayesian nonparametrics to describe the point cloud and surface normal densities, and branch and bound (BB) optimization to recover the relative transformation. BB uses a novel, refinabl…
▽ More
Point cloud alignment is a common problem in computer vision and robotics, with applications ranging from 3D object recognition to reconstruction. We propose a novel approach to the alignment problem that utilizes Bayesian nonparametrics to describe the point cloud and surface normal densities, and branch and bound (BB) optimization to recover the relative transformation. BB uses a novel, refinable, near-uniform tessellation of rotation space using 4D tetrahedra, leading to more efficient optimization compared to the common axis-angle tessellation. We provide objective function bounds for pruning given the proposed tessellation, and prove that BB converges to the optimum of the cost function along with providing its computational complexity. Finally, we empirically demonstrate the efficiency of the proposed approach as well as its robustness to real-world conditions such as missing data and partial overlap.
△ Less
Submitted 21 November, 2016; v1 submitted 15 March, 2016;
originally announced March 2016.
-
Streaming, Distributed Variational Inference for Bayesian Nonparametrics
Authors:
Trevor Campbell,
Julian Straub,
John W. Fisher III,
Jonathan P. How
Abstract:
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models. In the proposed framework, processing nodes receive a sequence of data minibatches, compute a variational posterior for each, and make asynchronous streaming updates to a central model. In contrast to previous algorithms, the proposed framework is truly streaming, dis…
▽ More
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayesian nonparametric (BNP) models. In the proposed framework, processing nodes receive a sequence of data minibatches, compute a variational posterior for each, and make asynchronous streaming updates to a central model. In contrast to previous algorithms, the proposed framework is truly streaming, distributed, asynchronous, learning-rate-free, and truncation-free. The key challenge in developing the framework, arising from the fact that BNP models do not impose an inherent ordering on their components, is finding the correspondence between minibatch and central BNP posterior components before performing each update. To address this, the paper develops a combinatorial optimization problem over component correspondences, and provides an efficient solution technique. The paper concludes with an application of the methodology to the DP mixture model, with experimental results demonstrating its practical scalability and performance.
△ Less
Submitted 30 October, 2015;
originally announced October 2015.