Search | arXiv e-print repository

arXiv:2403.02043 [pdf, other]

Iterative Occlusion-Aware Light Field Depth Estimation using 4D Geometrical Cues

Authors: Rui Lourenço, Lucas Thomaz, Eduardo A. B. Silva, Sergio M. M. Faria

Abstract: Light field cameras and multi-camera arrays have emerged as promising solutions for accurately estimating depth by passively capturing light information. This is possible because the 3D information of a scene is embedded in the 4D light field geometry. Commonly, depth estimation methods extract this information relying on gradient information, heuristic-based optimisation models, or learning-based… ▽ More Light field cameras and multi-camera arrays have emerged as promising solutions for accurately estimating depth by passively capturing light information. This is possible because the 3D information of a scene is embedded in the 4D light field geometry. Commonly, depth estimation methods extract this information relying on gradient information, heuristic-based optimisation models, or learning-based approaches. This paper focuses mainly on explicitly understanding and exploiting 4D geometrical cues for light field depth estimation. Thus, a novel method is proposed, based on a non-learning-based optimisation approach for depth estimation that explicitly considers surface normal accuracy and occlusion regions by utilising a fully explainable 4D geometric model of the light field. The 4D model performs depth/disparity estimation by determining the orientations and analysing the intersections of key 2D planes in 4D space, which are the images of 3D-space points in the 4D light field. Experimental results show that the proposed method outperforms both learning-based and non-learning-based state-of-the-art methods in terms of surface normal angle accuracy, achieving a Median Angle Error on planar surfaces, on average, 26.3\% lower than the state-of-the-art, and still being competitive with state-of-the-art methods in terms of Mean Squared Error $\vc{\times}$ 100 and Badpix 0.07. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2209.15585 [pdf, other]

Cloud Classification with Unsupervised Deep Learning

Authors: Takuya Kurihana, Ian Foster, Rebecca Willett, Sydney Jenkins, Kathryn Koenig, Ruby Werman, Ricardo Barros Lourenco, Casper Neo, Elisabeth Moyer

Abstract: We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detail… ▽ More We present a framework for cloud characterization that leverages modern unsupervised deep learning technologies. While previous neural network-based cloud classification models have used supervised learning methods, unsupervised learning allows us to avoid restricting the model to artificial categories based on historical cloud classification schemes and enables the discovery of novel, more detailed classifications. Our framework learns cloud features directly from radiance data produced by NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument, deriving cloud characteristics from millions of images without relying on pre-defined cloud types during the training process. We present preliminary results showing that our method extracts physically relevant information from radiance data and produces meaningful cloud classes. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 5 pages, 6 figures, Proceedings for Climate Informatics Workshop 2019 Paris

arXiv:2111.02508 [pdf, other]

AlphaD3M: Machine Learning Pipeline Synthesis

Authors: Iddo Drori, Yamuna Krishnamurthy, Remi Rampin, Raoni de Paula Lourenco, Jorge Piazentin Ono, Kyunghyun Cho, Claudio Silva, Juliana Freire

Abstract: We introduce AlphaD3M, an automatic machine learning (AutoML) system based on meta reinforcement learning using sequence models with self play. AlphaD3M is based on edit operations performed over machine learning pipeline primitives providing explainability. We compare AlphaD3M with state-of-the-art AutoML systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M achieves competiti… ▽ More We introduce AlphaD3M, an automatic machine learning (AutoML) system based on meta reinforcement learning using sequence models with self play. AlphaD3M is based on edit operations performed over machine learning pipeline primitives providing explainability. We compare AlphaD3M with state-of-the-art AutoML systems: Autosklearn, Autostacker, and TPOT, on OpenML datasets. AlphaD3M achieves competitive performance while being an order of magnitude faster, reducing computation time from hours to minutes, and is explainable by design. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: ICML 2018 AutoML Workshop

arXiv:2105.06058 [pdf, other]

DataExposer: Exposing Disconnect between Data and Systems

Authors: Sainyam Galhotra, Anna Fariha, Raoni Lourenço, Juliana Freire, Alexandra Meliou, Divesh Srivastava

Abstract: As data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of the data. For example, a health-monitoring system that is designed under the assumption that weight is reported in imperial units (lbs) will malfunction when encountering weight reported in metric units (kilograms). Similar to software debuggi… ▽ More As data is a central component of many modern systems, the cause of a system malfunction may reside in the data, and, specifically, particular properties of the data. For example, a health-monitoring system that is designed under the assumption that weight is reported in imperial units (lbs) will malfunction when encountering weight reported in metric units (kilograms). Similar to software debugging, which aims to find bugs in the mechanism (source code or runtime conditions), our goal is to debug the data to identify potential sources of disconnect between the assumptions about the data and the systems that operate on that data. Specifically, we seek which properties of the data cause a data-driven system to malfunction. We propose DataExposer, a framework to identify data properties, called profiles, that are the root causes of performance degradation or failure of a system that operates on the data. Such identification is necessary to repair the system and resolve the disconnect between data and system. Our technique is based on causal reasoning through interventions: when a system malfunctions for a dataset, DataExposer alters the data profiles and observes changes in the system's behavior due to the alteration. Unlike statistical observational analysis that reports mere correlations, DataExposer reports causally verified root causes, in terms of data profiles, of the system malfunction. We empirically evaluate DataExposer on three real-world and several synthetic data-driven systems that fail on datasets due to a diverse set of reasons. In all cases, DataExposer identifies the root causes precisely while requiring orders of magnitude fewer interventions than prior techniques. △ Less

Submitted 12 May, 2021; originally announced May 2021.

arXiv:2004.06530 [pdf, other]

doi 10.1145/3318464.3389763

BugDoc: Algorithms to Debug Computational Processes

Authors: Raoni Lourenço, Juliana Freire, Dennis Shasha

Abstract: Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous outputs, the pipeline may fail to execute or produce incorrect results. Inferring the root cause(s) of such failures is challen… ▽ More Data analysis for scientific experiments and enterprises, large-scale simulations, and machine learning tasks all entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous outputs, the pipeline may fail to execute or produce incorrect results. Inferring the root cause(s) of such failures is challenging, usually requiring time and much human thought, while still being error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our experimental data and processing software is available for use, reproducibility, and enhancement. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: To appear in SIGMOD 2020. arXiv admin note: text overlap with arXiv:2002.04640

arXiv:2002.04640 [pdf, other]

doi 10.1145/3329486.3329489

Debugging Machine Learning Pipelines

Authors: Raoni Lourenço, Juliana Freire, Dennis Shasha

Abstract: Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time-consumin… ▽ More Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time-consuming and error-prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement. △ Less

Submitted 11 February, 2020; originally announced February 2020.

Comments: 10 pages

Journal ref: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning, June 2019, Article No.: 3

arXiv:1905.10345 [pdf, other]

Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

Authors: Iddo Drori, Yamuna Krishnamurthy, Raoni Lourenco, Remi Rampin, Kyunghyun Cho, Claudio Silva, Juliana Freire

Abstract: Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre… ▽ More Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available. △ Less

Submitted 24 May, 2019; originally announced May 2019.

Comments: ICML Workshop on Automated Machine Learning

arXiv:1710.00115 [pdf, other]

Running the Network Harder: Connection Provisioning under Resource Crunch

Authors: Rafael B. R. Lourenco, Massimo Tornatore, Charles U. Martel, Biswanath Mukherjee

Abstract: Traditionally, networks operate at a small fraction of their capacities; however, recent technologies, such as Software-Defined Networking, may let operators run their networks harder (i.e., at higher utilization levels). Higher utilization can increase the network operator's revenue, but this gain comes at a cost: daily traffic fluctuations and failures might occasionally overload the network. We… ▽ More Traditionally, networks operate at a small fraction of their capacities; however, recent technologies, such as Software-Defined Networking, may let operators run their networks harder (i.e., at higher utilization levels). Higher utilization can increase the network operator's revenue, but this gain comes at a cost: daily traffic fluctuations and failures might occasionally overload the network. We call such situations Resource Crunch. Dealing with Resource Crunch requires certain types of flexibility in the system. We focus on scenarios with flexible bandwidth requirements, e.g., some connections can tolerate lower bandwidth allocation. This may free capacity to provision new requests that would otherwise be blocked. For that, the network operator needs to make an informed decision, since reducing the bandwidth of a high-paying connection to allocate a low-value connection is not sensible. We propose a strategy to decide whether or not to provision a request (and which other connections to degrade) focusing on maximizing profits during Resource Crunch. To address this problem, we use an abstraction of the network state, called a Connection Adjacency Graph (CAG). We propose PROVISIONER, which integrates our CAG solution with an efficient Linear Program (LP). We compare our method to existing greedy approaches and to LP-only solutions, and show that our method outperforms them during Resource Crunch. △ Less

Submitted 25 April, 2018; v1 submitted 29 September, 2017; originally announced October 2017.

Comments: 13 pages, 10 figures

Showing 1–8 of 8 results for author: Lourenço, R