Search | arXiv e-print repository

Recovering a Message from an Incomplete Set of Noisy Fragments

Authors: Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony

Abstract: We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment len… ▽ More We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment length distributions and deletion probabilities. Precisely, we show that the capacity is given by a closed-form expression that can be interpreted as F - A, where F is the coverage fraction ,i.e., the fraction of the input codeword that is covered by output fragments, and A is an alignment cost incurred due to the lack of ordering in the output fragments. We then consider a noisy version of the problem, where the fragments are corrupted by binary symmetric noise. We derive upper and lower bounds to the capacity, both of which can be seen as F - A expressions. These bounds match for specific choices of fragment length distributions, and they are approximately tight in cases where there are not too many short fragments. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 43 pages, 3 figures

arXiv:2407.02741 [pdf]

18 GHz Solidly Mounted Resonator in Scandium Aluminum Nitride on SiO2/Ta2O5 Bragg Reflector

Authors: Omar Barrera, Nishanth Ravi, Kapil Saha, Supratik Dasgupta, Joshua Campbell, Jack Kramer, Eugene Kwon, Tzu-Hsuan Hsu, Sinwoo Cho, Ian Anderson, Pietro Simeoni, Jue Hou, Matteo Rinaldi, Mark S. Goorsky, Ruochen Lu

Abstract: This work reports an acoustic solidly mounted resonator (SMR) at 18.64 GHz, among the highest operating frequencies reported. The device is built in scandium aluminum nitride (ScAlN) on top of silicon dioxide (SiO2) and tantalum pentoxide (Ta2O5) Bragg reflectors on silicon (Si) wafer. The stack is analyzed with X-ray reflectivity (XRR) and high-resolution X-ray diffraction (HRXRD). The resonator… ▽ More This work reports an acoustic solidly mounted resonator (SMR) at 18.64 GHz, among the highest operating frequencies reported. The device is built in scandium aluminum nitride (ScAlN) on top of silicon dioxide (SiO2) and tantalum pentoxide (Ta2O5) Bragg reflectors on silicon (Si) wafer. The stack is analyzed with X-ray reflectivity (XRR) and high-resolution X-ray diffraction (HRXRD). The resonator shows a coupling coefficient (k2) of 2.0%, high series quality factor (Qs) of 156, shunt quality factor (Qp) of 142, and maximum Bode quality factor (Qmax) of 210. The third-order harmonics at 59.64 GHz is also observed with k2 around 0.6% and Q around 40. Upon further development, the reported acoustic resonator platform can enable various front-end signal-processing functions, e.g., filters and oscillators, at future frequency range 3 (FR3) bands. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 5 pages, 9 figures, 5 tables

arXiv:2406.17005 [pdf, other]

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. The MOSE challenge had 140 registered teams in total, 65 teams participated the validation phase and 12 teams made valid submissions in the final challenge phase. The MeViS challenge had 225 registered teams in total, 50 teams participated the validation phase and 5 teams made valid submissions in the final challenge phase. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

arXiv:2404.08668 [pdf, other]

A Comprehensive Survey on AI-based Methods for Patents

Authors: Homaira Huda Shomee, Zhu Wang, Sathya N. Ravi, Sourav Medya

Abstract: Recent advancements in Artificial Intelligence (AI) and machine learning have demonstrated transformative capabilities across diverse domains. This progress extends to the field of patent analysis and innovation, where AI-based tools present opportunities to streamline and enhance important tasks in the patent cycle such as classification, retrieval, and valuation prediction. This not only acceler… ▽ More Recent advancements in Artificial Intelligence (AI) and machine learning have demonstrated transformative capabilities across diverse domains. This progress extends to the field of patent analysis and innovation, where AI-based tools present opportunities to streamline and enhance important tasks in the patent cycle such as classification, retrieval, and valuation prediction. This not only accelerates the efficiency of patent researchers and applicants but also opens new avenues for technological innovation and discovery. Our survey provides a comprehensive summary of recent AI tools in patent analysis from more than 40 papers from 26 venues between 2017 and 2023. Unlike existing surveys, we include methods that work for patent image and text data. Furthermore, we introduce a novel taxonomy for the categorization based on the tasks in the patent life cycle as well as the specifics of the AI methods. This interdisciplinary survey aims to serve as a resource for researchers and practitioners who are working at the intersection of AI and patent analysis as well as the patent offices that are aiming to build efficient patent systems. △ Less

Submitted 18 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.02324 [pdf, other]

Differentially Private Communication of Measurement Anomalies in the Smart Grid

Authors: Nikhil Ravi, Anna Scaglione, Sean Peisert, Parth Pradhan

Abstract: In this paper, we present a framework based on differential privacy (DP) for querying electric power measurements to detect system anomalies or bad data. Our DP approach conceals consumption and system matrix data, while simultaneously enabling an untrusted third party to test hypotheses of anomalies, such as the presence of bad data, by releasing a randomized sufficient statistic for hypothesis-t… ▽ More In this paper, we present a framework based on differential privacy (DP) for querying electric power measurements to detect system anomalies or bad data. Our DP approach conceals consumption and system matrix data, while simultaneously enabling an untrusted third party to test hypotheses of anomalies, such as the presence of bad data, by releasing a randomized sufficient statistic for hypothesis-testing. We consider a measurement model corrupted by Gaussian noise and a sparse noise vector representing the attack, and we observe that the optimal test statistic is a chi-square random variable. To detect possible attacks, we propose a novel DP chi-square noise mechanism that ensures the test does not reveal private information about power injections or the system matrix. The proposed framework provides a robust solution for detecting bad data while preserving the privacy of sensitive power system data. △ Less

Submitted 22 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 13 pages, 5 figures

arXiv:2401.03251 [pdf, other]

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR

Authors: Nagarathna Ravi, Thishyan Raj T, Vipul Arora

Abstract: Confidence estimation of predictions from an End-to-End (E2E) Automatic Speech Recognition (ASR) model benefits ASR's downstream and upstream tasks. Class-probability-based confidence scores do not accurately represent the quality of overconfident ASR predictions. An ancillary Confidence Estimation Model (CEM) calibrates the predictions. State-of-the-art (SOTA) solutions use binary target scores f… ▽ More Confidence estimation of predictions from an End-to-End (E2E) Automatic Speech Recognition (ASR) model benefits ASR's downstream and upstream tasks. Class-probability-based confidence scores do not accurately represent the quality of overconfident ASR predictions. An ancillary Confidence Estimation Model (CEM) calibrates the predictions. State-of-the-art (SOTA) solutions use binary target scores for CEM training. However, the binary labels do not reveal the granular information of predicted words, such as temporal alignment between reference and hypothesis and whether the predicted word is entirely incorrect or contains spelling errors. Addressing this issue, we propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train CEM. To address the data imbalance of target scores while training CEM, we use shrinkage loss to focus on hard-to-learn data points and minimise the impact of easily learned data points. We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes. Experiments show that TeLeS generalises well across domains. To demonstrate the applicability of the proposed method, we formulate a TeLeS-based Acquisition (TeLeS-A) function for sampling uncertainty in active learning. We observe a significant reduction in the Word Error Rate (WER) as compared to SOTA methods. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

arXiv:2310.11422 [pdf, other]

doi 10.3847/1538-4357/ad36c2

A Full Accounting of the Visible Mass in SDSS MaNGA Disk Galaxies

Authors: Nitya Ravi, Kelly A. Douglass, Regina Demina

Abstract: We present a study of the ratio of visible mass to total mass in spiral galaxies to better understand the relative amount of dark matter present in galaxies of different masses and evolutionary stages. Using the velocities of the H-alpha emission line measured in spectroscopic observations from the Sloan Digital Sky Survey (SDSS) MaNGA Data Release 17 (DR17), we evaluate the rotational velocity of… ▽ More We present a study of the ratio of visible mass to total mass in spiral galaxies to better understand the relative amount of dark matter present in galaxies of different masses and evolutionary stages. Using the velocities of the H-alpha emission line measured in spectroscopic observations from the Sloan Digital Sky Survey (SDSS) MaNGA Data Release 17 (DR17), we evaluate the rotational velocity of over 5500 disk galaxies at their 90% elliptical Petrosian radii, R90. We compare this to the velocity expected from the total visible mass, which we compute from the stellar, HI, molecular hydrogen, and heavy metals and dust masses. Molecular hydrogen mass measurements are available for only a small subset of galaxies observed in SDSS MaNGA DR17, so we derive a parameterization of the molecular hydrogen mass as a function of absolute magnitude in the r band using galaxies observed as part of SDSS DR7. With these parameterizations, we calculate the fraction of visible mass within R90 that corresponds to the observed velocity. Based on statistically analyzing the likelihood of this fraction, we conclude that the null hypothesis (no dark matter) cannot be excluded at a confidence level better than 95% within the visible extent of the disk galaxies. We also find that when all mass components are included, the ratio of visible-to-total mass within the visible extent of star-forming disk galaxies increases with galaxy luminosity. △ Less

Submitted 20 June, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 16 pages, 11 figures, published in ApJ

Journal ref: (2024) ApJ, 967 (2): 135-146

arXiv:2310.04515 [pdf, other]

Utilizing Free Clients in Federated Learning for Focused Model Enhancement

Authors: Aditya Narayan Ravi, Ilan Shomorony

Abstract: Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data, without the need for clients to share their data. Many existing FL approaches assume that all clients have equal importance and construct a global objective based on all clients. We consider a version of FL we call Prioritized FL, where the goal is to learn a weighted mean object… ▽ More Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data, without the need for clients to share their data. Many existing FL approaches assume that all clients have equal importance and construct a global objective based on all clients. We consider a version of FL we call Prioritized FL, where the goal is to learn a weighted mean objective of a subset of clients, designated as priority clients. An important question arises: How do we choose and incentivize well aligned non priority clients to participate in the federation, while discarding misaligned clients? We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge. The algorithm employs a matching strategy that chooses non priority clients based on how similar the models loss is on their data compared to the global data, thereby ensuring the use of non priority client gradients only when it is beneficial for priority clients. This approach ensures mutual benefits as non priority clients are motivated to join when the model performs satisfactorily on their data, and priority clients can utilize their updates and computational resources when their goals align. We present a convergence analysis that quantifies the trade off between client selection and speed of convergence. Our algorithm shows faster convergence and higher test accuracy than baselines for various synthetic and benchmark datasets. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: 26 pages, 6 figures

arXiv:2310.03890 [pdf, other]

Accelerated Neural Network Training with Rooted Logistic Objectives

Authors: Zhu Wang, Praveen Raj Veluswami, Harsh Mishra, Sathya N. Ravi

Abstract: Many neural networks deployed in the real world scenarios are trained using cross entropy based loss functions. From the optimization perspective, it is known that the behavior of first order methods such as gradient descent crucially depend on the separability of datasets. In fact, even in the most simplest case of binary classification, the rate of convergence depends on two factors: (1) conditi… ▽ More Many neural networks deployed in the real world scenarios are trained using cross entropy based loss functions. From the optimization perspective, it is known that the behavior of first order methods such as gradient descent crucially depend on the separability of datasets. In fact, even in the most simplest case of binary classification, the rate of convergence depends on two factors: (1) condition number of data matrix, and (2) separability of the dataset. With no further pre-processing techniques such as over-parametrization, data augmentation etc., separability is an intrinsic quantity of the data distribution under consideration. We focus on the landscape design of the logistic function and derive a novel sequence of {\em strictly} convex functions that are at least as strict as logistic loss. The minimizers of these functions coincide with those of the minimum norm solution wherever possible. The strict convexity of the derived function can be extended to finetune state-of-the-art models and applications. In empirical experimental analysis, we apply our proposed rooted logistic objective to multiple deep models, e.g., fully-connected neural networks and transformers, on various of classification benchmarks. Our results illustrate that training with rooted loss function is converged faster and gains performance improvements. Furthermore, we illustrate applications of our novel rooted loss function in generative modeling based downstream applications, such as finetuning StyleGAN model with the rooted loss. The code implementing our losses and models can be found here for open source software development purposes: https://anonymous.4open.science/r/rooted_loss. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.00035 [pdf, other]

FACET: Fairness in Computer Vision Evaluation Benchmark

Authors: Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron Adcock, Cheng-Yang Fu, Melissa Hall, Candace Ross

Abstract: Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for com… ▽ More Computer vision models have known performance disparities across attributes such as gender and skin tone. This means during tasks such as classification and detection, model performance differs for certain classes based on the demographics of the people in the image. These disparities have been shown to exist, but until now there has not been a unified approach to measure these differences for common use-cases of computer vision models. We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large, publicly available evaluation set of 32k images for some of the most common vision tasks - image classification, object detection and segmentation. For every image in FACET, we hired expert reviewers to manually annotate person-related attributes such as perceived skin tone and hair type, manually draw bounding boxes and label fine-grained person-related classes such as disk jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art vision models and present a deeper understanding of potential performance disparities and challenges across sensitive demographic attributes. With the exhaustive annotations collected, we probe models using single demographics attributes as well as multiple attributes using an intersectional approach (e.g. hair color and perceived skin tone). Our results show that classification, detection, segmentation, and visual grounding models exhibit performance disparities across demographic attributes and intersections of attributes. These harms suggest that not all people represented in datasets receive fair and equitable treatment in these vision tasks. We hope current and future results using our benchmark will contribute to fairer, more robust vision models. FACET is available publicly at https://facet.metademolab.com/ △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2306.05578 [pdf, other]

Differential Privacy for Class-based Data: A Practical Gaussian Mechanism

Authors: Raksha Ramakrishna, Anna Scaglione, Tong Wu, Nikhil Ravi, Sean Peisert

Abstract: In this paper, we present a notion of differential privacy (DP) for data that comes from different classes. Here, the class-membership is private information that needs to be protected. The proposed method is an output perturbation mechanism that adds noise to the release of query response such that the analyst is unable to infer the underlying class-label. The proposed DP method is capable of not… ▽ More In this paper, we present a notion of differential privacy (DP) for data that comes from different classes. Here, the class-membership is private information that needs to be protected. The proposed method is an output perturbation mechanism that adds noise to the release of query response such that the analyst is unable to infer the underlying class-label. The proposed DP method is capable of not only protecting the privacy of class-based data but also meets quality metrics of accuracy and is computationally efficient and practical. We illustrate the efficacy of the proposed method empirically while outperforming the baseline additive Gaussian noise mechanism. We also examine a real-world application and apply the proposed DP method to the autoregression and moving average (ARMA) forecasting method, protecting the privacy of the underlying data source. Case studies on the real-world advanced metering infrastructure (AMI) measurements of household power consumption validate the excellent performance of the proposed DP method while also satisfying the accuracy of forecasted power consumption measurements. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Under review in IEEE Transactions on Information Forensics & Security

arXiv:2304.03749 [pdf, other]

Solar Photovoltaic Systems Metadata Inference and Differentially Private Publication

Authors: Nikhil Ravi, Anna Scaglione, Julieta Giraldez, Parth Pradhan, Chuck Moran, Sean Peisert

Abstract: Stakeholders in electricity delivery infrastructure are amassing data about their system demand, use, and operations. Still, they are reluctant to share them, as even sharing aggregated or anonymized electric grid data risks the disclosure of sensitive information. This paper highlights how applying differential privacy to distributed energy resource production data can preserve the usefulness of… ▽ More Stakeholders in electricity delivery infrastructure are amassing data about their system demand, use, and operations. Still, they are reluctant to share them, as even sharing aggregated or anonymized electric grid data risks the disclosure of sensitive information. This paper highlights how applying differential privacy to distributed energy resource production data can preserve the usefulness of that data for operations, planning, and research purposes without violating privacy constraints. Differentially private mechanisms can be optimized for queries of interest in the energy sector, with provable privacy and accuracy trade-offs, and can help design differentially private databases for further analysis and research. In this paper, we consider the problem of inference and publication of solar photovoltaic systems' metadata. Metadata such as nameplate capacity, surface azimuth and surface tilt may reveal personally identifiable information regarding the installation behind-the-meter. We describe a methodology to infer the metadata and propose a mechanism based on Bayesian optimization to publish the inferred metadata in a differentially private manner. The proposed mechanism is numerically validated using real-world solar power generation data. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: 10 pages, 6 figures

arXiv:2304.02643 [pdf, other]

Segment Anything

Authors: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, Ross Girshick

Abstract: We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and… ▽ More We introduce the Segment Anything (SA) project: a new task, model, and dataset for image segmentation. Using our efficient model in a data collection loop, we built the largest segmentation dataset to date (by far), with over 1 billion masks on 11M licensed and privacy respecting images. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks. We evaluate its capabilities on numerous tasks and find that its zero-shot performance is impressive -- often competitive with or even superior to prior fully supervised results. We are releasing the Segment Anything Model (SAM) and corresponding dataset (SA-1B) of 1B masks and 11M images at https://segment-anything.com to foster research into foundation models for computer vision. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Comments: Project web-page: https://segment-anything.com

arXiv:2302.05865 [pdf, other]

Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization

Authors: Hamidreza Almasi, Harsh Mishra, Balajee Vamanan, Sathya N. Ravi

Abstract: Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a di… ▽ More Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a distributed setup is prone to Byzantine failures of individual nodes, components, and software. With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems. We define the quality of workers as reconstruction ratios $\in (0,1]$, and formulate aggregation as a Maximum Likelihood Estimation procedure using Beta densities. We show that the Regularized form of log-likelihood wrt subspace can be approximately solved using iterative least squares solver, and provide convergence guarantees using recent Convex Optimization landscape results. Our empirical findings demonstrate that our approach significantly enhances the robustness of state-of-the-art Byzantine resilient aggregators. We evaluate our method in a distributed setup with a parameter server, and show simultaneous improvements in communication efficiency and accuracy across various tasks. The code is publicly available at https://github.com/hamidralmasi/FlagAggregator △ Less

Submitted 24 September, 2023; v1 submitted 12 February, 2023; originally announced February 2023.

arXiv:2302.05608 [pdf, other]

Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Authors: Zhu Wang, Sourav Medya, Sathya N. Ravi

Abstract: Often, deep network models are purely inductive during training and while performing inference on unseen data. Thus, when such models are used for predictions, it is well known that they often fail to capture the semantic information and implicit dependencies that exist among objects (or concepts) on a population level. Moreover, it is still unclear how domain or prior modal knowledge can be speci… ▽ More Often, deep network models are purely inductive during training and while performing inference on unseen data. Thus, when such models are used for predictions, it is well known that they often fail to capture the semantic information and implicit dependencies that exist among objects (or concepts) on a population level. Moreover, it is still unclear how domain or prior modal knowledge can be specified in a backpropagation friendly manner, especially in large-scale and noisy settings. In this work, we propose an end-to-end vision and language model incorporating explicit knowledge graphs. We also introduce an interactive out-of-distribution (OOD) layer using implicit network operator. The layer is used to filter noise that is brought by external knowledge base. In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval on different datasets. Our experiments show that it is possible to design models that perform similarly to state-of-art results but with significantly fewer samples and training time. △ Less

Submitted 11 February, 2023; originally announced February 2023.

arXiv:2302.02336 [pdf, other]

Using Intermediate Forward Iterates for Intermediate Generator Optimization

Authors: Harsh Mishra, Jurijs Nazarovs, Manmohan Dogra, Sathya N. Ravi

Abstract: Score-based models have recently been introduced as a richer framework to model distributions in high dimensions and are generally more suitable for generative tasks. In score-based models, a generative task is formulated using a parametric model (such as a neural network) to directly learn the gradient of such high dimensional distributions, instead of the density functions themselves, as is done… ▽ More Score-based models have recently been introduced as a richer framework to model distributions in high dimensions and are generally more suitable for generative tasks. In score-based models, a generative task is formulated using a parametric model (such as a neural network) to directly learn the gradient of such high dimensional distributions, instead of the density functions themselves, as is done traditionally. From the mathematical point of view, such gradient information can be utilized in reverse by stochastic sampling to generate diverse samples. However, from a computational perspective, existing score-based models can be efficiently trained only if the forward or the corruption process can be computed in closed form. By using the relationship between the process and layers in a feed-forward network, we derive a backpropagation-based procedure which we call Intermediate Generator Optimization to utilize intermediate iterates of the process with negligible computational overhead. The main advantage of IGO is that it can be incorporated into any standard autoencoder pipeline for the generative task. We analyze the sample complexity properties of IGO to solve downstream tasks like Generative PCA. We show applications of the IGO on two dense predictive tasks viz., image extrapolation, and point cloud denoising. Our experiments indicate that obtaining an ensemble of generators for various time points is possible using first-order methods. △ Less

Submitted 5 February, 2023; originally announced February 2023.

arXiv:2212.12008 [pdf, other]

A Novel Method for Lane-change Maneuver in Urban Driving Using Predictive Markov Decision Process

Authors: Avinash Prabu, Niranjan Ravi, Lingxi Li

Abstract: Lane-change maneuver has always been a challenging task for both manual and autonomous driving, especially in an urban setting. In particular, the uncertainty in predicting the behavior of other vehicles on the road leads to indecisive actions while changing lanes, which, might result in traffic congestion and cause safety concerns. This paper analyzes the factors related to uncertainty such as sp… ▽ More Lane-change maneuver has always been a challenging task for both manual and autonomous driving, especially in an urban setting. In particular, the uncertainty in predicting the behavior of other vehicles on the road leads to indecisive actions while changing lanes, which, might result in traffic congestion and cause safety concerns. This paper analyzes the factors related to uncertainty such as speed range change and lane change so as to design a predictive Markov decision process for lane-change maneuver in the urban setting. A hidden Markov model is developed for modeling uncertainties of surrounding vehicles. The reward model uses the crash probabilities and the feasibility/distance to the goal as primary parameters. Numerical simulation and analysis of two traffic scenarios are completed to demonstrate the effectiveness of the proposed approach. △ Less

Submitted 22 December, 2022; originally announced December 2022.

arXiv:2211.01338 [pdf, other]

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Authors: Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya , et al. (2 additional authors not shown)

Abstract: Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages… ▽ More Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2207.10660 [pdf, other]

Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

Authors: Garrick Brazil, Abhinav Kumar, Julian Straub, Nikhila Ravi, Justin Johnson, Georgia Gkioxari

Abstract: Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the succ… ▽ More Recognizing scenes and objects in 3D from a single image is a longstanding goal of computer vision with applications in robotics and AR/VR. For 2D recognition, large datasets and scalable solutions have led to unprecedented advances. In 3D, existing benchmarks are small in size and approaches specialize in few object categories and specific domains, e.g. urban driving scenes. Motivated by the success of 2D recognition, we revisit the task of 3D object detection by introducing a large benchmark, called Omni3D. Omni3D re-purposes and combines existing datasets resulting in 234k images annotated with more than 3 million instances and 98 categories. 3D detection at such scale is challenging due to variations in camera intrinsics and the rich diversity of scene and object types. We propose a model, called Cube R-CNN, designed to generalize across camera and scene types with a unified approach. We show that Cube R-CNN outperforms prior works on the larger Omni3D and existing benchmarks. Finally, we prove that Omni3D is a powerful dataset for 3D object recognition and show that it improves single-dataset performance and can accelerate learning on new smaller datasets via pre-training. △ Less

Submitted 23 March, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: CVPR 2023, Project website: https://omni3d.garrickbrazil.com/

arXiv:2207.00611 [pdf, other]

doi 10.1038/s41597-022-01712-9

FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy

Authors: Nikil Ravi, Pranshu Chaturvedi, E. A. Huerta, Zhengchun Liu, Ryan Chard, Aristana Scourtas, K. J. Schmidt, Kyle Chard, Ben Blaiszik, Ian Foster

Abstract: A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set o… ▽ More A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery. △ Less

Submitted 21 December, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: 11 pages, 3 figures; Accepted to Scientific Data; for press release see https://www.anl.gov/article/argonne-scientists-promote-fair-standards-for-managing-artificial-intelligence-models and https://www.ncsa.illinois.edu/ncsa-student-researchers-lead-authors-on-award-winning-paper; Received 2022 HPCwire Readers' Choice Award on Best Use of High Performance Data Analytics & Artificial Intelligence

MSC Class: 68T01; 68T05 ACM Class: I.2; J.2

Journal ref: Scientific Data 9, 657 (2022)

arXiv:2206.07028 [pdf, other]

Learning 3D Object Shape and Layout without 3D Supervision

Authors: Georgia Gkioxari, Nikhila Ravi, Justin Johnson

Abstract: A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space. Understanding 3D scenes from 2D images is an important goal, with applications in robotics and graphics. While there have been recent advances in predicting 3D shape and layout from a single image, most approaches rely on 3D ground truth for training which is expensive to collect at scale. We ov… ▽ More A 3D scene consists of a set of objects, each with a shape and a layout giving their position in space. Understanding 3D scenes from 2D images is an important goal, with applications in robotics and graphics. While there have been recent advances in predicting 3D shape and layout from a single image, most approaches rely on 3D ground truth for training which is expensive to collect at scale. We overcome these limitations and propose a method that learns to predict 3D shape and layout for objects without any ground truth shape or layout information: instead we rely on multi-view images with 2D supervision which can more easily be collected at scale. Through extensive experiments on 3D Warehouse, Hypersim, and ScanNet we demonstrate that our approach scales to large datasets of realistic images, and compares favorably to methods relying on 3D ground truth. On Hypersim and ScanNet where reliable 3D ground truth is not available, our approach outperforms supervised approaches trained on smaller and less diverse datasets. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: CVPR 2022, project page: https://gkioxari.github.io/usl/

arXiv:2204.07655 [pdf, other]

Deep Unlearning via Randomized Conditionally Independent Hessians

Authors: Ronak Mehta, Sourav Pal, Vikas Singh, Sathya N. Ravi

Abstract: Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deleting the closest original sample can be effective. Bu… ▽ More Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deleting the closest original sample can be effective. But this idea is inapplicable to models which learn richer representations. Recent ideas leveraging optimization-based updates scale poorly with the model dimension d, due to inverting the Hessian of the loss function. We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. Our approach completely avoids the need to invert a (possibly) huge matrix. By utilizing a Markov blanket selection, we premise that L-CODEC is also suitable for deep unlearning, as well as other applications in vision. Compared to alternatives, L-CODEC makes approximate unlearning possible in settings that would otherwise be infeasible, including vision models used for face recognition, person re-identification and NLP models that may require unlearning samples identified for exclusion. Code can be found at https://github.com/vsingh-group/LCODEC-deep-unlearning/ △ Less

Submitted 13 July, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: CVPR 2022. Supplement appended to end of main paper (total 15 pages). Ronak Mehta and Sourav Pal equal contribution

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10422-10431

arXiv:2203.15234 [pdf, other]

Equivariance Allows Handling Multiple Nuisance Variables When Analyzing Pooled Neuroimaging Datasets

Authors: Vishnu Suresh Lokhande, Rudrasis Chakraborty, Sathya N. Ravi, Vikas Singh

Abstract: Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a {\em single} source of variability (e.g., different scanners), domain adaptation and matching the distributions of representations may suffice in many… ▽ More Pooling multiple neuroimaging datasets across institutions often enables improvements in statistical power when evaluating associations (e.g., between risk factors and disease outcomes) that may otherwise be too weak to detect. When there is only a {\em single} source of variability (e.g., different scanners), domain adaptation and matching the distributions of representations may suffice in many scenarios. But in the presence of {\em more than one} nuisance variable which concurrently influence the measurements, pooling datasets poses unique challenges, e.g., variations in the data can come from both the acquisition method as well as the demographics of participants (gender, age). Invariant representation learning, by itself, is ill-suited to fully model the data generation process. In this paper, we show how bringing recent results on equivariant representation learning (for studying symmetries in neural networks) instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution. In particular, we demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted at 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:2202.09463 [pdf, other]

Mixed Effects Neural ODE: A Variational Approximation for Analyzing the Dynamics of Panel Data

Authors: Jurijs Nazarovs, Rudrasis Chakraborty, Songwong Tasneeyapant, Sathya N. Ravi, Vikas Singh

Abstract: Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in such applications. The task of modeling not just… ▽ More Panel data involving longitudinal measurements of the same set of participants taken over multiple time points is common in studies to understand childhood development and disease modeling. Deep hybrid models that marry the predictive power of neural networks with physical simulators such as differential equations, are starting to drive advances in such applications. The task of modeling not just the observations but the hidden dynamics that are captured by the measurements poses interesting statistical/computational questions. We propose a probabilistic model called ME-NODE to incorporate (fixed + random) mixed effects for analyzing such panel data. We show that our model can be derived using smooth approximations of SDEs provided by the Wong-Zakai theorem. We then derive Evidence Based Lower Bounds for ME-NODE, and develop (efficient) training algorithms using MC based sampling methods and numerical ODE solvers. We demonstrate ME-NODE's utility on tasks spanning the spectrum from simulations and toy data to real longitudinal 3D imaging data from an Alzheimer's disease (AD) study, and study its performance in terms of accuracy of reconstruction for interpolation, uncertainty estimates and personalized prediction. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Journal ref: Proceedings of Machine Learning Research; PMLR 161:107-117, 2021

arXiv:2201.08377 [pdf, other]

Omnivore: A Single Model for Many Visual Modalities

Authors: Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, Ishan Misra

Abstract: Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data. Instead, in this paper, we propose a single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters. Our 'Omnivore' model leverages the flexibility of transformer-based architectures and is tra… ▽ More Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data. Instead, in this paper, we propose a single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters. Our 'Omnivore' model leverages the flexibility of transformer-based architectures and is trained jointly on classification tasks from different modalities. Omnivore is simple to train, uses off-the-shelf standard datasets, and performs at-par or better than modality-specific models of the same size. A single Omnivore model obtains 86.0% on ImageNet, 84.1% on Kinetics, and 67.1% on SUN RGB-D. After finetuning, our models outperform prior work on a variety of vision tasks and generalize across modalities. Omnivore's shared visual representation naturally enables cross-modal recognition without access to correspondences between modalities. We hope our results motivate researchers to model visual modalities together. △ Less

Submitted 30 March, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted at CVPR 2022 (Oral Presentation)

arXiv:2112.03801 [pdf, other]

doi 10.1109/TSG.2022.3184252

Differentially Private $K$-means Clustering Applied to Meter Data Analysis and Synthesis

Authors: Nikhil Ravi, Anna Scaglione, Sachin Kadam, Reinhard Gentz, Sean Peisert, Brent Lunghino, Emmanuel Levijarvi, Aram Shumavon

Abstract: The proliferation of smart meters has resulted in a large amount of data being generated. It is increasingly apparent that methods are required for allowing a variety of stakeholders to leverage the data in a manner that preserves the privacy of the consumers. The sector is scrambling to define policies, such as the so called `15/15 rule', to respond to the need. However, the current policies fail… ▽ More The proliferation of smart meters has resulted in a large amount of data being generated. It is increasingly apparent that methods are required for allowing a variety of stakeholders to leverage the data in a manner that preserves the privacy of the consumers. The sector is scrambling to define policies, such as the so called `15/15 rule', to respond to the need. However, the current policies fail to adequately guarantee privacy. In this paper, we address the problem of allowing third parties to apply $K$-means clustering, obtaining customer labels and centroids for a set of load time series by applying the framework of differential privacy. We leverage the method to design an algorithm that generates differentially private synthetic load data consistent with the labeled data. We test our algorithm's utility by answering summary statistics such as average daily load profiles for a 2-dimensional synthetic dataset and a real-world power load dataset. △ Less

Submitted 22 April, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: 13 pages, 13 figures

arXiv:2112.01520 [pdf, other]

Recognizing Scenes from Novel Viewpoints

Authors: Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari

Abstract: Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we attempt to endow machines with this ability. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoint… ▽ More Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we attempt to endow machines with this ability. We propose a model which takes as input a few RGB images of a new scene and recognizes the scene from novel viewpoints by segmenting it into semantic categories. All this without access to the RGB images from those views. We pair 2D scene recognition with an implicit 3D representation and learn from multi-view 2D annotations of hundreds of scenes without any 3D supervision beyond camera poses. We experiment on challenging datasets and demonstrate our model's ability to jointly capture semantics and geometry of novel scenes with diverse layouts, object types and shapes. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2111.11661 [pdf, other]

Optimum Noise Mechanism for Differentially Private Queries in Discrete Finite Sets

Authors: Sachin Kadam, Anna Scaglione, Nikhil Ravi, Sean Peisert, Brent Lunghino, Aram Shumavon

Abstract: The Differential Privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specified parametric distribution model with one or two degrees of freedom. However, this emphasis tends to neglect the crucial considerations of response accuracy and utility, especially in the context of categorical or discrete numerical database queries,… ▽ More The Differential Privacy (DP) literature often centers on meeting privacy constraints by introducing noise to the query, typically using a pre-specified parametric distribution model with one or two degrees of freedom. However, this emphasis tends to neglect the crucial considerations of response accuracy and utility, especially in the context of categorical or discrete numerical database queries, where the parameters defining the noise distribution are finite and could be chosen optimally. This paper addresses this gap by introducing a novel framework for designing an optimal noise Probability Mass Function (PMF) tailored to discrete and finite query sets. Our approach considers the modulo summation of random noise as the DP mechanism, aiming to present a tractable solution that not only satisfies privacy constraints but also minimizes query distortion. Unlike existing approaches focused solely on meeting privacy constraints, our framework seeks to optimize the noise distribution under an arbitrary $(ε, δ)$ constraint, thereby enhancing the accuracy and utility of the response. We demonstrate that the optimal PMF can be obtained through solving a Mixed-Integer Linear Program (MILP). Additionally, closed-form solutions for the optimal PMF are provided, minimizing the probability of error for two specific cases. Numerical experiments highlight the superior performance of our proposed optimal mechanisms compared to state-of-the-art methods. This paper contributes to the DP literature by presenting a clear and systematic approach to designing noise mechanisms that not only satisfy privacy requirements but also optimize query distortion. The framework introduced here opens avenues for improved privacy-preserving database queries, offering significant enhancements in response accuracy and utility. △ Less

Submitted 8 April, 2024; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: Accepted for publication in the journal Cybersecurity (https://cybersecurity.springeropen.com/)

arXiv:2111.09887 [pdf, other]

PyTorchVideo: A Deep Learning Library for Video Understanding

Authors: Haoqi Fan, Tullie Murrell, Heng Wang, Kalyan Vasudev Alwala, Yanghao Li, Yilei Li, Bo Xiong, Nikhila Ravi, Meng Li, Haichuan Yang, Jitendra Malik, Ross Girshick, Matt Feiszli, Aaron Adcock, Wan-Yen Lo, Christoph Feichtenhofer

Abstract: We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing. The library covers a full stack of video understanding tools including multimodal data loading, transformations, and models tha… ▽ More We introduce PyTorchVideo, an open-source deep-learning library that provides a rich set of modular, efficient, and reproducible components for a variety of video understanding tasks, including classification, detection, self-supervised learning, and low-level processing. The library covers a full stack of video understanding tools including multimodal data loading, transformations, and models that reproduce state-of-the-art performance. PyTorchVideo further supports hardware acceleration that enables real-time inference on mobile devices. The library is based on PyTorch and can be used by any training framework; for example, PyTorchLightning, PySlowFast, or Classy Vision. PyTorchVideo is available at https://pytorchvideo.org/ △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: Technical report

arXiv:2111.09714 [pdf, other]

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Authors: Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh

Abstract: Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a Bernoulli sampling attention mechanism based on Lo… ▽ More Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hashing (LSH), decreases the quadratic complexity of such models to linear. We bypass the quadratic cost by considering self-attention as a sum of individual tokens associated with Bernoulli random variables that can, in principle, be sampled at once by a single hash (although in practice, this number may be a small constant). This leads to an efficient sampling scheme to estimate self-attention which relies on specific modifications of LSH (to enable deployment on GPU architectures). We evaluate our algorithm on the GLUE benchmark with standard 512 sequence length where we see favorable performance relative to a standard pretrained Transformer. On the Long Range Arena (LRA) benchmark, for evaluating performance on long sequences, our method achieves results consistent with softmax self-attention but with sizable speed-ups and memory savings and often outperforms other efficient self-attention methods. Our code is available at https://github.com/mlpen/YOSO △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: Proceedings of the 38th ICML (2021)

arXiv:2111.07850 [pdf, other]

Colored Noise Mechanism for Differentially Private Clustering

Authors: Nikhil Ravi, Anna Scaglione, Sean Peisert

Abstract: The goal of this paper is to propose and analyze a differentially private randomized mechanism for the $K$-means query. The goal is to ensure that the information received about the cluster-centroids is differentially private. The method consists in adding Gaussian noise with an optimum covariance. The main result of the paper is the analytical solution for the optimum covariance as a function of… ▽ More The goal of this paper is to propose and analyze a differentially private randomized mechanism for the $K$-means query. The goal is to ensure that the information received about the cluster-centroids is differentially private. The method consists in adding Gaussian noise with an optimum covariance. The main result of the paper is the analytical solution for the optimum covariance as a function of the database. Comparisons with the state of the art prove the efficacy of our approach. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: 5 pages, 3 figures, preprint

arXiv:2110.02868 [pdf, other]

Coded Shotgun Sequencing

Authors: Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony

Abstract: Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in arXiv:1203.6233, is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this p… ▽ More Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in arXiv:1203.6233, is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this problem;i.e., the scenario where the DNA molecule being sequenced is a codeword from a predefined codebook. Our main result is an exact characterization of the capacity of the resulting shotgun sequencing channel as a function of the read length and coverage depth. In particular, our results imply that, while in the uncoded case, $O(n)$ reads of length greater than $2\log{n}$ are needed for reliable reconstruction of a length-$n$ binary sequence, in the coded case, only $O(n/\log{n})$ reads of length greater than $\log{n}$ are needed for the capacity to be arbitrarily close to $1$. △ Less

Submitted 7 February, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: 35 pages, 4 figures, 8 appendices

arXiv:2109.10472 [pdf, other]

doi 10.3389/fphys.2021.782176

Rotor Localization and Phase Mapping of Cardiac Excitation Waves using Deep Neural Networks

Authors: Jan Lebert, Namita Ravi, Flavio Fenton, Jan Christoph

Abstract: The analysis of electrical impulse phenomena in cardiac muscle tissue is important for the diagnosis of heart rhythm disorders and other cardiac pathophysiology. Cardiac mapping techniques acquire local temporal measurements and combine them to visualize the spread of electrophysiological wave phenomena across the heart surface. However, low spatial resolution, sparse measurement locations, noise… ▽ More The analysis of electrical impulse phenomena in cardiac muscle tissue is important for the diagnosis of heart rhythm disorders and other cardiac pathophysiology. Cardiac mapping techniques acquire local temporal measurements and combine them to visualize the spread of electrophysiological wave phenomena across the heart surface. However, low spatial resolution, sparse measurement locations, noise and other artifacts make it challenging to accurately visualize spatio-temporal activity. For instance, electro-anatomical catheter mapping is severely limited by the sparsity of the measurements, and optical mapping is prone to noise and motion artifacts. In the past, several approaches have been proposed to obtain more reliable maps from noisy or sparse mapping data. Here, we demonstrate that deep learning can be used to compute phase maps and detect phase singularities in optical mapping videos of ventricular fibrillation, as well as in very noisy, low-resolution and extremely sparse simulated data of reentrant wave chaos mimicking catheter mapping data. The deep learning approach learns to directly associate phase maps and the positions of phase singularities with short spatio-temporal sequences of electrical data. We tested several neural network architectures, based on a convolutional neural network with an encoding and decoding structure, to predict phase maps or rotor core positions either directly or indirectly via the prediction of phase maps and a subsequent classical calculation of phase singularities. Predictions can be performed across different data, with models being trained on one species and then successfully applied to another, or being trained solely on simulated data and then applied to experimental data. Future uses may include the analysis of optical mapping studies in basic cardiovascular research, as well as the mapping of atrial fibrillation in the clinical setting. △ Less

Submitted 8 November, 2021; v1 submitted 21 September, 2021; originally announced September 2021.

Journal ref: Front. Physiol. 12 (2021) 782176

arXiv:2108.08891 [pdf, other]

Neural TMDlayer: Modeling Instantaneous flow of features via SDE Generators

Authors: Zihang Meng, Vikas Singh, Sathya N. Ravi

Abstract: We study how stochastic differential equation (SDE) based ideas can inspire new modifications to existing algorithms for a set of problems in computer vision. Loosely speaking, our formulation is related to both explicit and implicit strategies for data augmentation and group equivariance, but is derived from new results in the SDE literature on estimating infinitesimal generators of a class of st… ▽ More We study how stochastic differential equation (SDE) based ideas can inspire new modifications to existing algorithms for a set of problems in computer vision. Loosely speaking, our formulation is related to both explicit and implicit strategies for data augmentation and group equivariance, but is derived from new results in the SDE literature on estimating infinitesimal generators of a class of stochastic processes. If and when there is nominal agreement between the needs of an application/task and the inherent properties and behavior of the types of processes that we can efficiently handle, we obtain a very simple and efficient plug-in layer that can be incorporated within any existing network architecture, with minimal modification and only a few additional parameters. We show promising experiments on a number of vision tasks including few shot learning, point cloud transformers and deep variational segmentation obtaining efficiency or performance improvements. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2102.08343 [pdf, other]

Learning Invariant Representations using Inverse Contrastive Loss

Authors: Aditya Kumar Akash, Vishnu Suresh Lokhande, Sathya N. Ravi, Vikas Singh

Abstract: Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach corresponds to the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimization purposes since these losses are agnostic of… ▽ More Learning invariant representations is a critical first step in a number of machine learning tasks. A common approach corresponds to the so-called information bottleneck principle in which an application dependent function of mutual information is carefully chosen and optimized. Unfortunately, in practice, these functions are not suitable for optimization purposes since these losses are agnostic of the metric structure of the parameters of the model. We introduce a class of losses for learning representations that are invariant to some extraneous variable of interest by inverting the class of contrastive losses, i.e., inverse contrastive loss (ICL). We show that if the extraneous variable is binary, then optimizing ICL is equivalent to optimizing a regularized MMD divergence. More generally, we also show that if we are provided a metric on the sample space, our formulation of ICL can be decomposed into a sum of convex functions of the given distance metric. Our experimental results indicate that models obtained by optimizing ICL achieve significantly better invariance to the extraneous variable for a fixed desired level of accuracy. In a variety of experimental settings, we show applicability of ICL for learning invariant representations for both continuous and discrete extraneous variables. △ Less

Submitted 16 February, 2021; originally announced February 2021.

Comments: Accepted to AAAI-21

arXiv:2012.09854 [pdf, other]

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image

Authors: Ronghang Hu, Nikhila Ravi, Alexander C. Berg, Deepak Pathak

Abstract: We present Worldsheet, a method for novel view synthesis using just a single RGB image as input. The main insight is that simply shrink-wrapping a planar mesh sheet onto the input image, consistent with the learned intermediate depth, captures underlying geometry sufficient to generate photorealistic unseen views with large viewpoint changes. To operationalize this, we propose a novel differentiab… ▽ More We present Worldsheet, a method for novel view synthesis using just a single RGB image as input. The main insight is that simply shrink-wrapping a planar mesh sheet onto the input image, consistent with the learned intermediate depth, captures underlying geometry sufficient to generate photorealistic unseen views with large viewpoint changes. To operationalize this, we propose a novel differentiable texture sampler that allows our wrapped mesh sheet to be textured and rendered differentiably into an image from a target viewpoint. Our approach is category-agnostic, end-to-end trainable without using any 3D supervision, and requires a single image at test time. We also explore a simple extension by stacking multiple layers of Worldsheets to better handle occlusions. Worldsheet consistently outperforms prior state-of-the-art methods on single-image view synthesis across several datasets. Furthermore, this simple idea captures novel views surprisingly well on a wide range of high-resolution in-the-wild images, converting them into navigable 3D pop-ups. Video results and code are available at https://worldsheet.github.io. △ Less

Submitted 18 August, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: ICCV 2021; 17 pages

arXiv:2011.06983 [pdf, other]

A Secure Distributed Ledger for Transactive Energy: The Electron Volt Exchange (EVE) Blockchain

Authors: Shammya Saha, Nikhil Ravi, Kari Hreinsson, Jaejong Baek, Anna Scaglione, Nathan G. Johnson

Abstract: The adoption of blockchain for Transactive Energy has gained significant momentum as it allows mutually non-trusting agents to trade energy services in a trustless energy market. Research to date has assumed that the built-in Byzantine Fault Tolerance in recording transactions in a ledger is sufficient to ensure integrity. Such work must be extended to address security gaps including random bilate… ▽ More The adoption of blockchain for Transactive Energy has gained significant momentum as it allows mutually non-trusting agents to trade energy services in a trustless energy market. Research to date has assumed that the built-in Byzantine Fault Tolerance in recording transactions in a ledger is sufficient to ensure integrity. Such work must be extended to address security gaps including random bilateral transactions that do not guarantee reliable and efficient market operation, and market participants having incentives to cheat when reporting actual production/consumption figures. Work herein introduces the Electron Volt Exchange framework with the following characteristics: 1) a distributed protocol for pricing and scheduling prosumers' production/consumption while keeping constraints and bids private, and 2) a distributed algorithm to prevent theft that verifies prosumers' compliance to scheduled transactions using information from grid sensors (such as smart meters) and mitigates the impact of false data injection attacks. Flexibility and robustness of the approach are demonstrated through simulation and implementation using Hyperledger Fabric. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: Accepted for Applied Energy

arXiv:2009.08765 [pdf, ps, other]

On the Capacity Enlargement of Gaussian Broadcast Channels with Passive Noisy Feedback

Authors: Aditya Narayan Ravi, Sibi Raj B. Pillai, Vinod Prabhakaran, Michèle Wigger

Abstract: It is well known that the capacity region of an average transmit power constrained Gaussian Broadcast Channel (GBC) with independent noise realizations at the receivers is enlarged by the presence of causal noiseless feedback. Capacity region enlargement is also known to be possible by using only passive noisy feedback, when the GBC has identical noise variances at the receivers. The last fact rem… ▽ More It is well known that the capacity region of an average transmit power constrained Gaussian Broadcast Channel (GBC) with independent noise realizations at the receivers is enlarged by the presence of causal noiseless feedback. Capacity region enlargement is also known to be possible by using only passive noisy feedback, when the GBC has identical noise variances at the receivers. The last fact remains true even when the feedback noise variance is very high, and available only from one of the receivers. While such capacity enlargements are feasible for several other feedback models in the Gaussian BC setting, it is also known that feedback does not change the capacity region for physically degraded broadcast channels. In this paper, we consider a two user GBC with independent noise realizations at the receivers, where the feedback links from the receivers are corrupted by independent additive Gaussian noise processes. We investigate the set of four noise variances, two forward and two feedback, for which no capacity enlargement is possible. A sharp characterization of this region is derived, i.e., any quadruple outside the presented region will lead to a capacity enlargement, whereas quadruples inside will leave the capacity region unchanged. Our results lead to the conclusion that when the forward noise variances are different, too noisy a feedback from one of the receivers alone is not always beneficial for enlarging the capacity region, be it from the stronger user or the weaker one, in sharp contrast to the case of equal forward noise variances. △ Less

Submitted 18 September, 2020; originally announced September 2020.

Comments: 23 single column pages, 4 Figures

arXiv:2007.08501 [pdf, other]

Accelerating 3D Deep Learning with PyTorch3D

Authors: Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari

Abstract: Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges invol… ▽ More Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges involved in 3D deep learning, such as efficiently processing heterogeneous data and reframing graphics operations to be differentiable. We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning. It includes a fast, modular differentiable renderer for meshes and point clouds, enabling analysis-by-synthesis approaches. Compared with other differentiable renderers, PyTorch3D is more modular and efficient, allowing users to more easily extend it while also gracefully scaling to large meshes and images. We compare the PyTorch3D operators and renderer with other implementations and demonstrate significant speed and memory improvements. We also use PyTorch3D to improve the state-of-the-art for unsupervised 3D mesh and point cloud prediction from 2D images on ShapeNet. PyTorch3D is open-source and we hope it will help accelerate research in 3D deep learning. △ Less

Submitted 16 July, 2020; originally announced July 2020.

Comments: tech report

arXiv:2004.14539 [pdf, other]

Physarum Powered Differentiable Linear Programming Layers and Applications

Authors: Zihang Meng, Sathya N. Ravi, Vikas Singh

Abstract: Consider a learning algorithm, which involves an internal call to an optimization routine such as a generalized eigenvalue problem, a cone programming problem or even sorting. Integrating such a method as a layer(s) within a trainable deep neural network (DNN) in an efficient and numerically stable way is not straightforward -- for instance, only recently, strategies have emerged for eigendecompos… ▽ More Consider a learning algorithm, which involves an internal call to an optimization routine such as a generalized eigenvalue problem, a cone programming problem or even sorting. Integrating such a method as a layer(s) within a trainable deep neural network (DNN) in an efficient and numerically stable way is not straightforward -- for instance, only recently, strategies have emerged for eigendecomposition and differentiable sorting. We propose an efficient and differentiable solver for general linear programming problems which can be used in a plug and play manner within DNNs as a layer. Our development is inspired by a fascinating but not widely used link between dynamics of slime mold (physarum) and optimization schemes such as steepest descent. We describe our development and show the use of our solver in a video segmentation task and meta-learning for few-shot learning. We review the existing results and provide a technical analysis describing its applicability for our use cases. Our solver performs comparably with a customized projected gradient descent method on the first task and outperforms the differentiable CVXPY-SCS solver on the second task. Experiments show that our solver converges quickly without the need for a feasible initial point. Our proposal is easy to implement and can easily serve as layers whenever a learning procedure needs a fast approximate solution to a LP, within a larger network. △ Less

Submitted 10 May, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

arXiv:2004.01355 [pdf, other]

FairALM: Augmented Lagrangian Method for Training Fair Models with Little Regret

Authors: Vishnu Suresh Lokhande, Aditya Kumar Akash, Sathya N. Ravi, Vikas Singh

Abstract: Algorithmic decision making based on computer vision and machine learning technologies continue to permeate our lives. But issues related to biases of these models and the extent to which they treat certain segments of the population unfairly, have led to concern in the general public. It is now accepted that because of biases in the datasets we present to the models, a fairness-oblivious training… ▽ More Algorithmic decision making based on computer vision and machine learning technologies continue to permeate our lives. But issues related to biases of these models and the extent to which they treat certain segments of the population unfairly, have led to concern in the general public. It is now accepted that because of biases in the datasets we present to the models, a fairness-oblivious training will lead to unfair models. An interesting topic is the study of mechanisms via which the de novo design or training of the model can be informed by fairness measures. Here, we study mechanisms that impose fairness concurrently while training the model. While existing fairness based approaches in vision have largely relied on training adversarial modules together with the primary classification/regression task, in an effort to remove the influence of the protected attribute or variable, we show how ideas based on well-known optimization concepts can provide a simpler alternative. In our proposed scheme, imposing fairness just requires specifying the protected attribute and utilizing our optimization routine. We provide a detailed technical analysis and present experiments demonstrating that various fairness measures from the literature can be reliably imposed on a number of training tasks in vision in a manner that is interpretable. △ Less

Submitted 23 June, 2020; v1 submitted 2 April, 2020; originally announced April 2020.

arXiv:2003.03808 [pdf, other]

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

Authors: Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, Cynthia Rudin

Abstract: The primary aim of single-image super-resolution is to construct high-resolution (HR) images from corresponding low-resolution (LR) inputs. In previous approaches, which have generally been supervised, the training objective typically measures a pixel-wise average distance between the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance (d… ▽ More The primary aim of single-image super-resolution is to construct high-resolution (HR) images from corresponding low-resolution (LR) inputs. In previous approaches, which have generally been supervised, the training objective typically measures a pixel-wise average distance between the super-resolved (SR) and HR images. Optimizing such metrics often leads to blurring, especially in high variance (detailed) regions. We propose an alternative formulation of the super-resolution problem based on creating realistic SR images that downscale correctly. We present an algorithm addressing this problem, PULSE (Photo Upsampling via Latent Space Exploration), which generates high-resolution, realistic images at resolutions previously unseen in the literature. It accomplishes this in an entirely self-supervised fashion and is not confined to a specific degradation operator used during training, unlike previous methods (which require supervised training on databases of LR-HR image pairs). Instead of starting with the LR image and slowly adding detail, PULSE traverses the high-resolution natural image manifold, searching for images that downscale to the original LR image. This is formalized through the "downscaling loss," which guides exploration through the latent space of a generative model. By leveraging properties of high-dimensional Gaussians, we restrict the search space to guarantee realistic outputs. PULSE thereby generates super-resolved images that both are realistic and downscale correctly. We show proof of concept of our approach in the domain of face super-resolution (i.e., face hallucination). We also present a discussion of the limitations and biases of the method as currently implemented with an accompanying model card with relevant metrics. Our method outperforms state-of-the-art methods in perceptual quality at higher resolutions and scale factors than previously possible. △ Less

Submitted 20 July, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

Comments: Sachit Menon and Alexandru Damian contributed equally. Computer Vision and Pattern Recognition (CVPR) 2020

arXiv:1911.06239 [pdf, other]

Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems

Authors: Aditya Narayan Ravi, Pranav Poduval, Dr. Sharayu Moharir

Abstract: We use a novel modification of Multi-Armed Bandits to create a new model for recommendation systems. We model the recommendation system as a bandit seeking to maximize reward by pulling on arms with unknown rewards. The catch however is that this bandit can only access these arms through an unreliable intermediate that has some level of autonomy while choosing its arms. For example, in a streaming… ▽ More We use a novel modification of Multi-Armed Bandits to create a new model for recommendation systems. We model the recommendation system as a bandit seeking to maximize reward by pulling on arms with unknown rewards. The catch however is that this bandit can only access these arms through an unreliable intermediate that has some level of autonomy while choosing its arms. For example, in a streaming website the user has a lot of autonomy while choosing content they want to watch. The streaming sites can use targeted advertising as a means to bias opinions of these users. Here the streaming site is the bandit aiming to maximize reward and the user is the unreliable intermediate. We model the intermediate as accessing states via a Markov chain. The bandit is allowed to perturb this Markov chain. We prove fundamental theorems for this setting after which we show a close-to-optimal Explore-Commit algorithm. △ Less

Submitted 14 November, 2019; originally announced November 2019.

Comments: 4 pages, 4 figures, Aditya Narayan Ravi and Pranav Poduval have equal contribution

arXiv:1910.13020 [pdf, other]

Detection and Isolation of Adversaries in Decentralized Optimization for Non-Strongly Convex Objectives

Authors: Nikhil Ravi, Anna Scaglione

Abstract: Decentralized optimization has found a significant utility in recent years, as a promising technique to overcome the curse of dimensionality when dealing with large-scale inference and decision problems in big data. While these algorithms are resilient to node and link failures, they however, are not inherently Byzantine fault-tolerant towards insider data injection attacks. This paper proposes a… ▽ More Decentralized optimization has found a significant utility in recent years, as a promising technique to overcome the curse of dimensionality when dealing with large-scale inference and decision problems in big data. While these algorithms are resilient to node and link failures, they however, are not inherently Byzantine fault-tolerant towards insider data injection attacks. This paper proposes a decentralized robust subgradient push (RSGP) algorithm for detection and isolation of malicious nodes in the network for optimization non-strongly convex objectives. In the attack considered in this work, the malicious nodes follow the algorithmic protocols, but can alter their local functions arbitrarily. However, we show that in sufficiently structured problems, the method proposed is effective in revealing their presence. The algorithm isolates detected nodes from the regular nodes, thereby mitigating the ill-effects of malicious nodes. We also provide performance measures for the proposed method. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: 6 pages, 2 figures, 8th IFAC Workshop on Distributed Estimation and Control in Networked Systems

arXiv:1910.10870 [pdf, other]

Keeping Them Honest: a Trustless Multi-Agent Algorithm to Validate Transactions Cleared on Blockchain using Physical Sensors

Authors: Nikhil Ravi, Shammya Saha, Anna Scaglione, Nathan G. Johnson

Abstract: In recent years, many Blockchain based frameworks for transacting commodities on a congestible network have been proposed. In particular, as the number of controllable grid connected assets increases, there is a need for a decentralized, coupled economic and control mechanism to dynamically balance the entire electric grid. Blockchain based Transactive Energy (TE) systems have gained significant m… ▽ More In recent years, many Blockchain based frameworks for transacting commodities on a congestible network have been proposed. In particular, as the number of controllable grid connected assets increases, there is a need for a decentralized, coupled economic and control mechanism to dynamically balance the entire electric grid. Blockchain based Transactive Energy (TE) systems have gained significant momentum as an approach to sustain the reliability and security of the power grid in order to support the flexibility of electricity demand. What is lacking in these designs, however, is a mechanism that physically verifies all the energy transactions, to keep the various inherently selfish players honest. In this paper, we introduce a secure peer-to-peer network mechanism for the physical validation of economic transactions cleared over a distributed ledger. The framework is $\textit{secure}$ in the sense that selfish and malicious agents that are trying to inject false data into the network are prevented from adversely affecting the optimal functionality of the verification process by detecting and isolating them from the communication network. Preliminary simulations focusing on TE show the workings of this framework. △ Less

Submitted 23 October, 2019; originally announced October 2019.

Comments: 8 pages, 5 figures, submitted to The 2020 American Control Conference

arXiv:1909.12398 [pdf, other]

Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains

Authors: Sathya N. Ravi, Abhay Venkatesh, Glenn Moo Fung, Vikas Singh

Abstract: Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The $F_β$ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent examples that are used in many applications. We… ▽ More Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The $F_β$ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent examples that are used in many applications. We find that for most medium to large sized datasets, scalability issues severely limit our ability in leveraging the benefits of such regularizers. Importantly, the key technical impediment despite some recent progress is that, such objectives remain difficult to optimize via backpropapagation procedures. While an efficient general-purpose strategy for this problem still remains elusive, in this paper, we show that for many data-dependent nondecomposable regularizers that are relevant in applications, sizable gains in efficiency are possible with minimal code-level changes; in other words, no specialized tools or numerical schemes are needed. Our procedure involves a reparameterization followed by a partial dualization -- this leads to a formulation that has provably cheap projection operators. We present a detailed analysis of runtime and convergence properties of our algorithm. On the experimental side, we show that a direct use of our scheme significantly improves the state of the art IOU measures reported for MSCOCO Stuff segmentation dataset. △ Less

Submitted 26 September, 2019; originally announced September 2019.

arXiv:1909.05479 [pdf, other]

Generating Accurate Pseudo-labels in Semi-Supervised Learning and Avoiding Overconfident Predictions via Hermite Polynomial Activations

Authors: Vishnu Suresh Lokhande, Songwong Tasneeyapant, Abhay Venkatesh, Sathya N. Ravi, Vikas Singh

Abstract: Rectified Linear Units (ReLUs) are among the most widely used activation function in a broad variety of tasks in vision. Recent theoretical results suggest that despite their excellent practical performance, in various cases, a substitution with basis expansions (e.g., polynomials) can yield significant benefits from both the optimization and generalization perspective. Unfortunately, the existing… ▽ More Rectified Linear Units (ReLUs) are among the most widely used activation function in a broad variety of tasks in vision. Recent theoretical results suggest that despite their excellent practical performance, in various cases, a substitution with basis expansions (e.g., polynomials) can yield significant benefits from both the optimization and generalization perspective. Unfortunately, the existing results remain limited to networks with a couple of layers, and the practical viability of these results is not yet known. Motivated by some of these results, we explore the use of Hermite polynomial expansions as a substitute for ReLUs in deep networks. While our experiments with supervised learning do not provide a clear verdict, we find that this strategy offers considerable benefits in semi-supervised learning (SSL) / transductive learning settings. We carefully develop this idea and show how the use of Hermite polynomials based activations can yield improvements in pseudo-label accuracies and sizable financial savings (due to concurrent runtime benefits). Further, we show via theoretical analysis, that the networks (with Hermite activations) offer robustness to noise and other attractive mathematical properties. △ Less

Submitted 31 March, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

Comments: Accepted at 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

arXiv:1909.02533 [pdf, other]

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion

Authors: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedaldi

Abstract: We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. We do so by learning a deep network that reconstructs a 3D object from a single view at a time, accounting for partial occlusions, and explicitly factoring the effects of viewpoint changes and object deformations. In order to achieve this factorization, we introduce a nov… ▽ More We propose C3DPO, a method for extracting 3D models of deformable objects from 2D keypoint annotations in unconstrained images. We do so by learning a deep network that reconstructs a 3D object from a single view at a time, accounting for partial occlusions, and explicitly factoring the effects of viewpoint changes and object deformations. In order to achieve this factorization, we introduce a novel regularization technique. We first show that the factorization is successful if, and only if, there exists a certain canonicalization function of the reconstructed shapes. Then, we learn the canonicalization function together with the reconstruction one, which constrains the result to be consistent. We demonstrate state-of-the-art reconstruction results for methods that do not use ground-truth 3D supervision for a number of benchmarks, including Up3D and PASCAL3D+. Source code has been made available at https://github.com/facebookresearch/c3dpo_nrsfm. △ Less

Submitted 15 October, 2019; v1 submitted 5 September, 2019; originally announced September 2019.

Comments: Added a link to the source code into the abstract

Journal ref: IEEE/CVF International Conference on Computer Vision 2019

arXiv:1805.03383 [pdf, other]

New Techniques for Preserving Global Structure and Denoising with Low Information Loss in Single-Image Super-Resolution

Authors: Yijie Bei, Alex Damian, Shijia Hu, Sachit Menon, Nikhil Ravi, Cynthia Rudin

Abstract: This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling. We summarize the techniques we developed for our second place entry in Track 1 (Bicubic Downsampling), seventh place entry in Track 2 (Realistic Adverse Conditions), and seventh p… ▽ More This work identifies and addresses two important technical challenges in single-image super-resolution: (1) how to upsample an image without magnifying noise and (2) how to preserve large scale structure when upsampling. We summarize the techniques we developed for our second place entry in Track 1 (Bicubic Downsampling), seventh place entry in Track 2 (Realistic Adverse Conditions), and seventh place entry in Track 3 (Realistic difficult) in the 2018 NTIRE Super-Resolution Challenge. Furthermore, we present new neural network architectures that specifically address the two challenges listed above: denoising and preservation of large-scale structure. △ Less

Submitted 15 June, 2018; v1 submitted 9 May, 2018; originally announced May 2018.

Comments: 8 pages, CVPR workshop 2018

arXiv:1803.08137 [pdf, other]

Robust Blind Deconvolution via Mirror Descent

Authors: Sathya N. Ravi, Ronak Mehta, Vikas Singh

Abstract: We revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Further, many blind deconvolution methods based on dee… ▽ More We revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Further, many blind deconvolution methods based on deep architectures internally make use of or optimize the basic formulation, so a clearer understanding of how this sub-module behaves, when it can be solved, and what noise injection it can tolerate is a first order requirement. We derive new insights into the theoretical underpinnings of blind deconvolution. The algorithm that emerges has nice convergence guarantees and is provably robust in a sense we formalize in the paper. Interestingly, these technical results play out very well in practice, where on standard datasets our algorithm yields results competitive with or superior to the state of the art. Keywords: blind deconvolution, robust continuous optimization △ Less

Submitted 21 March, 2018; originally announced March 2018.

Showing 1–50 of 59 results for author: Ravi, N