Search | arXiv e-print repository

What kind of linearly distributive category do polynomial functors form?

Authors: David I. Spivak, Priyaa Varshinee Srinivasan

Abstract: This paper has two purposes. The first is to extend the theory of linearly distributive categories by considering the structures that emerge in a special case: the normal duoidal category $(\mathsf{Poly} ,\mathcal{y}, \otimes, \triangleleft )$ of polynomial functors under Dirichlet and substitution product. This is an isomix LDC which is neither $*$-autonomous nor fully symmetric. The additional s… ▽ More This paper has two purposes. The first is to extend the theory of linearly distributive categories by considering the structures that emerge in a special case: the normal duoidal category $(\mathsf{Poly} ,\mathcal{y}, \otimes, \triangleleft )$ of polynomial functors under Dirichlet and substitution product. This is an isomix LDC which is neither $*$-autonomous nor fully symmetric. The additional structures of interest here are a closure for $\otimes$ and a co-closure for $\triangleleft$, making $\mathsf{Poly}$ a bi-closed LDC, which is a notion we introduce in this paper. The second purpose is to use $\mathsf{Poly}$ as a source of examples and intuition about various structures that can occur in the setting of LDCs, including duals, cores, linear monoids, and others, as well as how these generalize to the non-symmetric setting. To that end, we characterize the linearly dual objects in $\mathsf{Poly}$: every linear polynomial has a right dual which is a representable. It turns out that the linear and representable polynomials also form the left and right cores of $\mathsf{Poly}$. Finally, we provide examples of linear monoids, linear comonoids, and linear bialgebras in $\mathsf{Poly}$. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 32 pages

MSC Class: 18B99 ACM Class: F.4.1

arXiv:2406.06527 [pdf, other]

IllumiNeRF: 3D Relighting without Inverse Rendering

Authors: Xiaoming Zhao, Pratul P. Srinivasan, Dor Verbin, Keunhong Park, Ricardo Martin Brualla, Philipp Henzler

Abstract: Existing methods for relightable view synthesis -- using a set of images of an object under unknown lighting to recover a 3D representation that can be rendered from novel viewpoints under a target illumination -- are based on inverse rendering, and attempt to disentangle the object geometry, materials, and lighting that explain the input images. Furthermore, this typically involves optimization t… ▽ More Existing methods for relightable view synthesis -- using a set of images of an object under unknown lighting to recover a 3D representation that can be rendered from novel viewpoints under a target illumination -- are based on inverse rendering, and attempt to disentangle the object geometry, materials, and lighting that explain the input images. Furthermore, this typically involves optimization through differentiable Monte Carlo rendering, which is brittle and computationally-expensive. In this work, we propose a simpler approach: we first relight each input image using an image diffusion model conditioned on lighting and then reconstruct a Neural Radiance Field (NeRF) with these relit images, from which we render novel views under the target lighting. We demonstrate that this strategy is surprisingly competitive and achieves state-of-the-art results on multiple relighting benchmarks. Please see our project page at https://illuminerf.github.io/. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: Project page: https://illuminerf.github.io/

arXiv:2405.14871 [pdf, other]

NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections

Authors: Dor Verbin, Pratul P. Srinivasan, Peter Hedman, Ben Mildenhall, Benjamin Attal, Richard Szeliski, Jonathan T. Barron

Abstract: Neural Radiance Fields (NeRFs) typically struggle to reconstruct and render highly specular objects, whose appearance varies quickly with changes in viewpoint. Recent works have improved NeRF's ability to render detailed specular appearance of distant environment illumination, but are unable to synthesize consistent reflections of closer content. Moreover, these techniques rely on large computatio… ▽ More Neural Radiance Fields (NeRFs) typically struggle to reconstruct and render highly specular objects, whose appearance varies quickly with changes in viewpoint. Recent works have improved NeRF's ability to render detailed specular appearance of distant environment illumination, but are unable to synthesize consistent reflections of closer content. Moreover, these techniques rely on large computationally-expensive neural networks to model outgoing radiance, which severely limits optimization and rendering speed. We address these issues with an approach based on ray tracing: instead of querying an expensive neural network for the outgoing view-dependent radiance at points along each camera ray, our model casts reflection rays from these points and traces them through the NeRF representation to render feature vectors which are decoded into color using a small inexpensive network. We demonstrate that our model outperforms prior methods for view synthesis of scenes containing shiny objects, and that it is the only existing NeRF method that can synthesize photorealistic specular appearance and reflections in real-world scenes, while requiring comparable optimization time to current state-of-the-art view synthesis models. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Project page: http://nerf-casting.github.io

arXiv:2405.13181 [pdf, other]

Comparative Analysis of Different Efficient Fine Tuning Methods of Large Language Models (LLMs) in Low-Resource Setting

Authors: Krishna Prasad Varadarajan Srinivasan, Prasanth Gumpena, Madhusudhana Yattapu, Vishal H. Brahmbhatt

Abstract: In the domain of large language models (LLMs), arXiv:2305.16938 showed that few-shot full-model fine-tuning -- namely Vanilla Fine Tuning (FT) and Pattern-Based Fine Tuning (PBFT) --, and In-Context Learning (ICL) generalize similarly on Out-Of-Domain (OOD) datasets, but vary in terms of task adaptation. However, they both pose challenges, especially in term of memory requirements. In this paper,… ▽ More In the domain of large language models (LLMs), arXiv:2305.16938 showed that few-shot full-model fine-tuning -- namely Vanilla Fine Tuning (FT) and Pattern-Based Fine Tuning (PBFT) --, and In-Context Learning (ICL) generalize similarly on Out-Of-Domain (OOD) datasets, but vary in terms of task adaptation. However, they both pose challenges, especially in term of memory requirements. In this paper, we further try to push the understanding of different fine-tuning strategies for LLM and aim to bring a myriad of these on the same pedestal for an elaborate comparison with full-model fine-tuning on two diverse datasets. To that end, we conducted a series of experiments, beginning with state-of-the-art methods like vanilla fine-tuning and Pattern-Based Fine-Tuning (PBFT) on pre-trained models across two datasets, COLA and MNLI. We then investigate adaptive fine-tuning and the efficiency of LoRA adapters in a few-shot setting. Finally, we also compare an alternative approach that has gained recent popularity -- context distillation -- with the vanilla FT and PBFT with and without few-shot setup. Our findings suggest that these alternative strategies that we explored can exhibit out-of-domain generalization comparable to that of vanilla FT and PBFT. PBFT under-performs Vanilla FT on out-of-domain (OOD) data, emphasizing the need for effective prompts. Further, our adaptive-fine tuning and LoRA experiments perform comparable or slightly worse than the standard fine-tunings as anticipated, since standard fine-tunings involve tuning the entire model. Finally, our context distillation experiments out-perform the standard fine-tuning methods. These findings underscore that eventually the choice of an appropriate fine-tuning method depends on the available resources (memory, compute, data) and task adaptability. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 9 pages of main paper, 1 page of references, 6 appendix pages, 11 figures, 18 tables

arXiv:2405.10314 [pdf, other]

CAT3D: Create Anything in 3D with Multi-View Diffusion Models

Authors: Ruiqi Gao, Aleksander Holynski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, Ben Poole

Abstract: Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent nov… ▽ More Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation. See our project page for results and interactive demos at https://cat3d.github.io . △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: Project page: https://cat3d.github.io

arXiv:2405.05938 [pdf, other]

DOLOMITES: Domain-Specific Long-Form Methodical Tasks

Authors: Chaitanya Malaviya, Priyanka Agrawal, Kuzman Ganchev, Pranesh Srinivasan, Fantine Huot, Jonathan Berant, Mark Yatskar, Dipanjan Das, Mirella Lapata, Chris Alberti

Abstract: Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form o… ▽ More Experts in various fields routinely perform methodical writing tasks to plan, organize, and report their work. From a clinician writing a differential diagnosis for a patient, to a teacher writing a lesson plan for students, these tasks are pervasive, requiring to methodically generate structured long-form output for a given input. We develop a typology of methodical tasks structured in the form of a task objective, procedure, input, and output, and introduce DoLoMiTes, a novel benchmark with specifications for 519 such tasks elicited from hundreds of experts from across 25 fields. Our benchmark further contains specific instantiations of methodical tasks with concrete input and output examples (1,857 in total) which we obtain by collecting expert revisions of up to 10 model-generated examples of each task. We use these examples to evaluate contemporary language models highlighting that automating methodical tasks is a challenging long-form generation problem, as it requires performing complex inferences, while drawing upon the given context as well as domain knowledge. △ Less

Submitted 28 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Dataset now available at https://dolomites-benchmark.github.io

arXiv:2404.16399 [pdf, other]

Offline Reinforcement Learning with Behavioral Supervisor Tuning

Authors: Padmanaba Srinivasan, William Knottenbelt

Abstract: Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance, which requires policy rollouts in the environment to eva… ▽ More Offline reinforcement learning (RL) algorithms are applied to learn performant, well-generalizing policies when provided with a static dataset of interactions. Many recent approaches to offline RL have seen substantial success, but with one key caveat: they demand substantial per-dataset hyperparameter tuning to achieve reported performance, which requires policy rollouts in the environment to evaluate; this can rapidly become cumbersome. Furthermore, substantial tuning requirements can hamper the adoption of these algorithms in practical domains. In this paper, we present TD3 with Behavioral Supervisor Tuning (TD3-BST), an algorithm that trains an uncertainty model and uses it to guide the policy to select actions within the dataset support. TD3-BST can learn more effective policies from offline datasets compared to previous methods and achieves the best performance across challenging benchmarks without requiring per-dataset tuning. △ Less

Submitted 27 July, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.18226 [pdf, ps, other]

Drazin Inverses in Categories

Authors: Robin Cockett, Jean-Simon Pacaud Lemay, Priyaa Varshinee Srinivasan

Abstract: Drazin inverses are a fundamental algebraic structure which have been extensively deployed in semigroup theory and ring theory. Drazin inverses can also be defined for endomorphisms in any category. However, beyond a paper by Puystjens and Robinson from 1987, not much has been done with Drazin inverses in category theory. As such, here we provide a survey of the theory of Drazin inverses from a ca… ▽ More Drazin inverses are a fundamental algebraic structure which have been extensively deployed in semigroup theory and ring theory. Drazin inverses can also be defined for endomorphisms in any category. However, beyond a paper by Puystjens and Robinson from 1987, not much has been done with Drazin inverses in category theory. As such, here we provide a survey of the theory of Drazin inverses from a categorical perspective. We introduce Drazin categories, in which every endomorphism has a Drazin inverse, and provide various examples including the category of matrices over a field, the category of finite length modules over a ring, and finite set enriched categories. We also introduce the notion of expressive rank and prove that a category with expressive rank is Drazin. Moreover, we study Drazin inverses in mere categories, in additive categories, and in dagger categories. In an arbitrary category, we show how a Drazin inverse corresponds to an isomorphism in the idempotent splitting, as well as explain how Drazin inverses relate to Leinster's notion of eventual image duality. In additive categories, we explore the consequences of the core-nilpotent decomposition and the image-kernel decomposition, which we relate back to Fitting's famous results. We then develop the notion of Drazin inverses for pairs of opposing maps, generalizing the usual notion of Drazin inverse for endomorphisms. As an application of this new kind of Drazin inverse, for dagger categories, we provide a novel characterization of the Moore-Penrose inverse in terms of being a Drazin inverse of the pair of a map and its adjoint. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.12377 [pdf, other]

Binary Opacity Grids: Capturing Fine Geometric Detail for Mesh-Based View Synthesis

Authors: Christian Reiser, Stephan Garbin, Pratul P. Srinivasan, Dor Verbin, Richard Szeliski, Ben Mildenhall, Jonathan T. Barron, Peter Hedman, Andreas Geiger

Abstract: While surface-based view synthesis algorithms are appealing due to their low computational requirements, they often struggle to reproduce thin structures. In contrast, more expensive methods that model the scene's geometry as a volumetric density field (e.g. NeRF) excel at reconstructing fine geometric detail. However, density fields often represent geometry in a "fuzzy" manner, which hinders exac… ▽ More While surface-based view synthesis algorithms are appealing due to their low computational requirements, they often struggle to reproduce thin structures. In contrast, more expensive methods that model the scene's geometry as a volumetric density field (e.g. NeRF) excel at reconstructing fine geometric detail. However, density fields often represent geometry in a "fuzzy" manner, which hinders exact localization of the surface. In this work, we modify density fields to encourage them to converge towards surfaces, without compromising their ability to reconstruct thin structures. First, we employ a discrete opacity grid representation instead of a continuous density field, which allows opacity values to discontinuously transition from zero to one at the surface. Second, we anti-alias by casting multiple rays per pixel, which allows occlusion boundaries and subpixel structures to be modelled without using semi-transparent voxels. Third, we minimize the binary entropy of the opacity values, which facilitates the extraction of surface geometry by encouraging opacity values to binarize towards the end of training. Lastly, we develop a fusion-based meshing strategy followed by mesh simplification and appearance model fitting. The compact meshes produced by our model can be rendered in real-time on mobile devices and achieve significantly higher view synthesis quality compared to existing mesh-based approaches. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Project page at https://binary-opacity-grid.github.io

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.10003 [pdf, other]

ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Authors: Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, Sanjiv Kumar

Abstract: Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is… ▽ More Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is non-differentiable. To address these deficiencies, we define a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We further refine the agent through a ReST-like method that iteratively trains on previous trajectories, employing growing-batch reinforcement learning with AI feedback for continuous self-improvement and self-distillation. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model that achieves comparable performance on challenging compositional question-answering benchmarks with two orders of magnitude fewer parameters. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 19 pages, 4 figures, 4 tables, 8 listings

arXiv:2312.05283 [pdf, other]

Nuvo: Neural UV Mapping for Unruly 3D Representations

Authors: Pratul P. Srinivasan, Stephan J. Garbin, Dor Verbin, Jonathan T. Barron, Ben Mildenhall

Abstract: Existing UV mapping algorithms are designed to operate on well-behaved meshes, instead of the geometry representations produced by state-of-the-art 3D reconstruction and generation techniques. As such, applying these methods to the volume densities recovered by neural radiance fields and related techniques (or meshes triangulated from such fields) results in texture atlases that are too fragmented… ▽ More Existing UV mapping algorithms are designed to operate on well-behaved meshes, instead of the geometry representations produced by state-of-the-art 3D reconstruction and generation techniques. As such, applying these methods to the volume densities recovered by neural radiance fields and related techniques (or meshes triangulated from such fields) results in texture atlases that are too fragmented to be useful for tasks such as view synthesis or appearance editing. We present a UV mapping method designed to operate on geometry produced by 3D reconstruction and generation techniques. Instead of computing a mapping defined on a mesh's vertices, our method Nuvo uses a neural field to represent a continuous UV mapping, and optimizes it to be a valid and well-behaved mapping for just the set of visible points, i.e. only points that affect the scene's appearance. We show that our model is robust to the challenges posed by ill-behaved geometry, and that it produces editable UV mappings that can represent detailed appearance. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: Project page at https://pratulsrinivasan.github.io/nuvo

arXiv:2312.02981 [pdf, other]

ReconFusion: 3D Reconstruction with Diffusion Priors

Authors: Rundi Wu, Ben Mildenhall, Philipp Henzler, Keunhong Park, Ruiqi Gao, Daniel Watson, Pratul P. Srinivasan, Dor Verbin, Jonathan T. Barron, Ben Poole, Aleksander Holynski

Abstract: 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for nove… ▽ More 3D reconstruction methods such as Neural Radiance Fields (NeRFs) excel at rendering photorealistic novel views of complex scenes. However, recovering a high-quality NeRF typically requires tens to hundreds of input images, resulting in a time-consuming capture process. We present ReconFusion to reconstruct real-world scenes using only a few photos. Our approach leverages a diffusion prior for novel view synthesis, trained on synthetic and multiview datasets, which regularizes a NeRF-based 3D reconstruction pipeline at novel camera poses beyond those captured by the set of input images. Our method synthesizes realistic geometry and texture in underconstrained regions while preserving the appearance of observed regions. We perform an extensive evaluation across various real-world datasets, including forward-facing and 360-degree scenes, demonstrating significant performance improvements over previous few-view NeRF reconstruction approaches. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Project page: https://reconfusion.github.io/

arXiv:2312.02149 [pdf, other]

Generative Powers of Ten

Authors: Xiaojuan Wang, Janne Kontkanen, Brian Curless, Steve Seitz, Ira Kemelmacher, Ben Mildenhall, Pratul Srinivasan, Dor Verbin, Aleksander Holynski

Abstract: We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different… ▽ More We present a method that uses a text-to-image model to generate consistent content across multiple image scales, enabling extreme semantic zooms into a scene, e.g., ranging from a wide-angle landscape view of a forest to a macro shot of an insect sitting on one of the tree branches. We achieve this through a joint multi-scale diffusion sampling approach that encourages consistency across different scales while preserving the integrity of each individual sampling process. Since each generated scale is guided by a different text prompt, our method enables deeper levels of zoom than traditional super-resolution methods that may struggle to create new contextual structure at vastly different scales. We compare our method qualitatively with alternative techniques in image super-resolution and outpainting, and show that our method is most effective at generating consistent multi-scale content. △ Less

Submitted 21 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Project page: https://powers-of-10.github.io/

arXiv:2311.03128 [pdf]

Benchmarking Differential Evolution on a Quantum Simulator

Authors: Parthasarathy Srinivasan

Abstract: The use of Evolutionary Algorithms (EA) for solving Mathematical/Computational Optimization Problems is inspired by the biological processes of Evolution. Few of the primitives involved in the Evolutionary process/paradigm are selection of 'Fit' individuals (from a population sample) for retention, cloning, mutation, discarding, breeding, crossover etc. In the Evolutionary Algorithm abstraction, t… ▽ More The use of Evolutionary Algorithms (EA) for solving Mathematical/Computational Optimization Problems is inspired by the biological processes of Evolution. Few of the primitives involved in the Evolutionary process/paradigm are selection of 'Fit' individuals (from a population sample) for retention, cloning, mutation, discarding, breeding, crossover etc. In the Evolutionary Algorithm abstraction, the individuals are deemed to be solution candidates to an Optimization problem and additional solution(/sets) are built by applying analogies to the above primitives (cloning, mutation etc.) by means of evaluating a 'Fitness' function/criterion. One such algorithm is Differential Evolution (DE) which can be used to compute the minima of functions such as the rastrigin function and rosenbrock function. This work is an attempt to study the result of applying the DE method on these functions with candidate individuals generated on classical Turing modeled computation and comparing the same with those on state of the art Quantum computation.The study benchmarks the convergence of these functions by varying the parameters initialized and reports timing, convergence, and resource utilization results. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2310.07687 [pdf, other]

Orbital Polarimetric Tomography of a Flare Near the Sagittarius A* Supermassive Black Hole

Authors: Aviad Levis, Andrew A. Chael, Katherine L. Bouman, Maciek Wielgus, Pratul P. Srinivasan

Abstract: The interaction between the supermassive black hole at the center of the Milky Way, Sagittarius A*, and its accretion disk occasionally produces high-energy flares seen in X-ray, infrared, and radio. One proposed mechanism that produces flares is the formation of compact, bright regions that appear within the accretion disk and close to the event horizon. Understanding these flares provides a wind… ▽ More The interaction between the supermassive black hole at the center of the Milky Way, Sagittarius A*, and its accretion disk occasionally produces high-energy flares seen in X-ray, infrared, and radio. One proposed mechanism that produces flares is the formation of compact, bright regions that appear within the accretion disk and close to the event horizon. Understanding these flares provides a window into accretion processes. Although sophisticated simulations predict the formation of these flares, their structure has yet to be recovered by observations. Here we show the first three-dimensional (3D) reconstruction of an emission flare recovered from ALMA light curves observed on April 11, 2017. Our recovery shows compact, bright regions at a distance of roughly six times the event horizon. Moreover, it suggests a clockwise rotation in a low-inclination orbital plane, consistent with prior studies by GRAVITY and EHT. To recover this emission structure, we solve an ill-posed tomography problem by integrating a neural 3D representation with a gravitational model for black holes. Although the recovery is subject to, and sometimes sensitive to, the model assumptions, under physically motivated choices, our results are stable, and our approach is successful on simulated data. △ Less

Submitted 16 April, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2309.04437 [pdf, other]

Single View Refractive Index Tomography with Neural Fields

Authors: Brandon Zhao, Aviad Levis, Liam Connor, Pratul P. Srinivasan, Katherine L. Bouman

Abstract: Refractive Index Tomography is the inverse problem of reconstructing the continuously-varying 3D refractive index in a scene using 2D projected image measurements. Although a purely refractive field is not directly visible, it bends light rays as they travel through space, thus providing a signal for reconstruction. The effects of such fields appear in many scientific computer vision settings, ran… ▽ More Refractive Index Tomography is the inverse problem of reconstructing the continuously-varying 3D refractive index in a scene using 2D projected image measurements. Although a purely refractive field is not directly visible, it bends light rays as they travel through space, thus providing a signal for reconstruction. The effects of such fields appear in many scientific computer vision settings, ranging from refraction due to transparent cells in microscopy to the lensing of distant galaxies caused by dark matter in astrophysics. Reconstructing these fields is particularly difficult due to the complex nonlinear effects of the refractive field on observed images. Furthermore, while standard 3D reconstruction and tomography settings typically have access to observations of the scene from many viewpoints, many refractive index tomography problem settings only have access to images observed from a single viewpoint. We introduce a method that leverages prior knowledge of light sources scattered throughout the refractive medium to help disambiguate the single-view refractive index tomography problem. We differentiably trace curved rays through a neural field representation of the refractive field, and optimize its parameters to best reproduce the observed image. We demonstrate the efficacy of our approach by reconstructing simulated refractive fields, analyze the effects of light source distribution on the recovered field, and test our method on a simulated dark matter mapping problem where we successfully recover the 3D refractive field caused by a realistic dark matter distribution. △ Less

Submitted 1 December, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2305.16321 [pdf, other]

Eclipse: Disambiguating Illumination and Materials using Unintended Shadows

Authors: Dor Verbin, Ben Mildenhall, Peter Hedman, Jonathan T. Barron, Todd Zickler, Pratul P. Srinivasan

Abstract: Decomposing an object's appearance into representations of its materials and the surrounding illumination is difficult, even when the object's 3D shape is known beforehand. This problem is especially challenging for diffuse objects: it is ill-conditioned because diffuse materials severely blur incoming light, and it is ill-posed because diffuse materials under high-frequency lighting can be indist… ▽ More Decomposing an object's appearance into representations of its materials and the surrounding illumination is difficult, even when the object's 3D shape is known beforehand. This problem is especially challenging for diffuse objects: it is ill-conditioned because diffuse materials severely blur incoming light, and it is ill-posed because diffuse materials under high-frequency lighting can be indistinguishable from shiny materials under low-frequency lighting. We show that it is possible to recover precise materials and illumination -- even from diffuse objects -- by exploiting unintended shadows, like the ones cast onto an object by the photographer who moves around it. These shadows are a nuisance in most previous inverse rendering pipelines, but here we exploit them as signals that improve conditioning and help resolve material-lighting ambiguities. We present a method based on differentiable Monte Carlo ray tracing that uses images of an object to jointly recover its spatially-varying materials, the surrounding illumination environment, and the shapes of the unseen light occluders who inadvertently cast shadows upon it. △ Less

Submitted 13 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Project page: https://dorverbin.github.io/eclipse/

arXiv:2304.06706 [pdf, other]

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

Authors: Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman

Abstract: Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed b… ▽ More Neural Radiance Field training can be accelerated through the use of grid-based representations in NeRF's learned mapping from spatial coordinates to colors and volumetric density. However, these grid-based approaches lack an explicit understanding of scale and therefore often introduce aliasing, usually in the form of jaggies or missing scene content. Anti-aliasing has previously been addressed by mip-NeRF 360, which reasons about sub-volumes along a cone rather than points along a ray, but this approach is not natively compatible with current grid-based techniques. We show how ideas from rendering and signal processing can be used to construct a technique that combines mip-NeRF 360 and grid-based models such as Instant NGP to yield error rates that are 8% - 77% lower than either prior technique, and that trains 24x faster than mip-NeRF 360. △ Less

Submitted 26 October, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: Project page: https://jonbarron.info/zipnerf/

arXiv:2303.14231 [pdf, other]

Dagger linear logic and categorical quantum mechanics

Authors: Priyaa Varshinee Srinivasan

Abstract: This thesis develops the categorical proof theory for the non-compact multiplicative dagger linear logic, and investigates its applications to Categorical Quantum Mechanics (CQM). The existing frameworks of CQM are categorical proof theories of compact dagger linear logic, and are motivated by the interpretation of quantum systems in the category of finite dimensional Hilbert spaces. This thesis d… ▽ More This thesis develops the categorical proof theory for the non-compact multiplicative dagger linear logic, and investigates its applications to Categorical Quantum Mechanics (CQM). The existing frameworks of CQM are categorical proof theories of compact dagger linear logic, and are motivated by the interpretation of quantum systems in the category of finite dimensional Hilbert spaces. This thesis describes a new non-compact framework called Mixed Unitary Categories which can accommodate infinite dimensional systems, and develops models for the framework. To this end, it builds on linearly distributive categories, and $*$-autonomous categories which are categorical proof theories of (non-compact) multiplicative linear logic. The proof theory of non-compact dagger-linear logic is obtained from the basic setting of an LDC by adding a dagger functor satisfying appropriate coherences to give a dagger-LDC. From every (isomix) dagger-LDC one can extract a canonical "unitary core" which up to equivalence is the traditional CQM framework of dagger-monoidal categories. This leads to the framework of Mixed Unitary Categories (MUCs): every MUC contains a (compact) unitary core which is extended by a (non-compact) isomix dagger-LDC. Various models of MUCs based on Finiteness Spaces, Chu spaces, Hopf modules, etc., are developed in this thesis. This thesis also generalizes the key algebraic structures of CQM, such as observables, measurement, and complementarity, to MUC framework. Furthermore, using the MUC framework, this thesis establishes a connection between the complementary observables of quantum mechanics and the exponential modalities of linear logic. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: Thesis submitted for the degree of Doctor of Philosophy, University of Calgary, Fall 2021 (250 pages)

arXiv:2303.11839 [pdf, other]

doi 10.4204/EPTCS.397.5

Normalizing Resistor Networks

Authors: Robin Cockett, Amolak Ratan Kalra, Priyaa Varshinee Srinivasan

Abstract: Star to mesh transformations are well-known in electrical engineering, and are reminiscent of local complementation for graph states in qudit stabilizer quantum mechanics. This paper describes a rewriting system for resistor circuits over any positive division rig using general star to mesh transformations. We show how these transformations can be organized into a confluent and terminating rewriti… ▽ More Star to mesh transformations are well-known in electrical engineering, and are reminiscent of local complementation for graph states in qudit stabilizer quantum mechanics. This paper describes a rewriting system for resistor circuits over any positive division rig using general star to mesh transformations. We show how these transformations can be organized into a confluent and terminating rewriting system on the category of resistor circuits. Furthermore, based on the recently established connections between quantum and electrical circuits, this paper pushes forward the quest for approachable normal forms for stabilizer quantum circuits. △ Less

Submitted 14 December, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

Comments: In Proceedings ACT 2023, arXiv:2312.08138

Journal ref: EPTCS 397, 2023, pp. 70-83

arXiv:2302.14859 [pdf, other]

BakedSDF: Meshing Neural SDFs for Real-Time View Synthesis

Authors: Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin, Pratul P. Srinivasan, Richard Szeliski, Jonathan T. Barron, Ben Mildenhall

Abstract: We present a method for reconstructing high-quality meshes of large unbounded real-world scenes suitable for photorealistic novel view synthesis. We first optimize a hybrid neural volume-surface scene representation designed to have well-behaved level sets that correspond to surfaces in the scene. We then bake this representation into a high-quality triangle mesh, which we equip with a simple and… ▽ More We present a method for reconstructing high-quality meshes of large unbounded real-world scenes suitable for photorealistic novel view synthesis. We first optimize a hybrid neural volume-surface scene representation designed to have well-behaved level sets that correspond to surfaces in the scene. We then bake this representation into a high-quality triangle mesh, which we equip with a simple and fast view-dependent appearance model based on spherical Gaussians. Finally, we optimize this baked representation to best reproduce the captured viewpoints, resulting in a model that can leverage accelerated polygon rasterization pipelines for real-time view synthesis on commodity hardware. Our approach outperforms previous scene representations for real-time rendering in terms of accuracy, speed, and power consumption, and produces high quality meshes that enable applications such as appearance editing and physical simulation. △ Less

Submitted 16 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: Video and interactive web demo available at https://bakedsdf.github.io/

arXiv:2302.12249 [pdf, other]

MERF: Memory-Efficient Radiance Fields for Real-time View Synthesis in Unbounded Scenes

Authors: Christian Reiser, Richard Szeliski, Dor Verbin, Pratul P. Srinivasan, Ben Mildenhall, Andreas Geiger, Jonathan T. Barron, Peter Hedman

Abstract: Neural radiance fields enable state-of-the-art photorealistic view synthesis. However, existing radiance field representations are either too compute-intensive for real-time rendering or require too much memory to scale to large scenes. We present a Memory-Efficient Radiance Field (MERF) representation that achieves real-time rendering of large-scale scenes in a browser. MERF reduces the memory co… ▽ More Neural radiance fields enable state-of-the-art photorealistic view synthesis. However, existing radiance field representations are either too compute-intensive for real-time rendering or require too much memory to scale to large scenes. We present a Memory-Efficient Radiance Field (MERF) representation that achieves real-time rendering of large-scale scenes in a browser. MERF reduces the memory consumption of prior sparse volumetric radiance fields using a combination of a sparse feature grid and high-resolution 2D feature planes. To support large-scale unbounded scenes, we introduce a novel contraction function that maps scene coordinates into a bounded volume while still allowing for efficient ray-box intersection. We design a lossless procedure for baking the parameterization used during training into a model that achieves real-time rendering while still preserving the photorealistic view synthesis quality of a volumetric radiance field. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: Video and interactive web demo available at https://merf42.github.io

arXiv:2302.08504 [pdf, other]

PersonNeRF: Personalized Reconstruction from Photo Collections

Authors: Chung-Yi Weng, Pratul P. Srinivasan, Brian Curless, Ira Kemelmacher-Shlizerman

Abstract: We present PersonNeRF, a method that takes a collection of photos of a subject (e.g. Roger Federer) captured across multiple years with arbitrary body poses and appearances, and enables rendering the subject with arbitrary novel combinations of viewpoint, body pose, and appearance. PersonNeRF builds a customized neural volumetric 3D model of the subject that is able to render an entire space spann… ▽ More We present PersonNeRF, a method that takes a collection of photos of a subject (e.g. Roger Federer) captured across multiple years with arbitrary body poses and appearances, and enables rendering the subject with arbitrary novel combinations of viewpoint, body pose, and appearance. PersonNeRF builds a customized neural volumetric 3D model of the subject that is able to render an entire space spanned by camera viewpoint, body pose, and appearance. A central challenge in this task is dealing with sparse observations; a given body pose is likely only observed by a single viewpoint with a single appearance, and a given appearance is only observed under a handful of different body poses. We address this issue by recovering a canonical T-pose neural volumetric representation of the subject that allows for changing appearance across different observations, but uses a shared pose-dependent motion field across all observations. We demonstrate that this approach, along with regularization of the recovered volumetric geometry to encourage smoothness, is able to recover a model that renders compelling images from novel combinations of viewpoint, pose, and appearance from these challenging unstructured photo collections, outperforming prior work for free-viewpoint human rendering. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: Project Page: https://grail.cs.washington.edu/projects/personnerf/

arXiv:2302.06833 [pdf, other]

VQ3D: Learning a 3D-Aware Generative Model on ImageNet

Authors: Kyle Sargent, Jing Yu Koh, Han Zhang, Huiwen Chang, Charles Herrmann, Pratul Srinivasan, Jiajun Wu, Deqing Sun

Abstract: Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder… ▽ More Recent work has shown the possibility of training generative models of 3D content from 2D image collections on small datasets corresponding to a single object class, such as human faces, animal faces, or cars. However, these models struggle on larger, more complex datasets. To model diverse and unconstrained image collections such as ImageNet, we present VQ3D, which introduces a NeRF-based decoder into a two-stage vector-quantized autoencoder. Our Stage 1 allows for the reconstruction of an input image and the ability to change the camera position around the image, and our Stage 2 allows for the generation of new 3D scenes. VQ3D is capable of generating and reconstructing 3D-aware images from the 1000-class ImageNet dataset of 1.2 million training images. We achieve an ImageNet generation FID score of 16.8, compared to 69.8 for the next best baseline method. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: 15 pages. For visual results, please visit the project webpage at http://kylesargent.github.io/vq3d

arXiv:2301.02222 [pdf, other]

Computing nonsurjective primes associated to Galois representations of genus $2$ curves

Authors: Barinder S. Banwait, Armand Brumer, Hyun Jong Kim, Zev Klagsbrun, Jacob Mayle, Padmavathi Srinivasan, Isabel Vogt

Abstract: For a genus $2$ curve $C$ over $\mathbb{Q}$ whose Jacobian $A$ admits only trivial geometric endomorphisms, Serre's open image theorem for abelian surfaces asserts that there are only finitely many primes $\ell$ for which the Galois action on $\ell$-torsion points of $A$ is not maximal. Building on work of Dieulefait, we give a practical algorithm to compute this finite set. The key inputs are Mit… ▽ More For a genus $2$ curve $C$ over $\mathbb{Q}$ whose Jacobian $A$ admits only trivial geometric endomorphisms, Serre's open image theorem for abelian surfaces asserts that there are only finitely many primes $\ell$ for which the Galois action on $\ell$-torsion points of $A$ is not maximal. Building on work of Dieulefait, we give a practical algorithm to compute this finite set. The key inputs are Mitchell's classification of maximal subgroups of $\mathrm{PSp_4}(\mathbb{F}_\ell)$, sampling of the characteristic polynomials of Frobenius, and the Khare--Wintenberger modularity theorem. The algorithm has been submitted for integration into Sage, executed on all of the genus~$2$ curves with trivial endomorphism ring in the LMFDB, and the results incorporated into the homepage of each such curve. △ Less

Submitted 10 July, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

MSC Class: 11F80(primary); 11G10; 11Y16

arXiv:2212.10385 [pdf, other]

doi 10.1038/s41524-022-00956-8

High-accuracy thermodynamic properties to the melting point from ab initio calculations aided by machine-learning potentials

Authors: Jong Hyun Jung, Prashanth Srinivasan, Axel Forslund, Blazej Grabowski

Abstract: Accurate prediction of thermodynamic properties requires an extremely accurate representation of the free energy surface. Requirements are twofold -- first, the inclusion of the relevant finite-temperature mechanisms, and second, a dense volume-temperature grid on which the calculations are performed. A systematic workflow for such calculations requires computational efficiency and reliability, an… ▽ More Accurate prediction of thermodynamic properties requires an extremely accurate representation of the free energy surface. Requirements are twofold -- first, the inclusion of the relevant finite-temperature mechanisms, and second, a dense volume-temperature grid on which the calculations are performed. A systematic workflow for such calculations requires computational efficiency and reliability, and has not been available within an ab initio framework so far. Here, we elucidate such a framework involving direct upsampling, thermodynamic integration and machine-learning potentials, allowing us to incorporate, in particular, the full effect of anharmonic vibrations. The improved methodology has a five-times speed-up compared to state-of-the-art methods. We calculate equilibrium thermodynamic properties up to the melting point for bcc Nb, magnetic fcc Ni, fcc Al and hcp Mg, and find remarkable agreement with experimental data. Strong impact of anharmonicity is observed specifically for Nb. The introduced procedure paves the way for the development of ab initio thermodynamic databases. △ Less

Submitted 26 December, 2022; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: 6 figures, supplementary information, typos corrected

Journal ref: npj Comput. Mater. 9, 3 (2023)

arXiv:2210.03035 [pdf, ps, other]

Quadratic enrichment of the logarithmic derivative of the zeta function

Authors: Margaret Bilu, Wei Ho, Padmavathi Srinivasan, Isabel Vogt, Kirsten Wickelgren

Abstract: We define an enrichment of the logarithmic derivative of the zeta function of a variety over a finite field to a power series with coefficients in the Grothendieck--Witt group. We show that this enrichment is related to the topology of the real points of a lift. For cellular schemes over a field, we prove a rationality result for this enriched logarithmic derivative of the zeta function as an anal… ▽ More We define an enrichment of the logarithmic derivative of the zeta function of a variety over a finite field to a power series with coefficients in the Grothendieck--Witt group. We show that this enrichment is related to the topology of the real points of a lift. For cellular schemes over a field, we prove a rationality result for this enriched logarithmic derivative of the zeta function as an analogue of part of the Weil conjectures. We also compute several examples, including toric varieties, and show that the enrichment is a motivic measure. △ Less

Submitted 30 June, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

Comments: 41 pages, to appear in Transactions of the AMS

MSC Class: Primary 14G10; 14F42; Secondary 19D45; 55P25; 11G25

arXiv:2209.13114 [pdf, other]

Style Matters! Investigating Linguistic Style in Online Communities

Authors: Osama Khalid, Padmini Srinivasan

Abstract: Content has historically been the primary lens used to study language in online communities. This paper instead focuses on the linguistic style of communities. While we know that individuals have distinguishable styles, here we ask whether communities have distinguishable styles. Additionally, while prior work has relied on a narrow definition of style, we employ a broad definition involving 262 f… ▽ More Content has historically been the primary lens used to study language in online communities. This paper instead focuses on the linguistic style of communities. While we know that individuals have distinguishable styles, here we ask whether communities have distinguishable styles. Additionally, while prior work has relied on a narrow definition of style, we employ a broad definition involving 262 features to analyze the linguistic style of 9 online communities from 3 social media platforms discussing politics, television and travel. We find that communities indeed have distinct styles. Also, style is an excellent predictor of group membership (F-score 0.952 and Accuracy 96.09%). While on average it is statistically equivalent to predictions using content alone, it is more resilient to reductions in training data. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2209.12352 [pdf, other]

Smells like Teen Spirit: An Exploration of Sensorial Style in Literary Genres

Authors: Osama Khalid, Padmini Srinivasan

Abstract: It is well recognized that sensory perceptions and language have interconnections through numerous studies in psychology, neuroscience, and sensorial linguistics. Set in this rich context we ask whether the use of sensorial language in writings is part of linguistic style? This question is important from the view of stylometrics research where a rich set of language features have been explored, bu… ▽ More It is well recognized that sensory perceptions and language have interconnections through numerous studies in psychology, neuroscience, and sensorial linguistics. Set in this rich context we ask whether the use of sensorial language in writings is part of linguistic style? This question is important from the view of stylometrics research where a rich set of language features have been explored, but with insufficient attention given to features related to sensorial language. Taking this as the goal we explore several angles about sensorial language and style in collections of lyrics, novels, and poetry. We find, for example, that individual use of sensorial language is not a random phenomenon; choice is likely involved. Also, sensorial style is generally stable over time - the shifts are extremely small. Moreover, style can be extracted from just a few hundred sentences that have sensorial terms. We also identify representative and distinctive features within each genre. For example, we observe that 4 of the top 6 representative features in novels collection involved individuals using olfactory language where we expected them to use non-olfactory language. △ Less

Submitted 25 September, 2022; originally announced September 2022.

arXiv:2206.09784 [pdf, other]

doi 10.4204/EPTCS.380.12

Extending Resource Monotones using Kan Extensions

Authors: Robin Cockett, Isabelle Jianing Geng, Carlo Maria Scandolo, Priyaa Varshinee Srinivasan

Abstract: In this paper we generalize the framework proposed by Gour and Tomamichel regarding extensions of monotones for resource theories. A monotone for a resource theory assigns a real number to each resource in the theory signifying the utility or the value of the resource. Gour and Tomamichel studied the problem of extending monotones using set-theoretical framework when a resource theory embeds fully… ▽ More In this paper we generalize the framework proposed by Gour and Tomamichel regarding extensions of monotones for resource theories. A monotone for a resource theory assigns a real number to each resource in the theory signifying the utility or the value of the resource. Gour and Tomamichel studied the problem of extending monotones using set-theoretical framework when a resource theory embeds fully and faithfully into the larger theory. One can generalize the problem of computing monotone extensions to scenarios when there exists a functorial transformation of one resource theory to another instead of just a full and faithful inclusion. In this article, we show that (point-wise) Kan extensions provide a precise categorical framework to describe and compute such extensions of monotones. To set up monotone extensions using Kan extensions, we introduce partitioned categories (pCat)as a framework for resource theories and pCat functors to formalize relationship between resource theories. We describe monotones as pCat functors into the preorder of non-negative real numbers, and describe extending monotones along any pCat functor using Kan extensions. We show how our framework works by applying it to extend entanglement monotones for bipartite pure states to bipartite mixed states, to extend classical divergences to the quantum setting, and to extend a non-uniformity monotone from classical probabilistic theory to quantum theory. △ Less

Submitted 31 July, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: In Proceedings ACT 2022, arXiv:2307.15519

Journal ref: EPTCS 380, 2023, pp. 203-223

arXiv:2205.11164 [pdf, other]

Time-series Transformer Generative Adversarial Networks

Authors: Padmanaba Srinivasan, William J. Knottenbelt

Abstract: Many real-world tasks are plagued by limitations on data: in some instances very little data is available and in others, data is protected by privacy enforcing regulations (e.g. GDPR). We consider limitations posed specifically on time-series data and present a model that can generate synthetic time-series which can be used in place of real data. A model that generates synthetic time-series data h… ▽ More Many real-world tasks are plagued by limitations on data: in some instances very little data is available and in others, data is protected by privacy enforcing regulations (e.g. GDPR). We consider limitations posed specifically on time-series data and present a model that can generate synthetic time-series which can be used in place of real data. A model that generates synthetic time-series data has two objectives: 1) to capture the stepwise conditional distribution of real sequences, and 2) to faithfully model the joint distribution of entire real sequences. Autoregressive models trained via maximum likelihood estimation can be used in a system where previous predictions are fed back in and used to predict future ones; in such models, errors can accrue over time. Furthermore, a plausible initial value is required making MLE based models not really generative. Many downstream tasks learn to model conditional distributions of the time-series, hence, synthetic data drawn from a generative model must satisfy 1) in addition to performing 2). We present TsT-GAN, a framework that capitalises on the Transformer architecture to satisfy the desiderata and compare its performance against five state-of-the-art models on five datasets and show that TsT-GAN achieves higher predictive performance on all datasets. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2205.01714 [pdf, other]

Don't sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks

Authors: Jonathan Rusert, Padmini Srinivasan

Abstract: Deep learning (DL) is being used extensively for text classification. However, researchers have demonstrated the vulnerability of such classifiers to adversarial attacks. Attackers modify the text in a way which misleads the classifier while keeping the original meaning close to intact. State-of-the-art (SOTA) attack algorithms follow the general principle of making minimal changes to the text so… ▽ More Deep learning (DL) is being used extensively for text classification. However, researchers have demonstrated the vulnerability of such classifiers to adversarial attacks. Attackers modify the text in a way which misleads the classifier while keeping the original meaning close to intact. State-of-the-art (SOTA) attack algorithms follow the general principle of making minimal changes to the text so as to not jeopardize semantics. Taking advantage of this we propose a novel and intuitive defense strategy called Sample Shielding. It is attacker and classifier agnostic, does not require any reconfiguration of the classifier or external resources and is simple to implement. Essentially, we sample subsets of the input text, classify them and summarize these into a final decision. We shield three popular DL text classifiers with Sample Shielding, test their resilience against four SOTA attackers across three datasets in a realistic threat setting. Even when given the advantage of knowing about our shielding strategy the adversary's attack success rate is <=10% with only one exception and often < 5%. Additionally, Sample Shielding maintains near original accuracy when applied to original texts. Crucially, we show that the `make minimal changes' approach of SOTA attackers leads to critical vulnerabilities that can be defended against with an intuitive sampling strategy. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 9 pages, 8 figures, Accepted to NAACL 2022

ACM Class: I.2.7

arXiv:2204.03715 [pdf, other]

Gravitationally Lensed Black Hole Emission Tomography

Authors: Aviad Levis, Pratul P. Srinivasan, Andrew A. Chael, Ren Ng, Katherine L. Bouman

Abstract: Measurements from the Event Horizon Telescope enabled the visualization of light emission around a black hole for the first time. So far, these measurements have been used to recover a 2D image under the assumption that the emission field is static over the period of acquisition. In this work, we propose BH-NeRF, a novel tomography approach that leverages gravitational lensing to recover the conti… ▽ More Measurements from the Event Horizon Telescope enabled the visualization of light emission around a black hole for the first time. So far, these measurements have been used to recover a 2D image under the assumption that the emission field is static over the period of acquisition. In this work, we propose BH-NeRF, a novel tomography approach that leverages gravitational lensing to recover the continuous 3D emission field near a black hole. Compared to other 3D reconstruction or tomography settings, this task poses two significant challenges: first, rays near black holes follow curved paths dictated by general relativity, and second, we only observe measurements from a single viewpoint. Our method captures the unknown emission field using a continuous volumetric function parameterized by a coordinate-based neural network, and uses knowledge of Keplerian orbital dynamics to establish correspondence between 3D points over time. Together, these enable BH-NeRF to recover accurate 3D emission fields, even in challenging situations with sparse measurements and uncertain orbital dynamics. This work takes the first steps in showing how future measurements from the Event Horizon Telescope could be used to recover evolving 3D emission around the supermassive black hole in our Galactic center. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: To appear in the IEEE Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Supplemental material including accompanying pdf, code, and video highlight can be found in the project page: http://imaging.cms.caltech.edu/bhnerf/

arXiv:2203.11849 [pdf, other]

A Girl Has A Name, And It's ... Adversarial Authorship Attribution for Deobfuscation

Authors: Wanyue Zhai, Jonathan Rusert, Zubair Shafiq, Padmini Srinivasan

Abstract: Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially… ▽ More Recent advances in natural language processing have enabled powerful privacy-invasive authorship attribution. To counter authorship attribution, researchers have proposed a variety of rule-based and learning-based text obfuscation approaches. However, existing authorship obfuscation approaches do not consider the adversarial threat model. Specifically, they are not evaluated against adversarially trained authorship attributors that are aware of potential obfuscation. To fill this gap, we investigate the problem of adversarial authorship attribution for deobfuscation. We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators from 20-30% to 5-10%. We also evaluate the effectiveness of adversarial training when the attributor makes incorrect assumptions about whether and which obfuscator was used. While there is a a clear degradation in attribution accuracy, it is noteworthy that this degradation is still at or above the attribution accuracy of the attributor that is not adversarially trained at all. Our results underline the need for stronger obfuscation approaches that are resistant to deobfuscation △ Less

Submitted 22 March, 2022; originally announced March 2022.

Comments: 9 pages, 7 figures, 3 tables, ACL 2022

arXiv:2203.11401 [pdf, other]

Suum Cuique: Studying Bias in Taboo Detection with a Community Perspective

Authors: Osama Khalid, Jonathan Rusert, Padmini Srinivasan

Abstract: Prior research has discussed and illustrated the need to consider linguistic norms at the community level when studying taboo (hateful/offensive/toxic etc.) language. However, a methodology for doing so, that is firmly founded on community language norms is still largely absent. This can lead both to biases in taboo text classification and limitations in our understanding of the causes of bias. We… ▽ More Prior research has discussed and illustrated the need to consider linguistic norms at the community level when studying taboo (hateful/offensive/toxic etc.) language. However, a methodology for doing so, that is firmly founded on community language norms is still largely absent. This can lead both to biases in taboo text classification and limitations in our understanding of the causes of bias. We propose a method to study bias in taboo classification and annotation where a community perspective is front and center. This is accomplished by using special classifiers tuned for each community's language. In essence, these classifiers represent community level language norms. We use these to study bias and find, for example, biases are largest against African Americans (7/10 datasets and all 3 classifiers examined). In contrast to previous papers we also study other communities and find, for example, strong biases against South Asians. In a small scale user study we illustrate our key idea which is that common utterances, i.e., those with high alignment scores with a community (community classifier confidence scores) are unlikely to be regarded taboo. Annotators who are community members contradict taboo classification decisions and annotations in a majority of instances. This paper is a significant step toward reducing false positive taboo decisions that over time harm minority communities. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 9 pages, 3 figures, Accepted to the Findings of ACL 2022

ACM Class: I.2.7

arXiv:2203.11331 [pdf, other]

On The Robustness of Offensive Language Classifiers

Authors: Jonathan Rusert, Zubair Shafiq, Padmini Srinivasan

Abstract: Social media platforms are deploying machine learning based offensive language classification systems to combat hateful, racist, and other forms of offensive speech at scale. However, despite their real-world deployment, we do not yet comprehensively understand the extent to which offensive language classifiers are robust against adversarial attacks. Prior work in this space is limited to studying… ▽ More Social media platforms are deploying machine learning based offensive language classification systems to combat hateful, racist, and other forms of offensive speech at scale. However, despite their real-world deployment, we do not yet comprehensively understand the extent to which offensive language classifiers are robust against adversarial attacks. Prior work in this space is limited to studying robustness of offensive language classifiers against primitive attacks such as misspellings and extraneous spaces. To address this gap, we systematically analyze the robustness of state-of-the-art offensive language classifiers against more crafty adversarial attacks that leverage greedy- and attention-based word selection and context-aware embeddings for word replacement. Our results on multiple datasets show that these crafty adversarial attacks can degrade the accuracy of offensive language classifiers by more than 50% while also being able to preserve the readability and meaning of the modified text. △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 9 pages, 2 figures, Accepted at ACL 2022

ACM Class: I.2.7

arXiv:2203.07693 [pdf, ps, other]

doi 10.1007/s12036-022-09842-7

An Automated Pipeline for Ultra-Violet Imaging Telescope (UVIT)

Authors: S. K. Ghosh, S. N. Tandon, S. K. Singh, D. S. Shelat, P. Tahlani, A. K. Singh, T. P. Srinivasan, P. Joseph, A. Devaraj, K. George, R. Mohan, J. Postma, C. S. Stalin

Abstract: We describe a versatile pipeline for processing the data collected by the Ultra-Violet Imaging Telescope (UVIT) on board Indian Multi-wavelength astronomical satellite AstroSat.The UVIT instrument carries out simultaneous astronomical imaging through selected filters / gratings in Far-Ultra-Violet (FUV), Near-Ultra-Violet & visible (VIS) bands of the targeted circular sky field (~ 0.5 deg dia). Th… ▽ More We describe a versatile pipeline for processing the data collected by the Ultra-Violet Imaging Telescope (UVIT) on board Indian Multi-wavelength astronomical satellite AstroSat.The UVIT instrument carries out simultaneous astronomical imaging through selected filters / gratings in Far-Ultra-Violet (FUV), Near-Ultra-Violet & visible (VIS) bands of the targeted circular sky field (~ 0.5 deg dia). This pipeline converts the data (Level-1) emanating from UVIT in their raw primitive format supplemented by inputs from the spacecraft sub-systems into UV sky images (& slitless grating spectra) and associated products readily usable by astronomers (Level-2). The primary products include maps of Intensity (rate of photon arrival), error on Intensity and effective Exposure. The pipeline is open source, extensively user configurable with many selectable parameters and its execution is fully automated. The key ingredients of the pipeline includes - extraction of drift in pointing of the spacecraft, and disturbances in pointing due to internal movements; application of various corrections to measured position in the detector for each photon - e.g. differential pointing with respect to a reference frame for shift and add operation, systematic effects and artifacts in the optics of the telescopes and detectors, exposure tracking on the sky, alignment of sky products from multi-episode exposures to generate a consolidated set and astrometry. Detailed logs of operations and intermediate products for every processing stage are accessible via user selectable options. While large number of selectable parameters are available for the user, a well characterized standard default set is used for executing this pipeline at the Payload Operation Centre (POC) for UVIT and selected products are archived and disseminated by the Indian Space Research Organization (ISRO) through its ISSDC portal. △ Less

Submitted 15 March, 2022; originally announced March 2022.

Comments: Accepted for publication in Journal of Astrophysics & Astronomy, 50 pages, 16 figures

arXiv:2202.08805 [pdf, ps, other]

doi 10.1145/3555551

Making a Radical Misogynist: How online social engagement with the Manosphere influences traits of radicalization

Authors: Hussam Habib, Padmini Srinivasan, Rishab Nithyanand

Abstract: The algorithms and the interactions facilitated by online platforms have been used by radical groups to recruit vulnerable individuals to their cause. This has resulted in the sharp growth of violent events and deteriorating online discourse. The Manosphere, a collection of radical anti-feminist communities, is one such group which has attracted attention due to their rapid growth and increasingly… ▽ More The algorithms and the interactions facilitated by online platforms have been used by radical groups to recruit vulnerable individuals to their cause. This has resulted in the sharp growth of violent events and deteriorating online discourse. The Manosphere, a collection of radical anti-feminist communities, is one such group which has attracted attention due to their rapid growth and increasingly violent real world outbursts. In this paper, we examine the social engagements between Reddit users who have participated in feminist discourse and the Manosphere communities on Reddit to understand the process of development of traits associated with the adoption of extremist ideologies. By using existing research on the psychology of radicalization we track how specific types of social engagement with the Manosphere influence the development of traits associated with radicalization. Our findings show that: (1) participation, even by the simple act of joining the Manosphere, has a significant influence on the language and outlook traits of a user, (2) Manosphere elites are extremely effective propagators of radical traits and cause their increase even outside the Manosphere, and (3) community perception can heavily influence a user's behavior. Finally, we examine how our findings can help draft community and platform moderation policies to help mitigate the problem of online radicalization. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2202.06212 [pdf, other]

Uni-Retriever: Towards Learning The Unified Embedding Based Retriever in Bing Sponsored Search

Authors: Jianjin Zhang, Zheng Liu, Weihao Han, Shitao Xiao, Ruicheng Zheng, Yingxia Shao, Hao Sun, Hanqing Zhu, Premkumar Srinivasan, Denvy Deng, Qi Zhang, Xing Xie

Abstract: Embedding based retrieval (EBR) is a fundamental building block in many web applications. However, EBR in sponsored search is distinguished from other generic scenarios and technically challenging due to the need of serving multiple retrieval purposes: firstly, it has to retrieve high-relevance ads, which may exactly serve user's search intent; secondly, it needs to retrieve high-CTR ads so as to… ▽ More Embedding based retrieval (EBR) is a fundamental building block in many web applications. However, EBR in sponsored search is distinguished from other generic scenarios and technically challenging due to the need of serving multiple retrieval purposes: firstly, it has to retrieve high-relevance ads, which may exactly serve user's search intent; secondly, it needs to retrieve high-CTR ads so as to maximize the overall user clicks. In this paper, we present a novel representation learning framework Uni-Retriever developed for Bing Search, which unifies two different training modes knowledge distillation and contrastive learning to realize both required objectives. On one hand, the capability of making high-relevance retrieval is established by distilling knowledge from the ``relevance teacher model''. On the other hand, the capability of making high-CTR retrieval is optimized by learning to discriminate user's clicked ads from the entire corpus. The two training modes are jointly performed as a multi-objective learning process, such that the ads of high relevance and CTR can be favored by the generated embeddings. Besides the learning strategy, we also elaborate our solution for EBR serving pipeline built upon the substantially optimized DiskANN, where massive-scale EBR can be performed with competitive time and memory efficiency, and accomplished in high-quality. We make comprehensive offline and online experiments to evaluate the proposed techniques, whose findings may provide useful insights for the future development of EBR systems. Uni-Retriever has been mainstreamed as the major retrieval path in Bing's production thanks to the notable improvements on the representation and EBR serving quality. △ Less

Submitted 13 February, 2022; originally announced February 2022.

arXiv:2202.05263 [pdf, other]

Block-NeRF: Scalable Large Scene Neural View Synthesis

Authors: Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, Pratul P. Srinivasan, Jonathan T. Barron, Henrik Kretzschmar

Abstract: We present Block-NeRF, a variant of Neural Radiance Fields that can represent large-scale environments. Specifically, we demonstrate that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is vital to decompose the scene into individually trained NeRFs. This decomposition decouples rendering time from scene size, enables rendering to scale to arbitrarily large environments,… ▽ More We present Block-NeRF, a variant of Neural Radiance Fields that can represent large-scale environments. Specifically, we demonstrate that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is vital to decompose the scene into individually trained NeRFs. This decomposition decouples rendering time from scene size, enables rendering to scale to arbitrarily large environments, and allows per-block updates of the environment. We adopt several architectural changes to make NeRF robust to data captured over months under different environmental conditions. We add appearance embeddings, learned pose refinement, and controllable exposure to each individual NeRF, and introduce a procedure for aligning appearance between adjacent NeRFs so that they can be seamlessly combined. We build a grid of Block-NeRFs from 2.8 million images to create the largest neural scene representation to date, capable of rendering an entire neighborhood of San Francisco. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: Project page: https://waymo.com/research/block-nerf/

arXiv:2201.08239 [pdf, other]

LaMDA: Language Models for Dialog Applications

Authors: Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao , et al. (35 additional authors not shown)

Abstract: We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotat… ▽ More We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformer-based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency. △ Less

Submitted 10 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

arXiv:2201.04127 [pdf, other]

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video

Authors: Chung-Yi Weng, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, Ira Kemelmacher-Shlizerman

Abstract: We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challengin… ▽ More We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios. △ Less

Submitted 14 June, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

Comments: CVPR 2022 (oral). Project page with videos: https://grail.cs.washington.edu/projects/humannerf/

arXiv:2112.03907 [pdf, other]

Ref-NeRF: Structured View-Dependent Appearance for Neural Radiance Fields

Authors: Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, Pratul P. Srinivasan

Abstract: Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to a… ▽ More Neural Radiance Fields (NeRF) is a popular view synthesis technique that represents a scene as a continuous volumetric function, parameterized by multilayer perceptrons that provide the volume density and view-dependent emitted radiance at each location. While NeRF-based techniques excel at representing fine geometric structures with smoothly varying view-dependent appearance, they often fail to accurately capture and reproduce the appearance of glossy surfaces. We address this limitation by introducing Ref-NeRF, which replaces NeRF's parameterization of view-dependent outgoing radiance with a representation of reflected radiance and structures this function using a collection of spatially-varying scene properties. We show that together with a regularizer on normal vectors, our model significantly improves the realism and accuracy of specular reflections. Furthermore, we show that our model's internal representation of outgoing radiance is interpretable and useful for scene editing. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: Project page: https://dorverbin.github.io/refnerf/

arXiv:2112.03873 [pdf, ps, other]

p-adic adelic metrics and Quadratic Chabauty I

Authors: Amnon Besser, J. Steffen Müller, Padmavathi Srinivasan

Abstract: We give a new construction of $p$-adic heights on varieties over number fields using $p$-adic Arakelov theory. In analogy with Zhang's construction of real-valued heights in terms of adelic metrics, these heights are given in terms of $p$-adic adelic metrics on line bundles. In particular, we describe a construction of canonical $p$-adic heights on abelian varieties and we show that we recover the… ▽ More We give a new construction of $p$-adic heights on varieties over number fields using $p$-adic Arakelov theory. In analogy with Zhang's construction of real-valued heights in terms of adelic metrics, these heights are given in terms of $p$-adic adelic metrics on line bundles. In particular, we describe a construction of canonical $p$-adic heights on abelian varieties and we show that we recover the canonical Mazur--Tate height and, for Jacobians, the height constructed by Coleman and Gross. Our main application is a new and simplified approach to the Quadratic Chabauty method for the computation of rational points on certain curves over the rationals, by pulling back the canonical height on the Jacobian with respect to a carefully chosen line bundle. We show that our construction allows us to reprove, without using $p$-adic Hodge theory or arithmetic fundamental groups, several results due to Balakrishnan and Dogra. Our method also extends to primes $p$ of bad reduction. One consequence of our work is that for any canonical height ($p$-adic or $\mathbb{R}$-valued) on an abelian variety (and hence on pull-backs to other varieties), the local contribution at a finite prime $q$ can be constructed using $q$-analytic methods. △ Less

Submitted 23 November, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: Updates include a p-adic Arakelov-theory proof that the method extends when $r>g$ and $r<g+\mathrm{rank}(\mathrm{NS}(J))-1$ (Sections 3.4 and 7.2), comparison to Colmez's work (Sections 4.5 and 9.3), an updated comparison to Balakrishnan and Dogra's work (Section 8), and a new unified q-analytic construction of local heights (p-adic and real) on abelian varieties at a finite prime q (Section 9.2)

arXiv:2112.03288 [pdf, other]

Dense Depth Priors for Neural Radiance Fields from Sparse Input Views

Authors: Barbara Roessle, Jonathan T. Barron, Ben Mildenhall, Pratul P. Srinivasan, Matthias Nießner

Abstract: Neural radiance fields (NeRF) encode a scene into a neural representation that enables photo-realistic rendering of novel views. However, a successful reconstruction from RGB images requires a large number of input views taken under static conditions - typically up to a few hundred images for room-size scenes. Our method aims to synthesize novel views of whole rooms from an order of magnitude fewe… ▽ More Neural radiance fields (NeRF) encode a scene into a neural representation that enables photo-realistic rendering of novel views. However, a successful reconstruction from RGB images requires a large number of input views taken under static conditions - typically up to a few hundred images for room-size scenes. Our method aims to synthesize novel views of whole rooms from an order of magnitude fewer images. To this end, we leverage dense depth priors in order to constrain the NeRF optimization. First, we take advantage of the sparse depth data that is freely available from the structure from motion (SfM) preprocessing step used to estimate camera poses. Second, we use depth completion to convert these sparse points into dense depth maps and uncertainty estimates, which are used to guide NeRF optimization. Our method enables data-efficient novel view synthesis on challenging indoor scenes, using as few as 18 images for an entire scene. △ Less

Submitted 7 April, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

Comments: CVPR 2022, project page: https://barbararoessle.github.io/dense_depth_priors_nerf/ , video: https://youtu.be/zzkvvdcvksc

arXiv:2111.14643 [pdf, other]

Urban Radiance Fields

Authors: Konstantinos Rematas, Andrew Liu, Pratul P. Srinivasan, Jonathan T. Barron, Andrea Tagliasacchi, Thomas Funkhouser, Vittorio Ferrari

Abstract: The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments (e.g., Street View). Given a sequence of posed RGB images and lidar sweeps acquired by cameras and scanners moving through an outdoor scene, we produce a model from which 3D surfaces can be extracted and novel RGB… ▽ More The goal of this work is to perform 3D reconstruction and novel view synthesis from data captured by scanning platforms commonly deployed for world mapping in urban outdoor environments (e.g., Street View). Given a sequence of posed RGB images and lidar sweeps acquired by cameras and scanners moving through an outdoor scene, we produce a model from which 3D surfaces can be extracted and novel RGB images can be synthesized. Our approach extends Neural Radiance Fields, which has been demonstrated to synthesize realistic novel images for small scenes in controlled settings, with new methods for leveraging asynchronously captured lidar data, for addressing exposure variation between captured images, and for leveraging predicted image segmentations to supervise densities on rays pointing at the sky. Each of these three extensions provides significant performance improvements in experiments on Street View data. Our system produces state-of-the-art 3D surface reconstructions and synthesizes higher quality novel views in comparison to both traditional methods (e.g.~COLMAP) and recent neural representations (e.g.~Mip-NeRF). △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: Project: https://urban-radiance-fields.github.io/

arXiv:2111.13679 [pdf, other]

NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images

Authors: Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron

Abstract: Neural Radiance Fields (NeRF) is a technique for high quality novel view synthesis from a collection of posed input images. Like most view synthesis methods, NeRF uses tonemapped low dynamic range (LDR) as input; these images have been processed by a lossy camera pipeline that smooths detail, clips highlights, and distorts the simple noise distribution of raw sensor data. We modify NeRF to instead… ▽ More Neural Radiance Fields (NeRF) is a technique for high quality novel view synthesis from a collection of posed input images. Like most view synthesis methods, NeRF uses tonemapped low dynamic range (LDR) as input; these images have been processed by a lossy camera pipeline that smooths detail, clips highlights, and distorts the simple noise distribution of raw sensor data. We modify NeRF to instead train directly on linear raw images, preserving the scene's full dynamic range. By rendering raw output images from the resulting NeRF, we can perform novel high dynamic range (HDR) view synthesis tasks. In addition to changing the camera viewpoint, we can manipulate focus, exposure, and tonemapping after the fact. Although a single raw image appears significantly more noisy than a postprocessed one, we show that NeRF is highly robust to the zero-mean distribution of raw noise. When optimized over many noisy raw inputs (25-200), NeRF produces a scene representation so accurate that its rendered novel views outperform dedicated single and multi-image deep raw denoisers run on the same wide baseline input images. As a result, our method, which we call RawNeRF, can reconstruct scenes from extremely noisy images captured in near-darkness. △ Less

Submitted 26 November, 2021; originally announced November 2021.

Comments: Project page: https://bmild.github.io/rawnerf/

arXiv:2111.12077 [pdf, other]

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

Authors: Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, Peter Hedman

Abstract: Though neural radiance fields (NeRF) have demonstrated impressive view synthesis results on objects and small bounded regions of space, they struggle on "unbounded" scenes, where the camera may point in any direction and content may exist at any distance. In this setting, existing NeRF-like models often produce blurry or low-resolution renderings (due to the unbalanced detail and scale of nearby a… ▽ More Though neural radiance fields (NeRF) have demonstrated impressive view synthesis results on objects and small bounded regions of space, they struggle on "unbounded" scenes, where the camera may point in any direction and content may exist at any distance. In this setting, existing NeRF-like models often produce blurry or low-resolution renderings (due to the unbalanced detail and scale of nearby and distant objects), are slow to train, and may exhibit artifacts due to the inherent ambiguity of the task of reconstructing a large scene from a small set of images. We present an extension of mip-NeRF (a NeRF variant that addresses sampling and aliasing) that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes. Our model, which we dub "mip-NeRF 360" as we target scenes in which the camera rotates 360 degrees around a point, reduces mean-squared error by 57% compared to mip-NeRF, and is able to produce realistic synthesized views and detailed depth maps for highly intricate, unbounded real-world scenes. △ Less

Submitted 25 March, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: https://jonbarron.info/mipnerf360/

Showing 1–50 of 99 results for author: Srinivasan, P