Search | arXiv e-print repository

doi 10.1145/3678884.3681833

Envisioning New Futures of Positive Social Technology: Beyond Paradigms of Fixing, Protecting, and Preventing

Authors: JaeWon Kim, Lindsay Popowski, Anna Fang, Cassidy Pyle, Guo Freeman, Ryan M. Kelly, Angela Y. Lee, Fannie Liu, Angela D. R. Smith, Alexandra To, Amy X. Zhang

Abstract: Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis to… ▽ More Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis toward leveraging social technologies to support and augment human flourishing. This workshop is organized around three themes relevant to Positech: 1) "Exploring Relevant and Adjacent Research" to define and widen the Positech scope with insights from related fields, 2) "Projecting the Landscape of Positech" for participants to outline the domain's key aspects and 3) "Envisioning the Future of Positech," anchored around strategic planning towards a sustainable research community. Ultimately, this workshop will serve as a platform to shift the narrative of social technology research towards a more positive, human-centric approach. It will foster research that goes beyond fixing technologies to protect humans from harm, to also pursue enriching human experiences and connections through technology. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2406.18019 [pdf, other]

Continuous Execution of High-Level Collaborative Tasks for Heterogeneous Robot Teams

Authors: Amy Fang, Tenny Yin, Jiawei Lin, Hadas Kress-Gazit

Abstract: We propose a control synthesis framework for a heterogeneous multi-robot system to satisfy collaborative tasks, where actions may take varying duration of time to complete. We encode tasks using the discrete logic LTL^ψ, which uses the concept of bindings to interleave robot actions and express information about relationship between specific task requirements and robot assignments. We present a sy… ▽ More We propose a control synthesis framework for a heterogeneous multi-robot system to satisfy collaborative tasks, where actions may take varying duration of time to complete. We encode tasks using the discrete logic LTL^ψ, which uses the concept of bindings to interleave robot actions and express information about relationship between specific task requirements and robot assignments. We present a synthesis approach to automatically generate a teaming assignment and corresponding discrete behavior that is correct-by-construction for continuous execution, while also implementing synchronization policies to ensure collaborative portions of the task are satisfied. We demonstrate our approach on a physical multi-robot system. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Under review in IEEE Transactions on Robotics

arXiv:2406.11794 [pdf, other]

DataComp-LM: In search of the next generation of training sets for language models

Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation. △ Less

Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Project page: https://www.datacomp.ai/dclm/

arXiv:2405.19547 [pdf, other]

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Authors: Yiping Wang, Yifang Chen, Wendan Yan, Alex Fang, Wenjing Zhou, Kevin Jamieson, Simon Shaolei Du

Abstract: Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data selection, (2) training new CLIP-style embedding models that are more effective at selecting high-quality data than the original OpenAI CLIP model, and (3… ▽ More Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data selection, (2) training new CLIP-style embedding models that are more effective at selecting high-quality data than the original OpenAI CLIP model, and (3) designing better metrics or strategies universally applicable to any CLIP embedding without requiring specific model properties (e.g., CLIPScore is one popular metric). While the first two approaches have been extensively studied, the third remains under-explored. In this paper, we advance the third approach by proposing two new methods. Firstly, instead of classical CLIP scores that only consider the alignment between two modalities from a single sample, we introduce negCLIPLoss, a CLIP loss-inspired method that adds the alignment between one sample and its contrastive pairs as an extra normalization term for better quality measurement. Secondly, when downstream tasks are known, we propose a new norm-based metric, NormSim, to measure the similarity between pretraining data and target data. We test our methods on the data selection benchmark, DataComp~\cite{gadre2023datacomp}. Compared to the best baseline using only OpenAI's CLIP-L/14, our methods achieve a 5.3\% improvement on ImageNet-1k and a 2.8\% improvement on 38 downstream evaluation tasks. Moreover, both negCLIPLoss and NormSim are compatible with existing techniques. By combining our methods with the current best methods DFN~\cite{fang2023data} and HYPE~\cite{kim2024hype}, we can boost average performance on downstream tasks by 0.9\%, achieving a new state-of-the-art. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: This paper supercedes our previous VAS paper (arXiv:2402.02055)

arXiv:2405.11656 [pdf, other]

URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images

Authors: Zoey Chen, Aaron Walsman, Marius Memmel, Kaichun Mo, Alex Fang, Karthikeya Vemuri, Alan Wu, Dieter Fox, Abhishek Gupta

Abstract: Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by… ▽ More Constructing simulation scenes that are both visually and physically realistic is a problem of practical interest in domains ranging from robotics to computer vision. This problem has become even more relevant as researchers wielding large data-hungry learning methods seek new sources of training data for physical decision-making systems. However, building simulation models is often still done by hand. A graphic designer and a simulation engineer work with predefined assets to construct rich scenes with realistic dynamic and kinematic properties. While this may scale to small numbers of scenes, to achieve the generalization properties that are required for data-driven robotic control, we require a pipeline that is able to synthesize large numbers of realistic scenes, complete with 'natural' kinematic and dynamic structures. To attack this problem, we develop models for inferring structure and generating simulation scenes from natural images, allowing for scalable scene generation from web-scale datasets. To train these image-to-simulation models, we show how controllable text-to-image generative models can be used in generating paired training data that allows for modeling of the inverse problem, mapping from realistic images back to complete scene models. We show how this paradigm allows us to build large datasets of scenes in simulation with semantic and physical realism. We present an integrated end-to-end pipeline that generates simulation scenes complete with articulated kinematic and dynamic structures from real-world images and use these for training robotic control policies. We then robustly deploy in the real world for tasks like articulated object manipulation. In doing so, our work provides both a pipeline for large-scale generation of simulation environments and an integrated system for training robust robotic control policies in the resulting environments. △ Less

Submitted 31 May, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

Comments: Accepted at RSS2024

arXiv:2405.06786 [pdf, other]

SAM3D: Zero-Shot Semi-Automatic Segmentation in 3D Medical Images with the Segment Anything Model

Authors: Trevor J. Chan, Aarush Sahni, Jie Li, Alisha Luthra, Amy Fang, Alison Pouch, Chamith S. Rajapakse

Abstract: We introduce SAM3D, a new approach to semi-automatic zero-shot segmentation of 3D images building on the existing Segment Anything Model. We achieve fast and accurate segmentations in 3D images with a four-step strategy comprising: volume slicing along non-orthogonal axes, efficient prompting in 3D, slice-wise inference using the pretrained SAM, and recoposition and refinement in 3D. We evaluated… ▽ More We introduce SAM3D, a new approach to semi-automatic zero-shot segmentation of 3D images building on the existing Segment Anything Model. We achieve fast and accurate segmentations in 3D images with a four-step strategy comprising: volume slicing along non-orthogonal axes, efficient prompting in 3D, slice-wise inference using the pretrained SAM, and recoposition and refinement in 3D. We evaluated SAM3D performance qualitatively on an array of imaging modalities and anatomical structures and quantify performance for specific organs in body CT and tumors in brain MRI. By enabling users to create 3D segmentations of unseen data quickly and with dramatically reduced manual input, these methods have the potential to aid surgical planning and education, diagnostic imaging, and scientific research. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.02071 [pdf, other]

Spacelike initial data for black hole stability

Authors: Allen Juntao Fang, Jérémie Szeftel, Arthur Touati

Abstract: We construct initial data suitable for the Kerr stability conjecture, that is, solutions to the constraint equations on a spacelike hypersurface with boundary entering the black hole horizon that are arbitrarily decaying perturbations of a Kerr initial data set. This results from a more general perturbative construction on any asymptotically flat initial data set with the topology of… ▽ More We construct initial data suitable for the Kerr stability conjecture, that is, solutions to the constraint equations on a spacelike hypersurface with boundary entering the black hole horizon that are arbitrarily decaying perturbations of a Kerr initial data set. This results from a more general perturbative construction on any asymptotically flat initial data set with the topology of $\mathbb{R}^3\setminus\{r<1\}$ enjoying some analyticity near and at the boundary. In particular, we design a suitable mixed boundary condition for the elliptic operator of the conformal method in order to exclude the Killing initial data sets (KIDS). △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 26 pages

arXiv:2404.02831 [pdf, other]

Empowering Biomedical Discovery with AI Agents

Authors: Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, Marinka Zitnik

Abstract: We envision "AI scientists" as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate AI models and biomedical tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis… ▽ More We envision "AI scientists" as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate AI models and biomedical tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces, and execute repetitive tasks. AI agents are poised to be proficient in various tasks, planning discovery workflows and performing self-assessment to identify and mitigate gaps in their knowledge. These agents use large language models and generative models to feature structured memory for continual learning and use machine learning tools to incorporate scientific knowledge, biological principles, and theories. AI agents can impact areas ranging from virtual cell simulation, programmable control of phenotypes, and the design of cellular circuits to developing new therapies. △ Less

Submitted 24 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.08540 [pdf, other]

Language models scale reliably with over-training and on downstream tasks

Authors: Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar, Suchin Gururangan, Mitchell Wortsman, Rulin Shao, Jean Mercat, Alex Fang, Jeffrey Li, Sedrick Keh, Rui Xin, Marianna Nezhurina, Igor Vasiljevic, Jenia Jitsev, Luca Soldaini, Alexandros G. Dimakis, Gabriel Ilharco, Pang Wei Koh, Shuran Song, Thomas Kollar, Yair Carmon, Achal Dave, Reinhard Heckel, Niklas Muennighoff, Ludwig Schmidt

Abstract: Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contr… ▽ More Scaling laws are useful guides for derisking expensive training runs, as they predict performance of large models using cheaper, small-scale experiments. However, there remain gaps between current scaling studies and how language models are ultimately trained and evaluated. For instance, scaling is usually studied in the compute-optimal training regime (i.e., "Chinchilla optimal" regime). In contrast, models are often over-trained to reduce inference costs. Moreover, scaling laws mostly predict loss on next-token prediction, but models are usually compared on downstream task performance. To address both shortcomings, we create a testbed of 104 models with 0.011B to 6.9B parameters trained with various numbers of tokens on three data distributions. First, we fit scaling laws that extrapolate in both the amount of over-training and the number of model parameters. This enables us to predict the validation loss of a 1.4B parameter, 900B token run (i.e., 32$\times$ over-trained) and a 6.9B parameter, 138B token run (i.e., a compute-optimal run)$\unicode{x2014}$each from experiments that take 300$\times$ less compute. Second, we relate the perplexity of a language model to its downstream task performance by proposing a power law. We use this law to predict top-1 error averaged over downstream tasks for the two aforementioned models, using experiments that take 20$\times$ less compute. Our experiments are available at https://github.com/mlfoundations/scaling. △ Less

Submitted 14 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2402.00296 [pdf, other]

High-Level, Collaborative Task Planning Grammar and Execution for Heterogeneous Agents

Authors: Amy Fang, Hadas Kress-Gazit

Abstract: We propose a new multi-agent task grammar to encode collaborative tasks for a team of heterogeneous agents that can have overlapping capabilities. The grammar allows users to specify the relationship between agents and parts of the task without providing explicit assignments or constraints on the number of agents required. We develop a method to automatically find a team of agents and synthesize c… ▽ More We propose a new multi-agent task grammar to encode collaborative tasks for a team of heterogeneous agents that can have overlapping capabilities. The grammar allows users to specify the relationship between agents and parts of the task without providing explicit assignments or constraints on the number of agents required. We develop a method to automatically find a team of agents and synthesize correct-by-construction control with synchronization policies to satisfy the task. We demonstrate the scalability of our approach through simulation and compare our method to existing task grammars that encode multi-agent tasks. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: To appear in the Proceedings of the 2024 International Conference on Autonomous Agents and Multiagent Systems (AAMAS)

arXiv:2401.14353 [pdf, ps, other]

Initial data for Minkowski stability with arbitrary decay

Authors: Allen Juntao Fang, Jérémie Szeftel, Arthur Touati

Abstract: We construct and parametrize solutions to the constraint equations of general relativity in a neighborhood of Minkowski spacetime with arbitrary prescribed decay properties at infinity. We thus provide a large class of initial data for the results on stability of Minkowski which include a mass term in the asymptotics. Due to the symmetries of Minkowski, a naive linear perturbation fails. Our const… ▽ More We construct and parametrize solutions to the constraint equations of general relativity in a neighborhood of Minkowski spacetime with arbitrary prescribed decay properties at infinity. We thus provide a large class of initial data for the results on stability of Minkowski which include a mass term in the asymptotics. Due to the symmetries of Minkowski, a naive linear perturbation fails. Our construction is based on a simplified conformal method, a reduction to transverse traceless perturbations and a nonlinear fixed point argument where we face linear obstructions coming from the cokernels of both the linearized constraint operator and the Laplace operator. To tackle these obstructions, we introduce a well-chosen truncated black hole around which to perturb. The control of the parameters of the truncated black hole is the most technical part of the proof, since its center of mass and angular momentum could be arbitrarily large. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 86 pages

arXiv:2312.17213 [pdf, other]

Possible Unconventional Surface Superconductivity in the Half-Heusler YPtBi

Authors: Eylon Persky, Alan Fang, Xinyang Zhang, Carolina Adamo, Eli Levenson-Falk, Chandra Shekhar, Claudia Felser, Binghai Yan, Aharon Kapitulnik

Abstract: We report an extensive extensive study of the noncentrosymmetric half-Heusler topological superconductor YPtBi, revealing unusual relation between bulk superconductivity and the appearance of surface superconductivity at temperatures up to 3 times the bulk transition temperature. Transport measurements confirmed the low carrier density of the material and its bulk superconducting transition, which… ▽ More We report an extensive extensive study of the noncentrosymmetric half-Heusler topological superconductor YPtBi, revealing unusual relation between bulk superconductivity and the appearance of surface superconductivity at temperatures up to 3 times the bulk transition temperature. Transport measurements confirmed the low carrier density of the material and its bulk superconducting transition, which was also observed in ac susceptibility through mutual inductance (MI) measurements. However, a weak signature of superconductivity in the MI measurements appeared much above the bulk transition temperature, which was further observed in scanning tunneling spectroscopy. Polar Kerr effect measurements suggest that while the bulk superconductor may exhibit an unusual nodal superconducting state, only the surface state breaks time reversal symmetry. Complementary tunneling measurements on LuPtBi are used to establish the observations on YPtBi, while density-functional theory (DFT) calculations may shed light on the origin of this unusual surface state. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 11 pages, 10 figures

arXiv:2312.10775 [pdf, other]

What Makes Digital Support Effective? How Therapeutic Skills Affect Clinical Well-Being

Authors: Anna Fang, Wenjie Yang, Raj Sanjay Shah, Yash Mathur, Diyi Yang, Haiyi Zhu, Robert Kraut

Abstract: Online mental health support communities have grown in recent years for providing accessible mental and emotional health support through volunteer counselors. Despite millions of people participating in chat support on these platforms, the clinical effectiveness of these communities on mental health symptoms remains unknown. Furthermore, although volunteers receive some training based on establish… ▽ More Online mental health support communities have grown in recent years for providing accessible mental and emotional health support through volunteer counselors. Despite millions of people participating in chat support on these platforms, the clinical effectiveness of these communities on mental health symptoms remains unknown. Furthermore, although volunteers receive some training based on established therapeutic skills studied in face-to-face environments such as active listening and motivational interviewing, it remains understudied how the usage of these skills in this online context affects people's mental health status. In our work, we collaborate with one of the largest online peer support platforms and use both natural language processing and machine learning techniques to measure how one-on-one support chats affect depression and anxiety symptoms. We measure how the techniques and characteristics of support providers, such as using affirmation, empathy, and past experience on the platform, affect support-seekers' mental health changes. We find that online peer support chats improve both depression and anxiety symptoms with a statistically significant but relatively small effect size. Additionally, support providers' techniques such as emphasizing the autonomy of the client lead to better mental health outcomes. However, we also found that some behaviors (e.g. persuading) are actually harmful to depression and anxiety outcomes. Our work provides key understanding for mental health care in the online setting and designing training systems for online support providers. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2310.17034 [pdf, other]

Follow-on Question Suggestion via Voice Hints for Voice Assistants

Authors: Besnik Fetahu, Pedro Faustini, Giuseppe Castellucci, Anjie Fang, Oleg Rokhlenko, Shervin Malmasi

Abstract: The adoption of voice assistants like Alexa or Siri has grown rapidly, allowing users to instantly access information via voice search. Query suggestion is a standard feature of screen-based search experiences, allowing users to explore additional topics. However, this is not trivial to implement in voice-based settings. To enable this, we tackle the novel task of suggesting questions with compact… ▽ More The adoption of voice assistants like Alexa or Siri has grown rapidly, allowing users to instantly access information via voice search. Query suggestion is a standard feature of screen-based search experiences, allowing users to explore additional topics. However, this is not trivial to implement in voice-based settings. To enable this, we tackle the novel task of suggesting questions with compact and natural voice hints to allow users to ask follow-up questions. We define the task, ground it in syntactic theory and outline linguistic desiderata for spoken hints. We propose baselines and an approach using sequence-to-sequence Transformers to generate spoken hints from a list of questions. Using a new dataset of 6681 input questions and human written hints, we evaluated the models with automatic metrics and human evaluation. Results show that a naive approach of concatenating suggested questions creates poor voice hints. Our approach, which applies a linguistically-motivated pretraining task was strongly preferred by humans for producing the most natural hints. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted as Long Paper at EMNLP'23 Findings

arXiv:2309.17425 [pdf, other]

Data Filtering Networks

Authors: Alex Fang, Albin Madappally Jose, Amit Jain, Ludwig Schmidt, Alexander Toshev, Vaishaal Shankar

Abstract: Large training sets have become a cornerstone of machine learning and are the foundation for recent advances in language modeling and multimodal learning. While data curation for pre-training is often still ad-hoc, one common paradigm is to first collect a massive pool of data from the Web and then filter this candidate pool down to an actual training set via various heuristics. In this work, we s… ▽ More Large training sets have become a cornerstone of machine learning and are the foundation for recent advances in language modeling and multimodal learning. While data curation for pre-training is often still ad-hoc, one common paradigm is to first collect a massive pool of data from the Web and then filter this candidate pool down to an actual training set via various heuristics. In this work, we study the problem of learning a data filtering network (DFN) for this second step of filtering a large uncurated dataset. Our key finding is that the quality of a network for filtering is distinct from its performance on downstream tasks: for instance, a model that performs well on ImageNet can yield worse training sets than a model with low ImageNet accuracy that is trained on a small amount of high-quality data. Based on our insights, we construct new data filtering networks that induce state-of-the-art image-text datasets. Specifically, our best performing dataset DFN-5B enables us to train state-of-the-art CLIP models for their compute budgets: among other improvements on a variety of tasks, a ViT-H trained on our dataset achieves 84.4% zero-shot transfer accuracy on ImageNet, out-performing models trained on other datasets such as LAION-2B, DataComp-1B, or OpenAI's WIT. In order to facilitate further research in dataset design, we also release a new 2 billion example dataset DFN-2B and show that high performance data filtering networks can be trained from scratch using only publicly available data. △ Less

Submitted 5 November, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

arXiv:2309.10089 [pdf, other]

HTEC: Human Transcription Error Correction

Authors: Hanbo Sun, Jian Gao, Xiaomin Wu, Anjie Fang, Cheng Cao, Zheng Du

Abstract: High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has found that every 1% worse transcription Word Error Rate (WER) increases approximately 2% ASR WER by using the transcriptions to train ASR models. Transcription errors are inevitable for even highly-trained annotators. However, few studies have explo… ▽ More High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has found that every 1% worse transcription Word Error Rate (WER) increases approximately 2% ASR WER by using the transcriptions to train ASR models. Transcription errors are inevitable for even highly-trained annotators. However, few studies have explored human transcription correction. Error correction methods for other problems, such as ASR error correction and grammatical error correction, do not perform sufficiently for this problem. Therefore, we propose HTEC for Human Transcription Error Correction. HTEC consists of two stages: Trans-Checker, an error detection model that predicts and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative model that fills masked positions. We propose a holistic list of correction operations, including four novel operations handling deletion errors. We further propose a variant of embeddings that incorporates phoneme information into the input of the transformer. HTEC outperforms other methods by a large margin and surpasses human annotators by 2.2% to 4.5% in WER. Finally, we deployed HTEC to assist human annotators and showed HTEC is particularly effective as a co-pilot, which improves transcription quality by 15.1% without sacrificing transcription velocity. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 13 pages, 4 figures, 11 tables, AMLC 2023

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2308.01257 [pdf, other]

Shaping Online Dialogue: Examining How Community Rules Affect Discussion Structures on Reddit

Authors: Anna Fang, Wenjie Yang, Haiyi Zhu

Abstract: Community rules play a key part in enabling or constraining the behaviors of members in online communities. However, little is unknown regarding whether and to what degree changing rules actually affects community dynamics. In this paper, we seek to understand how these behavior-governing rules shape the interactions between users, as well as the structure of their discussion. Using the top commun… ▽ More Community rules play a key part in enabling or constraining the behaviors of members in online communities. However, little is unknown regarding whether and to what degree changing rules actually affects community dynamics. In this paper, we seek to understand how these behavior-governing rules shape the interactions between users, as well as the structure of their discussion. Using the top communities on Reddit (i.e. subreddits), we first contribute a taxonomy of behavior-based rule categories across Reddit. Then, we use a network analysis perspective to discover how changing implementation of different rule categories affects subreddits' user interaction and discussion networks over a 1.5 year period. Our study find several significant effects, including greater clustering among users when subreddits increase rules focused on structural regulation and how restricting allowable content surprisingly leads to more interactions between users. Our findings contribute to research in proactive moderation through rule setting, as well as lend valuable insights for online community designers and moderators to achieve desired community dynamics. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.08423 [pdf, other]

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science. △ Less

Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2306.10191 [pdf, other]

Neural Priming for Sample-Efficient Adaptation

Authors: Matthew Wallingford, Vivek Ramanujan, Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi, Ludwig Schmidt, Ali Farhadi

Abstract: We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be perfo… ▽ More We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time, even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2.45% improvement in accuracy on ImageNet and 3.81% accuracy improvement on average across standard transfer learning benchmarks. Further, using Neural Priming at inference to adapt to distribution shift, we see a 1.41% accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the challenge of limited labeled data and changing distributions. Code is available at github.com/RAIVNLab/neural-priming. △ Less

Submitted 4 December, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 18 pages, 7 figures, 9 tables

arXiv:2304.14108 [pdf, other]

DataComp: In search of the next generation of multimodal datasets

Authors: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song , et al. (9 additional authors not shown)

Abstract: Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Commo… ▽ More Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms. To address this shortcoming in the ML ecosystem, we introduce DataComp, a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Participants in our benchmark design new filtering techniques or curate new data sources and then evaluate their new dataset by running our standardized CLIP training code and testing the resulting model on 38 downstream test sets. Our benchmark consists of multiple compute scales spanning four orders of magnitude, which enables the study of scaling trends and makes the benchmark accessible to researchers with varying resources. Our baseline experiments show that the DataComp workflow leads to better training sets. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet, outperforming OpenAI's CLIP ViT-L/14 by 3.7 percentage points while using the same training procedure and compute. We release DataComp and all accompanying code at www.datacomp.ai. △ Less

Submitted 20 October, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: NeurIPS 2023 Datasets and Benchmarks Track

arXiv:2304.06939 [pdf, other]

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text

Authors: Wanrong Zhu, Jack Hessel, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin Choi

Abstract: In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex prompts involving interaction between images, e.g., "What do image A and image B have in common?" To support this interface, pretraining occurs… ▽ More In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via interleaving independent supervised (image, text) examples, but also, more complex prompts involving interaction between images, e.g., "What do image A and image B have in common?" To support this interface, pretraining occurs over web corpora that similarly contain interleaved images+text. To date, however, large-scale data of this form have not been publicly available. We release Multimodal C4, an augmentation of the popular text-only C4 corpus with images interleaved. We use a linear assignment algorithm to place images into longer bodies of text using CLIP features, a process that we show outperforms alternatives. Multimodal C4 spans everyday topics like cooking, travel, technology, etc. A manual inspection of a random sample of documents shows that a vast majority (88%) of images are topically relevant, and that linear assignment frequently selects individual sentences specifically well-aligned with each image (80%). After filtering NSFW images, ads, etc., the resulting corpus consists of 101.2M documents with 571M images interleaved in 43B English tokens. △ Less

Submitted 28 October, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: NeurIPS D&B 2023. Project homepage: https://github.com/allenai/mmc4

arXiv:2303.11272 [pdf, other]

Agent-based Simulation for Online Mental Health Matching

Authors: Yuhan Liu, Anna Fang, Glen Moriarty, Robert Kraut, Haiyi Zhu

Abstract: Online mental health communities (OMHCs) are an effective and accessible channel to give and receive social support for individuals with mental and emotional issues. However, a key challenge on these platforms is finding suitable partners to interact with given that mechanisms to match users are currently underdeveloped. In this paper, we collaborate with one of the world's largest OMHC to develop… ▽ More Online mental health communities (OMHCs) are an effective and accessible channel to give and receive social support for individuals with mental and emotional issues. However, a key challenge on these platforms is finding suitable partners to interact with given that mechanisms to match users are currently underdeveloped. In this paper, we collaborate with one of the world's largest OMHC to develop an agent-based simulation framework and explore the trade-offs in different matching algorithms. The simulation framework allows us to compare current mechanisms and new algorithmic matching policies on the platform, and observe their differing effects on a variety of outcome metrics. Our findings include that usage of the deferred-acceptance algorithm can significantly better the experiences of support-seekers in one-on-one chats while maintaining low waiting time. We note key design considerations that agent-based modeling reveals in the OMHC context, including the potential benefits of algorithmic matching on marginalized communities. △ Less

Submitted 20 March, 2023; originally announced March 2023.

arXiv:2303.01225 [pdf, other]

doi 10.1038/s41535-023-00555-w

Commensurate-to-incommensurate transition of charge-density-wave order and a possible quantum critical point in pressurized kagome metal CsV$_3$Sb$_5$

Authors: X. Y. Feng, Z. Zhao, J. Luo, J. Yang, A. F. Fang, H. T. Yang, H. J. Gao, R. Zhou, Guo-qing Zheng

Abstract: Clarifying the interplay between charge density waves (CDWs) and superconductivity is important in the kagome metal CsV$_3$Sb$_5$, and pressure ($P$) can play a crucial role. Here, we present $^{121/123}$Sb nuclear quadrupole resonance (NQR) measurements under hydrostatic pressures up to 2.43 GPa in CsV$_3$Sb$_5$ single crystals. We demonstrate that the CDW gradually changes from a commensurate mo… ▽ More Clarifying the interplay between charge density waves (CDWs) and superconductivity is important in the kagome metal CsV$_3$Sb$_5$, and pressure ($P$) can play a crucial role. Here, we present $^{121/123}$Sb nuclear quadrupole resonance (NQR) measurements under hydrostatic pressures up to 2.43 GPa in CsV$_3$Sb$_5$ single crystals. We demonstrate that the CDW gradually changes from a commensurate modulation with a star-of-David (SoD) pattern to an incommensurate one with a superimposed SoD and Tri-hexagonal (TrH) pattern stacking along the $c$-axis. Moreover, the linewidth $δν$ of $^{121/123}$Sb-NQR spectra increases with cooling down to $T_{\rm CDW}$, indicating the appearance of a short-range CDW order due to CDW fluctuations pinned by quenched disorders. The $δν$ shows a Curie-Weiss temperature dependence and tends to diverge at $P_{\rm c} \sim$ 1.9 GPa, suggesting that a CDW quantum critical point (QCP) exists at $P_{\rm c}$ where $T_{\rm c}$ shows the maximum. For $P > P_{\rm c}$, spin fluctuations are enhanced when the CDW is suppressed. Our results suggest that the maximal $T_{\rm c}$ at $P_{\rm c} \sim$ 1.9 GPa is related to the CDW QCP and the presence of spin fluctuations prevent the $T_{\rm c}$ from a rapid decrease otherwise after the CDW is completely suppressed. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 17 pages, 8 figures

Journal ref: npj Quantum Materials 8, 23 (2023)

arXiv:2301.06670 [pdf, ps, other]

Investigation of the laser-induced lineshape change in attosecond transient absorption spectra by employing a time-dependent generalized Floquet approach

Authors: Di Zhao, Chen-Wei Jiang, Ai-Ping Fang, Shao-Yan Gao, Fu-li Li

Abstract: We introduce a time-dependent generalized Floquet (TDGF) approach to calculate attosecond transient absorption spectra of helium atoms subjected to the combination of an attosecond extreme ultraviolet (XUV) pulse and a delayed few-cycle infrared (IR) laser pulse. This TDGF approach provides a Floquet understanding of the laser-induced change of resonant absorption lineshape. It is analytically dem… ▽ More We introduce a time-dependent generalized Floquet (TDGF) approach to calculate attosecond transient absorption spectra of helium atoms subjected to the combination of an attosecond extreme ultraviolet (XUV) pulse and a delayed few-cycle infrared (IR) laser pulse. This TDGF approach provides a Floquet understanding of the laser-induced change of resonant absorption lineshape. It is analytically demonstrated that, the phase shift of the time-dependent dipole moment that results in the lineshape changes consists of the \emph{adiabatic} laser-induced phase (LIP) due to the IR-induced stark shifts of adiabatic Floquet states and the \emph{non-adiabatic} phase correction due to the non-adiabatic IR-induced coupling between adiabatic Floquet states. Comparisons of the spectral lineshape calculated based on the TDGF approach with the results obtained with the LIP model [S. Chen \emph{et al.}, Phys. Rev. A \textbf{88}, 033409(2013)] and the rotating-wave approximation (RWA) are made in several typical cases. It is suggested in the picture of adiabatic Floquet states that, the LIP model works as long as the generalized adiabatic theorem [A. Dodin \emph{et al.}, Phys. Rev. X Quantum \textbf{2}, 030302(2021)] fulfils, and the RWA works when the higher-order IR-coupling effect in the formation of adiabatic Floquet states is neglectable. △ Less

Submitted 16 January, 2023; originally announced January 2023.

arXiv:2301.04644 [pdf, other]

Does progress on ImageNet transfer to real-world datasets?

Authors: Alex Fang, Simon Kornblith, Ludwig Schmidt

Abstract: Does progress on ImageNet transfer to real-world datasets? We investigate this question by evaluating ImageNet pre-trained models with varying accuracy (57% - 83%) on six practical image classification datasets. In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collec… ▽ More Does progress on ImageNet transfer to real-world datasets? We investigate this question by evaluating ImageNet pre-trained models with varying accuracy (57% - 83%) on six practical image classification datasets. In particular, we study datasets collected with the goal of solving real-world tasks (e.g., classifying images from camera traps or satellites), as opposed to web-scraped benchmarks collected for comparing models. On multiple datasets, models with higher ImageNet accuracy do not consistently yield performance improvements. For certain tasks, interventions such as data augmentation improve performance even when architectures do not. We hope that future benchmarks will include more diverse datasets to encourage a more comprehensive approach to improving learning algorithms. △ Less

Submitted 11 January, 2023; originally announced January 2023.

arXiv:2301.04101 [pdf, other]

Neural Radiance Field Codebooks

Authors: Matthew Wallingford, Aditya Kusupati, Alex Fang, Vivek Ramanujan, Aniruddha Kembhavi, Roozbeh Mottaghi, Ali Farhadi

Abstract: Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks. Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view recons… ▽ More Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks. Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view reconstruction. NRC learns to reconstruct scenes from novel views using a dictionary of object codes which are decoded through a volumetric renderer. This enables the discovery of reoccurring visual and geometric patterns across scenes which are transferable to downstream tasks. We show that NRC representations transfer well to object navigation in THOR, outperforming 2D and 3D representation learning methods by 3.1% success rate. We demonstrate that our approach is able to perform unsupervised segmentation for more complex synthetic (THOR) and real scenes (NYU Depth) better than prior methods (29% relative improvement). Finally, we show that NRC improves on the task of depth ordering by 5.5% accuracy in THOR. △ Less

Submitted 30 April, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: 19 pages, 8 figures, 9 tables

Journal ref: International Conference on Learning Representations 2023

arXiv:2210.15777 [pdf, other]

Reinforced Question Rewriting for Conversational Question Answering

Authors: Zhiyu Chen, Jie Zhao, Anjie Fang, Besnik Fetahu, Oleg Rokhlenko, Shervin Malmasi

Abstract: Conversational Question Answering (CQA) aims to answer questions contained within dialogues, which are not easily interpretable without context. Developing a model to rewrite conversational questions into self-contained ones is an emerging solution in industry settings as it allows using existing single-turn QA systems to avoid training a CQA model from scratch. Previous work trains rewriting mode… ▽ More Conversational Question Answering (CQA) aims to answer questions contained within dialogues, which are not easily interpretable without context. Developing a model to rewrite conversational questions into self-contained ones is an emerging solution in industry settings as it allows using existing single-turn QA systems to avoid training a CQA model from scratch. Previous work trains rewriting models using human rewrites as supervision. However, such objectives are disconnected with QA models and therefore more human-like rewrites do not guarantee better QA performance. In this paper we propose using QA feedback to supervise the rewriting model with reinforcement learning. Experiments show that our approach can effectively improve QA performance over baselines for both extractive and retrieval QA. Furthermore, human evaluation shows that our method can generate more accurate and detailed rewrites when compared to human annotations. △ Less

Submitted 31 October, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

Comments: A cleaned version of our paper Accepted by EMNLP 2022 (Industry Track)

arXiv:2208.14536 [pdf, other]

MultiCoNER: A Large-scale Multilingual dataset for Complex Named Entity Recognition

Authors: Shervin Malmasi, Anjie Fang, Besnik Fetahu, Sudipta Kar, Oleg Rokhlenko

Abstract: We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and code-mixing subsets. This dataset is designed to represent contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities like movie titles, a… ▽ More We present MultiCoNER, a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and code-mixing subsets. This dataset is designed to represent contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities like movie titles, and long-tail entity distributions. The 26M token dataset is compiled from public resources using techniques such as heuristic-based sentence sampling, template extraction and slotting, and machine translation. We applied two NER models on our dataset: a baseline XLM-RoBERTa model, and a state-of-the-art GEMNET model that leverages gazetteers. The baseline achieves moderate performance (macro-F1=54%), highlighting the difficulty of our data. GEMNET, which uses gazetteers, improvement significantly (average improvement of macro-F1=+30%). MultiCoNER poses challenges even for large pre-trained language models, and we believe that it can help further research in building robust NER systems. MultiCoNER is publicly available at https://registry.opendata.aws/multiconer/ and we hope that this resource will help advance research in various aspects of NER. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: Accepted at COLING 2022

arXiv:2207.08024 [pdf, other]

LAVA: Language Audio Vision Alignment for Contrastive Video Pre-Training

Authors: Sumanth Gurram, Andy Fang, David Chan, John Canny

Abstract: Generating representations of video data is of key importance in advancing the field of machine perception. Most current techniques rely on hand-annotated data, which can be difficult to work with, expensive to generate, and hard to scale. In this work, we propose a novel learning approach based on contrastive learning, LAVA, which is capable of learning joint language, audio, and video representa… ▽ More Generating representations of video data is of key importance in advancing the field of machine perception. Most current techniques rely on hand-annotated data, which can be difficult to work with, expensive to generate, and hard to scale. In this work, we propose a novel learning approach based on contrastive learning, LAVA, which is capable of learning joint language, audio, and video representations in a self-supervised manner. We pre-train LAVA on the Kinetics 700 dataset using transformer encoders to learn representations for each modality. We then demonstrate that LAVA performs competitively with the current state-of-the-art self-supervised and weakly-supervised pretraining techniques on UCF-101 and HMDB-51 video action recognition while using a fraction of the unlabeled data. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: Workshop Paper at ICML 2022

arXiv:2207.07902 [pdf, other]

Linear stability of the slowly-rotating Kerr-de Sitter family

Authors: Allen Juntao Fang

Abstract: In this paper, we prove that the slowly-rotating Kerr-de Sitter family of black holes are linearly stable as a family of solutions to the Einstein vacuum equations with $Λ>0$ in harmonic (wave) gauge. This article is part of a series that provides a novel proof of the full nonlinear stability of the slowly-rotating Kerr-de Sitter family. This paper and its follow-up offer a self-contained alternat… ▽ More In this paper, we prove that the slowly-rotating Kerr-de Sitter family of black holes are linearly stable as a family of solutions to the Einstein vacuum equations with $Λ>0$ in harmonic (wave) gauge. This article is part of a series that provides a novel proof of the full nonlinear stability of the slowly-rotating Kerr-de Sitter family. This paper and its follow-up offer a self-contained alternative approach to nonlinear stability of the Kerr-de Sitter family from the original work of Hintz and Vasy by interpreting quasinormal modes as $H^k$ eigenvalues of an operator on a Hilbert space, and using integrated local energy decay estimates to prove the existence of a spectral gap. In particular, we avoid the construction of a meromorphic continuation of the resolvent. We also do not compactify the spacetime, thus avoiding the use of $b$-calculus and instead only use standard pseudo-differential arguments in a neighborhood of the trapped set; and avoid constraint damping altogether. The methods in the current paper offer an explicit example of how to use the vectorfield method to achieve resolvent estimates on a trapping background. △ Less

Submitted 16 July, 2022; originally announced July 2022.

arXiv:2205.01397 [pdf, other]

Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)

Authors: Alex Fang, Gabriel Ilharco, Mitchell Wortsman, Yuhao Wan, Vaishaal Shankar, Achal Dave, Ludwig Schmidt

Abstract: Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Con… ▽ More Contrastively trained language-image models such as CLIP, ALIGN, and BASIC have demonstrated unprecedented robustness to multiple challenging natural distribution shifts. Since these language-image models differ from previous training approaches in several ways, an important question is what causes the large robustness gains. We answer this question via a systematic experimental investigation. Concretely, we study five different possible causes for the robustness gains: (i) the training set size, (ii) the training distribution, (iii) language supervision at training time, (iv) language supervision at test time, and (v) the contrastive loss function. Our experiments show that the more diverse training distribution is the main cause for the robustness gains, with the other factors contributing little to no robustness. Beyond our experimental results, we also introduce ImageNet-Captions, a version of ImageNet with original text annotations from Flickr, to enable further controlled experiments of language-image training. △ Less

Submitted 22 August, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2204.05423 [pdf, other]

Automated Task Updates of Temporal Logic Specifications for Heterogeneous Robots

Authors: Amy Fang, Hadas Kress-Gazit

Abstract: Given a heterogeneous group of robots executing a complex task represented in Linear Temporal Logic, and a new set of tasks for the group, we define the task update problem and propose a framework for automatically updating individual robot tasks given their respective existing tasks and capabilities. Our heuristic, token-based, conflict resolution task allocation algorithm generates a near-optima… ▽ More Given a heterogeneous group of robots executing a complex task represented in Linear Temporal Logic, and a new set of tasks for the group, we define the task update problem and propose a framework for automatically updating individual robot tasks given their respective existing tasks and capabilities. Our heuristic, token-based, conflict resolution task allocation algorithm generates a near-optimal assignment for the new task. We demonstrate the scalability of our approach through simulations of multi-robot tasks. △ Less

Submitted 17 April, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA) 2022

arXiv:2203.00079 [pdf, other]

doi 10.1103/PhysRevB.107.045120

Direct observation of discommensurate charge density wave modulation in the quasi-1D Weyl semimetal candidate NbTe$_4$

Authors: J. A. Galvis, A. Fang, D. Jimenez-Guerrero, J. Rojas-Castillo, J. Casas, O. Herrera, A. C. Garcia-Castro, E. Bousquet, I. R. Fisher, A. Kapitulnik, P. Giraldo-Gallo

Abstract: The transition-metal tetrachalcogenides are a model system to explore the conjunction of correlated electronic states such as charge density waves (CDW), with topological phases of matter. Understanding the connection between these phases requires a thorough understanding of the individual states, which for the case of the CDW in this system, is still missing. In this paper we combine phonon-struc… ▽ More The transition-metal tetrachalcogenides are a model system to explore the conjunction of correlated electronic states such as charge density waves (CDW), with topological phases of matter. Understanding the connection between these phases requires a thorough understanding of the individual states, which for the case of the CDW in this system, is still missing. In this paper we combine phonon-structure calculations and scanning tunneling microscopy measurements of NbTe$_4$ in order to provide a full characterization of the CDW state. We find that, at short range, the superstructure formed by the CDW is fully commensurate with the lattice parameters. Moreover, our data reveals the presence of phase-slip domain-walls separating regions of commensurate-CDW in the nanoscale, indicating that the CDW in this compound is discommensurate at long-range. Our results solve a long-standing discussion about the nature of the CDW in these materials, and provide a strong basis for the study of the interplay between this state and other novel quantum electronic states. △ Less

Submitted 28 February, 2022; originally announced March 2022.

Comments: 9 pages, 8 figures (5 in main, 3 in appendix)

Journal ref: Phys. Rev. B 107, 045120 (2023)

arXiv:2112.07183 [pdf, ps, other]

Nonlinear stability of the slowly-rotating Kerr-de Sitter family

Authors: Allen Juntao Fang

Abstract: In this paper, we provide a new proof of nonlinear stability of the slowly-rotating Kerr-de Sitter family of black holes as a family of solutions to the Einstein vacuum equations with cosmological constant $Λ>0$, originally established by Hintz and Vasy in their seminal work [arXiv:1606.04014]. Using the linear theory developed in an upcoming companion paper, we prove the nonlinear stability of sl… ▽ More In this paper, we provide a new proof of nonlinear stability of the slowly-rotating Kerr-de Sitter family of black holes as a family of solutions to the Einstein vacuum equations with cosmological constant $Λ>0$, originally established by Hintz and Vasy in their seminal work [arXiv:1606.04014]. Using the linear theory developed in an upcoming companion paper, we prove the nonlinear stability of slowly-rotating Kerr-de Sitter using a bootstrap argument, avoiding the need for a Nash-Moser argument, and requiring initial data small only in the $H^6$ norm. △ Less

Submitted 17 July, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 54 pages

arXiv:2110.02184 [pdf, other]

doi 10.1103/PhysRevB.105.075111

Revealing a charge-density-wave gap in the predicted weak topological insulator HoSbTe

Authors: J. L. Liu, R. Liu, M. Yang, L. Y. Cao, B. X. Gao, L. Wang, A. F. Fang, Y. G. Shi, Z. P. Yin, R. Y. Chen

Abstract: HoSbTe was predicted to be a weak topological insulator, whose spin-orbit coupling (SOC) gaps are reported to be as large as hundreds of meV. Utilizing infrared spectroscopy, we find that the compound is of metallic nature from 350 K down to 10 K. Particularly, both of its itinerant carrier density and scattering rate are demonstrated to decrease with temperature cooling, which is responsible for… ▽ More HoSbTe was predicted to be a weak topological insulator, whose spin-orbit coupling (SOC) gaps are reported to be as large as hundreds of meV. Utilizing infrared spectroscopy, we find that the compound is of metallic nature from 350 K down to 10 K. Particularly, both of its itinerant carrier density and scattering rate are demonstrated to decrease with temperature cooling, which is responsible for the appearance of a broad hump feature in the temperature dependent resistivity around 200 K. More importantly, we reveal the appearance of a charge density wave (CDW) gap in addition to the SOC related gap. The energy scale of the CDW gap is identified to be 364 meV at 10 K, which shift to 252 meV at 350 K. The coexistence of CDW and SOC gaps in the same compound paves a new avenue to explore more intriguing physics. △ Less

Submitted 5 October, 2021; originally announced October 2021.

Journal ref: Phys. Rev. B 105, 075111 (2022)

arXiv:2108.10263 [pdf, other]

doi 10.1038/s41535-022-00437-7

Possible star-of-David pattern charge density wave with additional modulation in the kagome superconductor CsV$_3$Sb$_5$

Authors: J. Luo, Z. Zhao, Y. Z. Zhou, J. Yang, A. F. Fang, H. T. Yang, H. J. Gao, R. Zhou, Guo-qing Zheng

Abstract: $A$V$_3$Sb$_5$ ($A$ = K, Rb, Cs) is a novel kagome superconductor coexisting with the charge density wave (CDW) order. Identifying the structure of the CDW order is crucial for understanding the exotic normal state and superconductivity in this system. Here, we report $^{51}$V nuclear magnetic resonance (NMR) and $^{121/123}$Sb nuclear quadrupole resonance (NQR) studies on kagome-metal CsV$_3$Sb$_… ▽ More $A$V$_3$Sb$_5$ ($A$ = K, Rb, Cs) is a novel kagome superconductor coexisting with the charge density wave (CDW) order. Identifying the structure of the CDW order is crucial for understanding the exotic normal state and superconductivity in this system. Here, we report $^{51}$V nuclear magnetic resonance (NMR) and $^{121/123}$Sb nuclear quadrupole resonance (NQR) studies on kagome-metal CsV$_3$Sb$_5$. Below the CDW transition temperature $T_\textrm{CDW} \sim$ 98 K, an abrupt change of spectra was observed, indicating that the transition is of the first order. By further analysing the spectra, we find that the CDW order is commensurate. And most remarkably, the obtained experimental results suggest that the charge modulation of the CDW order is of star-of-David pattern and accompanied by an additional charge modulation in bulk below $T^* \sim$ 40 K. Our results revealing the unconventional CDW order provide new insights into $A$V$_3$Sb$_5$. △ Less

Submitted 18 March, 2022; v1 submitted 23 August, 2021; originally announced August 2021.

Comments: 15 pages, 4 figures

Journal ref: npj Quantum Materials 7, 30 (2022)

arXiv:2104.12347 [pdf, other]

Dynamic Image Restoration and Fusion Based on Dynamic Degradation

Authors: Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Yanning Zhang

Abstract: The deep-learning-based image restoration and fusion methods have achieved remarkable results. However, the existing restoration and fusion methods paid little research attention to the robustness problem caused by dynamic degradation. In this paper, we propose a novel dynamic image restoration and fusion neural network, termed as DDRF-Net, which is capable of solving two problems, i.e., static re… ▽ More The deep-learning-based image restoration and fusion methods have achieved remarkable results. However, the existing restoration and fusion methods paid little research attention to the robustness problem caused by dynamic degradation. In this paper, we propose a novel dynamic image restoration and fusion neural network, termed as DDRF-Net, which is capable of solving two problems, i.e., static restoration and fusion, dynamic degradation. In order to solve the static fusion problem of existing methods, dynamic convolution is introduced to learn dynamic restoration and fusion weights. In addition, a dynamic degradation kernel is proposed to improve the robustness of image restoration and fusion. Our network framework can effectively combine image degradation with image fusion tasks, provide more detailed information for image fusion tasks through image restoration loss, and optimize image restoration tasks through image fusion loss. Therefore, the stumbling blocks of deep learning in image fusion, e.g., static fusion weight and specifically designed network architecture, are greatly mitigated. Extensive experiments show that our method is more superior compared with the state-of-the-art methods. △ Less

Submitted 30 April, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

arXiv:2104.07489 [pdf, ps, other]

The group invertibility of matrices over Bézout domains

Authors: Dayong Liu, Aixiang Fang

Abstract: Let $R$ be a Bézout domain, and let $A,B,C\in R^{n\times n}$ with $ABA=ACA$. If $AB$ and $CA$ are group invertible, we prove that $AB$ is similar to $CA$. Moreover, we have $(AB)^{\#}$ is similar to $(CA)^{\#}$. This generalize the main result of Cao and Li(Group inverses for matrices over a Bézout domain, {\it Electronic J. Linear Algebra}, {\bf 18}(2009), 600--612). Let $R$ be a Bézout domain, and let $A,B,C\in R^{n\times n}$ with $ABA=ACA$. If $AB$ and $CA$ are group invertible, we prove that $AB$ is similar to $CA$. Moreover, we have $(AB)^{\#}$ is similar to $(CA)^{\#}$. This generalize the main result of Cao and Li(Group inverses for matrices over a Bézout domain, {\it Electronic J. Linear Algebra}, {\bf 18}(2009), 600--612). △ Less

Submitted 4 February, 2022; v1 submitted 15 April, 2021; originally announced April 2021.

Comments: 11 pages

MSC Class: 15A09; 16E50; 16U90

arXiv:2103.02991 [pdf, ps, other]

doi 10.1088/1674-1056/abec37

Nodal superconducting gap in LiFeP revealed by NMR: contrast with LiFeAs

Authors: A. F. Fang, R. Zhou, H. Tukada, J. Yang, Z. Deng, X. C. Wang, C. Q. Jin, Guo-qing Zheng

Abstract: Identifying the uniqueness of FeP-based superconductors may shed new lights on the mechanism of superconductivity in iron-pnictides. Here, we report nuclear magnetic resonance(NMR) studies on LiFeP and LiFeAs which have the same crystal structure but different pnictogen atoms. The NMR spectrum is sensitive to inhomogeneous magnetic fields in the vortex state and can provide the information on the… ▽ More Identifying the uniqueness of FeP-based superconductors may shed new lights on the mechanism of superconductivity in iron-pnictides. Here, we report nuclear magnetic resonance(NMR) studies on LiFeP and LiFeAs which have the same crystal structure but different pnictogen atoms. The NMR spectrum is sensitive to inhomogeneous magnetic fields in the vortex state and can provide the information on the superconducting pairing symmetry through the temperature dependence of London penetration depth $λ_L$. We find that $λ_L$ saturates below $T \sim 0.2$ $T_c$ in LiFeAs, where $T_c$ is the superconducting transition temperature, indicating nodeless superconducting gaps. Furthermore, by using a two-gaps model, we simulate the temperature dependence of $λ_L$ and obtain the superconducting gaps of LiFeAs, as $Δ_1 = 1.2$ $k_B T_c$ and $Δ_2 = 2.8$ $k_B T_c$, in agreement with previous result from spin-lattice relaxation. For LiFeP, in contrast, the London penetration depth $λ_L$ does not show any saturation down to $T \sim 0.03 $ $T_c$, indicating nodes in the superconducting energy gap function. Finally, we demonstrate that the strong spin fluctuations with diffusive characteristics exist in LiFeP, as in some cuprate high temperature superconductors. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: 14 pages, 6 figures, to appear in Chinese Phys. B

Journal ref: Chinese Phys. B 30 047403 (2021)

arXiv:2012.01270 [pdf, ps, other]

The Generalized Flanders' Theorem in Unit-regular Rings

Authors: Dayong Liu, Aixiang Fang

Abstract: Let R be a unit-regular ring, and let a,b,c in R satisfy aba=aca. If ac and ba are group invertible, we prove that ac is similar to ba. Furthermore, if ac and ba are Drazin invertible, then their Drazin inverses are similar. For any n\times n complex matrices A,B,C with ABA = ACA ,we prove that AC and BA are similar if and only if their k-powers have the same rank. These generalize the known Fland… ▽ More Let R be a unit-regular ring, and let a,b,c in R satisfy aba=aca. If ac and ba are group invertible, we prove that ac is similar to ba. Furthermore, if ac and ba are Drazin invertible, then their Drazin inverses are similar. For any n\times n complex matrices A,B,C with ABA = ACA ,we prove that AC and BA are similar if and only if their k-powers have the same rank. These generalize the known Flanders' theorem proved by Hartwig. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: 8 pages

MSC Class: 15A09; 16E50; 16U90

arXiv:2011.12617 [pdf, other]

Characterization of magnetic field noise in the ARIADNE source mass rotor

Authors: Nancy Aggarwal, Allard Schnabel, Jens Voigt, Alex Brown, Josh C Long, L. Trahms, A. Fang, Andrew Geraci, A. Kapitulnik, D. Kim, Y. Kim, I. Lee, Y. H. Lee, C. Y. Liu, C. Lohmeyer, A. Reid, Y. Semertzidis, Y. Shin, J. Shortino, E. Smith, W. M. Snow, E. Weisman

Abstract: ARIADNE is a nuclear-magnetic-resonance-based experiment that will search for novel axion-induced spin-dependent interactions between an unpolarized source mass rotor and a nearby sample of spin-polarized $^3$He gas. To detect feeble axion signals at the sub-atto-Tesla level, the experiment relies on low magnetic background and noise. We measure and characterize the magnetic field background from… ▽ More ARIADNE is a nuclear-magnetic-resonance-based experiment that will search for novel axion-induced spin-dependent interactions between an unpolarized source mass rotor and a nearby sample of spin-polarized $^3$He gas. To detect feeble axion signals at the sub-atto-Tesla level, the experiment relies on low magnetic background and noise. We measure and characterize the magnetic field background from a prototype tungsten rotor. We show that the requirement is met with our current level of tungsten purity and demagnetization process. We further show that the noise is dominantly caused by a few discrete dipoles, likely due to a few impurities trapped inside the rotor during manufacturing. This is done via a numerical optimization pipeline which fits for the locations and magnetic moments of each dipole. We find that under the current demagnetization, the magnetic moment of trapped impurities is bounded at $10^{-9} \mathrm{A}\mathrm{m}^2$. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: 6 pages, 5 figures

arXiv:2011.10141 [pdf, other]

doi 10.1007/978-3-030-43761-9

Source mass characterization in the ARIADNE axion experiment

Authors: Chloe Lohmeyer, Nancy Aggarwal, Asimina Arvanitaki, Alex Brown, Alan Fang, Andrew A Geraci, Aharon Kapitulnik, Dongok Kim, Younggeun Kim, Inbum Lee, Yong Ho Lee, Eli Levenson-Falk, Chen Yu Liu, Josh C Long, Sam Mumford, Austin Reid, Allard Schnabel, Yannis Semertzidis, Yun Shin, Justin Shortino, Eric Smith, William M Snow, Lutz Trahms, Jens Voigt, Evan Weisman

Abstract: The Axion Resonant InterAction Detection Experiment (ARIADNE) is a collaborative effort to search for the QCD axion using nuclear magnetic resonance (NMR), where the axion acts as a mediator of spin-dependent forces between an unpolarized tungsten source mass and a sample of polarized helium-3 gas. Since the experiment involves precision measurement of a small magnetization, it relies on limiting… ▽ More The Axion Resonant InterAction Detection Experiment (ARIADNE) is a collaborative effort to search for the QCD axion using nuclear magnetic resonance (NMR), where the axion acts as a mediator of spin-dependent forces between an unpolarized tungsten source mass and a sample of polarized helium-3 gas. Since the experiment involves precision measurement of a small magnetization, it relies on limiting ordinary magnetic noise with superconducting magnetic shielding. In addition to the shielding, proper characterization of the noise level from other sources is crucial. We investigate one such noise source in detail: the magnetic noise due to impurities and Johnson noise in the tungsten source mass. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: ARIADNE Collaboration

Journal ref: Proceedings of the 3rd International Workshop on Microwave Cavities and Detectors for Axion Research 2020

arXiv:2011.07293 [pdf, ps, other]

Dynamical Effective Field Model for Interacting Ferrofluids: II. The proper relaxation time and effects of dynamic correlations

Authors: Angbo Fang

Abstract: The recently proposed dynamical effective field model (DEFM) is quantitatively accurate for describing dynamical magnetic response of ferrofluids. In paper I it is derived under the framework of dynamical density functional theory, via which the original ensemble of bare Brownian particles is mapped to an ensemble of dressed particles. However, it remains to clarify how the characteristic rotation… ▽ More The recently proposed dynamical effective field model (DEFM) is quantitatively accurate for describing dynamical magnetic response of ferrofluids. In paper I it is derived under the framework of dynamical density functional theory, via which the original ensemble of bare Brownian particles is mapped to an ensemble of dressed particles. However, it remains to clarify how the characteristic rotational relaxation time of a dressed particle, denoted by $τ_r$, is quantitatively related to that of a bare particle, denoted by $τ^0_r$. By building macro-micro connections via two different routes, I reveal that under some gentle assumptions $τ_r$ can be identified with the long-time rotational self-diffusion time. I further introduce two simple but useful integrated correlation factors, describing the effects of quasi-static (adiabatic) and dynamic (nonadiabatic) inter-particle correlations, respectively. In terms of both correlation factors I reformulate the dynamic magnetic susceptibility in an illuminating and elegant form. Remarkably, it shows that the macro-micro connection is established via two successive steps: a dynamical coarse-graining with nonadiabatic effects accounted for by the dynamic factor, followed by equilibrium statistical mechanical averaging captured by the static factor. Surprisingly, $τ_r/τ^0_r$ is found insensitive to changes of particle volume fraction. I provide a physical picture to explain it. Furthermore, an empirical formula is proposed to characterize the dependence of $τ_r/τ^0_r$ on dipole-dipole interaction strength. The DEFM supplemented with this formula leads to parameter-free predictions in good agreement with results from Brownian dynamics simulations. The theoretical developments presented in this paper may have important consequences to studies of ferrofluid dynamics in particular and other systems modelled by DDFTs in general. △ Less

Submitted 14 November, 2020; originally announced November 2020.

Comments: 28 pages, 3 figures

arXiv:2011.07287 [pdf, ps, other]

Dynamical Effective Field Model for Interacting Ferrofluids: I. Derivations for homogeneous, inhomogeneous, and polydisperse cases

Authors: Angbo Fang

Abstract: Quite recently I have proposed a nonperturbative dynamical effective field model (DEFM) to quantitatively describe the dynamics of interacting ferrofluids. Its predictions compare very well with the results from simulations. In this paper I put the DEFM on firm theoretical ground by deriving it within the framework of dynamical density functional theory (DDFT), in which the relevant part of correl… ▽ More Quite recently I have proposed a nonperturbative dynamical effective field model (DEFM) to quantitatively describe the dynamics of interacting ferrofluids. Its predictions compare very well with the results from simulations. In this paper I put the DEFM on firm theoretical ground by deriving it within the framework of dynamical density functional theory (DDFT), in which the relevant part of correlation-induced free energy is approximated by a function of the instantaneous magnetization. The DEFM is generalized to inhomogeneous finite-size samples for which the macroscopic and mesoscopic scale separation is nontrivial due to the presence of long-range dipole-dipole interactions. The demagnetizing field naturally emerges from microscopic considerations and is consistently accounted for. The resulting particle dynamics on the mesoscopic scale only involves macroscopically local quantities such as local magnetization and Maxwell field. Nevertheless, the local demagnetizing field essentially couples to magnetization at distant macroscopic locations. Thus, a two-scale parallel algorithm, involving information transfer between different macroscopic locations, can be applied to fully resolve particle rotational dynamics in an inhomogeneous sample. I also derive the DEFM for polydisperse ferrofluids, in which the dynamics of particles belonging to different species can be strongly coupled to each other. I discuss the underlying assumptions in obtaining a thermodynamically consistent polydisperse magnetization relaxation equation, which is of the same generic form as that for monodisperse ferrofluids. The theoretical advances presented in this paper are important for both qualitative understanding and quantitative modeling of ferrofluid dynamics. △ Less

Submitted 17 November, 2020; v1 submitted 14 November, 2020; originally announced November 2020.

Comments: 28 pages, no figures

arXiv:2011.07277 [pdf, ps, other]

doi 10.1039/C9SM02072A

Generic Theory of the Dynamic Magnetic Response of Ferrofluids

Authors: Angbo Fang

Abstract: Ferrofluids belong to an important class of highly functional soft matter, benefiting from their magnetically controllable physical properties. Therefore, it is of central importance to quantitatively predict the dynamic magnetic response of ferrofluids. Traditional dynamic theories, however, are often restricted to the near-equilibrium regime and/or only apply to nearly ideal ferrofluids that are… ▽ More Ferrofluids belong to an important class of highly functional soft matter, benefiting from their magnetically controllable physical properties. Therefore, it is of central importance to quantitatively predict the dynamic magnetic response of ferrofluids. Traditional dynamic theories, however, are often restricted to the near-equilibrium regime and/or only apply to nearly ideal ferrofluids that are monodisperse, dilute enough, and weakly interacting. In this paper I develop a self-consistent and nonperturbative dynamical mean field theory for typical ferrofluids which are often polydisperse, concentrated, and strongly interacting, possibly driven far from equilibrium. I obtain a general nonperturbative expression for the dynamic magnetic susceptibility, quantitatively agreeing with the spectra obtained from Brownian Dynamics simulations on both mono- and bidisperse samples. Furthermore, I derive a generic magnetization relaxation equation (MRE) for both mono- and polydisperse ferrofluids by employing the projection operator technique in nonequlibrium statistical mechanics. This MRE is in simple closed form and independent of which model is employed to approximate the equilibrium magnetization curve. Existing models can be recovered as low-order approximations of my generic and nonperturbative MRE. My theory can play a key role in studying the dynamics of ferrofluids and other polar fluids. It may also have substantial and immediate consequences to various ferrofluid applications. △ Less

Submitted 14 November, 2020; originally announced November 2020.

Comments: 8 pages, 3 figures, to appear in Soft Matter

arXiv:2010.01863 [pdf, other]

AE-Netv2: Optimization of Image Fusion Efficiency and Network Architecture

Authors: Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Beibei Qin, Yanning Zhang

Abstract: Existing image fusion methods pay few research attention to image fusion efficiency and network architecture. However, the efficiency and accuracy of image fusion has an important impact in practical applications. To solve this problem, we propose an \textit{efficient autonomous evolution image fusion method, dubed by AE-Netv2}. Different from other image fusion methods based on deep learning, AE-… ▽ More Existing image fusion methods pay few research attention to image fusion efficiency and network architecture. However, the efficiency and accuracy of image fusion has an important impact in practical applications. To solve this problem, we propose an \textit{efficient autonomous evolution image fusion method, dubed by AE-Netv2}. Different from other image fusion methods based on deep learning, AE-Netv2 is inspired by human brain cognitive mechanism. Firstly, we discuss the influence of different network architecture on image fusion quality and fusion efficiency, which provides a reference for the design of image fusion architecture. Secondly, we explore the influence of pooling layer on image fusion task and propose an image fusion method with pooling layer. Finally, we explore the commonness and characteristics of different image fusion tasks, which provides a research basis for further research on the continuous learning characteristics of human brain in the field of image fusion. Comprehensive experiments demonstrate the superiority of AE-Netv2 compared with state-of-the-art methods in different fusion tasks at a real time speed of 100+ FPS on GTX 2070. Among all tested methods based on deep learning, AE-Netv2 has the faster speed, the smaller model size and the better robustness. △ Less

Submitted 6 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: Some mistakes have been fixed

arXiv:2007.08763 [pdf, other]

AE-Net: Autonomous Evolution Image Fusion Method Inspired by Human Cognitive Mechanism

Authors: Aiqing Fang, Xinbo Zhao, Jiaqi Yang, Shihao Cao, Yanning Zhang

Abstract: In order to solve the robustness and generality problems of the image fusion task,inspired by the human brain cognitive mechanism, we propose a robust and general image fusion method with autonomous evolution ability, and is therefore denoted with AE-Net. Through the collaborative optimization of multiple image fusion methods to simulate the cognitive process of human brain, unsupervised learning… ▽ More In order to solve the robustness and generality problems of the image fusion task,inspired by the human brain cognitive mechanism, we propose a robust and general image fusion method with autonomous evolution ability, and is therefore denoted with AE-Net. Through the collaborative optimization of multiple image fusion methods to simulate the cognitive process of human brain, unsupervised learning image fusion task can be transformed into semi-supervised image fusion task or supervised image fusion task, thus promoting the evolutionary ability of network model weight. Firstly, the relationship between human brain cognitive mechanism and image fusion task is analyzed and a physical model is established to simulate human brain cognitive mechanism. Secondly, we analyze existing image fusion methods and image fusion loss functions, select the image fusion method with complementary features to construct the algorithm module, establish the multi-loss joint evaluation function to obtain the optimal solution of algorithm module. The optimal solution of each image is used to guide the weight training of network model. Our image fusion method can effectively unify the cross-modal image fusion task and the same modal image fusion task, and effectively overcome the difference of data distribution between different datasets. Finally, extensive numerical results verify the effectiveness and superiority of our method on a variety of image fusion datasets, including multi-focus dataset, infrared and visi-ble dataset, medical image dataset and multi-exposure dataset. Comprehensive experiments demonstrate the superiority of our image fusion method in robustness and generality. In addition, experimental results also demonstate the effectiveness of human brain cognitive mechanism to improve the robustness and generality of image fusion. △ Less

Submitted 17 July, 2020; originally announced July 2020.

arXiv:2006.14191 [pdf, other]

doi 10.1103/PhysRevResearch.2.043221

Robust superconductivity intertwined with charge density wave and disorder in Pd-intercalated ErTe$_3$

Authors: Alan Fang, Anisha G. Singh, Joshua A. W. Straquadine, Ian R. Fisher, Steven A. Kivelson, Aharon Kapitulnik

Abstract: Pd-intercalated ErTe$_3$ is studied as a model system to explore the effect of "intertwined" superconducting and charge density wave (CDW) orders. Despite the common wisdom that superconductivity emerges only when CDW is suppressed, we present data from STM and AC susceptibility measurements that show no direct competition between CDW order and superconductivity. Both coexist over most of the inte… ▽ More Pd-intercalated ErTe$_3$ is studied as a model system to explore the effect of "intertwined" superconducting and charge density wave (CDW) orders. Despite the common wisdom that superconductivity emerges only when CDW is suppressed, we present data from STM and AC susceptibility measurements that show no direct competition between CDW order and superconductivity. Both coexist over most of the intercalation range, with uniform superconductivity over length scales that exceed the superconducting coherence length. This is despite persisting short-range CDW order and increased scattering from the Pd intercalation. While superconductivity is insensitive to local defects in either of the bi-directional CDWs, vestiges of the Fermi-level distortions are observed in the properties of the superconducting state. △ Less

Submitted 25 June, 2020; originally announced June 2020.

Comments: 10 pages, 6 figures

Journal ref: Phys. Rev. Research 2, 043221 (2020)

arXiv:2006.13331 [pdf, other]

Incorporating Music Knowledge in Continual Dataset Augmentation for Music Generation

Authors: Alisa Liu, Alexander Fang, Gaëtan Hadjeres, Prem Seetharaman, Bryan Pardo

Abstract: Deep learning has rapidly become the state-of-the-art approach for music generation. However, training a deep model typically requires a large training set, which is often not available for specific musical styles. In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain. The key intuition… ▽ More Deep learning has rapidly become the state-of-the-art approach for music generation. However, training a deep model typically requires a large training set, which is often not available for specific musical styles. In this paper, we present augmentative generation (Aug-Gen), a method of dataset augmentation for any music generation system trained on a resource-constrained domain. The key intuition of this method is that the training data for a generative system can be augmented by examples the system produces during the course of training, provided these examples are of sufficiently high quality and variety. We apply Aug-Gen to Transformer-based chorale generation in the style of J.S. Bach, and show that this allows for longer training and results in better generative output. △ Less

Submitted 20 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: 2 pages, 2 figures, Machine Learning for Media Discovery (ML4MD) Workshop at ICML 2020

arXiv:2006.13329 [pdf, other]

Bach or Mock? A Grading Function for Chorales in the Style of J.S. Bach

Authors: Alexander Fang, Alisa Liu, Prem Seetharaman, Bryan Pardo

Abstract: Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems. Thus, it can be difficult to determine whether deep models generate stylistically correct output without expert evaluation, but this is expensive and time-consuming. Therefore, there is a need for automatic, inter… ▽ More Deep generative systems that learn probabilistic models from a corpus of existing music do not explicitly encode knowledge of a musical style, compared to traditional rule-based systems. Thus, it can be difficult to determine whether deep models generate stylistically correct output without expert evaluation, but this is expensive and time-consuming. Therefore, there is a need for automatic, interpretable, and musically-motivated evaluation measures of generated music. In this paper, we introduce a grading function that evaluates four-part chorales in the style of J.S. Bach along important musical features. We use the grading function to evaluate the output of a Transformer model, and show that the function is both interpretable and outperforms human experts at discriminating Bach chorales from model-generated ones. △ Less

Submitted 17 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

Comments: 2 pages, 3 figures, Machine Learning for Media Discovery (ML4MD) Workshop at ICML 2020

Showing 1–50 of 95 results for author: Fang, A