Search | arXiv e-print repository

Coupled Stochastic-Statistical Equations for Filtering Multiscale Turbulent Systems

Abstract: We present a new strategy for filtering high-dimensional multiscale systems characterized by high-order non-Gaussian statistics using observations from leading-order moments. A closed stochastic-statistical modeling framework suitable for systematic theoretical analysis and efficient numerical simulations is designed. Optimal filtering solutions are derived based on the explicit coupling structure… ▽ More We present a new strategy for filtering high-dimensional multiscale systems characterized by high-order non-Gaussian statistics using observations from leading-order moments. A closed stochastic-statistical modeling framework suitable for systematic theoretical analysis and efficient numerical simulations is designed. Optimal filtering solutions are derived based on the explicit coupling structures of stochastic and statistical equations subject to linear operators, which satisfy an infinite-dimensional Kalman-Bucy filter with conditional Gaussian dynamics. To facilitate practical implementation, we develop a finite-dimensional stochastic filter model that approximates the optimal filter solution. We prove that this approximating filter effectively captures key non-Gaussian features, demonstrating consistent statistics with the optimal filter first in its analysis step update, then at the long-time limit guaranteeing stable convergence to the optimal filter. Finally, we build a practical ensemble filter algorithm based on the approximating filtering model, which enables accurate recovery of the true model statistics. The proposed modeling and filtering strategies are applicable to a wide range challenging problems in science and engineering, particularly for statistical prediction and uncertainty quantification of multiscale turbulent states. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 35 pages

arXiv:2406.11434 [pdf, other]

DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

Authors: Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen

Abstract: Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper,… ▽ More Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper, we present DB-GPT-Hub, an open benchmark suite for LLM-empowered text-to-SQL, which primarily focuses on tuning LLMs at large scales. The proposed benchmark consists of: 1. a standardized and comprehensive evaluation of text-to-SQL tasks by fine-tuning medium to large-sized open LLMs; 2. a modularized and easy-to-extend codebase with mainstream LLMs and experimental scenarios supported, which prioritizes fine-tuning methods but can be easily extended to prompt-based setting. Our work investigates the potential gains and the performance boundaries of tuning approaches, compared to prompting approaches and explores optimal solutions tailored to specific scenarios. We hope DB-GPT-Hub, along with these findings, enables further research and broad applications that would otherwise be difficult owing to the absence of a dedicated open benchmark. The project code has been released at https://github.com/eosphoros-ai/DB-GPT-Hub. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10839 [pdf, other]

Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags

Authors: Daiqing Qi, Handong Zhao, Zijun Wei, Sheng Li

Abstract: Despite recent advances in the general visual instruction-following ability of Multimodal Large Language Models (MLLMs), they still struggle with critical problems when required to provide a precise and detailed response to a visual instruction: (1) failure to identify novel objects or entities, (2) mention of non-existent objects, and (3) neglect of object's attributed details. Intuitive solution… ▽ More Despite recent advances in the general visual instruction-following ability of Multimodal Large Language Models (MLLMs), they still struggle with critical problems when required to provide a precise and detailed response to a visual instruction: (1) failure to identify novel objects or entities, (2) mention of non-existent objects, and (3) neglect of object's attributed details. Intuitive solutions include improving the size and quality of data or using larger foundation models. They show effectiveness in mitigating these issues, but at an expensive cost of collecting a vast amount of new data and introducing a significantly larger model. Standing at the intersection of these approaches, we examine the three object-oriented problems from the perspective of the image-to-text mapping process by the multimodal connector. In this paper, we first identify the limitations of multimodal connectors stemming from insufficient training data. Driven by this, we propose to enhance the mapping with retrieval-augmented tag tokens, which contain rich object-aware information such as object names and attributes. With our Tag-grounded visual instruction tuning with retrieval Augmentation (TUNA), we outperform baselines that share the same language model and training data on 12 benchmarks. Furthermore, we show the zero-shot capability of TUNA when provided with specific datastores. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 18 pages, 11 figures

arXiv:2405.17790 [pdf, other]

Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Instruct-ReID is the first exploration of a general ReID setting, where existing 6 ReID tasks can be viewed as special cases by assigning different instructions. To facilitate research in this new instruct-ReID task, we propose a large-scale OmniReID++ benchmark equipped with diverse data and comprehensive evaluation methods e.g., task specific and task-free evaluation settings. In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks. We propose a novel baseline model, IRM, with an adaptive triplet loss to handle various retrieval tasks within a unified framework. For task-free evaluation setting, where target person images are retrieved from task-agnostic gallery sets, we further propose a new method called IRM++ with novel memory bank-assisted learning. Extensive evaluations of IRM and IRM++ on OmniReID++ benchmark demonstrate the superiority of our proposed methods, achieving state-of-the-art performance on 10 test sets. The datasets, the model, and the code will be available at https://github.com/hwz-zju/Instruct-ReID △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

arXiv:2405.12830 [pdf]

Pick-and-place transfer of arbitrary-metal electrodes for van der Waals device fabrication

Authors: Kaijian Xing, Daniel McEwen, Weiyao Zhao, Abdulhakim Bake, David Cortie, Jingying Liu, Thi-Hai-Yen Vu, James Hone, Alastair Stacey, Mark T. Edmonds, Kenji Watanabe, Takashi Taniguchi, Qingdong Ou, Dong-Chen Qi, Michael S. Fuhrer

Abstract: Van der Waals electrode integration is a promising strategy to create near-perfect interfaces between metals and two-dimensional materials, with advantages such as eliminating Fermi-level pinning and reducing contact resistance. However, the lack of a simple, generalizable pick-and-place transfer technology has greatly hampered the wide use of this technique. We demonstrate the pick-and-place tran… ▽ More Van der Waals electrode integration is a promising strategy to create near-perfect interfaces between metals and two-dimensional materials, with advantages such as eliminating Fermi-level pinning and reducing contact resistance. However, the lack of a simple, generalizable pick-and-place transfer technology has greatly hampered the wide use of this technique. We demonstrate the pick-and-place transfer of pre-fabricated electrodes from reusable polished hydrogenated diamond substrates without the use of any surface treatments or sacrificial layers. The technique enables transfer of large-scale arbitrary metal electrodes, as demonstrated by successful transfer of eight different elemental metals with work functions ranging from 4.22 to 5.65 eV. The mechanical transfer of metal electrodes from diamond onto van der Waals materials creates atomically smooth interfaces with no interstitial impurities or disorder, as observed with cross-sectional high-resolution transmission electron microscopy and energy-dispersive X-ray spectroscopy. As a demonstration of its device application, we use the diamond-transfer technique to create metal contacts to monolayer transition metal dichalcogenide semiconductors with high-work-function Pd, low-work-function Ti, and semi metal Bi to create n- and p-type field-effect transistors with low Schottky barrier heights. We also extend this technology to other applications such as ambipolar transistor and optoelectronics, paving the way for new device architectures and high-performance devices. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.10209 [pdf, other]

Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interaction tasks to enhance user experience and accessibility. DB-GPT is designed to understand data interaction tasks described by natural language and provide context-aware responses powered by LLMs, making it an indispensable tool for users ranging from novice to expert. Its system design supports deployment across local, distributed, and cloud environments. Beyond handling basic data interaction tasks like Text-to-SQL with LLMs, it can handle complex tasks like generative data analysis through a Multi-Agents framework and the Agentic Workflow Expression Language (AWEL). The Service-oriented Multi-model Management Framework (SMMF) ensures data privacy and security, enabling users to employ DB-GPT with private LLMs. Additionally, DB-GPT offers a series of product-ready features designed to enable users to integrate DB-GPT within their product environments easily. The code of DB-GPT is available at Github(https://github.com/eosphoros-ai/DB-GPT) which already has over 10.7k stars. Please install DB-GPT for your own usage with the instructions(https://github.com/eosphoros-ai/DB-GPT#install) and watch a 5-minute introduction video on Youtube(https://youtu.be/n_8RI1ENyl4) to further investigate DB-GPT. △ Less

Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.02617 [pdf, other]

Neural Radiance Fields with Torch Units

Authors: Bingnan Ni, Huanyu Wang, Dongfeng Bai, Minghe Weng, Dexin Qi, Weichao Qiu, Bingbing Liu

Abstract: Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current i… ▽ More Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current inference pattern, $i.e.$, a pixel only relies on an individual camera ray, fails to capture contextual information. To solve these problems, we propose to enlarge the ray perception field and build up the sample points interactions. In this paper, we design a novel inference pattern that encourages a single camera ray possessing more contextual information, and models the relationship among sample points on each camera ray. To hold contextual information,a camera ray in our proposed method can render a patch of pixels simultaneously. Moreover, we replace the MLP in neural radiance field models with distance-aware convolutions to enhance the feature propagation among sample points from the same camera ray. To summarize, as a torchlight, a ray in our proposed method achieves rendering a patch of image. Thus, we call the proposed method, Torch-NeRF. Extensive experiments on KITTI-360 and LLFF show that the Torch-NeRF exhibits excellent performance. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.19369 [pdf, other]

RAIL: Robot Affordance Imagination with Large Language Models

Authors: Ceng Zhang, Xin Meng, Dongchen Qi, Gregory S. Chirikjian

Abstract: This paper introduces an automatic affordance reasoning paradigm tailored to minimal semantic inputs, addressing the critical challenges of classifying and manipulating unseen classes of objects in household settings. Inspired by human cognitive processes, our method integrates generative language models and physics-based simulators to foster analytical thinking and creative imagination of novel a… ▽ More This paper introduces an automatic affordance reasoning paradigm tailored to minimal semantic inputs, addressing the critical challenges of classifying and manipulating unseen classes of objects in household settings. Inspired by human cognitive processes, our method integrates generative language models and physics-based simulators to foster analytical thinking and creative imagination of novel affordances. Structured with a tripartite framework consisting of analysis, imagination, and evaluation, our system "analyzes" the requested affordance names into interaction-based definitions, "imagines" the virtual scenarios, and "evaluates" the object affordance. If an object is recognized as possessing the requested affordance, our method also predicts the optimal pose for such functionality, and how a potential user can interact with it. Tuned on only a few synthetic examples across 3 affordance classes, our pipeline achieves a very high success rate on affordance classification and functional pose prediction of 8 classes of novel objects, outperforming learning-based baselines. Validation through real robot manipulating experiments demonstrates the practical applicability of the imagined user interaction, showcasing the system's ability to independently conceptualize unseen affordances and interact with new objects and scenarios in everyday settings. △ Less

Submitted 7 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.08291 [pdf, other]

CleanAgent: Automating Data Standardization with LLM-based Agents

Authors: Danrui Qi, Jiannan Wang

Abstract: Data standardization is a crucial part in data science life cycle. While tools like Pandas offer robust functionalities, their complexity and the manual effort required for customizing code to diverse column types pose significant challenges. Although large language models (LLMs) like ChatGPT have shown promise in automating this process through natural language understanding and code generation,… ▽ More Data standardization is a crucial part in data science life cycle. While tools like Pandas offer robust functionalities, their complexity and the manual effort required for customizing code to diverse column types pose significant challenges. Although large language models (LLMs) like ChatGPT have shown promise in automating this process through natural language understanding and code generation, it still demands expert-level programming knowledge and continuous interaction for prompt refinement. To solve these challenges, our key idea is to propose a Python library with declarative, unified APIs for standardizing column types, simplifying the code generation of LLM with concise API calls. We first propose Dataprep.Clean which is written as a component of the Dataprep Library, offers a significant reduction in complexity by enabling the standardization of specific column types with a single line of code. Then we introduce the CleanAgent framework integrating Dataprep.Clean and LLM-based agents to automate the data standardization process. With CleanAgent, data scientists need only provide their requirements once, allowing for a hands-free, automatic standardization process. △ Less

Submitted 24 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.06367 [pdf, other]

FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables

Authors: Danrui Qi, Weiling Zheng, Jiannan Wang

Abstract: Feature augmentation from one-to-many relationship tables is a critical but challenging problem in ML model development. To augment good features, data scientists need to come up with SQL queries manually, which is time-consuming. Featuretools [1] is a widely used tool by the data science community to automatically augment the training data by extracting new features from relevant tables. It repre… ▽ More Feature augmentation from one-to-many relationship tables is a critical but challenging problem in ML model development. To augment good features, data scientists need to come up with SQL queries manually, which is time-consuming. Featuretools [1] is a widely used tool by the data science community to automatically augment the training data by extracting new features from relevant tables. It represents each feature as a group-by aggregation SQL query on relevant tables and can automatically generate these SQL queries. However, it does not include predicates in these queries, which significantly limits its application in many real-world scenarios. To overcome this limitation, we propose FEATAUG, a new feature augmentation framework that automatically extracts predicate-aware SQL queries from one-to-many relationship tables. This extension is not trivial because considering predicates will exponentially increase the number of candidate queries. As a result, the original Featuretools framework, which materializes all candidate queries, will not work and needs to be redesigned. We formally define the problem and model it as a hyperparameter optimization problem. We discuss how the Bayesian Optimization can be applied here and propose a novel warm-up strategy to optimize it. To make our algorithm more practical, we also study how to identify promising attribute combinations for predicates. We show that how the beam search idea can partially solve the problem and propose several techniques to further optimize it. Our experiments on four real-world datasets demonstrate that FeatAug extracts more effective features compared to Featuretools and other baselines. The code is open-sourced at https://github.com/sfu-db/FeatAug △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2402.13942 [pdf, other]

doi 10.1063/5.0207687

The Maintenance of Coherent Vortex Topology by Lagrangian Chaos in Drift-Rossby Wave Turbulence

Authors: Norman M. Cao, Di Qi

Abstract: This work introduces the "potential vorticity bucket brigade," a mechanism for explaining the resilience of vortex structures in magnetically confined fusion plasmas and geophysical flows. Drawing parallels with zonal jet formation, we show how inhomogeneous patterns of mixing can reinforce, rather than destroy non-zonal flow structure. We accomplish this through an exact stochastic Lagrangian rep… ▽ More This work introduces the "potential vorticity bucket brigade," a mechanism for explaining the resilience of vortex structures in magnetically confined fusion plasmas and geophysical flows. Drawing parallels with zonal jet formation, we show how inhomogeneous patterns of mixing can reinforce, rather than destroy non-zonal flow structure. We accomplish this through an exact stochastic Lagrangian representation of vorticity transport, together with a near-integrability property, which relates coherent flow topology to fluid relabeling symmetries. We demonstrate these ideas in the context of gradient-driven magnetized plasma turbulence, though the tools we develop here are model-agnostic and applicable beyond the system studied here. △ Less

Submitted 3 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Journal ref: Physics of Fluids 36, 061701 (2024)

arXiv:2401.10356 [pdf, ps, other]

Mean Field Games for Controlling Coherent Structures in Nonlinear Fluid Systems

Authors: Yuan Gao, Di Qi

Abstract: This paper discusses the control of coherent structures in turbulent flows, which has broad applications among complex systems in science and technology. Mean field games have been proved a powerful tool and are proposed here to control the stochastic Lagrangian tracers as players tracking the flow field. We derive optimal control solutions for general nonlinear fluid systems using mean field game… ▽ More This paper discusses the control of coherent structures in turbulent flows, which has broad applications among complex systems in science and technology. Mean field games have been proved a powerful tool and are proposed here to control the stochastic Lagrangian tracers as players tracking the flow field. We derive optimal control solutions for general nonlinear fluid systems using mean field game models, and develop computational algorithms to efficiently solve the resulting coupled forward and backward mean field system. A precise link is established for the control of Lagrangian tracers and the scalar vorticity field based on the functional Hamilton-Jacobi equations derived from the mean field models. New iterative numerical strategy is then constructed to compute the optimal solution with fast convergence. We verify the skill of the mean field control models and illustrate their practical efficiency on a prototype model modified from the viscous Burger's equation under various cost functions in both deterministic and stochastic formulations. The good model performance implies potential effectiveness of the strategy for more general high-dimensional turbulent systems. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 26 pages, 8 figures

arXiv:2401.02241 [pdf, other]

Slot-guided Volumetric Object Radiance Fields

Authors: Di Qi, Tong Yang, Xiangyu Zhang

Abstract: We present a novel framework for 3D object-centric representation learning. Our approach effectively decomposes complex scenes into individual objects from a single image in an unsupervised fashion. This method, called slot-guided Volumetric Object Radiance Fields (sVORF), composes volumetric object radiance fields with object slots as a guidance to implement unsupervised 3D scene decomposition. S… ▽ More We present a novel framework for 3D object-centric representation learning. Our approach effectively decomposes complex scenes into individual objects from a single image in an unsupervised fashion. This method, called slot-guided Volumetric Object Radiance Fields (sVORF), composes volumetric object radiance fields with object slots as a guidance to implement unsupervised 3D scene decomposition. Specifically, sVORF obtains object slots from a single image via a transformer module, maps these slots to volumetric object radiance fields with a hypernetwork and composes object radiance fields with the guidance of object slots at a 3D location. Moreover, sVORF significantly reduces memory requirement due to small-sized pixel rendering during training. We demonstrate the effectiveness of our approach by showing top results in scene decomposition and generation tasks of complex synthetic datasets (e.g., Room-Diverse). Furthermore, we also confirm the potential of sVORF to segment objects in real-world scenes (e.g., the LLFF dataset). We hope our approach can provide preliminary understanding of the physical world and help ease future research in 3D object-centric representation learning. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: NeurIPS 2023

arXiv:2312.17449 [pdf, other]

DB-GPT: Empowering Database Interactions with Private Large Language Models

Authors: Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Danrui Qi, Hong Yi, Shaodong Liu, Faqiang Chen

Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user… ▽ More The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user experience and accessibility. DB-GPT is designed to understand natural language queries, provide context-aware responses, and generate complex SQL queries with high accuracy, making it an indispensable tool for users ranging from novice to expert. The core innovation in DB-GPT lies in its private LLM technology, which is fine-tuned on domain-specific corpora to maintain user privacy and ensure data security while offering the benefits of state-of-the-art LLMs. We detail the architecture of DB-GPT, which includes a novel retrieval augmented generation (RAG) knowledge system, an adaptive learning mechanism to continuously improve performance based on user feedback and a service-oriented multi-model framework (SMMF) with powerful data-driven agents. Our extensive experiments and user studies confirm that DB-GPT represents a paradigm shift in database interactions, offering a more natural, efficient, and secure way to engage with data repositories. The paper concludes with a discussion of the implications of DB-GPT framework on the future of human-database interaction and outlines potential avenues for further enhancements and applications in the field. The project code is available at https://github.com/eosphoros-ai/DB-GPT. Experience DB-GPT for yourself by installing it with the instructions https://github.com/eosphoros-ai/DB-GPT#install and view a concise 10-minute video at https://www.youtube.com/watch?v=KYs4nTDzEhk. △ Less

Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2310.18698 [pdf, other]

Triplet Attention Transformer for Spatiotemporal Predictive Learning

Authors: Xuesong Nie, Xi Chen, Haoyuan Jin, Zhihang Zhu, Yunfeng Yan, Donglian Qi

Abstract: Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maint… ▽ More Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maintaining computational efficiency, we propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features. Specifically, the model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions. In this configuration: (i) temporal tokens contain abstract representations of inter-frame, facilitating the capture of inherent temporal dependencies; (ii) spatial and channel attention combine to refine the intra-frame representation by performing fine-grained interactions across spatial and channel dimensions. Alternating temporal, spatial, and channel-level attention allows our approach to learn more complex short- and long-range spatiotemporal dependencies. Extensive experiments demonstrate performance surpassing existing recurrent-based and recurrent-free methods, achieving state-of-the-art under multi-scenario examination including moving object trajectory prediction, traffic flow prediction, driving scene prediction, and human motion capture. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Accepted to WACV 2024

arXiv:2310.02540 [pdf, other]

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

Authors: Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang

Abstract: Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make… ▽ More Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make difficult decisions about which preprocessors to select and in which order to compose them. In this paper, we study how to automate feature preprocessing (Auto-FP) for tabular data. Due to the large search space, a brute-force solution is prohibitively expensive. To address this challenge, we interestingly observe that Auto-FP can be modelled as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem. This observation enables us to extend a variety of HPO and NAS algorithms to solve the Auto-FP problem. We conduct a comprehensive evaluation and analysis of 15 algorithms on 45 public ML datasets. Overall, evolution-based algorithms show the leading average ranking. Surprisingly, the random search turns out to be a strong baseline. Many surrogate-model-based and bandit-based search algorithms, which achieve good performance for HPO and NAS, do not outperform random search for Auto-FP. We analyze the reasons for our findings and conduct a bottleneck analysis to identify the opportunities to improve these algorithms. Furthermore, we explore how to extend Auto-FP to support parameter search and compare two ways to achieve this goal. In the end, we evaluate Auto-FP in an AutoML context and discuss the limitations of popular AutoML tools. To the best of our knowledge, this is the first study on automated feature preprocessing. We hope our work can inspire researchers to develop new algorithms tailored for Auto-FP. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2309.15764 [pdf, other]

doi 10.1063/5.0158013

Nearly integrable flows and chaotic tangles in the Dimits shift regime of plasma edge turbulence

Authors: Norman M. Cao, Di Qi

Abstract: Transitionally turbulent flows frequently exhibit spatiotemporal intermittency, reflecting a complex interplay between driving forces, dissipation, and transport present in these systems. When this intermittency manifests as observable structures and patterns in the flow, the characterization of turbulence in these systems becomes challenging due to the nontrivial correlations introduced into the… ▽ More Transitionally turbulent flows frequently exhibit spatiotemporal intermittency, reflecting a complex interplay between driving forces, dissipation, and transport present in these systems. When this intermittency manifests as observable structures and patterns in the flow, the characterization of turbulence in these systems becomes challenging due to the nontrivial correlations introduced into the statistics of the turbulence by these structures. In this work, we use tools from dynamical systems theory to study intermittency in the Dimits shift regime of the flux-balanced Hasegawa-Wakatani (BHW) equations, which models a transitional regime of resistive drift-wave turbulence relevant to magnetically confined fusion plasmas. First, we show in direct numerical simulations that turbulence in this regime is dominated by strong zonal flows and coherent drift-wave vortex structures which maintain a strong linear character despite their large amplitude. Using the framework of generalized Liouville integrability, we develop a theory of integrable Lagrangian flows in generic fluid and plasma systems and discuss how the observed zonal flows plus drift waves in the BHW system exhibit a form of ``near-integrability'' originating from a fluid element relabeling symmetry. We further demonstrate that the BHW flows transition from integrability to chaos via the formation of chaotic tangles in the aperiodic Lagrangian flow, and establish a direct link between the `lobes' associated with these tangles and intermittency in the observed turbulent dissipation. This illustrates how utilizing tools from deterministic dynamical systems theory to study convective nonlinearities can explain aspects of intermittent spatiotemporal structure exhibited by the statistics of turbulent fields. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Journal ref: Phys. Plasmas 30, 092307 (2023)

arXiv:2309.06417 [pdf, other]

The trigger system for the CSR external-target experiment

Authors: Dong Guo, Haoqian Xyu, DongDong Qi, HeXiang Wang, Lei Zhang, Zhengyang Sun, Zhi Qin, Botan Wang, Yingjie Zhou, Zekun Wang, Yuansheng Yang, Yuhao Qin, Xianglun Wei, Herun Yang, Yuhong Yu, Lei Zhao, Zhigang Xiao

Abstract: A trigger system has been designed and implemented for the HIRFL-CSR external target experiment (CEE), the spectrometer for studying nuclear matter properties with heavy ion collisions in the GeV energy region. The system adopts master-slave structure and serial data transmission mode using optical fiber to deal with different types of detectors and long-distance signal transmission. The trigger l… ▽ More A trigger system has been designed and implemented for the HIRFL-CSR external target experiment (CEE), the spectrometer for studying nuclear matter properties with heavy ion collisions in the GeV energy region. The system adopts master-slave structure and serial data transmission mode using optical fiber to deal with different types of detectors and long-distance signal transmission. The trigger logic can be accessed based on command register and controlled by a remote computer. The overall field programmable gate array (FPGA) logic can be flexibly reconfigured online to match the physical requirements of the experiment. The trigger system has been tested in beam experiment. It is demonstrated that the trigger system functions correctly and meets the physical requirements of CEE. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.02835 [pdf]

A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography

Authors: Zihan Guo, Jiali Yao, Dalong Qi, Pengpeng Ding, Chengzhi Jin, Ning Xu, Zhiling Zhang, Yunhua Yao, Lianzhong Deng, Zhiyong Wang, Zhenrong Sun, Shian Zhang

Abstract: Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hun… ▽ More Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hundred, and plays a revolutionary role in single-shot ultrafast optical imaging. However, due to the ultra-high data compression ratio induced by the extremely large sequence depth as well as the limited fidelities of traditional reconstruction algorithms over the reconstruction process, HCUP suffers from a poor image reconstruction quality and fails to capture fine structures in complex transient scenes. To overcome these restrictions, we propose a flexible image reconstruction algorithm based on the total variation (TV) and cascaded denoisers (CD) for HCUP, named the TV-CD algorithm. It applies the TV denoising model cascaded with several advanced deep learning-based denoising models in the iterative plug-and-play alternating direction method of multipliers framework, which can preserve the image smoothness while utilizing the deep denoising networks to obtain more priori, and thus solving the common sparsity representation problem in local similarity and motion compensation. Both simulation and experimental results show that the proposed TV-CD algorithm can effectively improve the image reconstruction accuracy and quality of HCUP, and further promote the practical applications of HCUP in capturing high-dimensional complex physical, chemical and biological ultrafast optical scenes. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 25 pages, 5 figures and 1 table

arXiv:2308.12315 [pdf, other]

Trustworthy Representation Learning Across Domains

Authors: Ronghang Zhu, Dongliang Guo, Daiqing Qi, Zhixuan Chu, Xiang Yu, Sheng Li

Abstract: As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most impo… ▽ More As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most important parts for AI systems and representation learning is the fundamental technology in machine learning. How to make the representation learning trustworthy in real-world application, e.g., cross domain scenarios, is very valuable and necessary for both machine learning and AI system fields. Inspired by the concepts in trustworthy AI, we proposed the first trustworthy representation learning across domains framework which includes four concepts, i.e, robustness, privacy, fairness, and explainability, to give a comprehensive literature review on this research direction. Specifically, we first introduce the details of the proposed trustworthy framework for representation learning across domains. Second, we provide basic notions and comprehensively summarize existing methods for the trustworthy framework from four concepts. Finally, we conclude this survey with insights and discussions on future research directions. △ Less

Submitted 29 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: 38 pages, 15 figures

ACM Class: A.1

arXiv:2307.15637 [pdf, other]

Effective Statistical Control Strategies for Complex Turbulent Dynamical Systems

Authors: Jeffrey Covington, Di Qi, Nan Chen

Abstract: Control of complex turbulent dynamical systems involving strong nonlinearity and high degrees of internal instability is an important topic in practice. Different from traditional methods for controlling individual trajectories, controlling the statistical features of a turbulent system offers a more robust and efficient approach. Crude first-order linear response approximations were typically emp… ▽ More Control of complex turbulent dynamical systems involving strong nonlinearity and high degrees of internal instability is an important topic in practice. Different from traditional methods for controlling individual trajectories, controlling the statistical features of a turbulent system offers a more robust and efficient approach. Crude first-order linear response approximations were typically employed in previous works for statistical control with small initial perturbations. This paper aims to develop two new statistical control strategies for scenarios with more significant initial perturbations and stronger nonlinear responses, allowing the statistical control framework to be applied to a much wider range of problems. First, higher-order methods, incorporating the second-order terms, are developed to resolve the full control-forcing relation. The corresponding changes to recovering the forcing perturbation effectively improve the performance of the statistical control strategy. Second, a mean closure model for the mean response is developed, which is based on the explicit mean dynamics given by the underlying turbulent dynamical system. The dependence of the mean dynamics on higher-order moments is closed using linear response theory but for the response of the second-order moments to the forcing perturbation rather than the mean response directly. The performance of these methods is evaluated extensively on prototype nonlinear test models, which exhibit crucial turbulent features, including non-Gaussian statistics and regime switching with large initial perturbations. The numerical results illustrate the feasibility of different approaches due to their physical and statistical structures and provide detailed guidelines for choosing the most suitable method based on the model properties. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2306.10026 [pdf, ps, other]

High-order Moment Closure Models with Random Batch Method for Efficient Computation of Multiscale Turbulent Systems

Authors: Di Qi, Jian-Guo Liu

Abstract: We propose a high-order stochastic-statistical moment closure model for efficient ensemble prediction of leading-order statistical moments and probability density functions in multiscale complex turbulent systems. The statistical moment equations are closed by a precise calibration of the high-order feedbacks using ensemble solutions of the consistent stochastic equations, suitable for modeling co… ▽ More We propose a high-order stochastic-statistical moment closure model for efficient ensemble prediction of leading-order statistical moments and probability density functions in multiscale complex turbulent systems. The statistical moment equations are closed by a precise calibration of the high-order feedbacks using ensemble solutions of the consistent stochastic equations, suitable for modeling complex phenomena including non-Gaussian statistics and extreme events. To address challenges associated with closely coupled spatio-temporal scales in turbulent states and expensive large ensemble simulation for high-dimensional systems, we introduce efficient computational strategies using the random batch method (RBM). This approach significantly reduces the required ensemble size while accurately capturing essential high-order structures. Only a small batch of small-scale fluctuation modes is used for each time update of the samples, and exact convergence to the full model statistics is ensured through frequent resampling of the batches during time evolution. Furthermore, we develop a reduced-order model to handle systems with really high dimension by linking the large number of small-scale fluctuation modes to ensemble samples of dominant leading modes. The effectiveness of the proposed models is validated by numerical experiments on the one-layer and two-layer Lorenz '96 systems, which exhibit representative chaotic features and various statistical regimes. The full and reduced-order RBM models demonstrate uniformly high skill in capturing the time evolution of crucial leading-order statistics, non-Gaussian probability distributions, while achieving significantly lower computational cost compared to direct Monte-Carlo approaches. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 31 pages, 11 figures

arXiv:2306.07520 [pdf, other]

Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

Authors: Weizhen He, Yiheng Deng, Shixiang Tang, Qihao Chen, Qingsong Xie, Yizhou Wang, Lei Bai, Feng Zhu, Rui Zhao, Wanli Ouyang, Donglian Qi, Yunfeng Yan

Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve im… ▽ More Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Our instruct-ReID is a more general ReID setting, where existing 6 ReID tasks can be viewed as special cases by designing different instructions. We propose a large-scale OmniReID benchmark and an adaptive triplet loss as a baseline method to facilitate research in this new setting. Experimental results show that the proposed multi-purpose ReID model, trained on our OmniReID benchmark without fine-tuning, can improve +0.5%, +0.6%, +7.7% mAP on Market1501, MSMT17, CUHK03 for traditional ReID, +6.4%, +7.1%, +11.2% mAP on PRCC, VC-Clothes, LTCC for clothes-changing ReID, +11.7% mAP on COCAS+ real2 for clothes template based clothes-changing ReID when using only RGB images, +24.9% mAP on COCAS+ real2 for our newly defined language-instructed ReID, +4.3% on LLCM for visible-infrared ReID, +2.6% on CUHK-PEDES for text-to-image ReID. The datasets, the model, and code will be available at https://github.com/hwz-zju/Instruct-ReID. △ Less

Submitted 31 December, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.01966 [pdf, other]

doi 10.1364/OL.487582

Experimental upstream transmission of continuous variable quantum key distribution access network

Authors: Xiangyu Wang, Ziyang Chen, Zhenghua Li, Dengke Qi, Song Yu, Hong Guo

Abstract: Continuous-variable quantum key distribution which can be implemented using only low-cost and off-the-shelf components reveals great potential in the practical large-scale realization. Access network as a modern network necessity, connects multiple end-users to the network backbone. In this work, we demonstrate the first upstream transmission quantum access networks using continuous-variable quant… ▽ More Continuous-variable quantum key distribution which can be implemented using only low-cost and off-the-shelf components reveals great potential in the practical large-scale realization. Access network as a modern network necessity, connects multiple end-users to the network backbone. In this work, we demonstrate the first upstream transmission quantum access networks using continuous-variable quantum key distribution. A two-end-user quantum access network is then experimentally realized. Through phase compensation, data synchronization and other technical upgrades, we achieve 390kbps secret key rate of the total network. In addition, we extend the case of two-end-user quantum access network to the case of multiple users, and analyze the network capacity in the case of multiple users by measuring the additive excess noise from different time slots. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 4 pages,3figures

arXiv:2303.02936 [pdf, other]

UniHCP: A Unified Model for Human-Centric Perceptions

Authors: Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang

Abstract: Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian detection, person re-identification, etc.) play a key role in industrial applications of visual models. While specific human-centric tasks have their own relevant semantic aspect to focus on, they also share the same underlying semantic structure of the human body. However, few works have attempted to exploit such homogene… ▽ More Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian detection, person re-identification, etc.) play a key role in industrial applications of visual models. While specific human-centric tasks have their own relevant semantic aspect to focus on, they also share the same underlying semantic structure of the human body. However, few works have attempted to exploit such homogeneity and design a general-propose model for human-centric tasks. In this work, we revisit a broad range of human-centric tasks and unify them in a minimalist manner. We propose UniHCP, a Unified Model for Human-Centric Perceptions, which unifies a wide range of human-centric tasks in a simplified end-to-end manner with the plain vision transformer architecture. With large-scale joint training on 33 human-centric datasets, UniHCP can outperform strong baselines on several in-domain and downstream tasks by direct evaluation. When adapted to a specific task, UniHCP achieves new SOTAs on a wide range of human-centric tasks, e.g., 69.8 mIoU on CIHP for human parsing, 86.18 mA on PA-100K for attribute prediction, 90.3 mAP on Market1501 for ReID, and 85.8 JI on CrowdHuman for pedestrian detection, performing better than specialized models tailored for each task. △ Less

Submitted 22 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted for publication at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (CVPR 2023)

arXiv:2302.13001 [pdf, other]

Better Generative Replay for Continual Federated Learning

Authors: Daiqing Qi, Handong Zhao, Sheng Li

Abstract: Federated learning is a technique that enables a centralized server to learn from distributed clients via communications without accessing the client local data. However, existing federated learning works mainly focus on a single task scenario with static data. In this paper, we introduce the problem of continual federated learning, where clients incrementally learn new tasks and history data cann… ▽ More Federated learning is a technique that enables a centralized server to learn from distributed clients via communications without accessing the client local data. However, existing federated learning works mainly focus on a single task scenario with static data. In this paper, we introduce the problem of continual federated learning, where clients incrementally learn new tasks and history data cannot be stored due to certain reasons, such as limited storage and data retention policy. Generative replay based methods are effective for continual learning without storing history data, but adapting them for this setting is challenging. By analyzing the behaviors of clients during training, we find that the unstable training process caused by distributed training on non-IID data leads to a notable performance degradation. To address this problem, we propose our FedCIL model with two simple but effective solutions: model consolidation and consistency enforcement. Our experimental results on multiple benchmark datasets demonstrate that our method significantly outperforms baselines. △ Less

Submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.11461 [pdf, other]

Saliency Guided Contrastive Learning on Scene Images

Authors: Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Haiyang Yang, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

Abstract: Self-supervised learning holds promise in leveraging large numbers of unlabeled data. However, its success heavily relies on the highly-curated dataset, e.g., ImageNet, which still needs human cleaning. Directly learning representations from less-curated scene images is essential for pushing self-supervised learning to a higher level. Different from curated images which include simple and clear se… ▽ More Self-supervised learning holds promise in leveraging large numbers of unlabeled data. However, its success heavily relies on the highly-curated dataset, e.g., ImageNet, which still needs human cleaning. Directly learning representations from less-curated scene images is essential for pushing self-supervised learning to a higher level. Different from curated images which include simple and clear semantic information, scene images are more complex and mosaic because they often include complex scenes and multiple objects. Despite being feasible, recent works largely overlooked discovering the most discriminative regions for contrastive learning to object representations in scene images. In this work, we leverage the saliency map derived from the model's output during learning to highlight these discriminative regions and guide the whole contrastive learning. Specifically, the saliency map first guides the method to crop its discriminative regions as positive pairs and then reweighs the contrastive losses among different crops by its saliency scores. Our method significantly improves the performance of self-supervised learning on scene images by +1.1, +4.3, +2.2 Top1 accuracy in ImageNet linear evaluation, Semi-supervised learning with 1% and 10% ImageNet labels, respectively. We hope our insights on saliency maps can motivate future research on more general-purpose unsupervised representation learning from scene data. △ Less

Submitted 23 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2106.11952 by other authors

arXiv:2302.02100 [pdf]

Single-shot polarization-resolved ultrafast mapping photography

Authors: Pengpeng Ding, Dalong Qi, Yunhua Yao, Yilin He, Jiali Yao, Chengzhi Jin, Zihan Guo, Lianzhong Deng, Zhenrong Sun, Shian Zhang

Abstract: Single-shot ultrafast optical imaging plays a very important role in the detection of transient scenes, especially in capturing irreversible or stochastic dynamic scenes. To break the limit of time response speed of electronic devices, such as charge-coupled device (CCD) or complementary metal-oxide-semiconductor (CMOS) detectors, ultrafast optical imaging techniques usually convert the time infor… ▽ More Single-shot ultrafast optical imaging plays a very important role in the detection of transient scenes, especially in capturing irreversible or stochastic dynamic scenes. To break the limit of time response speed of electronic devices, such as charge-coupled device (CCD) or complementary metal-oxide-semiconductor (CMOS) detectors, ultrafast optical imaging techniques usually convert the time information of a transient scene into the wavelength, angle, space or spatial frequency of the illumination light in previous studies. In this work, we propose a novel polarization-resolved ultrafast mapping photography (PUMP) technique by converting the time information into the polarization. Here, the spatiotemporal information of a dynamic scene is loaded into a rotationally polarized illumination laser pulse, and a polarization filtering in imaging detection and a deconvolution algorithm in image reconstruction are used to extract the original dynamic scene. In our PUMP system, the temporal resolution is 850 fs, the spatial resolution is 28.5 lp/mm at 700 micrometer by 700 micrometer field of view, and the number of frames is 16. By using PUMP, a spatiotemporal dynamics of femtosecond laser ablation in an indium tin oxide film on glass substrate is successfully captured. PUMP provides a new solution for measuring the transient scenes in a snapshot, which will bring a very wide range of applications in the field of ultrafast science. △ Less

Submitted 4 February, 2023; originally announced February 2023.

arXiv:2210.00373 [pdf, other]

doi 10.1063/5.0129127

A Random Batch Method for Efficient Ensemble Forecasts of Multiscale Turbulent Systems

Authors: Di Qi, Jian-Guo Liu

Abstract: A new efficient ensemble prediction strategy is developed for a general turbulent model framework with emphasis on the nonlinear interactions between large and small scale variables. The high computational cost in running large ensemble simulations of high dimensional equations is effectively avoided by adopting a random batch decomposition of the wide spectrum of the fluctuation states which is a… ▽ More A new efficient ensemble prediction strategy is developed for a general turbulent model framework with emphasis on the nonlinear interactions between large and small scale variables. The high computational cost in running large ensemble simulations of high dimensional equations is effectively avoided by adopting a random batch decomposition of the wide spectrum of the fluctuation states which is a characteristic feature of the multiscale turbulent systems. The time update of each ensemble sample is then only subject to a small portion of the small-scale fluctuation modes in one batch, while the true model dynamics with multiscale coupling is respected by frequent random resampling of the batches at each time updating step. We investigate both theoretical and numerical properties of the proposed method. First, the convergence of statistical errors in the random batch model approximation is shown rigorously independent of the sample size and full dimension of the system. Then, the forecast skill of the computational algorithm is tested on two representative models of turbulent flows exhibiting many key statistical phenomena with direct link to realistic turbulent systems. The random batch method displays robust performance in capturing a series of crucial statistical features of general interests including highly non-Gaussian fat-tailed probability distributions and intermittent bursts of instability, while requires a much lower computational cost than the direct ensemble approach. The efficient random batch method also facilitates the development of new strategies in uncertainty quantification and data assimilation for a wide variety of complex turbulent systems in science and engineering. △ Less

Submitted 1 October, 2022; originally announced October 2022.

Comments: 23 pages, 6 figures

arXiv:2208.10612 [pdf, other]

doi 10.1016/j.jcp.2023.112085

A Data-Driven Statistical-Stochastic Surrogate Modeling Strategy for Complex Nonlinear Non-stationary Dynamics

Authors: Di Qi, John Harlim

Abstract: We propose a statistical-stochastic surrogate modeling approach to predict the response of the mean and variance statistics under various initial conditions and external forcing perturbations. The proposed modeling framework extends the purely statistical modeling approach that is practically limited to the homogeneous statistical regime for high-dimensional state variables. The new closure system… ▽ More We propose a statistical-stochastic surrogate modeling approach to predict the response of the mean and variance statistics under various initial conditions and external forcing perturbations. The proposed modeling framework extends the purely statistical modeling approach that is practically limited to the homogeneous statistical regime for high-dimensional state variables. The new closure system allows one to overcome several practical issues that emerge in the non-homogeneous statistical regimes. First, the proposed ensemble modeling that couples the mean statistics and stochastic fluctuations naturally produces positive-definite covariance matrix estimation, which is a challenging issue that hampers the purely statistical modeling approaches. Second, the proposed closure model, which embeds a non-Markovian neural-network model for the unresolved fluxes such that the variance of the dynamics is consistent, overcomes the inherent instability of the stochastic fluctuation dynamics. Effectively, the proposed framework extends the classical stochastic parametric modeling paradigm for the unresolved dynamics to a semi-parametric parameterization with a residual Long-Short-Term-Memory neural network architecture. Third, based on empirical information metric, we provide an efficient and effective training procedure by fitting a loss function that measures the differences between response statistics. Supporting numerical examples are provided with the Lorenz-96 model, a system of ODEs that admits the characteristic of chaotic dynamics with both homogeneous and inhomogeneous statistical regimes. In the latter case, we will see the effectiveness of the statistical prediction even though the resolved Fourier modes corresponding to the leading mean energy and variance spectra do not coincide. △ Less

Submitted 1 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: 11 figures

arXiv:2206.06293 [pdf, other]

Learning Domain Adaptive Object Detection with Probabilistic Teacher

Authors: Meilin Chen, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, Shiliang Pu

Abstract: Self-training for unsupervised domain adaptive object detection is a challenging task, of which the performance depends heavily on the quality of pseudo boxes. Despite the promising results, prior works have largely overlooked the uncertainty of pseudo boxes during self-training. In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture… ▽ More Self-training for unsupervised domain adaptive object detection is a challenging task, of which the performance depends heavily on the quality of pseudo boxes. Despite the promising results, prior works have largely overlooked the uncertainty of pseudo boxes during self-training. In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture the uncertainty of unlabeled target data from a gradually evolving teacher and guides the learning of a student in a mutually beneficial manner. Specifically, we propose to leverage the uncertainty-guided consistency training to promote classification adaptation and localization adaptation, rather than filtering pseudo boxes via an elaborate confidence threshold. In addition, we conduct anchor adaptation in parallel with localization adaptation, since anchor can be regarded as a learnable parameter. Together with this framework, we also present a novel Entropy Focal Loss (EFL) to further facilitate the uncertainty-guided self-training. Equipped with EFL, PT outperforms all previous baselines by a large margin and achieve new state-of-the-arts. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: To appear in ICML 2022. Code is coming soon: https://github.com/hikvision-research/ProbabilisticTeacher

Journal ref: International Conference on Machine Learning (ICML), 2022

arXiv:2204.08547 [pdf, other]

A Physics-Informed Data-Driven Algorithm for Ensemble Forecast of Complex Turbulent Systems

Authors: Nan Chen, Di Qi

Abstract: A new ensemble forecast algorithm, named as the physics-informed data-driven algorithm with conditional Gaussian statistics (PIDD-CG), is developed to predict the time evolution of the probability density functions (PDFs) of complex turbulent systems with partial observations. The PIDD-CG algorithm integrates a unique multiscale statistical closure model with an extremely efficient nonlinear data… ▽ More A new ensemble forecast algorithm, named as the physics-informed data-driven algorithm with conditional Gaussian statistics (PIDD-CG), is developed to predict the time evolution of the probability density functions (PDFs) of complex turbulent systems with partial observations. The PIDD-CG algorithm integrates a unique multiscale statistical closure model with an extremely efficient nonlinear data assimilation scheme to represent the PDF as a mixture of conditional statistics, which overcomes the curse of dimensionality for high-dimensional systems. The multiscale features in the time evolution of each conditional statistics ensemble member effectively captured by an appropriate combination of physics-informed analytic formulae and recurrent neural networks. An information metric is adopted as the loss function for the latter to more accurately calibrate the key turbulent signals with strong fluctuations. The proposed algorithm succeeds in forecasting both the transient and statistical equilibrium non-Gaussian PDFs of strongly turbulent systems with intermittency, regime switching and extreme events. △ Less

Submitted 18 April, 2022; originally announced April 2022.

arXiv:2204.02574 [pdf, other]

FocalClick: Towards Practical Interactive Image Segmentation

Authors: Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao

Abstract: Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correc… ▽ More Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correct part. FocalClick solves both issues at once by predicting and updating the mask in localized areas. For higher efficiency, we decompose the slow prediction on the entire image into two fast inferences on small crops: a coarse segmentation on the Target Crop, and a local refinement on the Focus Crop. To make the model work with preexisting masks, we formulate a sub-task termed Interactive Mask Correction, and propose Progressive Merge as the solution. Progressive Merge exploits morphological information to decide where to preserve and where to update, enabling users to refine any preexisting mask effectively. FocalClick achieves competitive results against SOTA methods with significantly smaller FLOPs. It also shows significant superiority when making corrections on preexisting masks. Code and data will be released at github.com/XavierCHEN34/ClickSEG △ Less

Submitted 17 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: CVPR2022

arXiv:2203.12301 [pdf]

Lane detection with Position Embedding

Authors: Jun Xie, Jiacheng Han, Dezhen Qi, Feng Chen, Kaer Huang, Jianwei Shuai

Abstract: Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich lane feature after preliminary feature extraction with an ordinary CNN. For Tusimple dataset, there is not too complicated scene and lane has more prominent spatial features. On the basis of RESA, we introduce the met… ▽ More Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich lane feature after preliminary feature extraction with an ordinary CNN. For Tusimple dataset, there is not too complicated scene and lane has more prominent spatial features. On the basis of RESA, we introduce the method of position embedding to enhance the spatial features. The experimental results show that this method has achieved the best accuracy 96.93% on Tusimple dataset. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2112.13809 [pdf, other]

Improving Deep Image Matting via Local Smoothness Assumption

Authors: Rui Wang, Jun Xie, Jiacheng Han, Dezhen Qi

Abstract: Natural image matting is a fundamental and challenging computer vision task. Conventionally, the problem is formulated as an underconstrained problem. Since the problem is ill-posed, further assumptions on the data distribution are required to make the problem well-posed. For classical matting methods, a commonly adopted assumption is the local smoothness assumption on foreground and background co… ▽ More Natural image matting is a fundamental and challenging computer vision task. Conventionally, the problem is formulated as an underconstrained problem. Since the problem is ill-posed, further assumptions on the data distribution are required to make the problem well-posed. For classical matting methods, a commonly adopted assumption is the local smoothness assumption on foreground and background colors. However, the use of such assumptions was not systematically considered for deep learning based matting methods. In this work, we consider two local smoothness assumptions which can help improving deep image matting models. Based on the local smoothness assumptions, we propose three techniques, i.e., training set refinement, color augmentation and backpropagating refinement, which can improve the performance of the deep image matting model significantly. We conduct experiments to examine the effectiveness of the proposed algorithm. The experimental results show that the proposed method has favorable performance compared with existing matting methods. △ Less

Submitted 5 April, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

Comments: 9 pages, accepted by IEEE ICME 2022

arXiv:2112.09084 [pdf, ps, other]

doi 10.1063/5.0082718

Anomalous Statistics and Large Deviations of Turbulent Water Waves past a Step

Authors: Di Qi, Eric Vanden-Eijnden

Abstract: A computational strategy based on large deviation theory (LDT) is used to study the anomalous statistical features of turbulent surface waves propagating past an abrupt depth change created via a step in the bottom topography. The dynamics of the outgoing waves past the step are modeled using the truncated Korteweg-de Vries (TKdV) equation with random initial conditions at the step drawn from the… ▽ More A computational strategy based on large deviation theory (LDT) is used to study the anomalous statistical features of turbulent surface waves propagating past an abrupt depth change created via a step in the bottom topography. The dynamics of the outgoing waves past the step are modeled using the truncated Korteweg-de Vries (TKdV) equation with random initial conditions at the step drawn from the system's Gibbs invariant measure of the incoming waves. Within the LDT framework, the probability distributions of the wave height can be obtained via the solution of a deterministic optimization problem. Detailed numerical tests show that this approach accurately captures the non-Gaussian features of the wave height distributions, in particular their asymmetric tails leading to high skewness. These calculations also give the spatio-temporal pattern of the anomalous waves most responsible for these non-Gaussian features. The strategy shows potential for a general class of nonlinear Hamiltonian systems with highly non-Gaussian statistics. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2112.00496 [pdf, other]

Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

Authors: Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

Abstract: The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective. While prev… ▽ More The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective. While previous works focus on the effectiveness of MLP on unsupervised image classification where pretraining and evaluation are conducted on the same dataset, we reveal that the MLP projector is also the key factor to better transferability of unsupervised pretraining methods than supervised pretraining methods. Based on this observation, we attempt to close the transferability gap between supervised and unsupervised pretraining by adding an MLP projector before the classifier in supervised pretraining. Our analysis indicates that the MLP projector can help retain intra-class variation of visual features, decrease the feature distribution distance between pretraining and evaluation datasets, and reduce feature redundancy. Extensive experiments on public benchmarks demonstrate that the added MLP projector significantly boosts the transferability of supervised pretraining, e.g. +7.2% top-1 accuracy on the concept generalization task, +5.8% top-1 accuracy for linear evaluation on 12-domain classification tasks, and +0.8% AP on COCO object detection task, making supervised pretraining comparable or even better than unsupervised pretraining. △ Less

Submitted 28 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted by CVPR 2022. [camera ready with supplement]

arXiv:2110.01521 [pdf, other]

Balanced Masked and Standard Face Recognition

Authors: Delong Qi, Kangli Hu, Weijun Tan, Qi Yao, Jingfeng Liu

Abstract: We present the improved network architecture, data augmentation, and training strategies for the Webface track and Insightface/Glint360K track of the masked face recognition challenge of ICCV2021. One of the key goals is to have a balanced performance of masked and standard face recognition. In order to prevent the overfitting for the masked face recognition, we control the total number of masked… ▽ More We present the improved network architecture, data augmentation, and training strategies for the Webface track and Insightface/Glint360K track of the masked face recognition challenge of ICCV2021. One of the key goals is to have a balanced performance of masked and standard face recognition. In order to prevent the overfitting for the masked face recognition, we control the total number of masked faces by not more than 10\% of the total face recognition in the training dataset. We propose a few key changes to the face recognition network including a new stem unit, drop block, face detection and alignment using YOLO5Face, feature concatenation, a cycle cosine learning rate, etc. With this strategy, we achieve good and balanced performance for both masked and standard face recognition. △ Less

Submitted 4 October, 2021; originally announced October 2021.

Journal ref: 2021 ICCV Workshops

arXiv:2109.14811 [pdf, other]

Surveillance Evasion Through Bayesian Reinforcement Learning

Authors: Dongping Qi, David Bindel, Alexander Vladimirsky

Abstract: We consider a task of surveillance-evading path-planning in a continuous setting. An Evader strives to escape from a 2D domain while minimizing the risk of detection (and immediate capture). The probability of detection is path-dependent and determined by the spatially inhomogeneous surveillance intensity, which is fixed but a priori unknown and gradually learned in the multi-episodic setting. We… ▽ More We consider a task of surveillance-evading path-planning in a continuous setting. An Evader strives to escape from a 2D domain while minimizing the risk of detection (and immediate capture). The probability of detection is path-dependent and determined by the spatially inhomogeneous surveillance intensity, which is fixed but a priori unknown and gradually learned in the multi-episodic setting. We introduce a Bayesian reinforcement learning algorithm that relies on a Gaussian Process regression (to model the surveillance intensity function based on the information from prior episodes), numerical methods for Hamilton-Jacobi PDEs (to plan the best continuous trajectories based on the current model), and Confidence Bounds (to balance the exploration vs exploitation). We use numerical experiments and regret metrics to highlight the significant advantages of our approach compared to traditional graph-based algorithms of reinforcement learning. △ Less

Submitted 23 February, 2023; v1 submitted 29 September, 2021; originally announced September 2021.

Comments: 6 pages, 3 figures; accepted for presentation publication at AISTATS 2023

MSC Class: 93E35; 49L20; 68W27; 68T37; 60G15; 62N02

arXiv:2109.05270 [pdf]

doi 10.1063/5.0058989

Reversible modulation of metal-insulator transition in VO2 via chemically-induced oxygen migration

Authors: Kun Han, Hanyu Wang, Liang Wu, Yu Cao, Dong-Chen Qi, Changjian Li, Zhen Huang, Xiao Li, X. Renshaw Wang

Abstract: Metal-insulator transitions (MIT),an intriguing correlated phenomenon induced by the subtle competition of the electrons' repulsive Coulomb interaction and kinetic energy, is of great potential use for electronic applications due to the dramatic change in resistivity. Here, we demonstrate a reversible control of MIT in VO2 films via oxygen stoichiometry engineering. By facilely depositing and diss… ▽ More Metal-insulator transitions (MIT),an intriguing correlated phenomenon induced by the subtle competition of the electrons' repulsive Coulomb interaction and kinetic energy, is of great potential use for electronic applications due to the dramatic change in resistivity. Here, we demonstrate a reversible control of MIT in VO2 films via oxygen stoichiometry engineering. By facilely depositing and dissolving a water-soluble yet oxygen-active Sr3Al2O6 capping layer atop the VO2 at room temperature, oxygen ions can reversibly migrate between VO2 and Sr3Al2O6, resulting in a gradual suppression and a complete recovery of MIT in VO2. The migration of the oxygen ions is evidenced in a combination of transport measurement, structural characterization and first-principles calculations. This approach of chemically-induced oxygen migration using a water-dissolvable adjacent layer could be useful for advanced electronic and iontronic devices and studying oxygen stoichiometry effects on the MIT. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Comments: 16 pages, 4 figures

arXiv:2108.13220 [pdf, other]

doi 10.1098/rsta.2021.0205

Machine Learning-Based Statistical Closure Models for Turbulent Dynamical Systems

Authors: Di Qi, John Harlim

Abstract: We propose a Machine Learning (ML) non-Markovian closure modeling framework for accurate predictions of statistical responses of turbulent dynamical systems subjected to external forcings. One of the difficulties in this statistical closure problem is the lack of training data, which is a configuration that is not desirable in supervised learning with neural network models. In this study with the… ▽ More We propose a Machine Learning (ML) non-Markovian closure modeling framework for accurate predictions of statistical responses of turbulent dynamical systems subjected to external forcings. One of the difficulties in this statistical closure problem is the lack of training data, which is a configuration that is not desirable in supervised learning with neural network models. In this study with the 40-dimensional Lorenz-96 model, the shortage of data (in temporal) is due to the stationarity of the statistics beyond the decorrelation time, thus, the only informative content in the training data is on short-time transient statistics. We adopted a unified closure framework on various truncation regimes, including and excluding the detailed dynamical equations for the variances. The closure frameworks employ a Long-Short-Term-Memory architecture to represent the higher-order unresolved statistical feedbacks with careful consideration to account for the intrinsic instability yet producing stable long-time predictions. We found that this unified agnostic ML approach performs well under various truncation scenarios. Numerically, the ML closure model can accurately predict the long-time statistical responses subjected to various time-dependent external forces that are not (and maximum forcing amplitudes that are relatively larger than those) in the training dataset. △ Less

Submitted 3 June, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 8 figures

arXiv:2106.11405 [pdf, other]

Optimality and robustness in path-planning under initial uncertainty

Authors: Dongping Qi, Adam Dhillon, Alexander Vladimirsky

Abstract: Classical deterministic optimal control problems assume full information about the controlled process. The theory of control for general partially-observable processes is powerful, but the methods are computationally expensive and typically address the problems with stochastic dynamics and continuous (directly unobserved) stochastic perturbations. In this paper we focus on path planning problems w… ▽ More Classical deterministic optimal control problems assume full information about the controlled process. The theory of control for general partially-observable processes is powerful, but the methods are computationally expensive and typically address the problems with stochastic dynamics and continuous (directly unobserved) stochastic perturbations. In this paper we focus on path planning problems which are in between -- deterministic, but with an initial uncertainty on either the target or the running cost on parts of the domain. That uncertainty is later removed at some time $T$, and the goal is to choose the optimal trajectory until then. We address this challenge for three different models of information acquisition: with fixed $T$, discretely distributed and exponentially distributed random $T$. We develop models and numerical methods suitable for multiple notions of optimality: based on the average-case performance, the worst-case performance, the average constrained by the worst, the average performance with probabilistic constraints on the bad outcomes, risk-sensitivity, and distributional-robustness. We illustrate our approach using examples of pursuing random targets identified at a (possibly random) later time $T$. △ Less

Submitted 1 August, 2024; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 25 pages, 14 figures. Keywords: optimal control, path-planning, Hamilton-Jacobi PDEs, uncertainty, robustness, delayed information acquisition

MSC Class: 49L20; 49N90; 60J28; 35R35

arXiv:2105.12931 [pdf, other]

YOLO5Face: Why Reinventing a Face Detector

Authors: Delong Qi, Weijun Tan, Qi Yao, Jingfeng Liu

Abstract: Tremendous progress has been made on face detection in recent years using convolutional neural networks. While many face detectors use designs designated for detecting faces, we treat face detection as a generic object detection task. We implement a face detector based on the YOLOv5 object detector and call it YOLO5Face. We make a few key modifications to the YOLOv5 and optimize it for face detect… ▽ More Tremendous progress has been made on face detection in recent years using convolutional neural networks. While many face detectors use designs designated for detecting faces, we treat face detection as a generic object detection task. We implement a face detector based on the YOLOv5 object detector and call it YOLO5Face. We make a few key modifications to the YOLOv5 and optimize it for face detection. These modifications include adding a five-point landmark regression head, using a stem block at the input of the backbone, using smaller-size kernels in the SPP, and adding a P6 output in the PAN block. We design detectors of different model sizes, from an extra-large model to achieve the best performance to a super small model for real-time detection on an embedded or mobile device. Experiment results on the WiderFace dataset show that on VGA images, our face detectors can achieve state-of-the-art performance in almost all the Easy, Medium, and Hard subsets, exceeding the more complex designated face detectors. The code is available at \url{https://github.com/deepcam-cn/yolov5-face} △ Less

Submitted 27 January, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.01058 [pdf, other]

A Dataset and System for Real-Time Gun Detection in Surveillance Video Using Deep Learning

Authors: Delong Qi, Weijun Tan, Zhifu Liu, Qi Yao, Jingfeng Liu

Abstract: Gun violence is a severe problem in the world, particularly in the United States. Deep learning methods have been studied to detect guns in surveillance video cameras or smart IP cameras and to send a real-time alert to security personals. One problem for the development of gun detection algorithms is the lack of large public datasets. In this work, we first publish a dataset with 51K annotated gu… ▽ More Gun violence is a severe problem in the world, particularly in the United States. Deep learning methods have been studied to detect guns in surveillance video cameras or smart IP cameras and to send a real-time alert to security personals. One problem for the development of gun detection algorithms is the lack of large public datasets. In this work, we first publish a dataset with 51K annotated gun images for gun detection and other 51K cropped gun chip images for gun classification we collect from a few different sources. To our knowledge, this is the largest dataset for the study of gun detection. This dataset can be downloaded at www.linksprite.com/gun-detection-datasets. We present a gun detection system using a smart IP camera as an embedded edge device, and a cloud server as a manager for device, data, alert, and to further reduce the false positive rate. We study to find solutions for gun detection in an embedded device, and for gun classification on the edge device and the cloud server. This edge/cloud framework makes the deployment of gun detection in the real world possible. △ Less

Submitted 16 August, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: IEEE SMC 2021 Oral

arXiv:2105.00154 [pdf]

doi 10.1021/acsami.1c01581

Enhanced metal-insulator transition in freestanding VO2 down to 5 nm thickness

Authors: Kun Han, Liang Wu, Yu Cao, Hanyu Wang, Chen Ye, Ke Huang, M. Motapothula, Hongna Xing, Xinghua Li, Dong-Chen Qi, Xiao Li, X. Renshaw Wang

Abstract: Ultrathin freestanding membranes with a pronounced metal-insulator transition (MIT) provides huge potential in future flexible electronic applications as well as a unique aspect of the study of lattice-electron interplay. However, the reduction of the thickness to an ultrathin region (a few nm) is typically detrimental to the MIT in epitaxial films, and even catastrophic for their freestanding for… ▽ More Ultrathin freestanding membranes with a pronounced metal-insulator transition (MIT) provides huge potential in future flexible electronic applications as well as a unique aspect of the study of lattice-electron interplay. However, the reduction of the thickness to an ultrathin region (a few nm) is typically detrimental to the MIT in epitaxial films, and even catastrophic for their freestanding form. Here, we report an enhanced MIT in VO2-based freestanding membranes, with a lateral size up to millimetres and VO2 thickness down to 5 nm. The VO2-membranes were detached by dissolving a Sr3Al2O6 sacrificial layer between the VO2 thin film and c-Al2O3(0001) substrate, allowing a transfer onto arbitrary surfaces. Furthermore, the MIT in the VO2-membrane was greatly enhanced by inserting an intermediate Al2O3 buffer layer. In comparison to the best available ultrathin VO2-membranes, the enhancement of MIT is over 400% at 5 nm VO2 thickness and more than one order of magnitude for VO2 above 10 nm. Our study widens the spectrum of functionality in ultrathin and large-scale membranes, and enables the potential integration of MIT into flexible electronics and photonics. △ Less

Submitted 30 April, 2021; originally announced May 2021.

arXiv:2104.09322 [pdf]

doi 10.1002/admi.202002147

Bipolar conduction and giant positive magnetoresistance in doped metallic titanium oxide heterostructures

Authors: Ke Huang, Tao Wang, Mengjia Jin, Liang Wu, Junyao Floria Wang, Shengyao Li, Dong-chen Qi, Shuying Cheng, Yangyang Li, Jingsheng Chen, Xiaozhong He, Changjian Li, Stephen J. Pennycook, X. Renshaw Wang

Abstract: Empowering conventional materials with unexpected magnetoelectric properties is appealing to the multi-functionalization of existing devices and the exploration of future electronics. Recently, owing to its unique effect in modulating a matter's properties, ultra-small dopants, e.g. H, D, and Li, attract enormous attention in creating emergent functionalities, such as superconductivity, and metal-… ▽ More Empowering conventional materials with unexpected magnetoelectric properties is appealing to the multi-functionalization of existing devices and the exploration of future electronics. Recently, owing to its unique effect in modulating a matter's properties, ultra-small dopants, e.g. H, D, and Li, attract enormous attention in creating emergent functionalities, such as superconductivity, and metal-insulator transition. Here, we report an observation of bipolar conduction accompanied by a giant positive magnetoresistance in D-doped metallic Ti oxide (TiOxDy) films. To overcome the challenges in intercalating the D into a crystalline oxide, a series of TiOxDy were formed by sequentially doping Ti with D and surface/interface oxidation. Intriguingly, while the electron mobility of the TiOxDy increases by an order of magnitude larger after doping, the emergent holes also exhibit high mobility. Moreover, the bipolar conduction induces a giant magnetoresistance up to 900% at 6 T, which is ~6 times higher than its conventional phase. Our study paves a way to empower conventional materials in existing electronics and induce novel electronic phases. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Journal ref: Advanced Materials Interfaces, 2021, 2002147

arXiv:2103.00301 [pdf, other]

Spline parameterization of neural network controls for deep learning

Authors: Stefanie Günther, Will Pazner, Dongping Qi

Abstract: Based on the continuous interpretation of deep learning cast as an optimal control problem, this paper investigates the benefits of employing B-spline basis functions to parameterize neural network controls across the layers. Rather than equipping each layer of a discretized ODE-network with a set of trainable weights, we choose a fixed number of B-spline basis functions whose coefficients are the… ▽ More Based on the continuous interpretation of deep learning cast as an optimal control problem, this paper investigates the benefits of employing B-spline basis functions to parameterize neural network controls across the layers. Rather than equipping each layer of a discretized ODE-network with a set of trainable weights, we choose a fixed number of B-spline basis functions whose coefficients are the trainable parameters of the neural network. Decoupling the trainable parameters from the layers of the neural network enables us to investigate and adapt the accuracy of the network propagation separated from the optimization learning problem. We numerically show that the spline-based neural network increases robustness of the learning problem towards hyperparameters due to increased stability and accuracy of the network propagation. Further, training on B-spline coefficients rather than layer weights directly enables a reduction in the number of trainable parameters. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: 19 pages, 9 figures

arXiv:2011.13106 [pdf]

doi 10.1016/j.actamat.2020.116516

A two-dimensional electron gas based on a 5s oxide with high room-temperature mobility and strain sensitivity

Authors: Zexin Feng, Peixin Qin, Yali Yang, Han Yan, Huixin Guo, Xiaoning Wang, Xiaorong Zhou, Yuyan Han, Jiabao Yi, Dongchen Qi, Xiaojiang Yu, Mark B. H. Breese, Xin Zhang, Haojiang Wu, Hongyu Chen, Hongjun Xiangb, Chengbao Jiang, Zhiqi Liu

Abstract: The coupling of optical and electronic degrees of freedom together with quantum confinement in low-dimensional electron systems is particularly interesting for achieving exotic functionalities in strongly correlated oxide electronics. Recently, high room-temperature mobility has been achieved for a large bandgap transparent oxide - BaSnO$_3$ upon extrinsic La or Sb doping, which has excited signif… ▽ More The coupling of optical and electronic degrees of freedom together with quantum confinement in low-dimensional electron systems is particularly interesting for achieving exotic functionalities in strongly correlated oxide electronics. Recently, high room-temperature mobility has been achieved for a large bandgap transparent oxide - BaSnO$_3$ upon extrinsic La or Sb doping, which has excited significant research attention. In this work, we report the observation of room-temperature ferromagnetism in BaSnO$_3$ thin films and the realization of a two-dimensional electron gas (2DEG) on the surface of transparent BaSnO$_3$ via oxygen vacancy creation, which exhibits a high carrier density of $\sim 7.72*10^{14} /{\rm cm}^2$ and a high room-temperature mobility of ~18 cm$^2$/V/s. Such a 2DEG is rather sensitive to strain and a less than 0.1% in-plane biaxial compressive strain leads to a giant resistance enhancement of 350% (more than 540 kOhm/Square) at room temperature. Thus, this work creates a new path to exploring the physics of low-dimensional oxide electronics and devices applicable at room temperature. △ Less

Submitted 1 January, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: 28 pages, 7 figures

Journal ref: Acta Materialia 204, 116516 (2021)

arXiv:2010.15881 [pdf, other]

Less is More: Data-Efficient Complex Question Answering over Knowledge Bases

Authors: Yuncheng Hua, Yuan-Fang Li, Guilin Qi, Wei Wu, Jingyao Zhang, Daiqing Qi

Abstract: Question answering is an effective method for obtaining information from knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples. Our framework consists of a neural generator and a symbolic executor that, respectiv… ▽ More Question answering is an effective method for obtaining information from knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples. Our framework consists of a neural generator and a symbolic executor that, respectively, transforms a natural-language question into a sequence of primitive actions, and executes them over the knowledge base to compute the answer. We carefully formulate a set of primitive symbolic actions that allows us to not only simplify our neural network design but also accelerate model convergence. To reduce search space, we employ the copy and masking mechanisms in our encoder-decoder architecture to drastically reduce the decoder output vocabulary and improve model generalizability. We equip our model with a memory buffer that stores high-reward promising programs. Besides, we propose an adaptive reward function. By comparing the generated trial with the trials stored in the memory buffer, we derive the curriculum-guided reward bonus, i.e., the proximity and the novelty. To mitigate the sparse reward problem, we combine the adaptive reward and the reward bonus, reshaping the sparse reward into dense feedback. Also, we encourage the model to generate new trials to avoid imitating the spurious trials while making the model remember the past high-reward trials to improve data efficiency. Our NS-CQA model is evaluated on two datasets: CQA, a recent large-scale complex question answering dataset, and WebQuestionsSP, a multi-hop question answering dataset. On both datasets, our model outperforms the state-of-the-art models. Notably, on CQA, NS-CQA performs well on questions with higher complexity, while only using approximately 1% of the total training samples. △ Less

Submitted 29 October, 2020; originally announced October 2020.

Comments: 18 pages, 4 figures, published in JWS

arXiv:2006.10554 [pdf, other]

doi 10.1063/5.0018943

Dimits shift, avalanche-like bursts, and Solitary propagating structures in the two-field Flux-Balanced Hasegawa-Wakatani model for plasma edge turbulence

Authors: Di Qi, Andrew J. Majda, Antoine J. Cerfon

Abstract: We show that the recently introduced two-field flux-balanced Hasegawa-Wakatani (BHW) model captures the key features of drift-wave turbulent transport mediated by zonal flows observed in more complete and accurate gyrokinetic simulations, such as the existence of a nonlinear upshift of the threshold for drift wave turbulence driven transport, often called the Dimits shift, as well as non-local tra… ▽ More We show that the recently introduced two-field flux-balanced Hasegawa-Wakatani (BHW) model captures the key features of drift-wave turbulent transport mediated by zonal flows observed in more complete and accurate gyrokinetic simulations, such as the existence of a nonlinear upshift of the threshold for drift wave turbulence driven transport, often called the Dimits shift, as well as non-local transport with avalanche bursts and solitary propagating structures. Because of the approximations made in the BHW model, these observations are made for the particle flux instead of the heat flux more commonly studied in ion temperature gradient (ITG) driven turbulence in fluid or gyrokinetic codes. Many of these features are not seen in other Hasegawa-Wakatani models, which confirms the critical role of the electron dynamics parallel to the magnetic field lines. To address questions regarding the role of boundary conditions on the drift-wave zonal flow dynamics, we apply our model to both a channel domain geometry and the more typical doubly periodic geometry. We only observe strong soliton-like solutions in the particle flux for the channel geometry, in the vicinity of the boundaries, where strong velocity shear and density gradients are generated which are absent in the doubly periodic simulations. Changing the aspect ratio of the simulation domain also has a significant effect. In domains which are elongated in the radial direction, more complex multiscale dynamics takes place, with multiple zonal jets interacting with each other, and large scale avalanches. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 24 pages, 11 figures

Showing 1–50 of 91 results for author: Qi, D