Search | arXiv e-print repository

VCoME: Verbal Video Composition with Multimodal Editing Effects

Authors: Weibo Gong, Xiaojie Jin, Xin Li, Dongliang He, Xinglong Wu

Abstract: Verbal videos, featuring voice-overs or text overlays, provide valuable content but present significant challenges in composition, especially when incorporating editing effects to enhance clarity and visual appeal. In this paper, we introduce the novel task of verbal video composition with editing effects. This task aims to generate coherent and visually appealing verbal videos by integrating mult… ▽ More Verbal videos, featuring voice-overs or text overlays, provide valuable content but present significant challenges in composition, especially when incorporating editing effects to enhance clarity and visual appeal. In this paper, we introduce the novel task of verbal video composition with editing effects. This task aims to generate coherent and visually appealing verbal videos by integrating multimodal editing effects across textual, visual, and audio categories. To achieve this, we curate a large-scale dataset of video effects compositions from publicly available sources. We then formulate this task as a generative problem, involving the identification of appropriate positions in the verbal content and the recommendation of editing effects for these positions. To address this task, we propose VCoME, a general framework that employs a large multimodal model to generate editing effects for video composition. Specifically, VCoME takes in the multimodal video context and autoregressively outputs where to apply effects within the verbal content and which effects are most appropriate for each position. VCoME also supports prompt-based control of composition density and style, providing substantial flexibility for diverse applications. Through extensive quantitative and qualitative evaluations, we clearly demonstrate the effectiveness of VCoME. A comprehensive user study shows that our method produces videos of professional quality while being 85$\times$ more efficient than professional editors. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.05666 [pdf, other]

General Distribution Learning: A theoretical framework for Deep Learning

Authors: Binchuan Qi, Li Li, Wei Gong

Abstract: There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima for generalization, and the exceptional performance of deep architectures… ▽ More There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima for generalization, and the exceptional performance of deep architectures in solving physical problems. This paper introduces General Distribution Learning (GD Learning), a novel theoretical learning framework designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from traditional statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors due to models and algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solutions in non-convex optimization can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight leads to the development of the gradient structure control algorithm. GD Learning also offers fresh insights into the questions on deep learning, including overparameterization and non-convex optimization, bias-variance trade-off, and the mechanism of flat minima. △ Less

Submitted 26 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2105.04026 by other authors. arXiv admin note: text overlap with arXiv:2105.04026 by other authors

arXiv:2406.04567 [pdf, other]

Error Bounds of Supervised Classification from Information-Theoretic Perspective

Authors: Binchuan Qi, Wei Gong, Li Li

Abstract: There remains a list of unanswered research questions on deep learning (DL), including the remarkable generalization power of overparametrized neural networks, the efficient optimization performance despite the non-convexity, and the mechanisms behind flat minima in generalization. In this paper, we adopt an information-theoretic perspective to explore the theoretical foundations of supervised cla… ▽ More There remains a list of unanswered research questions on deep learning (DL), including the remarkable generalization power of overparametrized neural networks, the efficient optimization performance despite the non-convexity, and the mechanisms behind flat minima in generalization. In this paper, we adopt an information-theoretic perspective to explore the theoretical foundations of supervised classification using deep neural networks (DNNs). Our analysis introduces the concepts of fitting error and model risk, which, together with generalization error, constitute an upper bound on the expected risk. We demonstrate that the generalization errors are bounded by the complexity, influenced by both the smoothness of distribution and the sample size. Consequently, task complexity serves as a reliable indicator of the dataset's quality, guiding the setting of regularization hyperparameters. Furthermore, the derived upper bound fitting error links the back-propagated gradient, Neural Tangent Kernel (NTK), and the model's parameter count with the fitting error. Utilizing the triangle inequality, we establish an upper bound on the expected risk. This bound offers valuable insights into the effects of overparameterization, non-convex optimization, and the flat minima in DNNs.Finally, empirical verification confirms a significant positive correlation between the derived theoretical bounds and the practical expected risk, confirming the practical relevance of the theoretical findings. △ Less

Submitted 27 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.00734 [pdf, other]

GLADformer: A Mixed Perspective for Graph-level Anomaly Detection

Authors: Fan Xu, Nan Wang, Hao Wu, Xuezhi Wen, Dalin Zhang, Siyang Lu, Binyong Li, Wei Gong, Hai Wan, Xibin Zhao

Abstract: Graph-Level Anomaly Detection (GLAD) aims to distinguish anomalous graphs within a graph dataset. However, current methods are constrained by their receptive fields, struggling to learn global features within the graphs. Moreover, most contemporary methods are based on spatial domain and lack exploration of spectral characteristics. In this paper, we propose a multi-perspective hybrid graph-level… ▽ More Graph-Level Anomaly Detection (GLAD) aims to distinguish anomalous graphs within a graph dataset. However, current methods are constrained by their receptive fields, struggling to learn global features within the graphs. Moreover, most contemporary methods are based on spatial domain and lack exploration of spectral characteristics. In this paper, we propose a multi-perspective hybrid graph-level anomaly detector namely GLADformer, consisting of two key modules. Specifically, we first design a Graph Transformer module with global spectrum enhancement, which ensures balanced and resilient parameter distributions by fusing global features and spectral distribution characteristics. Furthermore, to uncover local anomalous attributes, we customize a band-pass spectral GNN message passing module that further enhances the model's generalization capability. Through comprehensive experiments on ten real-world datasets from multiple domains, we validate the effectiveness and robustness of GLADformer. This demonstrates that GLADformer outperforms current state-of-the-art models in graph-level anomaly detection, particularly in effectively capturing global anomaly representations and spectral characteristics. △ Less

Submitted 3 July, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.00770 [pdf, other]

Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises

Authors: Zhihan Zhang, Weiyuan Gong, Weikang Li, Dong-Ling Deng

Abstract: We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly wit… ▽ More We study quantum-classical separations between classical and quantum supervised learning models based on constant depth (i.e., shallow) circuits, in scenarios with and without noises. We construct a classification problem defined by a noiseless shallow quantum circuit and rigorously prove that any classical neural network with bounded connectivity requires logarithmic depth to output correctly with a larger-than-exponentially-small probability. This unconditional near-optimal quantum-classical separation originates from the quantum nonlocality property that distinguishes quantum circuits from their classical counterparts. We further derive the noise thresholds for demonstrating such a separation on near-term quantum devices under the depolarization noise model. We prove that this separation will persist if the noise strength is upper bounded by an inverse polynomial with respect to the system size, and vanish if the noise strength is greater than an inverse polylogarithmic function. In addition, for quantum devices with constant noise strength, we prove that no super-polynomial classical-quantum separation exists for any classification task defined by shallow Clifford circuits, independent of the structures of the circuits that specify the learning models. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 14 pages, 3 figures

arXiv:2404.19105 [pdf, other]

Optimal tradeoffs for estimating Pauli observables

Authors: Sitan Chen, Weiyuan Gong, Qi Ye

Abstract: We revisit the problem of Pauli shadow tomography: given copies of an unknown $n$-qubit quantum state $ρ$, estimate $\text{tr}(Pρ)$ for some set of Pauli operators $P$ to within additive error $ε$. This has been a popular testbed for exploring the advantage of protocols with quantum memory over those without: with enough memory to measure two copies at a time, one can use Bell sampling to estimate… ▽ More We revisit the problem of Pauli shadow tomography: given copies of an unknown $n$-qubit quantum state $ρ$, estimate $\text{tr}(Pρ)$ for some set of Pauli operators $P$ to within additive error $ε$. This has been a popular testbed for exploring the advantage of protocols with quantum memory over those without: with enough memory to measure two copies at a time, one can use Bell sampling to estimate $|\text{tr}(Pρ)|$ for all $P$ using $O(n/ε^4)$ copies, but with $k\le n$ qubits of memory, $Ω(2^{(n-k)/3})$ copies are needed. These results leave open several natural questions. How does this picture change in the physically relevant setting where one only needs to estimate a certain subset of Paulis? What is the optimal dependence on $ε$? What is the optimal tradeoff between quantum memory and sample complexity? We answer all of these questions. For any subset $A$ of Paulis and any family of measurement strategies, we completely characterize the optimal sample complexity, up to $\log |A|$ factors. We show any protocol that makes $\text{poly}(n)$-copy measurements must make $Ω(1/ε^4)$ measurements. For any protocol that makes $\text{poly}(n)$-copy measurements and only has $k < n$ qubits of memory, we show that $\widetildeΘ(\min\{2^n/ε^2, 2^{n-k}/ε^4\})$ copies are necessary and sufficient. The protocols we propose can also estimate the actual values $\text{tr}(Pρ)$, rather than just their absolute values as in prior work. Additionally, as a byproduct of our techniques, we establish tight bounds for the task of purity testing and show that it exhibits an intriguing phase transition not present in the memory-sample tradeoff for Pauli shadow tomography. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 59 pages, 1 figure

arXiv:2404.12529 [pdf, other]

A Survey of Bluetooth Indoor Localization

Authors: Taolei Shi, Wei Gong

Abstract: Nowadays, indoor localization has received extensive research interest due to more and more applications' needs for location information to provide a more precise and effective service [1], [2]. There are various wireless techniques and mechanisms that have been proposed; some of them have been studied in depth and come into use, such as Wi-Fi, RFID, and sensor networks. In comparison, the develop… ▽ More Nowadays, indoor localization has received extensive research interest due to more and more applications' needs for location information to provide a more precise and effective service [1], [2]. There are various wireless techniques and mechanisms that have been proposed; some of them have been studied in depth and come into use, such as Wi-Fi, RFID, and sensor networks. In comparison, the development of Bluetooth location technology is slow and there are not many papers and surveys in this field, although the performance and market value of Bluetooth are increasing steadily. In this paper, we aim to provide a detailed survey of various indoor localization systems with Bluetooth. In contrast with the existing surveys, we categorize the exciting localization techniques that have been proposed in the literature in order to sketch the development of Bluetooth location compared to other technologies. We also evaluate different systems from the perspective of availability, cost, scalability, and accuracy. We also discuss remaining problems and challenges to accurate Bluetooth localization. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 8 pages, 2 figures

arXiv:2404.09622 [pdf, other]

DIDLM:A Comprehensive Multi-Sensor Dataset with Infrared Cameras, Depth Cameras, LiDAR, and 4D Millimeter-Wave Radar in Challenging Scenarios for 3D Mapping

Authors: WeiSheng Gong, Chen He, KaiJie Su, QingYong Li

Abstract: This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as… ▽ More This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as rain, snow, and uneven road surfaces. The dataset also includes interactive robot data at different speeds indoors and outdoors, providing a realistic background environment. Slam comparisons between similar routes are conducted, analyzing the influence of different complex scenes on various sensors. Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios. In summary, this dataset addresses the problem of data scarcity in special environments, fostering the development of perception and mapping algorithms for extreme conditions. Leveraging multi-sensor data including infrared, depth cameras, LiDAR, 4D millimeter-wave radar, and robot interactions, the dataset advances intelligent mapping and perception capabilities.Our dataset is available at https://github.com/GongWeiSheng/DIDLM. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2403.13869 [pdf, other]

Accurately Predicting Probabilities of Safety-Critical Rare Events for Intelligent Systems

Authors: Ruoxuan Bai, Jingxuan Yang, Weiduo Gong, Yi Zhang, Qiujing Lu, Shuo Feng

Abstract: Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality… ▽ More Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality arises from the extreme data imbalance caused by rare events in high dimensional variables associated with the rare events, a challenge we refer to as the curse of rarity. Existing methods tend to be either overly conservative or prone to overlooking safety-critical events, thus struggling to achieve both high precision and recall rates, which severely limits their applicability. This study endeavors to develop a criticality prediction model that excels in both precision and recall rates for evaluating the criticality of safety-critical autonomous systems. We propose a multi-stage learning framework designed to progressively densify the dataset, mitigating the curse of rarity across stages. To validate our approach, we evaluate it in two cases: lunar lander and bipedal walker scenarios. The results demonstrate that our method surpasses traditional approaches, providing a more accurate and dependable assessment of criticality in intelligent systems. △ Less

Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.01736 [pdf, other]

Lightweight Object Detection: A Study Based on YOLOv7 Integrated with ShuffleNetv2 and Vision Transformer

Authors: Wenkai Gong

Abstract: As mobile computing technology rapidly evolves, deploying efficient object detection algorithms on mobile devices emerges as a pivotal research area in computer vision. This study zeroes in on optimizing the YOLOv7 algorithm to boost its operational efficiency and speed on mobile platforms while ensuring high accuracy. Leveraging a synergy of advanced techniques such as Group Convolution, ShuffleN… ▽ More As mobile computing technology rapidly evolves, deploying efficient object detection algorithms on mobile devices emerges as a pivotal research area in computer vision. This study zeroes in on optimizing the YOLOv7 algorithm to boost its operational efficiency and speed on mobile platforms while ensuring high accuracy. Leveraging a synergy of advanced techniques such as Group Convolution, ShuffleNetV2, and Vision Transformer, this research has effectively minimized the model's parameter count and memory usage, streamlined the network architecture, and fortified the real-time object detection proficiency on resource-constrained devices. The experimental outcomes reveal that the refined YOLO model demonstrates exceptional performance, markedly enhancing processing velocity while sustaining superior detection accuracy. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.12381 [pdf, other]

Constrained Multi-objective Optimization with Deep Reinforcement Learning Assisted Operator Selection

Authors: Fei Ming, Wenyin Gong, Ling Wang, Yaochu Jin

Abstract: Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention. Various constrained multi-objective optimization evolutionary algorithms (CMOEAs) have been developed with the use of different algorithmic strategies, evolutionary operators, and constraint-handling techniques. The performance of CMOEAs may be heavily dependent on the opera… ▽ More Solving constrained multi-objective optimization problems with evolutionary algorithms has attracted considerable attention. Various constrained multi-objective optimization evolutionary algorithms (CMOEAs) have been developed with the use of different algorithmic strategies, evolutionary operators, and constraint-handling techniques. The performance of CMOEAs may be heavily dependent on the operators used, however, it is usually difficult to select suitable operators for the problem at hand. Hence, improving operator selection is promising and necessary for CMOEAs. This work proposes an online operator selection framework assisted by Deep Reinforcement Learning. The dynamics of the population, including convergence, diversity, and feasibility, are regarded as the state; the candidate operators are considered as actions; and the improvement of the population state is treated as the reward. By using a Q-Network to learn a policy to estimate the Q-values of all actions, the proposed approach can adaptively select an operator that maximizes the improvement of the population according to the current state and thereby improve the algorithmic performance. The framework is embedded into four popular CMOEAs and assessed on 42 benchmark problems. The experimental results reveal that the proposed Deep Reinforcement Learning-assisted operator selection significantly improves the performance of these CMOEAs and the resulting algorithm obtains better versatility compared to nine state-of-the-art CMOEAs. △ Less

Submitted 15 January, 2024; originally announced February 2024.

arXiv:2402.06665 [pdf, other]

The Essential Role of Causality in Foundation World Models for Embodied AI

Authors: Tarun Gupta, Wenbo Gong, Chao Ma, Nick Pawlowski, Agrin Hilmkil, Meyer Scetbon, Marc Rigter, Ade Famoti, Ashley Juan Llorens, Jianfeng Gao, Stefan Bauer, Danica Kragic, Bernhard Schölkopf, Cheng Zhang

Abstract: Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for E… ▽ More Recent advances in foundation models, especially in large multi-modal models and conversational agents, have ignited interest in the potential of generally capable embodied agents. Such agents will require the ability to perform new tasks in many different real-world environments. However, current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI. The study of causality lends itself to the construction of veridical world models, which are crucial for accurately predicting the outcomes of possible interactions. This paper focuses on the prospects of building foundation world models for the upcoming generation of embodied agents and presents a novel viewpoint on the significance of causality within these. We posit that integrating causal considerations is vital to facilitating meaningful physical interactions with the world. Finally, we demystify misconceptions about causality in this context and present our outlook for future research. △ Less

Submitted 29 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.00763 [pdf, other]

360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming

Authors: Jiayang Bai, Letian Huang, Jie Guo, Wen Gong, Yuanqi Li, Yanwen Guo

Abstract: 3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto th… ▽ More 3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of ${360^\circ}$ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (e.g., walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel $360^{\circ}$ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 11 pages, 10 figures

arXiv:2401.15635 [pdf, other]

doi 10.1145/3589334.3645533

RecDCL: Dual Contrastive Learning for Recommendation

Authors: Dan Zhang, Yangliao Geng, Wenwen Gong, Zhongang Qi, Zhiyu Chen, Xing Tang, Ying Shan, Yuxiao Dong, Jie Tang

Abstract: Self-supervised learning (SSL) has recently achieved great success in mining the user-item interactions for collaborative filtering. As a major paradigm, contrastive learning (CL) based SSL helps address data sparsity in Web platforms by contrasting the embeddings between raw and augmented data. However, existing CL-based methods mostly focus on contrasting in a batch-wise way, failing to exploit… ▽ More Self-supervised learning (SSL) has recently achieved great success in mining the user-item interactions for collaborative filtering. As a major paradigm, contrastive learning (CL) based SSL helps address data sparsity in Web platforms by contrasting the embeddings between raw and augmented data. However, existing CL-based methods mostly focus on contrasting in a batch-wise way, failing to exploit potential regularity in the feature dimension. This leads to redundant solutions during the representation learning of users and items. In this work, we investigate how to employ both batch-wise CL (BCL) and feature-wise CL (FCL) for recommendation. We theoretically analyze the relation between BCL and FCL, and find that combining BCL and FCL helps eliminate redundant solutions but never misses an optimal solution. We propose a dual contrastive learning recommendation framework -- RecDCL. In RecDCL, the FCL objective is designed to eliminate redundant solutions on user-item positive pairs and to optimize the uniform distributions within users and items using a polynomial kernel for driving the representations to be orthogonal; The BCL objective is utilized to generate contrastive embeddings on output vectors for enhancing the robustness of the representations. Extensive experiments on four widely-used benchmarks and one industry dataset demonstrate that RecDCL can consistently outperform the state-of-the-art GNNs-based and SSL-based models (with an improvement of up to 5.65\% in terms of Recall@20). The source code is publicly available (https://github.com/THUDM/RecDCL). △ Less

Submitted 18 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: Accepted to WWW 2024

Journal ref: Proceedings of TheWebConf 2024 (WWW '24), May 13--17, 2024, Singapore

arXiv:2401.12247 [pdf]

doi 10.1108/INTR-08-2020-0460

Exploring consumers response to text-based chatbots in e-commerce: The moderating role of task complexity and chatbot disclosure

Authors: Xusen Cheng, Ying Bao, Alex Zarifis, Wankun Gong, Jian Mou

Abstract: Artificial intelligence based chatbots have brought unprecedented business potential. This study aims to explore consumers trust and response to a text-based chatbot in ecommerce, involving the moderating effects of task complexity and chatbot identity disclosure. A survey method with 299 useable responses was conducted in this research. This study adopted the ordinary least squares regression to… ▽ More Artificial intelligence based chatbots have brought unprecedented business potential. This study aims to explore consumers trust and response to a text-based chatbot in ecommerce, involving the moderating effects of task complexity and chatbot identity disclosure. A survey method with 299 useable responses was conducted in this research. This study adopted the ordinary least squares regression to test the hypotheses. First, the consumers perception of both the empathy and friendliness of the chatbot positively impacts their trust in it. Second, task complexity negatively moderates the relationship between friendliness and consumers trust. Third, disclosure of the text based chatbot negatively moderates the relationship between empathy and consumers trust, while it positively moderates the relationship between friendliness and consumers trust. Fourth, consumers trust in the chatbot increases their reliance on the chatbot and decreases their resistance to the chatbot in future interactions. Adopting the stimulus organism response framework, this study provides important insights on consumers perception and response to the text-based chatbot. The findings of this research also make suggestions that can increase consumers positive responses to text based chatbots. Extant studies have investigated the effects of automated bots attributes on consumers perceptions. However, the boundary conditions of these effects are largely ignored. This research is one of the first attempts to provide a deep understanding of consumers responses to a chatbot. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: Internet Research (2021)

arXiv:2312.08867 [pdf, other]

Complexity of Digital Quantum Simulation in the Low-Energy Subspace: Applications and a Lower Bound

Authors: Weiyuan Gong, Shuo Zhou, Tongyang Li

Abstract: Digital quantum simulation has broad applications in approximating unitary evolution of Hamiltonians. In practice, many simulation tasks for quantum systems focus on quantum states in the low-energy subspace instead of the entire Hilbert space. In this paper, we systematically investigate the complexity of digital quantum simulation based on product formulas in the low-energy subspace. We show tha… ▽ More Digital quantum simulation has broad applications in approximating unitary evolution of Hamiltonians. In practice, many simulation tasks for quantum systems focus on quantum states in the low-energy subspace instead of the entire Hilbert space. In this paper, we systematically investigate the complexity of digital quantum simulation based on product formulas in the low-energy subspace. We show that the simulation error depends on the effective low-energy norm of the Hamiltonian for a variety of digital quantum simulation algorithms and quantum systems, allowing improvements over the previous complexities for full unitary simulations even for imperfect state preparations {due to thermalization}. In particular, for simulating spin models in the low-energy subspace, we prove that randomized product formulas such as qDRIFT and random permutation require smaller Trotter numbers. Such improvement also persists in symmetry-protected digital quantum simulations. We prove a similar improvement in simulating the dynamics of power-law quantum interactions. We also provide a query lower bound for general digital quantum simulations in the low-energy subspace. △ Less

Submitted 1 July, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 34 pages, 4 figures, github repo: https://github.com/Qubit-Fernand/Digital-Quantum-Simulation

arXiv:2312.08134 [pdf, other]

MToP: A MATLAB Optimization Platform for Evolutionary Multitasking

Authors: Yanchi Li, Wenyin Gong, Fei Ming, Tingyu Zhang, Shuijia Li, Qiong Gu

Abstract: Evolutionary multitasking (EMT) has emerged as a popular topic of evolutionary computation over the past years. It aims to concurrently address multiple optimization tasks within limited computing resources, leveraging inter-task knowledge transfer techniques. Despite the abundance of multitask evolutionary algorithms (MTEAs) proposed for multitask optimization (MTO), there remains a comprehensive… ▽ More Evolutionary multitasking (EMT) has emerged as a popular topic of evolutionary computation over the past years. It aims to concurrently address multiple optimization tasks within limited computing resources, leveraging inter-task knowledge transfer techniques. Despite the abundance of multitask evolutionary algorithms (MTEAs) proposed for multitask optimization (MTO), there remains a comprehensive software platform to help researchers evaluate MTEA performance on benchmark MTO problems as well as explore real-world applications. To bridge this gap, we introduce the first open-source optimization platform, named MTO-Platform (MToP), for EMT. MToP incorporates over 40 MTEAs, more than 150 MTO problem cases with real-world applications, and over 10 performance metrics. Moreover, to facilitate comparative analyses between MTEAs and traditional evolutionary algorithms, we adapted over 40 popular single-task evolutionary algorithms to address MTO problems. MToP boasts a user-friendly graphical interface, facilitating results analysis, data export, and schematics visualization. More importantly, MToP is designed with extensibility in mind, allowing users to develop new algorithms and tackle emerging problem domains. The source code of MToP is available at https://github.com/intLyc/MTO-Platform. △ Less

Submitted 9 April, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.04109 [pdf, other]

doi 10.1109/TMC.2024.3412751

Bridge the Present and Future: A Cross-Layer Matching Game in Dynamic Cloud-Aided Mobile Edge Networks

Authors: Houyi Qi, Minghui Liwang, Xianbin Wang, Li Li, Wei Gong, Jian Jin, Zhenzhen Jiao

Abstract: Cloud-aided mobile edge networks (CAMENs) allow edge servers (ESs) to purchase resources from remote cloud servers (CSs), while overcoming resource shortage when handling computation-intensive tasks of mobile users (MUs). Conventional trading mechanisms (e.g., onsite trading) confront many challenges, including decision-making overhead (e.g., latency) and potential trading failures. This paper inv… ▽ More Cloud-aided mobile edge networks (CAMENs) allow edge servers (ESs) to purchase resources from remote cloud servers (CSs), while overcoming resource shortage when handling computation-intensive tasks of mobile users (MUs). Conventional trading mechanisms (e.g., onsite trading) confront many challenges, including decision-making overhead (e.g., latency) and potential trading failures. This paper investigates a series of cross-layer matching mechanisms to achieve stable and cost-effective resource provisioning across different layers (i.e., MUs, ESs, CSs), seamlessly integrated into a novel hybrid paradigm that incorporates futures and spot trading. In futures trading, we explore an overbooking-driven aforehand cross-layer matching (OA-CLM) mechanism, facilitating two future contract types: contract between MUs and ESs, and contract between ESs and CSs, while assessing potential risks under historical statistical analysis. In spot trading, we design two backup plans respond to current network/market conditions: determination on contractual MUs that should switch to local processing from edge/cloud services; and an onsite cross-layer matching (OS-CLM) mechanism that engages participants in real-time practical transactions. We next show that our matching mechanisms theoretically satisfy stability, individual rationality, competitive equilibrium, and weak Pareto optimality. Comprehensive simulations in real-world and numerical network settings confirm the corresponding efficacy, while revealing remarkable improvements in time/energy efficiency and social welfare. △ Less

Submitted 8 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Journal ref: IEEE Transactions on Mobile Computing,2024

arXiv:2311.03309 [pdf, other]

Neural Structure Learning with Stochastic Differential Equations

Authors: Benjie Wang, Joel Jennings, Wenbo Gong

Abstract: Discovering the underlying relationships among variables from temporal observations has been a longstanding challenge in numerous scientific disciplines, including biology, finance, and climate science. The dynamics of such systems are often best described using continuous-time stochastic processes. Unfortunately, most existing structure learning approaches assume that the underlying process evolv… ▽ More Discovering the underlying relationships among variables from temporal observations has been a longstanding challenge in numerous scientific disciplines, including biology, finance, and climate science. The dynamics of such systems are often best described using continuous-time stochastic processes. Unfortunately, most existing structure learning approaches assume that the underlying process evolves in discrete-time and/or observations occur at regular time intervals. These mismatched assumptions can often lead to incorrect learned structures and models. In this work, we introduce a novel structure learning method, SCOTCH, which combines neural stochastic differential equations (SDE) with variational inference to infer a posterior distribution over possible structures. This continuous-time approach can naturally handle both learning from and predicting observations at arbitrary time points. Theoretically, we establish sufficient conditions for an SDE and SCOTCH to be structurally identifiable, and prove its consistency under infinite data limits. Empirically, we demonstrate that our approach leads to improved structure learning performance on both synthetic and real-world datasets compared to relevant baselines under regular and irregular sampling intervals. △ Less

Submitted 5 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: ICLR 2024

arXiv:2309.14326 [pdf, other]

Efficient Pauli channel estimation with logarithmic quantum memory

Authors: Sitan Chen, Weiyuan Gong

Abstract: Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $ε$. Prior work (Chen et al., 2022) proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must… ▽ More Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $ε$. Prior work (Chen et al., 2022) proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must make exponentially many measurements, provided it is non-concatenating. Such protocols can only interact with the channel by repeatedly preparing a state, passing it through the channel, and measuring immediately afterward. This left open a natural question: does the lower bound hold even for general protocols, i.e. ones which chain together many queries to the channel, interleaved with arbitrary data-processing channels, before measuring? Surprisingly, in this work we show the opposite: there is a protocol that can estimate the eigenvalues of a Pauli channel to error $ε$ using only $O(\log n/ε^2)$ ancilla qubits and $\tilde{O}(n^2/ε^2)$ measurements. In contrast, we show that any protocol with zero ancilla, even a concatenating one, must make $Ω(2^n/ε^2)$ measurements, which is tight. Our results imply, to our knowledge, the first quantum learning task where logarithmically many qubits of quantum memory suffice for an exponential statistical advantage. △ Less

Submitted 30 November, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: 57 pages, 3 figures

arXiv:2308.00531 [pdf, ps, other]

Adaptive Bitrate Video Semantic Communication over Wireless Networks

Authors: Wentao Gong, Haonan Tong, Sihua Wang, Zhaohui Yang, Xinxin He, Changchuan Yin

Abstract: This paper investigates the adaptive bitrate (ABR) video semantic communication over wireless networks. In the considered model, video sensing devices must transmit video semantic information to an edge server, to facilitate ubiquitous video sensing services such as road environment monitoring at the edge server in autonomous driving scenario. However, due to the varying wireless network condition… ▽ More This paper investigates the adaptive bitrate (ABR) video semantic communication over wireless networks. In the considered model, video sensing devices must transmit video semantic information to an edge server, to facilitate ubiquitous video sensing services such as road environment monitoring at the edge server in autonomous driving scenario. However, due to the varying wireless network conditions, it is challenging to guarantee both low transmission delay and high semantic accuracy at the same time if devices continuously transmit a fixed bitrate video semantic information. To address this challenge, we develop an adaptive bitrate video semantic communication (ABRVSC) system, in which devices adaptively adjust the bitrate of video semantic information according to network conditions. Specifically, we first define the quality of experience (QoE) for video semantic communication. Subsequently, a swin transformer-based semantic codec is proposed to extract semantic information with considering the influence of QoE. Then, we propose an Actor-Critic based ABR algorithm for the semantic codec to enhance the robustness of the proposed ABRVSC scheme against network variations. Simulation results demonstrate that at low bitrates, the mean intersection over union (MIoU) of the proposed ABRVSC scheme is nearly twice that of the traditional scheme. Moreover, the proposed ABRVSC scheme, which increases the QoE in video semantic communication by 36.57%, exhibits more robustness against network variations compared to both the fixed bitrate schemes and traditional ABR schemes. △ Less

Submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.13917 [pdf, other]

BayesDAG: Gradient-Based Posterior Inference for Causal Discovery

Authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong

Abstract: Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existin… ▽ More Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on a combination of stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and Variational Inference (VI) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluation on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines. △ Less

Submitted 8 December, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023

arXiv:2307.13028 [pdf, other]

Improved Digital Quantum Simulation by Non-Unitary Channels

Authors: W. Gong, Yaroslav Kharkov, Minh C. Tran, Przemyslaw Bienias, Alexey V. Gorshkov

Abstract: Simulating quantum systems is one of the most promising avenues to harness the computational power of quantum computers. However, hardware errors in noisy near-term devices remain a major obstacle for applications. Ideas based on the randomization of Suzuki-Trotter product formulas have been shown to be a powerful approach to reducing the errors of quantum simulation and lowering the gate count. I… ▽ More Simulating quantum systems is one of the most promising avenues to harness the computational power of quantum computers. However, hardware errors in noisy near-term devices remain a major obstacle for applications. Ideas based on the randomization of Suzuki-Trotter product formulas have been shown to be a powerful approach to reducing the errors of quantum simulation and lowering the gate count. In this paper, we study the performance of non-unitary simulation channels and consider the error structure of channels constructed from a weighted average of unitary circuits. We show that averaging over just a few simulation circuits can significantly reduce the Trotterization error for both single-step short-time and multi-step long-time simulations. We focus our analysis on two approaches for constructing circuit ensembles for averaging: (i) permuting the order of the terms in the Hamiltonian and (ii) applying a set of global symmetry transformations. We compare our analytical error bounds to empirical performance and show that empirical error reduction surpasses our analytical estimates in most cases. Finally, we test our method on an IonQ trapped-ion quantum computer accessed via the Amazon Braket cloud platform, and benchmark the performance of the averaging approach. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 24 pages, 9 figures

arXiv:2306.06629 [pdf, other]

GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Authors: Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao, Peng Zhang, Jie Tang

Abstract: Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices. However, the deployment of knowledge distillation systems faces great challenges in real-world industrial-strength applications, which require the use of complex distillation methods on even larger-s… ▽ More Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices. However, the deployment of knowledge distillation systems faces great challenges in real-world industrial-strength applications, which require the use of complex distillation methods on even larger-scale PLMs (over 10B), limited by memory on GPUs and the switching of methods. To overcome these challenges, we propose GKD, a general knowledge distillation framework that supports distillation on larger-scale PLMs using various distillation methods. With GKD, developers can build larger distillation models on memory-limited GPUs and easily switch and combine different distillation methods within a single framework. Experimental results show that GKD can support the distillation of at least 100B-scale PLMs and 25 mainstream methods on 8 NVIDIA A100 (40GB) GPUs. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: accepted for ACL 2023 industry track

arXiv:2306.06625 [pdf, other]

Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method

Authors: Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Shu Zhao, Peng Zhang, Jie Tang

Abstract: The large scale of pre-trained language models poses a challenge for their deployment on various devices, with a growing emphasis on methods to compress these models, particularly knowledge distillation. However, current knowledge distillation methods rely on the model's intermediate layer features and the golden labels (also called hard labels), which usually require aligned model architecture an… ▽ More The large scale of pre-trained language models poses a challenge for their deployment on various devices, with a growing emphasis on methods to compress these models, particularly knowledge distillation. However, current knowledge distillation methods rely on the model's intermediate layer features and the golden labels (also called hard labels), which usually require aligned model architecture and enough labeled data respectively. Moreover, the parameters of vocabulary are usually neglected in existing methods. To address these problems, we propose a general language model distillation (GLMD) method that performs two-stage word prediction distillation and vocabulary compression, which is simple and surprisingly shows extremely strong performance. Specifically, GLMD supports more general application scenarios by eliminating the constraints of dimension and structure between models and the need for labeled datasets through the absence of intermediate layers and golden labels. Meanwhile, based on the long-tailed distribution of word frequencies in the data, GLMD designs a strategy of vocabulary compression through decreasing vocabulary size instead of dimensionality. Experimental results show that our method outperforms 25 state-of-the-art methods on the SuperGLUE benchmark, achieving an average score that surpasses the best method by 3%. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: Accepted to Findings of ACL2023

arXiv:2305.09331 [pdf, other]

Energy-Efficient WiFi Backscatter Communication for Green IoTs

Authors: Yimeng Huang, Lijie Liu, Jihong Yu, Yuguang Fang, Wei Gong

Abstract: The boom of the Internet of Things has revolutionized people's lives, but it has also resulted in massive resource consumption and environmental pollution. Recently, Green IoT (GIoT) has become a worldwide consensus to address this issue. In this paper, we propose EEWScatter, an energy-efficient WiFi backscatter communication system to pursue the goal of GIoT. Unlike previous backscatter systems t… ▽ More The boom of the Internet of Things has revolutionized people's lives, but it has also resulted in massive resource consumption and environmental pollution. Recently, Green IoT (GIoT) has become a worldwide consensus to address this issue. In this paper, we propose EEWScatter, an energy-efficient WiFi backscatter communication system to pursue the goal of GIoT. Unlike previous backscatter systems that solely focus on tags, our approach offers a comprehensive system-wide view on energy conservation. Specifically, we reuse ambient signals as carriers and utilize an ultra-low-power and battery-free design for tag nodes by backscatter. Further, we design a new CRC-based algorithm that enables the demodulation of both ambient and tag data by only a single receiver while using ambient carriers. Such a design eliminates system reliance on redundant transceivers with high power consumption. Results demonstrate that EEWScatter achieves the lowest overall system power consumption and saves at least half of the energy. What's more, the power consumption of our tag is only 1/1000 of that of active radio. We believe that EEWScatter is a critical step towards a sustainable future. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.06563 [pdf, other]

Spatiotemporal Regularized Tucker Decomposition Approach for Traffic Data Imputation

Authors: Wenwu Gong, Zhejun Huang, Lili Yang

Abstract: In intelligent transportation systems, traffic data imputation, estimating the missing value from partially observed data is an inevitable and challenging task. Previous studies have not fully considered traffic data's multidimensionality and spatiotemporal correlations, but they are vital to traffic data recovery, especially for high-level missing scenarios. To address this problem, we propose a… ▽ More In intelligent transportation systems, traffic data imputation, estimating the missing value from partially observed data is an inevitable and challenging task. Previous studies have not fully considered traffic data's multidimensionality and spatiotemporal correlations, but they are vital to traffic data recovery, especially for high-level missing scenarios. To address this problem, we propose a novel spatiotemporal regularized Tucker decomposition method. First, the traffic matrix is converted into a third-order tensor. Then, based on Tucker decomposition, the tensor is approximated by multiplying non-negative factor matrices with a sparse core tensor. Notably, we do not need to set the tensor rank or determine it through matrix nuclear-norm minimization or tensor rank minimization. The low rankness is characterized by the $l_1$-norm of the core tensor, while the manifold regularization and temporal constraint are employed to capture spatiotemporal correlations and further improve imputation performance. We use an alternating proximal gradient method with guaranteed convergence to address the proposed model. Numerical experiments show that our proposal outperforms matrix-based and tensor-based baselines on real-world spatiotemporal traffic datasets in various missing scenarios. △ Less

Submitted 30 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2304.05524 [pdf, other]

Understanding Causality with Large Language Models: Feasibility and Opportunities

Authors: Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski, James Vaughan

Abstract: We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decisio… ▽ More We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decision-making tasks with high precision. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules as well as deep causal-aware LLMs. These will not only enable LLMs to answer many different types of causal questions for greater impact but also enable LLMs to be more trustworthy and efficient in general. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2303.12363 [pdf]

Distribution-restrained Softmax Loss for the Model Robustness

Authors: Hao Wang, Chen Li, Jinzhe Jiang, Xin Zhang, Yaqian Zhao, Weifeng Gong

Abstract: Recently, the robustness of deep learning models has received widespread attention, and various methods for improving model robustness have been proposed, including adversarial training, model architecture modification, design of loss functions, certified defenses, and so on. However, the principle of the robustness to attacks is still not fully understood, also the related research is still not s… ▽ More Recently, the robustness of deep learning models has received widespread attention, and various methods for improving model robustness have been proposed, including adversarial training, model architecture modification, design of loss functions, certified defenses, and so on. However, the principle of the robustness to attacks is still not fully understood, also the related research is still not sufficient. Here, we have identified a significant factor that affects the robustness of models: the distribution characteristics of softmax values for non-real label samples. We found that the results after an attack are highly correlated with the distribution characteristics, and thus we proposed a loss function to suppress the distribution diversity of softmax. A large number of experiments have shown that our method can improve robustness without significant time consumption. △ Less

Submitted 22 March, 2023; originally announced March 2023.

MSC Class: 68T45

arXiv:2303.08969 [pdf, other]

Relative coordinates are crucial for Ulam's "trick to the train of thought"

Authors: Weibo Gong, Chirag S. Trasikar, Bradley Zylstra

Abstract: Spatial signal processing algorithms often use pre-given coordinate systems to label pixel positions. These processing algorithms are thus burdened by an external reference grid, making the acquisition of relative, intrinsic features difficult. This is in contrast to animal vision and cognition: animals recognize features without an external coordinate system. We show that a coordinate system-inde… ▽ More Spatial signal processing algorithms often use pre-given coordinate systems to label pixel positions. These processing algorithms are thus burdened by an external reference grid, making the acquisition of relative, intrinsic features difficult. This is in contrast to animal vision and cognition: animals recognize features without an external coordinate system. We show that a coordinate system-independent algorithm for visual signal processing is not only important for animal vision, but also fundamental for concept formation. In this paper we start with a visual object deformation transfer experiment. We then formulate an algorithm that achieves deformation-invariance with relative coordinates. The paper concludes with implications for general concept formation. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 19 pages, 10 figures, conference

ACM Class: I.2.0

arXiv:2303.05779 [pdf, other]

CRC-based Reliable WiFi Backscatter Communiation for Supply Chain Management

Authors: Yun-Hao Liu, Tao Liu, Yimeng Huang, Han Ding, Wei Xi, Wei Gong

Abstract: Supply chain management is aimed to keep going long-term performance of the supply chain and minimize the costs. Backscatter technology provides a more efficient way of being able to identify items and real-time monitoring. Among the backscatter systems, the ambient backscatter communication (AmBC) system provides a prospect of ultra-low energy consumption and does not require controlled excitatio… ▽ More Supply chain management is aimed to keep going long-term performance of the supply chain and minimize the costs. Backscatter technology provides a more efficient way of being able to identify items and real-time monitoring. Among the backscatter systems, the ambient backscatter communication (AmBC) system provides a prospect of ultra-low energy consumption and does not require controlled excitation devices. In this paper, we introduce CRCScatter, a CRC reverse algorithm-based AmBC system using a single access point (AP). A CRC reverse decoder is applied to reverse the ambient data from CRC32 sequence in the backscatter packet and realize single-AP decoding. Based on the nature of DBPSK modulation in WiFi signal, the CRCScatter system obtains the tag data by XOR and Differential decoder. Our simulation results verify the effectiveness of our proposed system in the low SNR regime. The average decoding time of CRCScatter system is independent of the length of tag data. Furthermore, our system can append redundant bits in the tag data to improve the decoding accuracy while not increasing the decoding time. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.05775 [pdf, other]

Self-NeRF: A Self-Training Pipeline for Few-Shot Neural Radiance Fields

Authors: Jiayang Bai, Letian Huang, Wen Gong, Jie Guo, Yanwen Guo

Abstract: Recently, Neural Radiance Fields (NeRF) have emerged as a potent method for synthesizing novel views from a dense set of images. Despite its impressive performance, NeRF is plagued by its necessity for numerous calibrated views and its accuracy diminishes significantly in a few-shot setting. To address this challenge, we propose Self-NeRF, a self-evolved NeRF that iteratively refines the radiance… ▽ More Recently, Neural Radiance Fields (NeRF) have emerged as a potent method for synthesizing novel views from a dense set of images. Despite its impressive performance, NeRF is plagued by its necessity for numerous calibrated views and its accuracy diminishes significantly in a few-shot setting. To address this challenge, we propose Self-NeRF, a self-evolved NeRF that iteratively refines the radiance fields with very few number of input views, without incorporating additional priors. Basically, we train our model under the supervision of reference and unseen views simultaneously in an iterative procedure. In each iteration, we label unseen views with the predicted colors or warped pixels generated by the model from the preceding iteration. However, these expanded pseudo-views are afflicted by imprecision in color and warping artifacts, which degrades the performance of NeRF. To alleviate this issue, we construct an uncertainty-aware NeRF with specialized embeddings. Some techniques such as cone entropy regularization are further utilized to leverage the pseudo-views in the most efficient manner. Through experiments under various settings, we verified that our Self-NeRF is robust to input with uncertainty and surpasses existing methods when trained on limited training data. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: 11 pages, 11 figures

arXiv:2302.10697 [pdf, other]

doi 10.1109/TCSVT.2023.3284076

A Visual Representation-guided Framework with Global Affinity for Weakly Supervised Salient Object Detection

Authors: Binwei Xu, Haoran Liang, Weihua Gong, Ronghua Liang, Peng Chen

Abstract: Fully supervised salient object detection (SOD) methods have made considerable progress in performance, yet these models rely heavily on expensive pixel-wise labels. Recently, to achieve a trade-off between labeling burden and performance, scribble-based SOD methods have attracted increasing attention. Previous scribble-based models directly implement the SOD task only based on SOD training data w… ▽ More Fully supervised salient object detection (SOD) methods have made considerable progress in performance, yet these models rely heavily on expensive pixel-wise labels. Recently, to achieve a trade-off between labeling burden and performance, scribble-based SOD methods have attracted increasing attention. Previous scribble-based models directly implement the SOD task only based on SOD training data with limited information, it is extremely difficult for them to understand the image and further achieve a superior SOD task. In this paper, we propose a simple yet effective framework guided by general visual representations with rich contextual semantic knowledge for scribble-based SOD. These general visual representations are generated by self-supervised learning based on large-scale unlabeled datasets. Our framework consists of a task-related encoder, a general visual module, and an information integration module to efficiently combine the general visual representations with task-related features to perform the SOD task based on understanding the contextual connections of images. Meanwhile, we propose a novel global semantic affinity loss to guide the model to perceive the global structure of the salient objects. Experimental results on five public benchmark datasets demonstrate that our method, which only utilizes scribble annotations without introducing any extra label, outperforms the state-of-the-art weakly supervised SOD methods. Specifically, it outperforms the previous best scribble-based method on all datasets with an average gain of 5.5% for max f-measure, 5.8% for mean f-measure, 24% for MAE, and 3.1% for E-measure. Moreover, our method achieves comparable or even superior performance to the state-of-the-art fully supervised models. △ Less

Submitted 8 June, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

arXiv:2301.07868 [pdf, other]

MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval

Authors: Xiaojie Jin, Bowen Zhang, Weibo Gong, Kai Xu, XueQing Deng, Peng Wang, Zhao Zhang, Xiaohui Shen, Jiashi Feng

Abstract: State-of-the-art video-text retrieval (VTR) methods typically involve fully fine-tuning a pre-trained model (e.g. CLIP) on specific datasets. However, this can result in significant storage costs in practical applications as a separate model per task must be stored. To address this issue, we present our pioneering work that enables parameter-efficient VTR using a pre-trained model, with only a sma… ▽ More State-of-the-art video-text retrieval (VTR) methods typically involve fully fine-tuning a pre-trained model (e.g. CLIP) on specific datasets. However, this can result in significant storage costs in practical applications as a separate model per task must be stored. To address this issue, we present our pioneering work that enables parameter-efficient VTR using a pre-trained model, with only a small number of tunable parameters during training. Towards this goal, we propose a new method dubbed Multimodal Video Adapter (MV-Adapter) for efficiently transferring the knowledge in the pre-trained CLIP from image-text to video-text. Specifically, MV-Adapter utilizes bottleneck structures in both video and text branches, along with two novel components. The first is a Temporal Adaptation Module that is incorporated in the video branch to introduce global and local temporal contexts. We also train weights calibrations to adjust to dynamic variations across frames. The second is Cross Modality Tying that generates weights for video/text branches through sharing cross modality factors, for better aligning between modalities. Thanks to above innovations, MV-Adapter can achieve comparable or better performance than standard full fine-tuning with negligible parameters overhead. Notably, MV-Adapter consistently outperforms various competing methods in V2T/T2V tasks with large margins on five widely used VTR benchmarks (MSR-VTT, MSVD, LSMDC, DiDemo, and ActivityNet). △ Less

Submitted 11 April, 2024; v1 submitted 18 January, 2023; originally announced January 2023.

arXiv:2212.02548 [pdf, other]

Robustness of Quantum Algorithms for Nonconvex Optimization

Authors: Weiyuan Gong, Chenyi Zhang, Tongyang Li

Abstract: Recent results suggest that quantum computers possess the potential to speed up nonconvex optimization problems. However, a crucial factor for the implementation of quantum optimization algorithms is their robustness against experimental and statistical noises. In this paper, we systematically study quantum algorithms for finding an $ε$-approximate second-order stationary point ($ε$-SOSP) of a… ▽ More Recent results suggest that quantum computers possess the potential to speed up nonconvex optimization problems. However, a crucial factor for the implementation of quantum optimization algorithms is their robustness against experimental and statistical noises. In this paper, we systematically study quantum algorithms for finding an $ε$-approximate second-order stationary point ($ε$-SOSP) of a $d$-dimensional nonconvex function, a fundamental problem in nonconvex optimization, with noisy zeroth- or first-order oracles as inputs. We first prove that, up to noise of $O(ε^{10}/d^5)$, accelerated perturbed gradient descent with quantum gradient estimation takes $O(\log d/ε^{1.75})$ quantum queries to find an $ε$-SOSP. We then prove that perturbed gradient descent is robust to the noise of $O(ε^6/d^4)$ and $O(ε/d^{0.5+ζ})$ for $ζ>0$ on the zeroth- and first-order oracles, respectively, which provides a quantum algorithm with poly-logarithmic query complexity. We then propose a stochastic gradient descent algorithm using quantum mean estimation on the Gaussian smoothing of noisy oracles, which is robust to $O(ε^{1.5}/d)$ and $O(ε/\sqrt{d})$ noise on the zeroth- and first-order oracles, respectively. The quantum algorithm takes $O(d^{2.5}/ε^{3.5})$ and $O(d^2/ε^3)$ queries to the two oracles, giving a polynomial speedup over the classical counterparts. Moreover, we characterize the domains where quantum algorithms can find an $ε$-SOSP with poly-logarithmic, polynomial, or exponential number of queries in $d$, or the problem is information-theoretically unsolvable even by an infinite number of queries. In addition, we prove an $Ω(ε^{-12/7})$ lower bound in $ε$ for any randomized classical and quantum algorithm to find an $ε$-SOSP using either noisy zeroth- or first-order oracles. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2212.02531 [pdf, other]

Enhancing Quantum Adversarial Robustness by Randomized Encodings

Authors: Weiyuan Gong, Dong Yuan, Weikang Li, Dong-Ling Deng

Abstract: The interplay between quantum physics and machine learning gives rise to the emergent frontier of quantum machine learning, where advanced quantum learning models may outperform their classical counterparts in solving certain challenging problems. However, quantum learning systems are vulnerable to adversarial attacks: adding tiny carefully-crafted perturbations on legitimate input samples can cau… ▽ More The interplay between quantum physics and machine learning gives rise to the emergent frontier of quantum machine learning, where advanced quantum learning models may outperform their classical counterparts in solving certain challenging problems. However, quantum learning systems are vulnerable to adversarial attacks: adding tiny carefully-crafted perturbations on legitimate input samples can cause misclassifications. To address this issue, we propose a general scheme to protect quantum learning systems from adversarial attacks by randomly encoding the legitimate data samples through unitary or quantum error correction encoders. In particular, we rigorously prove that both global and local random unitary encoders lead to exponentially vanishing gradients (i.e. barren plateaus) for any variational quantum circuits that aim to add adversarial perturbations, independent of the input data and the inner structures of adversarial circuits and quantum classifiers. In addition, we prove a rigorous bound on the vulnerability of quantum classifiers under local unitary adversarial attacks. We show that random black-box quantum error correction encoders can protect quantum classifiers against local adversarial noises and their robustness increases as we concatenate error correction codes. To quantify the robustness enhancement, we adapt quantum differential privacy as a measure of the prediction stability for quantum classifiers. Our results establish versatile defense strategies for quantum classifiers against adversarial perturbations, which provide valuable guidance to enhance the reliability and security for both near-term and future quantum learning technologies. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2210.14706 [pdf, other]

Rhino: Deep Causal Temporal Relationship Learning With History-dependent Noise

Authors: Wenbo Gong, Joel Jennings, Cheng Zhang, Nick Pawlowski

Abstract: Discovering causal relationships between different variables from time series data has been a long-standing challenge for many domains such as climate science, finance, and healthcare. Given the complexity of real-world relationships and the nature of observations in discrete time, causal discovery methods need to consider non-linear relations between variables, instantaneous effects and history-d… ▽ More Discovering causal relationships between different variables from time series data has been a long-standing challenge for many domains such as climate science, finance, and healthcare. Given the complexity of real-world relationships and the nature of observations in discrete time, causal discovery methods need to consider non-linear relations between variables, instantaneous effects and history-dependent noise (the change of noise distribution due to past actions). However, previous works do not offer a solution addressing all these problems together. In this paper, we propose a novel causal relationship learning framework for time-series data, called Rhino, which combines vector auto-regression, deep learning and variational inference to model non-linear relationships with instantaneous effects while allowing the noise distribution to be modulated by historical observations. Theoretically, we prove the structural identifiability of Rhino. Our empirical results from extensive synthetic experiments and two real-world benchmarks demonstrate better discovery performance compared to relevant baselines, with ablation studies revealing its robustness under model misspecification. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: 28 pages, 8 figures, 5 tables

arXiv:2210.04410 [pdf, other]

Accelerating the Delivery of Data Services over Uncertain Mobile Crowdsensing Networks

Authors: Minghui Liwang, Zhipeng Cheng, Wei Gong, Li Li, Yuhan Su, Zhenzhen Jiao, Seyyedali Hosseinalipour, Xianbin Wang, Huaiyu Dai

Abstract: The challenge of exchanging and processing of big data over mobile crowdsensing (MCS) networks calls for designing seamless data service provisioning mechanisms to enable utilization of resources of mobile devices/users for crowdsensing tasks. Although conventional onsite spot trading of resources based on real-time network conditions can facilitate data sharing, it often suffers from prohibitivel… ▽ More The challenge of exchanging and processing of big data over mobile crowdsensing (MCS) networks calls for designing seamless data service provisioning mechanisms to enable utilization of resources of mobile devices/users for crowdsensing tasks. Although conventional onsite spot trading of resources based on real-time network conditions can facilitate data sharing, it often suffers from prohibitively long service provisioning delays and unavoidable trading failures due to requiring timely analysis of dynamic network environment. These limitations motivate us to investigate an integrated forward and spot trading mechanism (iFAST), which entails a novel hybrid data trading protocol with time efficiency, over uncertain MCS ecosystems. In iFAST, the sellers (i.e., mobile devices who can contribute data) can provide long-term or temporary sensing services to the buyers (i.e., sensing tasks). Specifically, it enables signing long-term contracts in advance of future transactions through a forward trading mode, via analyzing historical statistics of the network/market, for which the notion of overbooking is introduced and promoted. iFAST further encourages the buyers with unsatisfying service quality to recruit temporary sellers through a spot trading mode, considering the current network/market conditions. We analyze the fundamental blocks of iFAST and provide a case study to demonstrate its performance. Inspirations for future research directions of next-generation sensing and communication are summarized. △ Less

Submitted 8 April, 2024; v1 submitted 9 October, 2022; originally announced October 2022.

arXiv:2209.03007 [pdf, ps, other]

Learning Distributions over Quantum Measurement Outcomes

Authors: Weiyuan Gong, Scott Aaronson

Abstract: Shadow tomography for quantum states provides a sample efficient approach for predicting the properties of quantum systems when the properties are restricted to expectation values of $2$-outcome POVMs. However, these shadow tomography procedures yield poor bounds if there are more than 2 outcomes per measurement. In this paper, we consider a general problem of learning properties from unknown quan… ▽ More Shadow tomography for quantum states provides a sample efficient approach for predicting the properties of quantum systems when the properties are restricted to expectation values of $2$-outcome POVMs. However, these shadow tomography procedures yield poor bounds if there are more than 2 outcomes per measurement. In this paper, we consider a general problem of learning properties from unknown quantum states: given an unknown $d$-dimensional quantum state $ρ$ and $M$ unknown quantum measurements $\mathcal{M}_1,...,\mathcal{M}_M$ with $K\geq 2$ outcomes, estimating the probability distribution for applying $\mathcal{M}_i$ on $ρ$ to within total variation distance $ε$. Compared to the special case when $K=2$, we need to learn unknown distributions instead of values. We develop an online shadow tomography procedure that solves this problem with high success probability requiring $\tilde{O}(K\log^2M\log d/ε^4)$ copies of $ρ$. We further prove an information-theoretic lower bound that at least $Ω(\min\{d^2,K+\log M\}/ε^2)$ copies of $ρ$ are required to solve this problem with high success probability. Our shadow tomography procedure requires sample complexity with only logarithmic dependence on $M$ and $d$ and is sample-optimal for the dependence on $K$. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: 25 pages

arXiv:2208.12610 [pdf, ps, other]

NeurIPS Competition Instructions and Guide: Causal Insights for Learning Paths in Education

Authors: Wenbo Gong, Digory Smith, Zichao Wang, Craig Barton, Simon Woodhead, Nick Pawlowski, Joel Jennings, Cheng Zhang

Abstract: In this competition, participants will address two fundamental causal challenges in machine learning in the context of education using time-series data. The first is to identify the causal relationships between different constructs, where a construct is defined as the smallest element of learning. The second challenge is to predict the impact of learning one construct on the ability to answer ques… ▽ More In this competition, participants will address two fundamental causal challenges in machine learning in the context of education using time-series data. The first is to identify the causal relationships between different constructs, where a construct is defined as the smallest element of learning. The second challenge is to predict the impact of learning one construct on the ability to answer questions on other constructs. Addressing these challenges will enable optimisation of students' knowledge acquisition, which can be deployed in a real edtech solution impacting millions of students. Participants will run these tasks in an idealised environment with synthetic data and a real-world scenario with evaluation data collected from a series of A/B tests. △ Less

Submitted 31 August, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: 19 pages, NeurIPS 2022 Competition Track

arXiv:2205.10034 [pdf, other]

SE-MoE: A Scalable and Efficient Mixture-of-Experts Distributed Training and Inference System

Authors: Liang Shen, Zhihua Wu, WeiBao Gong, Hongxiang Hao, Yangfan Bai, HuaChao Wu, Xinxuan Wu, Jiang Bian, Haoyi Xiong, Dianhai Yu, Yanjun Ma

Abstract: With the increasing diversity of ML infrastructures nowadays, distributed training over heterogeneous computing systems is desired to facilitate the production of big models. Mixture-of-Experts (MoE) models have been proposed to lower the cost of training subject to the overall size of models/data through gating and parallelism in a divide-and-conquer fashion. While DeepSpeed has made efforts in c… ▽ More With the increasing diversity of ML infrastructures nowadays, distributed training over heterogeneous computing systems is desired to facilitate the production of big models. Mixture-of-Experts (MoE) models have been proposed to lower the cost of training subject to the overall size of models/data through gating and parallelism in a divide-and-conquer fashion. While DeepSpeed has made efforts in carrying out large-scale MoE training over heterogeneous infrastructures, the efficiency of training and inference could be further improved from several system aspects, including load balancing, communication/computation efficiency, and memory footprint limits. In this work, we present SE-MoE that proposes Elastic MoE training with 2D prefetch and Fusion communication over Hierarchical storage, so as to enjoy efficient parallelisms in various types. For scalable inference in a single node, especially when the model size is larger than GPU memory, SE-MoE forms the CPU-GPU memory jointly into a ring of sections to load the model, and executes the computation tasks across the memory sections in a round-robin manner for efficient inference. We carried out extensive experiments to evaluate SE-MoE, where SE-MoE successfully trains a Unified Feature Optimization (UFO) model with a Sparsely-Gated Mixture-of-Experts model of 12B parameters in 8 days on 48 A100 GPU cards. The comparison against the state-of-the-art shows that SE-MoE outperformed DeepSpeed with 33% higher throughput (tokens per second) in training and 13% higher throughput in inference in general. Particularly, under unbalanced MoE Tasks, e.g., UFO, SE-MoE achieved 64% higher throughput with 18% lower memory footprints. The code of the framework will be released on: https://github.com/PaddlePaddle/Paddle. △ Less

Submitted 12 June, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

arXiv:2205.09470 [pdf, other]

Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters

Authors: Yang Xiang, Zhihua Wu, Weibao Gong, Siyu Ding, Xianjie Mo, Yuang Liu, Shuohuan Wang, Peng Liu, Yongshuai Hou, Long Li, Bin Wang, Shaohuai Shi, Yaqian Han, Yue Yu, Ge Li, Yu Sun, Yanjun Ma, Dianhai Yu

Abstract: The ever-growing model size and scale of compute have attracted increasing interests in training deep learning models over multiple nodes. However, when it comes to training on cloud clusters, especially across remote clusters, huge challenges are faced. In this work, we introduce a general framework, Nebula-I, for collaboratively training deep learning models over remote heterogeneous clusters, t… ▽ More The ever-growing model size and scale of compute have attracted increasing interests in training deep learning models over multiple nodes. However, when it comes to training on cloud clusters, especially across remote clusters, huge challenges are faced. In this work, we introduce a general framework, Nebula-I, for collaboratively training deep learning models over remote heterogeneous clusters, the connections between which are low-bandwidth wide area networks (WANs). We took natural language processing (NLP) as an example to show how Nebula-I works in different training phases that include: a) pre-training a multilingual language model using two remote clusters; and b) fine-tuning a machine translation model using knowledge distilled from pre-trained models, which run through the most popular paradigm of recent deep learning. To balance the accuracy and communication efficiency, in Nebula-I, parameter-efficient training strategies, hybrid parallel computing methods and adaptive communication acceleration techniques are jointly applied. Meanwhile, security strategies are employed to guarantee the safety, reliability and privacy in intra-cluster computation and inter-cluster communication. Nebula-I is implemented with the PaddlePaddle deep learning framework, which can support collaborative training over heterogeneous hardware, e.g. GPU and NPU. Experiments demonstrate that the proposed framework could substantially maximize the training efficiency while preserving satisfactory NLP performance. By using Nebula-I, users can run large-scale training tasks over cloud clusters with minimum developments, and the utility of existed large pre-trained models could be further promoted. We also introduced new state-of-the-art results on cross-lingual natural language inference tasks, which are generated based upon a novel learning framework and Nebula-I. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: 20 pages, 10 figures, technical report

arXiv:2204.02008 [pdf, other]

Learning Video Salient Object Detection Progressively from Unlabeled Videos

Authors: Binwei Xu, Haoran Liang, Wentian Ni, Weihua Gong, Ronghua Liang, Peng Chen

Abstract: Recent deep learning-based video salient object detection (VSOD) has achieved some breakthrough, but these methods rely on expensive annotated videos with pixel-wise annotations, weak annotations, or part of the pixel-wise annotations. In this paper, based on the similarities and the differences between VSOD and image salient object detection (SOD), we propose a novel VSOD method via a progressive… ▽ More Recent deep learning-based video salient object detection (VSOD) has achieved some breakthrough, but these methods rely on expensive annotated videos with pixel-wise annotations, weak annotations, or part of the pixel-wise annotations. In this paper, based on the similarities and the differences between VSOD and image salient object detection (SOD), we propose a novel VSOD method via a progressive framework that locates and segments salient objects in sequence without utilizing any video annotation. To use the knowledge learned in the SOD dataset for VSOD efficiently, we introduce dynamic saliency to compensate for the lack of motion information of SOD during the locating process but retain the same fine segmenting process. Specifically, an algorithm for generating spatiotemporal location labels, which consists of generating high-saliency location labels and tracking salient objects in adjacent frames, is proposed. Based on these location labels, a two-stream locating network that introduces an optical flow branch for video salient object locating is presented. Although our method does not require labeled video at all, the experimental results on five public benchmarks of DAVIS, FBMS, ViSal, VOS, and DAVSOD demonstrate that our proposed method is competitive with fully supervised methods and outperforms the state-of-the-art weakly and unsupervised methods. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2202.02195 [pdf, other]

Deep End-to-end Causal Inference

Authors: Tomas Geffner, Javier Antoran, Adam Foster, Wenbo Gong, Chao Ma, Emre Kiciman, Amit Sharma, Angus Lamb, Martin Kukla, Nick Pawlowski, Miltiadis Allamanis, Cheng Zhang

Abstract: Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment and policy making. However, research on causal discovery has evolved separately from inference methods, preventing straight-forward combination of methods from both fields. In this work, we develop Deep End-to-end Causal Inference (DECI), a single flow-based non-linear additi… ▽ More Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment and policy making. However, research on causal discovery has evolved separately from inference methods, preventing straight-forward combination of methods from both fields. In this work, we develop Deep End-to-end Causal Inference (DECI), a single flow-based non-linear additive noise model that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect (CATE) estimation. We provide a theoretical guarantee that DECI can recover the ground truth causal graph under standard causal discovery assumptions. Motivated by application impact, we extend this model to heterogeneous, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Our results show the competitive performance of DECI when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and causal machine learning benchmarks across data-types and levels of missingness. △ Less

Submitted 20 June, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

arXiv:2112.12731 [pdf, other]

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Authors: Shuohuan Wang, Yu Sun, Yang Xiang, Zhihua Wu, Siyu Ding, Weibao Gong, Shikun Feng, Junyuan Shang, Yanbin Zhao, Chao Pang, Jiaxiang Liu, Xuyi Chen, Yuxiang Lu, Weixin Liu, Xi Wang, Yangfan Bai, Qiuliang Chen, Li Zhao, Shiyong Li, Peng Sun, Dianhai Yu, Yanjun Ma, Hao Tian, Hua Wu, Tian Wu , et al. (4 additional authors not shown)

Abstract: Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outp… ▽ More Pre-trained language models have achieved state-of-the-art results in various Natural Language Processing (NLP) tasks. GPT-3 has shown that scaling up pre-trained language models can further exploit their enormous potential. A unified framework named ERNIE 3.0 was recently proposed for pre-training large-scale knowledge enhanced models and trained a model with 10 billion parameters. ERNIE 3.0 outperformed the state-of-the-art models on various NLP tasks. In order to explore the performance of scaling up ERNIE 3.0, we train a hundred-billion-parameter model called ERNIE 3.0 Titan with up to 260 billion parameters on the PaddlePaddle platform. Furthermore, we design a self-supervised adversarial loss and a controllable language modeling loss to make ERNIE 3.0 Titan generate credible and controllable texts. To reduce the computation overhead and carbon emission, we propose an online distillation framework for ERNIE 3.0 Titan, where the teacher model will teach students and train itself simultaneously. ERNIE 3.0 Titan is the largest Chinese dense pre-trained model so far. Empirical results show that the ERNIE 3.0 Titan outperforms the state-of-the-art models on 68 NLP datasets. △ Less

Submitted 23 December, 2021; originally announced December 2021.

Comments: arXiv admin note: text overlap with arXiv:2107.02137

arXiv:2112.02752 [pdf, other]

End-to-end Adaptive Distributed Training on PaddlePaddle

Authors: Yulong Ao, Zhihua Wu, Dianhai Yu, Weibao Gong, Zhiqing Kui, Minxu Zhang, Zilingfeng Ye, Liang Shen, Yanjun Ma, Tian Wu, Haifeng Wang, Wei Zeng, Chao Yang

Abstract: Distributed training has become a pervasive and effective approach for training a large neural network (NN) model with processing massive data. However, it is very challenging to satisfy requirements from various NN models, diverse computing resources, and their dynamic changes during a training job. In this study, we design our distributed training framework in a systematic end-to-end view to pro… ▽ More Distributed training has become a pervasive and effective approach for training a large neural network (NN) model with processing massive data. However, it is very challenging to satisfy requirements from various NN models, diverse computing resources, and their dynamic changes during a training job. In this study, we design our distributed training framework in a systematic end-to-end view to provide the built-in adaptive ability for different scenarios, especially for industrial applications and production environments, by fully considering resource allocation, model partition, task placement, and distributed execution. Based on the unified distributed graph and the unified cluster object, our adaptive framework is equipped with a global cost model and a global planner, which can enable arbitrary parallelism, resource-aware placement, multi-mode execution, fault-tolerant, and elastic distributed training. The experiments demonstrate that our framework can satisfy various requirements from the diversity of applications and the heterogeneity of resources with highly competitive performance. The ERNIE language model with 260 billion parameters is efficiently trained on thousands of AI processors with 91.7% weak scalability. The throughput of the model from the recommender system by employing the heterogeneous pipeline asynchronous execution can be increased up to 2.1 times and 3.3 times that of the GPU-only and CPU-only training respectively. Moreover, the fault-tolerant and elastic distributed training have been successfully applied to the online industrial applications, which give a reduction of 34.49% in the number of failed long-term training jobs and an increase of 33.91% for the global scheduling efficiency in the production environment. △ Less

Submitted 5 December, 2021; originally announced December 2021.

Comments: 16 pages, 10 figures, 4 tables

arXiv:2111.02426 [pdf, other]

doi 10.1103/PhysRevResearch.5.013060

Weighted Quantum Channel Compiling through Proximal Policy Optimization

Authors: Weiyuan Gong, Si Jiang, Dong-Ling Deng

Abstract: We propose a general and systematic strategy to compile arbitrary quantum channels without using ancillary qubits, based on proximal policy optimization -- a powerful deep reinforcement learning algorithm. We rigorously prove that, in sharp contrast to the case of compiling unitary gates, it is impossible to compile an arbitrary channel to arbitrary precision with any given finite elementary chann… ▽ More We propose a general and systematic strategy to compile arbitrary quantum channels without using ancillary qubits, based on proximal policy optimization -- a powerful deep reinforcement learning algorithm. We rigorously prove that, in sharp contrast to the case of compiling unitary gates, it is impossible to compile an arbitrary channel to arbitrary precision with any given finite elementary channel set, regardless of the length of the decomposition sequence. However, for a fixed accuracy $ε$ one can construct a universal set with constant number of $ε$-dependent elementary channels, such that an arbitrary quantum channel can be decomposed into a sequence of these elementary channels followed by a unitary gate, with the sequence length bounded by $O(\frac{1}ε\log\frac{1}ε)$. Through a concrete example concerning topological compiling of Majorana fermions, we show that our proposed algorithm can conveniently and effectively reduce the use of expensive elementary gates through adding the weighted cost into the reward function of the proximal policy optimization. △ Less

Submitted 3 November, 2021; originally announced November 2021.

Comments: 14 pages, 4 figures

Journal ref: Phys. Rev. Research 5, 013060 (2023)

arXiv:2110.08223 [pdf, other]

Simultaneous Missing Value Imputation and Structure Learning with Groups

Authors: Pablo Morales-Alvarez, Wenbo Gong, Angus Lamb, Simon Woodhead, Simon Peyton Jones, Nick Pawlowski, Miltiadis Allamanis, Cheng Zhang

Abstract: Learning structures between groups of variables from data with missing values is an important task in the real world, yet difficult to solve. One typical scenario is discovering the structure among topics in the education domain to identify learning pathways. Here, the observations are student performances for questions under each topic which contain missing values. However, most existing methods… ▽ More Learning structures between groups of variables from data with missing values is an important task in the real world, yet difficult to solve. One typical scenario is discovering the structure among topics in the education domain to identify learning pathways. Here, the observations are student performances for questions under each topic which contain missing values. However, most existing methods focus on learning structures between a few individual variables from the complete data. In this work, we propose VISL, a novel scalable structure learning approach that can simultaneously infer structures between groups of variables under missing data and perform missing value imputations with deep learning. Particularly, we propose a generative model with a structured latent space and a graph neural network-based architecture, scaling to a large number of variables. Empirically, we conduct extensive experiments on synthetic, semi-synthetic, and real-world education data sets. We show improved performances on both imputation and structure learning accuracy compared to popular and recent approaches. △ Less

Submitted 24 February, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

arXiv:2107.10538 [pdf, other]

Diversified and Compatible Web APIs Recommendation in IoT

Authors: Wenwen Gong, Huiping Wu, Xiaokang Wang, Xuyun Zhang, Yawei Wang, Yifei Chen, Mohammad R. Khosravi

Abstract: With the ever-increasing popularity of Service-oriented Architecture (SoA) and Internet of Things (IoT), a considerable number of enterprises or organizations are attempting to encapsulate their provided complex business services into various lightweight and accessible web APIs (application programming interfaces) with diverse functions. In this situation, a software developer can select a group o… ▽ More With the ever-increasing popularity of Service-oriented Architecture (SoA) and Internet of Things (IoT), a considerable number of enterprises or organizations are attempting to encapsulate their provided complex business services into various lightweight and accessible web APIs (application programming interfaces) with diverse functions. In this situation, a software developer can select a group of preferred web APIs from a massive number of candidates to create a complex mashup economically and quickly based on the keywords typed by the developer. However, traditional keyword-based web API search approaches often suffer from the following difficulties and challenges. First, they often focus more on the functional matching between the candidate web APIs and the mashup to be developed while neglecting the compatibility among different APIs, which probably returns a group of incompatible web APIs and further leads to a mashup development failure. Second, existing approaches often return a web API composition solution to the mashup developer for reference, which narrows the developer's API selection scope considerably and may reduce developer satisfaction heavily. In view of the above challenges and successful application of game theory in the IoT, based on the idea of game theory, we propose a compatible and diverse web APIs recommendation approach for mashup creations, named MCCOMP+DIV, to return multiple sets of diverse and compatible web APIs with higher success rate. Finally, we validate the effectiveness and efficiency of MCCOMP+DIV through a set of experiments based on a real-world web API dataset, i.e., the PW dataset crawled from ProgrammableWeb.com. △ Less

Submitted 11 August, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

Comments: 15 pages, 11 figures

arXiv:2107.10072 [pdf, other]

Interpreting diffusion score matching using normalizing flow

Authors: Wenbo Gong, Yingzhen Li

Abstract: Scoring matching (SM), and its related counterpart, Stein discrepancy (SD) have achieved great success in model training and evaluations. However, recent research shows their limitations when dealing with certain types of distributions. One possible fix is incorporating the original score matching (or Stein discrepancy) with a diffusion matrix, which is called diffusion score matching (DSM) (or di… ▽ More Scoring matching (SM), and its related counterpart, Stein discrepancy (SD) have achieved great success in model training and evaluations. However, recent research shows their limitations when dealing with certain types of distributions. One possible fix is incorporating the original score matching (or Stein discrepancy) with a diffusion matrix, which is called diffusion score matching (DSM) (or diffusion Stein discrepancy (DSD)). However, the lack of interpretation of the diffusion limits its usage within simple distributions and manually chosen matrix. In this work, we plan to fill this gap by interpreting the diffusion matrix using normalizing flows. Specifically, we theoretically prove that DSM (or DSD) is equivalent to the original score matching (or Stein discrepancy) evaluated in the transformed space defined by the normalizing flow, where the diffusion matrix is the inverse of the flow's Jacobian matrix. In addition, we also build its connection to Riemannian manifolds and further extend it to continuous flows, where the change of DSM is characterized by an ODE. △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: 8 pages, International Conference on Machine Learning (ICML) INNF+ 2021 Workshop Spotlight

Showing 1–50 of 63 results for author: Gong, W