-
hyperbolic fibered slice knots with right-veering monodromy
Authors:
Dongtai He
Abstract:
We construct a hyperbolic fibered slice knot with right-veering monodromy, giving a negative answer to the question posed by Hubbard-Kawamuro-Kose-Martin-Plamenevskaya-Raoux-Truong-Turner.
We construct a hyperbolic fibered slice knot with right-veering monodromy, giving a negative answer to the question posed by Hubbard-Kawamuro-Kose-Martin-Plamenevskaya-Raoux-Truong-Turner.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
RAW-GNN: RAndom Walk Aggregation based Graph Neural Network
Authors:
Di Jin,
Rui Wang,
Meng Ge,
Dongxiao He,
Xiang Li,
Wei Lin,
Weixiong Zhang
Abstract:
Graph-Convolution-based methods have been successfully applied to representation learning on homophily graphs where nodes with the same label or similar attributes tend to connect with one another. Due to the homophily assumption of Graph Convolutional Networks (GCNs) that these methods use, they are not suitable for heterophily graphs where nodes with different labels or dissimilar attributes ten…
▽ More
Graph-Convolution-based methods have been successfully applied to representation learning on homophily graphs where nodes with the same label or similar attributes tend to connect with one another. Due to the homophily assumption of Graph Convolutional Networks (GCNs) that these methods use, they are not suitable for heterophily graphs where nodes with different labels or dissimilar attributes tend to be adjacent. Several methods have attempted to address this heterophily problem, but they do not change the fundamental aggregation mechanism of GCNs because they rely on summation operators to aggregate information from neighboring nodes, which is implicitly subject to the homophily assumption. Here, we introduce a novel aggregation mechanism and develop a RAndom Walk Aggregation-based Graph Neural Network (called RAW-GNN) method. The proposed approach integrates the random walk strategy with graph neural networks. The new method utilizes breadth-first random walk search to capture homophily information and depth-first search to collect heterophily information. It replaces the conventional neighborhoods with path-based neighborhoods and introduces a new path-based aggregator based on Recurrent Neural Networks. These designs make RAW-GNN suitable for both homophily and heterophily graphs. Extensive experimental results showed that the new method achieved state-of-the-art performance on a variety of homophily and heterophily graphs.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
Pair production of neutral Higgs particles in the B-LSSM
Authors:
Dan He,
Tai-Fu Feng,
Jin-Lei Yang,
Guo-Zhu Ning,
Hai-Bin Zhang,
Xing-Xing Dong
Abstract:
Higgs pair production provides a unique handle for measuring the strength of Higgs self interaction and constraining the shape of the Higgs potential. Including radiative corrections to the trilinear couplings of $CP$-even Higgs, we investigate the cross section of the lightest neutral Higgs pair production in gluon fusion at the Large Hadron Collider in the supersymmetric extensions of the standa…
▽ More
Higgs pair production provides a unique handle for measuring the strength of Higgs self interaction and constraining the shape of the Higgs potential. Including radiative corrections to the trilinear couplings of $CP$-even Higgs, we investigate the cross section of the lightest neutral Higgs pair production in gluon fusion at the Large Hadron Collider in the supersymmetric extensions of the standard model. Numerical results indicate that the correction to the cross section is about 11\% in the B-LSSM, while is only about 4\% in the MSSM. Considering the constraints of the experimental data of the lightest Higgs, we find that the gauge couplings of $U(1)_{B-L}$ and the ratio of the nonzero vacuum expectation values of two singlets also affect strongly the theoretical evaluations on the production cross section in the B-LSSM.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Adversarial Noises Are Linearly Separable for (Nearly) Random Neural Networks
Authors:
Huishuai Zhang,
Da Yu,
Yiping Lu,
Di He
Abstract:
Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-…
▽ More
Adversarial examples, which are usually generated for specific inputs with a specific model, are ubiquitous for neural networks. In this paper we unveil a surprising property of adversarial noises when they are put together, i.e., adversarial noises crafted by one-step gradient methods are linearly separable if equipped with the corresponding labels. We theoretically prove this property for a two-layer network with randomly initialized entries and the neural tangent kernel setup where the parameters are not far from initialization. The proof idea is to show the label information can be efficiently backpropagated to the input while keeping the linear separability. Our theory and experimental evidence further show that the linear classifier trained with the adversarial noises of the training data can well classify the adversarial noises of the test data, indicating that adversarial noises actually inject a distributional perturbation to the original data distribution. Furthermore, we empirically demonstrate that the adversarial noises may become less linearly separable when the above conditions are compromised while they are still much easier to classify than original features.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Is $L^2$ Physics-Informed Loss Always Suitable for Training Physics-Informed Neural Network?
Authors:
Chuwei Wang,
Shanda Li,
Di He,
Liwei Wang
Abstract:
The Physics-Informed Neural Network (PINN) approach is a new and promising way to solve partial differential equations using deep learning. The $L^2$ Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks. In this paper, we challenge this common practice by investigating the relationship between the loss function and the approximation quality of the learned sol…
▽ More
The Physics-Informed Neural Network (PINN) approach is a new and promising way to solve partial differential equations using deep learning. The $L^2$ Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks. In this paper, we challenge this common practice by investigating the relationship between the loss function and the approximation quality of the learned solution. In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero. With this concept, we study an important class of high-dimensional non-linear PDEs in optimal control, the Hamilton-Jacobi-Bellman(HJB) Equation, and prove that for general $L^p$ Physics-Informed Loss, a wide class of HJB equation is stable only if $p$ is sufficiently large. Therefore, the commonly used $L^2$ loss is not suitable for training PINN on those equations, while $L^{\infty}$ loss is a better choice. Based on the theoretical insight, we develop a novel PINN training algorithm to minimize the $L^{\infty}$ loss for HJB equations which is in a similar spirit to adversarial training. The effectiveness of the proposed algorithm is empirically demonstrated through experiments. Our code is released at https://github.com/LithiumDA/L_inf-PINN.
△ Less
Submitted 30 December, 2022; v1 submitted 4 June, 2022;
originally announced June 2022.
-
PO-ELIC: Perception-Oriented Efficient Learned Image Coding
Authors:
Dailan He,
Ziming Yang,
Hongjiu Yu,
Tongda Xu,
Jixiang Luo,
Yuan Chen,
Chenjian Gao,
Xinjie Shi,
Hongwei Qin,
Yan Wang
Abstract:
In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, w…
▽ More
In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, we propose PO-ELIC, i.e., Perception-Oriented Efficient Learned Image Coding. To be specific, we adapt ELIC, one of the state-of-the-art LIC models, with adversarial training techniques. We apply a mixture of losses including hinge-form adversarial loss, Charbonnier loss, and style loss, to finetune the model towards better perceptual quality. Experimental results demonstrate that our method achieves comparable perceptual quality with HiFiC with much lower bitrate.
△ Less
Submitted 28 May, 2022;
originally announced May 2022.
-
Your Transformer May Not be as Powerful as You Expect
Authors:
Shengjie Luo,
Shanda Li,
Shuxin Zheng,
Tie-Yan Liu,
Liwei Wang,
Di He
Abstract:
Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximati…
▽ More
Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. One may naturally assume the answer is in the affirmative -- RPE-based Transformers are universal function approximators. However, we present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is. One key reason lies in that most RPEs are placed in the softmax attention that always generates a right stochastic matrix. This restricts the network from capturing positional information in the RPEs and limits its capacity. To overcome the problem and make the model more powerful, we first present sufficient conditions for RPE-based Transformers to achieve universal function approximation. With the theoretical guidance, we develop a novel attention module, called Universal RPE-based (URPE) Attention, which satisfies the conditions. Therefore, the corresponding URPE-based Transformers become universal function approximators. Extensive experiments covering typical architectures and tasks demonstrate that our model is parameter-efficient and can achieve superior performance to strong baselines in a wide range of applications. The code will be made publicly available at https://github.com/lsj2408/URPE.
△ Less
Submitted 28 October, 2022; v1 submitted 26 May, 2022;
originally announced May 2022.
-
TrustGNN: Graph Neural Network based Trust Evaluation via Learnable Propagative and Composable Nature
Authors:
Cuiying Huo,
Di Jin,
Chundong Liang,
Dongxiao He,
Tie Qiu,
Lingfei Wu
Abstract:
Trust evaluation is critical for many applications such as cyber security, social communication and recommender systems. Users and trust relationships among them can be seen as a graph. Graph neural networks (GNNs) show their powerful ability for analyzing graph-structural data. Very recently, existing work attempted to introduce the attributes and asymmetry of edges into GNNs for trust evaluation…
▽ More
Trust evaluation is critical for many applications such as cyber security, social communication and recommender systems. Users and trust relationships among them can be seen as a graph. Graph neural networks (GNNs) show their powerful ability for analyzing graph-structural data. Very recently, existing work attempted to introduce the attributes and asymmetry of edges into GNNs for trust evaluation, while failed to capture some essential properties (e.g., the propagative and composable nature) of trust graphs. In this work, we propose a new GNN based trust evaluation method named TrustGNN, which integrates smartly the propagative and composable nature of trust graphs into a GNN framework for better trust evaluation. Specifically, TrustGNN designs specific propagative patterns for different propagative processes of trust, and distinguishes the contribution of different propagative processes to create new trust. Thus, TrustGNN can learn comprehensive node embeddings and predict trust relationships based on these embeddings. Experiments on some widely-used real-world datasets indicate that TrustGNN significantly outperforms the state-of-the-art methods. We further perform analytical experiments to demonstrate the effectiveness of the key designs in TrustGNN.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer
Authors:
Shanzhuo Zhang,
Zhiyuan Yan,
Yueyang Huang,
Lihang Liu,
Donglong He,
Wei Wang,
Xiaomin Fang,
Xiaonan Zhang,
Fan Wang,
Hua Wu,
Haifeng Wang
Abstract:
Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET sys…
▽ More
Accurate ADMET (an abbreviation for "absorption, distribution, metabolism, excretion, and toxicity") predictions can efficiently screen out undesirable drug candidates in the early stage of drug discovery. In recent years, multiple comprehensive ADMET systems that adopt advanced machine learning models have been developed, providing services to estimate multiple endpoints. However, those ADMET systems usually suffer from weak extrapolation ability. First, due to the lack of labelled data for each endpoint, typical machine learning models perform frail for the molecules with unobserved scaffolds. Second, most systems only provide fixed built-in endpoints and cannot be customised to satisfy various research requirements. To this end, we develop a robust and endpoint extensible ADMET system, HelixADMET (H-ADMET). H-ADMET incorporates the concept of self-supervised learning to produce a robust pre-trained model. The model is then fine-tuned with a multi-task and multi-stage framework to transfer knowledge between ADMET endpoints, auxiliary tasks, and self-supervised tasks. Our results demonstrate that H-ADMET achieves an overall improvement of 4%, compared with existing ADMET systems on comparable endpoints. Additionally, the pre-trained model provided by H-ADMET can be fine-tuned to generate new and customised ADMET endpoints, meeting various demands of drug research and development requirements.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Heterogeneous Graph Neural Networks using Self-supervised Reciprocally Contrastive Learning
Authors:
Cuiying Huo,
Dongxiao He,
Yawen Li,
Di Jin,
Jianwu Dang,
Weixiong Zhang,
Witold Pedrycz,
Lingfei Wu
Abstract:
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs. Most existing HGNN-based approaches are supervised or semi-supervised learning methods requiring graphs to be annotated, which is costly and time-consuming. Self-supervised contrastive learning has been proposed to address the problem of requiring annotated data by mining in…
▽ More
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs. Most existing HGNN-based approaches are supervised or semi-supervised learning methods requiring graphs to be annotated, which is costly and time-consuming. Self-supervised contrastive learning has been proposed to address the problem of requiring annotated data by mining intrinsic information hidden within the given data. However, the existing contrastive learning methods are inadequate for heterogeneous graphs because they construct contrastive views only based on data perturbation or pre-defined structural properties (e.g., meta-path) in graph data while ignore the noises that may exist in both node attributes and graph topologies. We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies and integrates and enhances them by reciprocally contrastive mechanism to better model heterogeneous graphs. In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately. We further use both attribute similarity and topological correlation to construct high-quality contrastive samples. Extensive experiments on three large real-world heterogeneous graphs demonstrate the superiority and robustness of HGCL over state-of-the-art methods.
△ Less
Submitted 16 November, 2023; v1 submitted 30 April, 2022;
originally announced May 2022.
-
METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals
Authors:
Payal Bajaj,
Chenyan Xiong,
Guolin Ke,
Xiaodong Liu,
Di He,
Saurabh Tiwary,
Tie-Yan Liu,
Paul Bennett,
Xia Song,
Jianfeng Gao
Abstract:
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated d…
▽ More
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO), which incorporates some of the best modeling techniques developed recently to speed up, stabilize, and enhance pretrained language models without compromising model effectiveness. The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks. More importantly, METRO-LM are efficient in that they often outperform previous large models with significantly smaller model sizes and lower pretraining cost.
△ Less
Submitted 16 April, 2022; v1 submitted 13 April, 2022;
originally announced April 2022.
-
The strong coupling $g_{X J/ψφ}$ of $X(4700) \to J/ψφ$ in the light-cone sum rules
Authors:
Yiling Xie,
Dazhuang He,
Xuan Luo,
Hao Sun
Abstract:
We assign the scalar tetraquark and the D-wave tetraquark state for $X(4700)$ and calculate the width of the decay $X(4700)$ $\to J/ψφ$ within the framework of light-cone sum rules. The strong coupling $g_{X J/ψφ}$ is obtained by considering the technique of soft-meson approximation. We also investigate the mass and the decay constant of $X(4700)$ in the framework of SVZ sum rules. Our prediction…
▽ More
We assign the scalar tetraquark and the D-wave tetraquark state for $X(4700)$ and calculate the width of the decay $X(4700)$ $\to J/ψφ$ within the framework of light-cone sum rules. The strong coupling $g_{X J/ψφ}$ is obtained by considering the technique of soft-meson approximation. We also investigate the mass and the decay constant of $X(4700)$ in the framework of SVZ sum rules. Our prediction for the mass is in agreement with the experimental measurement, and that for the decay width of $X(4700)$ $\to J/ψφ$ support the possibility that $X(4700)$ could be a scalar tetraquark state if $X(4700)$ $\to J/ψφ$ is the predominant decay channel, or a D-wave tetraquark state if $X(4700)$ $\to J/ψφ$ is not the predominant one and there exist other decays.
△ Less
Submitted 28 September, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain
Authors:
Lina Guo,
Xinjie Shi,
Dailan He,
Yuanyuan Wang,
Rui Ma,
Hongwei Qin,
Yan Wang
Abstract:
JPEG is a popular image compression method widely used by individuals, data center, cloud storage and network filesystems. However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning b…
▽ More
JPEG is a popular image compression method widely used by individuals, data center, cloud storage and network filesystems. However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning based JPEG recompression method that operates on DCT domain and propose a Multi-Level Cross-Channel Entropy Model to compress the most informative Y component. Experiments show that our method achieves state-of-the-art performance compared with traditional JPEG recompression methods including Lepton, JPEG XL and CMIX. To the best of our knowledge, this is the first learned compression method that losslessly transcodes JPEG images to more storage-saving bitstreams.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding
Authors:
Dailan He,
Ziming Yang,
Weikun Peng,
Rui Ma,
Hongwei Qin,
Yan Wang
Abstract:
Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we firs…
▽ More
Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we first propose uneven channel-conditional adaptive coding, motivated by the observation of energy compaction in learned image compression. Combining the proposed uneven grouping model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding, which makes the coming application of learning-based image compression more promising.
△ Less
Submitted 29 March, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
An Empirical Study of Graphormer on Large-Scale Molecular Modeling Datasets
Authors:
Yu Shi,
Shuxin Zheng,
Guolin Ke,
Yifei Shen,
Jiacheng You,
Jiyan He,
Shengjie Luo,
Chang Liu,
Di He,
Tie-Yan Liu
Abstract:
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. The "Graphormer-V2" could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on downstream tasks. In addition, we show that with a global recepti…
▽ More
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. The "Graphormer-V2" could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on downstream tasks. In addition, we show that with a global receptive field and an adaptive aggregation strategy, Graphormer is more powerful than classic message-passing-based GNNs. Graphormer-V2 achieves much less MAE than the vanilla Graphormer on the PCQM4M quantum chemistry dataset used in KDD Cup 2021, where the latter one won the first place in this competition. In the meanwhile, Graphormer-V2 greatly outperforms the competitors in the recent Open Catalyst Challenge, which is a competition track on NeurIPS 2021 workshop, and aims to model the catalyst-adsorbate reaction system with advanced AI models. All models could be found at \url{https://github.com/Microsoft/Graphormer}.
△ Less
Submitted 14 March, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.
-
Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets
Authors:
Yu Shi,
Shuxin Zheng,
Guolin Ke,
Yifei Shen,
Jiacheng You,
Jiyan He,
Shengjie Luo,
Chang Liu,
Di He,
Tie-Yan Liu
Abstract:
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. With these simple modifications, Graphormer could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on 2D and 3D molecular graph modeling tasks.…
▽ More
This technical note describes the recent updates of Graphormer, including architecture design modifications, and the adaption to 3D molecular dynamics simulation. With these simple modifications, Graphormer could attain better results on large-scale molecular modeling datasets than the vanilla one, and the performance gain could be consistently obtained on 2D and 3D molecular graph modeling tasks. In addition, we show that with a global receptive field and an adaptive aggregation strategy, Graphormer is more powerful than classic message-passing-based GNNs. Empirically, Graphormer could achieve much less MAE than the originally reported results on the PCQM4M quantum chemistry dataset used in KDD Cup 2021. In the meanwhile, it greatly outperforms the competitors in the recent Open Catalyst Challenge, which is a competition track on NeurIPS 2021 workshop, and aims to model the catalyst-adsorbate reaction system with advanced AI models. All codes could be found at https://github.com/Microsoft/Graphormer.
△ Less
Submitted 7 January, 2023; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Adversarial Dual-Student with Differentiable Spatial Warping for Semi-Supervised Semantic Segmentation
Authors:
Cong Cao,
Tianwei Lin,
Dongliang He,
Fu Li,
Huanjing Yue,
Jingyu Yang,
Errui Ding
Abstract:
A common challenge posed to robust semantic segmentation is the expensive data annotation cost. Existing semi-supervised solutions show great potential for solving this problem. Their key idea is constructing consistency regularization with unsupervised data augmentation from unlabeled data for model training. The perturbations for unlabeled data enable the consistency training loss, which benefit…
▽ More
A common challenge posed to robust semantic segmentation is the expensive data annotation cost. Existing semi-supervised solutions show great potential for solving this problem. Their key idea is constructing consistency regularization with unsupervised data augmentation from unlabeled data for model training. The perturbations for unlabeled data enable the consistency training loss, which benefits semi-supervised semantic segmentation. However, these perturbations destroy image context and introduce unnatural boundaries, which is harmful for semantic segmentation. Besides, the widely adopted semi-supervised learning framework, i.e. mean-teacher, suffers performance limitation since the student model finally converges to the teacher model. In this paper, first of all, we propose a context friendly differentiable geometric warping to conduct unsupervised data augmentation; secondly, a novel adversarial dual-student framework is proposed to improve the Mean-Teacher from the following two aspects: (1) dual student models are learned independently except for a stabilization constraint to encourage exploiting model diversities; (2) adversarial training scheme is applied to both students and the discriminators are resorted to distinguish reliable pseudo-label of unlabeled data for self-training. Effectiveness is validated via extensive experiments on PASCAL VOC2012 and Cityscapes. Our solution significantly improves the performance and state-of-the-art results are achieved on both datasets. Remarkably, compared with fully supervision, our solution achieves comparable mIoU of 73.4% using only 12.5% annotated data on PASCAL VOC2012. Our codes and models are available at https://github.com/cao-cong/ADS-SemiSeg.
△ Less
Submitted 27 September, 2022; v1 submitted 5 March, 2022;
originally announced March 2022.
-
Query Processing on Tensor Computation Runtimes
Authors:
Dong He,
Supun Nakandala,
Dalitso Banda,
Rathijit Sen,
Karla Saur,
Kwanghyun Park,
Carlo Curino,
Jesús Camacho-Rodríguez,
Konstantinos Karanasos,
Matteo Interlandi
Abstract:
The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientis…
▽ More
The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space.
We design, build, and evaluate Tensor Query Processor (TQP): TQP transforms SQL queries into tensor programs and executes them on TCRs. TQP is able to run the full TPC-H benchmark by implementing novel algorithms for relational operators on the tensor routines. At the same time, TQP can support various hardware while only requiring a fraction of the usual development effort. Experiments show that TQP can improve query execution time by up to 10$\times$ over specialized CPU- and GPU-only systems. Finally, TQP can accelerate queries mixing ML predictions and SQL end-to-end, and deliver up to 9$\times$ speedup over CPU baselines.
△ Less
Submitted 9 February, 2023; v1 submitted 3 March, 2022;
originally announced March 2022.
-
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
Authors:
Zhihong Pan,
Baopu Li,
Dongliang He,
Mingde Yao,
Wenhao Wu,
Tianwei Lin,
Xin Li,
Errui Ding
Abstract:
Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel. To improve real world applicability of such models, there are growing interests to develop models optimized for arbitrary upscaling factors. Our proposed method is the first to treat arbitrar…
▽ More
Deep learning based single image super-resolution models have been widely studied and superb results are achieved in upscaling low-resolution images with fixed scale factor and downscaling degradation kernel. To improve real world applicability of such models, there are growing interests to develop models optimized for arbitrary upscaling factors. Our proposed method is the first to treat arbitrary rescaling, both upscaling and downscaling, as one unified process. Using joint optimization of both directions, the proposed model is able to learn upscaling and downscaling simultaneously and achieve bidirectional arbitrary image rescaling. It improves the performance of current arbitrary upscaling models by a large margin while at the same time learns to maintain visual perception quality in downscaled images. The proposed model is further shown to be robust in cycle idempotence test, free of severe degradations in reconstruction accuracy when the downscaling-to-upscaling cycle is applied repetitively. This robustness is beneficial for image rescaling in the wild when this cycle could be applied to one image for multiple times. It also performs well on tests with arbitrary large scales and asymmetric scales, even when the model is not trained with such tasks. Extensive experiments are conducted to demonstrate the superior performance of our model.
△ Less
Submitted 7 March, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Resonance Fluorescence from a two-level artificial atom strongly coupled to a single-mode cavity
Authors:
Z. H. Peng,
D. He,
Y. Zhou,
J. H. Ding,
J. Lu,
L. Zhou,
J. Q. Liao,
L. M. Kuang,
Yu-xi Liu,
Oleg V. Astafiev,
J. S. Tsai
Abstract:
We experimentally demonstrate the resonance fluorescence of a two-level artificial atom strongly coupled to a single-mode cavity field. The effect was theoretically predicted thirty years ago by Savage [Phys. Rev. Lett. 63, 1376 (1989)]. The system consists of a superconducting qubit circuit and a one-dimensional transmission line resonator. In addition, a one-dimensional transmission line strongl…
▽ More
We experimentally demonstrate the resonance fluorescence of a two-level artificial atom strongly coupled to a single-mode cavity field. The effect was theoretically predicted thirty years ago by Savage [Phys. Rev. Lett. 63, 1376 (1989)]. The system consists of a superconducting qubit circuit and a one-dimensional transmission line resonator. In addition, a one-dimensional transmission line strongly coupled to the atom serves as an open space. The effect takes place, when a microwave field is applied to the cavity, which in turn is resonantly coupled to the atom. The fluorescence spectrum is measured via the emission into the transmission line. We find that the central peak is determined by the atom spontaneous emission to the open space and the widths of side peaks are largely determined by the coherent interaction between the atom and the cavity, that is, the fluorescence spectrum here is very different from that of the Mollow triplet. We also derive analytical form for the spectrum. Our experimental results agree well with theoretical calculations.
△ Less
Submitted 12 April, 2023; v1 submitted 24 February, 2022;
originally announced February 2022.
-
VADOI:Voice-Activity-Detection Overlapping Inference For End-to-end Long-form Speech Recognition
Authors:
Jinhan Wang,
Xiaosu Tong,
Jinxi Guo,
Di He,
Roland Maas
Abstract:
While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when overlapping percentage decreases. Setting aside computational…
▽ More
While end-to-end models have shown great success on the Automatic Speech Recognition task, performance degrades severely when target sentences are long-form. The previous proposed methods, (partial) overlapping inference are shown to be effective on long-form decoding. For both methods, word error rate (WER) decreases monotonically when overlapping percentage decreases. Setting aside computational cost, the setup with 50% overlapping during inference can achieve the best performance. However, a lower overlapping percentage has an advantage of fast inference speed. In this paper, we first conduct comprehensive experiments comparing overlapping inference and partial overlapping inference with various configurations. We then propose Voice-Activity-Detection Overlapping Inference to provide a trade-off between WER and computation cost. Results show that the proposed method can achieve a 20% relative computation cost reduction on Librispeech and Microsoft Speech Language Translation long-form corpus while maintaining the WER performance when comparing to the best performing overlapping inference algorithm. We also propose Soft-Match to compensate for similar words mis-aligned problem.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
Learning Physics-Informed Neural Networks without Stacked Back-propagation
Authors:
Di He,
Shanda Li,
Wenlei Shi,
Xiaotian Gao,
Jia Zhang,
Jiang Bian,
Liwei Wang,
Tie-Yan Liu
Abstract:
Physics-Informed Neural Network (PINN) has become a commonly used machine learning approach to solve partial differential equations (PDE). But, facing high-dimensional secondorder PDE problems, PINN will suffer from severe scalability issues since its loss includes second-order derivatives, the computational cost of which will grow along with the dimension during stacked back-propagation. In this…
▽ More
Physics-Informed Neural Network (PINN) has become a commonly used machine learning approach to solve partial differential equations (PDE). But, facing high-dimensional secondorder PDE problems, PINN will suffer from severe scalability issues since its loss includes second-order derivatives, the computational cost of which will grow along with the dimension during stacked back-propagation. In this work, we develop a novel approach that can significantly accelerate the training of Physics-Informed Neural Networks. In particular, we parameterize the PDE solution by the Gaussian smoothed model and show that, derived from Stein's Identity, the second-order derivatives can be efficiently calculated without back-propagation. We further discuss the model capacity and provide variance reduction methods to address key limitations in the derivative estimation. Experimental results show that our proposed method can achieve competitive error compared to standard PINN training but is significantly faster. Our code is released at https://github.com/LithiumDA/PINN-without-Stacked-BP.
△ Less
Submitted 24 February, 2023; v1 submitted 18 February, 2022;
originally announced February 2022.
-
HousE: Knowledge Graph Embedding with Householder Parameterization
Authors:
Rui Li,
Jianan Zhao,
Chaozhuo Li,
Di He,
Yiqi Wang,
Yuming Liu,
Hao Sun,
Senzhang Wang,
Weiwei Deng,
Yanming Shen,
Xing Xie,
Qi Zhang
Abstract:
The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties. However, existing approaches can only capture some of them with insufficient modeling capacity. In this work, we propose a more powerful KGE framework named HousE, which involves a novel parameterization based on two kinds of Householder transformations:…
▽ More
The effectiveness of knowledge graph embedding (KGE) largely depends on the ability to model intrinsic relation patterns and mapping properties. However, existing approaches can only capture some of them with insufficient modeling capacity. In this work, we propose a more powerful KGE framework named HousE, which involves a novel parameterization based on two kinds of Householder transformations: (1) Householder rotations to achieve superior capacity of modeling relation patterns; (2) Householder projections to handle sophisticated relation mapping properties. Theoretically, HousE is capable of modeling crucial relation patterns and mapping properties simultaneously. Besides, HousE is a generalization of existing rotation-based models while extending the rotations to high-dimensional spaces. Empirically, HousE achieves new state-of-the-art performance on five benchmark datasets. Our code is available at https://github.com/anrep/HousE.
△ Less
Submitted 19 June, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Post-Training Quantization for Cross-Platform Learned Image Compression
Authors:
Dailan He,
Ziming Yang,
Yuan Chen,
Qi Zhang,
Hongwei Qin,
Yan Wang
Abstract:
It has been witnessed that learned image compression has outperformed conventional image coding techniques and tends to be practical in industrial applications. One of the most critical issues that need to be considered is the non-deterministic calculation, which makes the probability prediction cross-platform inconsistent and frustrates successful decoding. We propose to solve this problem by int…
▽ More
It has been witnessed that learned image compression has outperformed conventional image coding techniques and tends to be practical in industrial applications. One of the most critical issues that need to be considered is the non-deterministic calculation, which makes the probability prediction cross-platform inconsistent and frustrates successful decoding. We propose to solve this problem by introducing well-developed post-training quantization and making the model inference integer-arithmetic-only, which is much simpler than presently existing training and fine-tuning based approaches yet still keeps the superior rate-distortion performance of learned image compression. Based on that, we further improve the discretization of the entropy parameters and extend the deterministic inference to fit Gaussian mixture models. With our proposed methods, the current state-of-the-art image compression models can infer in a cross-platform consistent manner, which makes the further development and practice of learned image compression more promising.
△ Less
Submitted 30 November, 2022; v1 submitted 15 February, 2022;
originally announced February 2022.
-
Weyl-type nodal chains in X2MnO4 (X= Li, Na)
Authors:
R. R. Kang,
S. D. He,
P. Zhou,
L. Z. Sun
Abstract:
Recently, magnetic topological semimetals have received a lot of attention due to their potential applications in the field of spintronics. By using first-principles calculations, we propose that two ferromagnetic spinel materials of X2MnO4 (X=Li, Na) have Weyl-type nodal chains around the Fermi level. Their stabilities are validated by cohesive energies, phonon dispersions, and elastic constants.…
▽ More
Recently, magnetic topological semimetals have received a lot of attention due to their potential applications in the field of spintronics. By using first-principles calculations, we propose that two ferromagnetic spinel materials of X2MnO4 (X=Li, Na) have Weyl-type nodal chains around the Fermi level. Their stabilities are validated by cohesive energies, phonon dispersions, and elastic constants. The nodal chains are composed of two types of nodal loops, which are protected by the glide operation Mz, the mirror operation M101 and their equivalent. The drumhead surface states are observed on the (001) surface and they exhibit nontrivial topological features. In addition, under different electron correlations and lattice strains, the semimetal states of these two materials are well kept. Our work provides two promising candidates for exploring the combination of magnetic materials and topological semimetal states.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
De-Noising of Photoacoustic Microscopy Images by Deep Learning
Authors:
Da He,
Jiasheng Zhou,
Xiaoyu Shang,
Jiajia Luo,
Sung-Liang Chen
Abstract:
As a hybrid imaging technology, photoacoustic microscopy (PAM) imaging suffers from noise due to the maximum permissible exposure of laser intensity, attenuation of ultrasound in the tissue, and the inherent noise of the transducer. De-noising is a post-processing method to reduce noise, and PAM image quality can be recovered. However, previous de-noising techniques usually heavily rely on mathema…
▽ More
As a hybrid imaging technology, photoacoustic microscopy (PAM) imaging suffers from noise due to the maximum permissible exposure of laser intensity, attenuation of ultrasound in the tissue, and the inherent noise of the transducer. De-noising is a post-processing method to reduce noise, and PAM image quality can be recovered. However, previous de-noising techniques usually heavily rely on mathematical priors as well as manually selected parameters, resulting in unsatisfactory and slow de-noising performance for different noisy images, which greatly hinders practical and clinical applications. In this work, we propose a deep learning-based method to remove complex noise from PAM images without mathematical priors and manual selection of settings for different input images. An attention enhanced generative adversarial network is used to extract image features and remove various noises. The proposed method is demonstrated on both synthetic and real datasets, including phantom (leaf veins) and in vivo (mouse ear blood vessels and zebrafish pigment) experiments. The results show that compared with previous PAM de-noising methods, our method exhibits good performance in recovering images qualitatively and quantitatively. In addition, the de-noising speed of 0.016 s is achieved for an image with $256\times256$ pixels. Our approach is effective and practical for the de-noising of PAM images.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Reconfigurable Intelligent Surface Enabled Spatial Multiplexing with Fully Convolutional Network
Authors:
Bile Peng,
Jan-Aike Termöhlen,
Cong Sun,
Danping He,
Ke Guan,
Tim Fingscheidt,
Eduard A. Jorswieck
Abstract:
Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply…
▽ More
Reconfigurable intelligent surface (RIS) is an emerging technology for future wireless communication systems. In this work, we consider downlink spatial multiplexing enabled by the RIS for weighted sum-rate (WSR) maximization. In the literature, most solutions use alternating gradient-based optimization, which has moderate performance, high complexity, and limited scalability. We propose to apply a fully convolutional network (FCN) to solve this problem, which was originally designed for semantic segmentation of images. The rectangular shape of the RIS and the spatial correlation of channels with adjacent RIS antennas due to the short distance between them encourage us to apply it for the RIS configuration. We design a set of channel features that includes both cascaded channels via the RIS and the direct channel. In the base station (BS), the differentiable minimum mean squared error (MMSE) precoder is used for pretraining and the weighted minimum mean squared error (WMMSE) precoder is then applied for fine-tuning, which is nondifferentiable, more complex, but achieves a better performance. Evaluation results show that the proposed solution has higher performance and allows for a faster evaluation than the baselines. Hence it scales better to a large number of antennas, advancing the RIS one step closer to practical deployment.
△ Less
Submitted 21 September, 2022; v1 submitted 8 January, 2022;
originally announced January 2022.
-
The study of lepton EDMs in $U(1)_X$ SSM
Authors:
Lu-Hao Su,
Dan He,
Xing-Xing Dong,
Tai-Fu Feng,
Shu-Min Zhao
Abstract:
The minimal supersymmetric extension of the standard model (MSSM) is extended to the $U(1)_X$SSM, whose local gauge group is $SU(3)_C \times SU(2)_L \times U(1)_Y \times U(1)_X$. To obtain the $U(1)_X$SSM, we add the new superfields to the MSSM, namely: three Higgs singlets $\hatη,~\hat{\barη},~\hat{S}$ and right-handed neutrinos $\hatν_i$. The CP violating effects are considered to study the lept…
▽ More
The minimal supersymmetric extension of the standard model (MSSM) is extended to the $U(1)_X$SSM, whose local gauge group is $SU(3)_C \times SU(2)_L \times U(1)_Y \times U(1)_X$. To obtain the $U(1)_X$SSM, we add the new superfields to the MSSM, namely: three Higgs singlets $\hatη,~\hat{\barη},~\hat{S}$ and right-handed neutrinos $\hatν_i$. The CP violating effects are considered to study the lepton electric dipole moment(EDM) in $U(1)_X$SSM. The CP violating phases in $U(1)_X$SSM are more than those in the standard model(SM). In this model, some new parameters $(θ_S, θ_{BB^{\prime}}, θ_{BL})$ as CP violating phases are considered, so there are new contributions to lepton EDMs. It is conducive to exploring the source of CP violation and probing new physical beyond SM.
△ Less
Submitted 9 May, 2022; v1 submitted 3 January, 2022;
originally announced January 2022.
-
Distributed Evolution Strategies Using TPUs for Meta-Learning
Authors:
Alex Sheng,
Derek He
Abstract:
Meta-learning traditionally relies on backpropagation through entire tasks to iteratively improve a model's learning dynamics. However, this approach is computationally intractable when scaled to complex tasks. We propose a distributed evolutionary meta-learning strategy using Tensor Processing Units (TPUs) that is highly parallel and scalable to arbitrarily long tasks with no increase in memory c…
▽ More
Meta-learning traditionally relies on backpropagation through entire tasks to iteratively improve a model's learning dynamics. However, this approach is computationally intractable when scaled to complex tasks. We propose a distributed evolutionary meta-learning strategy using Tensor Processing Units (TPUs) that is highly parallel and scalable to arbitrarily long tasks with no increase in memory cost. Using a Prototypical Network trained with evolution strategies on the Omniglot dataset, we achieved an accuracy of 98.4% on a 5-shot classification problem. Our algorithm used as much as 40 times less memory than automatic differentiation to compute the gradient, with the resulting model achieving accuracy within 1.3% of a backpropagation-trained equivalent (99.6%). We observed better classification accuracy as high as 99.1% with larger population configurations. We further experimentally validate the stability and performance of ES-ProtoNet across a variety of training conditions (varying population size, model size, number of workers, shot, way, ES hyperparameters, etc.). Our contributions are twofold: we provide the first assessment of evolutionary meta-learning in a supervised setting, and create a general framework for distributed evolution strategies on TPUs.
△ Less
Submitted 31 December, 2021;
originally announced January 2022.
-
Powerful Graph Convolutioal Networks with Adaptive Propagation Mechanism for Homophily and Heterophily
Authors:
Tao Wang,
Rui Wang,
Di Jin,
Dongxiao He,
Yuxiao Huang
Abstract:
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edg…
▽ More
Graph Convolutional Networks (GCNs) have been widely applied in various fields due to their significant power on processing graph-structured data. Typical GCN and its variants work under a homophily assumption (i.e., nodes with same class are prone to connect to each other), while ignoring the heterophily which exists in many real-world networks (i.e., nodes with different classes tend to form edges). Existing methods deal with heterophily by mainly aggregating higher-order neighborhoods or combing the immediate representations, which leads to noise and irrelevant information in the result. But these methods did not change the propagation mechanism which works under homophily assumption (that is a fundamental part of GCNs). This makes it difficult to distinguish the representation of nodes from different classes. To address this problem, in this paper we design a novel propagation mechanism, which can automatically change the propagation and aggregation process according to homophily or heterophily between node pairs. To adaptively learn the propagation process, we introduce two measurements of homophily degree between node pairs, which is learned based on topological and attribute information, respectively. Then we incorporate the learnable homophily degree into the graph convolution framework, which is trained in an end-to-end schema, enabling it to go beyond the assumption of homophily. More importantly, we theoretically prove that our model can constrain the similarity of representations between nodes according to their homophily degree. Experiments on seven real-world datasets demonstrate that this new approach outperforms the state-of-the-art methods under heterophily or low homophily, and gains competitive performance under homophily.
△ Less
Submitted 27 December, 2021;
originally announced December 2021.
-
Block Modeling-Guided Graph Convolutional Neural Networks
Authors:
Dongxiao He,
Chundong Liang,
Huixin Liu,
Mingxiang Wen,
Pengfei Jiao,
Zhiyong Feng
Abstract:
Graph Convolutional Network (GCN) has shown remarkable potential of exploring graph representation. However, the GCN aggregating mechanism fails to generalize to networks with heterophily where most nodes have neighbors from different classes, which commonly exists in real-world networks. In order to make the propagation and aggregation mechanism of GCN suitable for both homophily and heterophily…
▽ More
Graph Convolutional Network (GCN) has shown remarkable potential of exploring graph representation. However, the GCN aggregating mechanism fails to generalize to networks with heterophily where most nodes have neighbors from different classes, which commonly exists in real-world networks. In order to make the propagation and aggregation mechanism of GCN suitable for both homophily and heterophily (or even their mixture), we introduce block modeling into the framework of GCN so that it can realize "block-guided classified aggregation", and automatically learn the corresponding aggregation rules for neighbors of different classes. By incorporating block modeling into the aggregation process, GCN is able to aggregate information from homophilic and heterophilic neighbors discriminately according to their homophily degree. We compared our algorithm with state-of-art methods which deal with the heterophily problem. Empirical results demonstrate the superiority of our new approach over existing methods in heterophilic datasets while maintaining a competitive performance in homophilic datasets.
△ Less
Submitted 27 December, 2021; v1 submitted 26 December, 2021;
originally announced December 2021.
-
TOD-DA: Towards Boosting the Robustness of Task-oriented Dialogue Modeling on Spoken Conversations
Authors:
Xin Tian,
Xinxian Huang,
Dongfeng He,
Yingzhan Lin,
Siqi Bao,
Huang He,
Liankai Huang,
Qiang Ju,
Xiyuan Zhang,
Jian Xie,
Shuqi Sun,
Fan Wang,
Hua Wu,
Haifeng Wang
Abstract:
Task-oriented dialogue systems have been plagued by the difficulties of obtaining large-scale and high-quality annotated conversations. Furthermore, most of the publicly available datasets only include written conversations, which are insufficient to reflect actual human behaviors in practical spoken dialogue systems. In this paper, we propose Task-oriented Dialogue Data Augmentation (TOD-DA), a n…
▽ More
Task-oriented dialogue systems have been plagued by the difficulties of obtaining large-scale and high-quality annotated conversations. Furthermore, most of the publicly available datasets only include written conversations, which are insufficient to reflect actual human behaviors in practical spoken dialogue systems. In this paper, we propose Task-oriented Dialogue Data Augmentation (TOD-DA), a novel model-agnostic data augmentation paradigm to boost the robustness of task-oriented dialogue modeling on spoken conversations. The TOD-DA consists of two modules: 1) Dialogue Enrichment to expand training data on task-oriented conversations for easing data sparsity and 2) Spoken Conversation Simulator to imitate oral style expressions and speech recognition errors in diverse granularities for bridging the gap between written and spoken conversations. With such designs, our approach ranked first in both tasks of DSTC10 Track2, a benchmark for task-oriented dialogue modeling on spoken conversations, demonstrating the superiority and effectiveness of our proposed TOD-DA.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
Exploration of Dark Chemical Genomics Space via Portal Learning: Applied to Targeting the Undruggable Genome and COVID-19 Anti-Infective Polypharmacology
Authors:
Tian Cai,
Li Xie,
Muge Chen,
Yang Liu,
Di He,
Shuo Zhang,
Cameron Mura,
Philip E. Bourne,
Lei Xie
Abstract:
Advances in biomedicine are largely fueled by exploring uncharted territories of human biology. Machine learning can both enable and accelerate discovery, but faces a fundamental hurdle when applied to unseen data with distributions that differ from previously observed ones -- a common dilemma in scientific inquiry. We have developed a new deep learning framework, called {\textit{Portal Learning}}…
▽ More
Advances in biomedicine are largely fueled by exploring uncharted territories of human biology. Machine learning can both enable and accelerate discovery, but faces a fundamental hurdle when applied to unseen data with distributions that differ from previously observed ones -- a common dilemma in scientific inquiry. We have developed a new deep learning framework, called {\textit{Portal Learning}}, to explore dark chemical and biological space. Three key, novel components of our approach include: (i) end-to-end, step-wise transfer learning, in recognition of biology's sequence-structure-function paradigm, (ii) out-of-cluster meta-learning, and (iii) stress model selection. Portal Learning provides a practical solution to the out-of-distribution (OOD) problem in statistical machine learning. Here, we have implemented Portal Learning to predict chemical-protein interactions on a genome-wide scale. Systematic studies demonstrate that Portal Learning can effectively assign ligands to unexplored gene families (unknown functions), versus existing state-of-the-art methods, thereby allowing us to target previously "undruggable" proteins and design novel polypharmacological agents for disrupting interactions between SARS-CoV-2 and human proteins. Portal Learning is general-purpose and can be further applied to other areas of scientific inquiry.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
Authors:
Zipeng Xu,
Tianwei Lin,
Hao Tang,
Fu Li,
Dongliang He,
Nicu Sebe,
Radu Timofte,
Luc Van Gool,
Errui Ding
Abstract:
To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trained for. We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety o…
▽ More
To achieve disentangled image manipulation, previous works depend heavily on manual annotation. Meanwhile, the available manipulations are limited to a pre-defined set the models were trained for. We propose a novel framework, i.e., Predict, Prevent, and Evaluate (PPE), for disentangled text-driven image manipulation that requires little manual annotation while being applicable to a wide variety of manipulations. Our method approaches the targets by deeply exploiting the power of the large-scale pre-trained vision-language model CLIP. Concretely, we firstly Predict the possibly entangled attributes for a given text command. Then, based on the predicted attributes, we introduce an entanglement loss to Prevent entanglements during training. Finally, we propose a new evaluation metric to Evaluate the disentangled image manipulation. We verify the effectiveness of our method on the challenging face editing task. Extensive experiments show that the proposed PPE framework achieves much better quantitative and qualitative results than the up-to-date StyleCLIP baseline.
△ Less
Submitted 24 March, 2022; v1 submitted 26 November, 2021;
originally announced November 2021.
-
Transverse mode-encoded quantum gate on a silicon photonic chip
Authors:
Lan-Tian Feng,
Ming Zhang,
Xiao Xiong,
Di Liu,
Yu-Jie Cheng,
Fang-Ming Jing,
Xiao-Zhuo Qi,
Yang Chen,
De-Yong He,
Guo-Ping Guo,
Guang-Can Guo,
Dao-Xin Dai,
Xi-Feng Ren
Abstract:
As an important degree of freedom (DoF) in integrated photonic circuits, the orthogonal transverse mode provides a promising and flexible way to increasing communication capability, for both classical and quantum information processing. To construct large-scale on-chip multimode multi-DoF quantum systems, a transverse mode-encoded controlled-NOT (CNOT) gate is necessary. Here, through design and i…
▽ More
As an important degree of freedom (DoF) in integrated photonic circuits, the orthogonal transverse mode provides a promising and flexible way to increasing communication capability, for both classical and quantum information processing. To construct large-scale on-chip multimode multi-DoF quantum systems, a transverse mode-encoded controlled-NOT (CNOT) gate is necessary. Here, through design and integrate transverse mode-dependent directional coupler and attenuators on a silicon photonic chip, we demonstrate the first multimode implementation of a two-qubit quantum gate. With the aid of state preparation and analysis parts, we show the ability of the gate to entangle two separated transverse mode qubits with an average fidelity of $0.89\pm0.02$ and the achievement of 10 standard deviations of violations in the quantum nonlocality verification. In addition, a fidelity of $0.82\pm0.01$ was obtained from quantum process tomography used to completely characterize the CNOT gate. Our work paves the way for universal transverse mode-encoded quantum operations and large-scale multimode multi-DoF quantum systems.
△ Less
Submitted 7 November, 2021;
originally announced November 2021.
-
Can Vision Transformers Perform Convolution?
Authors:
Shanda Li,
Xiangning Chen,
Di He,
Cho-Jui Hsieh
Abstract:
Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers. This naturally leads to the following questions: Can a self-attention layer of ViT express any convolution operation? In this work, we prove that a single ViT layer with ima…
▽ More
Several recent studies have demonstrated that attention-based networks, such as Vision Transformer (ViT), can outperform Convolutional Neural Networks (CNNs) on several computer vision tasks without using convolutional layers. This naturally leads to the following questions: Can a self-attention layer of ViT express any convolution operation? In this work, we prove that a single ViT layer with image patches as the input can perform any convolution operation constructively, where the multi-head attention mechanism and the relative positional encoding play essential roles. We further provide a lower bound on the number of heads for Vision Transformers to express CNNs. Corresponding with our analysis, experimental results show that the construction in our proof can help inject convolutional bias into Transformers and significantly improve the performance of ViT in low data regimes.
△ Less
Submitted 2 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Boosting the Certified Robustness of L-infinity Distance Nets
Authors:
Bohang Zhang,
Du Jiang,
Di He,
Liwei Wang
Abstract:
Recently, Zhang et al. (2021) developed a new neural network architecture based on $\ell_\infty$-distance functions, which naturally possesses certified $\ell_\infty$ robustness by its construction. Despite the novel design and theoretical foundation, so far the model only achieved comparable performance to conventional networks. In this paper, we make the following two contributions:…
▽ More
Recently, Zhang et al. (2021) developed a new neural network architecture based on $\ell_\infty$-distance functions, which naturally possesses certified $\ell_\infty$ robustness by its construction. Despite the novel design and theoretical foundation, so far the model only achieved comparable performance to conventional networks. In this paper, we make the following two contributions: $\mathrm{(i)}$ We demonstrate that $\ell_\infty$-distance nets enjoy a fundamental advantage in certified robustness over conventional networks (under typical certification approaches); $\mathrm{(ii)}$ With an improved training process we are able to significantly boost the certified accuracy of $\ell_\infty$-distance nets. Our training approach largely alleviates the optimization problem that arose in the previous training scheme, in particular, the unexpected large Lipschitz constant due to the use of a crucial trick called $\ell_p$-relaxation. The core of our training approach is a novel objective function that combines scaled cross-entropy loss and clipped hinge loss with a decaying mixing coefficient. Experiments show that using the proposed training strategy, the certified accuracy of $\ell_\infty$-distance net can be dramatically improved from 33.30% to 40.06% on CIFAR-10 ($ε=8/255$), meanwhile outperforming other approaches in this area by a large margin. Our results clearly demonstrate the effectiveness and potential of $\ell_\infty$-distance net for certified robustness. Codes are available at https://github.com/zbh2047/L_inf-dist-net-v2.
△ Less
Submitted 15 March, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
HLIC: Harmonizing Optimization Metrics in Learned Image Compression by Reinforcement Learning
Authors:
Baocheng Sun,
Meng Gu,
Dailan He,
Tongda Xu,
Yan Wang,
Hongwei Qin
Abstract:
Learned image compression is making good progress in recent years. Peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) are the two most popular evaluation metrics. As different metrics only reflect certain aspects of human perception, works in this field normally optimize two models using PSNR and MS-SSIM as loss function separately, which is suboptimal and makes it d…
▽ More
Learned image compression is making good progress in recent years. Peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) are the two most popular evaluation metrics. As different metrics only reflect certain aspects of human perception, works in this field normally optimize two models using PSNR and MS-SSIM as loss function separately, which is suboptimal and makes it difficult to select the model with best visual quality or overall performance. Towards solving this problem, we propose to Harmonize optimization metrics in Learned Image Compression (HLIC) using online loss function adaptation by reinforcement learning. By doing so, we are able to leverage the advantages of both PSNR and MS-SSIM, achieving better visual quality and higher VMAF score. To our knowledge, our work is the first to explore automatic loss function adaptation for harmonizing optimization metrics in low level vision tasks like learned image compression.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Quantum key distribution over scattering channel
Authors:
Qi-Hang Lu,
Fang-Xiang Wang,
Kun Huang,
Xin Wu,
Shuang Wang,
De-Yong He,
Zhen-Qiang Yin,
Guang-Can Guo,
Wei Chen,
Zheng-Fu Han
Abstract:
Scattering of light by cloud, haze, and fog decreases the transmission efficiency of communication channels in quantum key distribution (QKD), reduces the system's practical security, and thus constrains the deployment of free-space QKD. Here, we employ the wavefront shaping technology to compensate distorted optical signals in high-loss scattering quantum channels and fulfill a polarization-encod…
▽ More
Scattering of light by cloud, haze, and fog decreases the transmission efficiency of communication channels in quantum key distribution (QKD), reduces the system's practical security, and thus constrains the deployment of free-space QKD. Here, we employ the wavefront shaping technology to compensate distorted optical signals in high-loss scattering quantum channels and fulfill a polarization-encoded BB84 QKD experiment. With this quantum channel compensation technology, we achieve a typical enhancement of about 250 in transmission efficiency and improve the secure key rate from 0 to $1.85\times10^{-6}$ per sifted key. The method and its first time validation show the great potential to expand the territory of QKD systems from lossless channels to highly scattered ones and therefore enhances the deployment ability of global quantum communication network.
△ Less
Submitted 27 September, 2021; v1 submitted 25 September, 2021;
originally announced September 2021.
-
Triangle mechanism in the decay process $J/ψ\to K^- K^+ a_1(1260)$
Authors:
Xuan Luo,
Dazhuang He,
Yiling Xie,
Hao Sun
Abstract:
The role of triangle mechanism in the decay process $J/ψ\to K^- K^+ a_1(1260)$ is probed. In this mechanism, a close-up resonance with mass $1823$ MeV and width $122$ MeV decays into $K^* φ, K^* \to K π$ and then $K^* \bar{K}$ fuses into the $a_1(1260)$ resonance. We find that this mechanism leads to a triangle singularity around $M_{\rm inv}(K^- a_1(1260))\approx 1920$ MeV, where the axial-vector…
▽ More
The role of triangle mechanism in the decay process $J/ψ\to K^- K^+ a_1(1260)$ is probed. In this mechanism, a close-up resonance with mass $1823$ MeV and width $122$ MeV decays into $K^* φ, K^* \to K π$ and then $K^* \bar{K}$ fuses into the $a_1(1260)$ resonance. We find that this mechanism leads to a triangle singularity around $M_{\rm inv}(K^- a_1(1260))\approx 1920$ MeV, where the axial-vector meson $a_1(1260)$ is considered as a dynamically generated resonance. With the help of the triangle mechanism we find sizable branching ratios $\text{Br}(J/ψ\to K^- K^+ a_1(1260),a_1 \to πρ)=1.210 \times 10^{-5}$ and $\text{Br}(J/ψ\to K^- K^+ a_1(1260))=3.501 \times 10^{-5}$. Such a effect from triangle mechanism of the decay process could be investigated by such as BESIII, LHCb and Belle-II experiments. This potential investigation can help us obtain the information of the axial-vector meson $a_1(1260)$.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
Decomposition approach for Stackelberg P-median problem with user preferences
Authors:
Qingyun Tian,
Yun Hui Lin,
Dongdong He
Abstract:
The P-median facility location problem with user preferences (PUP) studies an operator that locates P facilities to serve customers/users in a cost-efficient manner, upon anticipating customer preferences and choices. The problem can be visualized as a leader-follower game in which the operator is the leader that opens facilities, whereas the customer is the follower who observes the operator's lo…
▽ More
The P-median facility location problem with user preferences (PUP) studies an operator that locates P facilities to serve customers/users in a cost-efficient manner, upon anticipating customer preferences and choices. The problem can be visualized as a leader-follower game in which the operator is the leader that opens facilities, whereas the customer is the follower who observes the operator's location decision at first and then seeks services from the most preferred facility. Such a modeling perspective is of practical importance as we have witnessed its applications to various problems, such as the establishment of power plants in energy markets and the location of healthcare service centers for COVID-19 Vaccination. Despite that a considerable number of solution methodologies have been proposed, many of them are heuristic methods whose solution quality cannot be easily verified. Moreover, due to the hardness of the problems, existing exact approaches have limited performance. Motivated by these observations, we aim to develop an efficient exact algorithm for solving large-scale PUP models. We first propose a branch-and-cut decomposition algorithm and then design accelerated techniques to further enhance the performance. Using a broad testbed, we show that our algorithm outperforms various exact approaches by a large margin, and the advantage can go up to several orders of magnitude in terms of computational time in some datasets. Finally, we conduct sensitivity analysis to draw additional implications and to highlight the importance of considering user preferences when they exist.
△ Less
Submitted 18 September, 2021;
originally announced September 2021.
-
Detection of Abrupt Change in Channel Covariance Matrix for Multi-Antenna Communication
Authors:
Runnan Liu,
Liang Liu,
Dazhi He,
Wenjun Zhang,
Erik G. Larsson
Abstract:
The knowledge of channel covariance matrices is of paramount importance to the estimation of instantaneous channels and the design of beamforming vectors in multi-antenna systems. In practice, an abrupt change in channel covariance matrices may occur due to the change in the environment and the user location. Although several works have proposed efficient algorithms to estimate the channel covaria…
▽ More
The knowledge of channel covariance matrices is of paramount importance to the estimation of instantaneous channels and the design of beamforming vectors in multi-antenna systems. In practice, an abrupt change in channel covariance matrices may occur due to the change in the environment and the user location. Although several works have proposed efficient algorithms to estimate the channel covariance matrices after any change occurs, how to detect such a change accurately and quickly is still an open problem in the literature. In this paper, we focus on channel covariance change detection between a multi-antenna base station (BS) and a single-antenna user equipment (UE). To provide theoretical performance limit, we first propose a genie-aided change detector based on the log-likelihood ratio (LLR) test assuming the channel covariance matrix after change is known, and characterize the corresponding missed detection and false alarm probabilities. Then, this paper considers the practical case where the channel covariance matrix after change is unknown. The maximum likelihood (ML) estimation technique is used to predict the covariance matrix based on the received pilot signals over a certain number of coherence blocks, building upon which the LLR-based change detector is employed. Numerical results show that our proposed scheme can detect the change with low error probability even when the number of channel samples is small such that the estimation of the covariance matrix is not that accurate. This result verifies the possibility to detect the channel covariance change both accurately and quickly in practice.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
Measurement-device-independent quantum key distribution for nonstandalone networks
Authors:
Guan-Jie Fan-Yuan,
Feng-Yu Lu,
Shuang Wang,
Zhen-Qiang Yin,
De-Yong He,
Zheng Zhou,
Jun Teng,
Wei Chen,
Guang-Can Guo,
Zheng-Fu Han
Abstract:
Untrusted node networks initially implemented by measurement-device-independent quantum key distribution (MDI-QKD) protocol are a crucial step on the roadmap of the quantum Internet. Considering extensive QKD implementations of trusted node networks, a workable upgrading tactic of existing networks toward MDI networks needs to be explicit. Here, referring to the nonstandalone (NSA) network of 5G,…
▽ More
Untrusted node networks initially implemented by measurement-device-independent quantum key distribution (MDI-QKD) protocol are a crucial step on the roadmap of the quantum Internet. Considering extensive QKD implementations of trusted node networks, a workable upgrading tactic of existing networks toward MDI networks needs to be explicit. Here, referring to the nonstandalone (NSA) network of 5G, we propose an NSA-MDI scheme as an evolutionary selection for existing phase-encoding BB84 networks. Our solution can upgrade the BB84 networks and terminals that employ various phase-encoding schemes to immediately support MDI without hardware changes. This cost-effective upgrade effectively promotes the deployment of MDI networks as a step of untrusted node networks while taking full advantage of existing networks. In addition, the diversified demands on security and bandwidth are satisfied, and network survivability is improved.
△ Less
Submitted 6 September, 2021; v1 submitted 2 September, 2021;
originally announced September 2021.
-
Unbalanced-basis-misalignment tolerant measurement-device-independent quantum key distribution
Authors:
Feng-Yu Lu,
Ze-Hao Wang,
Zhen-Qiang Yin,
Shuang Wang,
Rong Wang,
Guan-Jie Fan-Yuan,
Xiao-Juan Huang,
De-Yong He,
Wei Chen,
Zheng Zhou,
Guang-Can Guo,
Zheng-Fu Han
Abstract:
Measurement-device-independent quantum key distribution (MDIQKD) is a revolutionary protocol since it is physically immune to all attacks on the detection side. However, the protocol still keeps the strict assumptions on the source side that the four BB84-states must be perfectly prepared to ensure security. Some protocols release part of the assumptions in the encoding system to keep the practica…
▽ More
Measurement-device-independent quantum key distribution (MDIQKD) is a revolutionary protocol since it is physically immune to all attacks on the detection side. However, the protocol still keeps the strict assumptions on the source side that the four BB84-states must be perfectly prepared to ensure security. Some protocols release part of the assumptions in the encoding system to keep the practical security, but the performance would be dramatically reduced. In this work, we present a MDIQKD protocol that requires less knowledge of encoding system to combat the troublesome modulation errors and fluctuations. We have also experimentally demonstrated the protocol. The result indicates the high-performance and good security for its practical applications. Besides, its robustness and flexibility exhibit a good value for complex scenarios such as the QKD networks.
△ Less
Submitted 3 August, 2022; v1 submitted 26 August, 2021;
originally announced August 2021.
-
Paint Transformer: Feed Forward Neural Painting with Stroke Prediction
Authors:
Songhua Liu,
Tianwei Lin,
Dongliang He,
Fu Li,
Ruifeng Deng,
Xin Li,
Errui Ding,
Hao Wang
Abstract:
Neural painting refers to the procedure of producing a series of strokes for a given image and non-photo-realistically recreating it using neural networks. While reinforcement learning (RL) based agents can generate a stroke sequence step by step for this task, it is not easy to train a stable RL agent. On the other hand, stroke optimization methods search for a set of stroke parameters iterativel…
▽ More
Neural painting refers to the procedure of producing a series of strokes for a given image and non-photo-realistically recreating it using neural networks. While reinforcement learning (RL) based agents can generate a stroke sequence step by step for this task, it is not easy to train a stable RL agent. On the other hand, stroke optimization methods search for a set of stroke parameters iteratively in a large search space; such low efficiency significantly limits their prevalence and practicality. Different from previous methods, in this paper, we formulate the task as a set prediction problem and propose a novel Transformer-based framework, dubbed Paint Transformer, to predict the parameters of a stroke set with a feed forward network. This way, our model can generate a set of strokes in parallel and obtain the final painting of size 512 * 512 in near real time. More importantly, since there is no dataset available for training the Paint Transformer, we devise a self-training pipeline such that it can be trained without any off-the-shelf dataset while still achieving excellent generalization capability. Experiments demonstrate that our method achieves better painting performance than previous ones with cheaper training and inference costs. Codes and models are available.
△ Less
Submitted 11 August, 2021; v1 submitted 9 August, 2021;
originally announced August 2021.
-
AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer
Authors:
Songhua Liu,
Tianwei Lin,
Dongliang He,
Fu Li,
Meiling Wang,
Xin Li,
Zhengxing Sun,
Qian Li,
Errui Ding
Abstract:
Fast arbitrary neural style transfer has attracted widespread attention from academic, industrial and art communities due to its flexibility in enabling various applications. Existing solutions either attentively fuse deep style feature into deep content feature without considering feature distributions, or adaptively normalize deep content feature according to the style such that their global sta…
▽ More
Fast arbitrary neural style transfer has attracted widespread attention from academic, industrial and art communities due to its flexibility in enabling various applications. Existing solutions either attentively fuse deep style feature into deep content feature without considering feature distributions, or adaptively normalize deep content feature according to the style such that their global statistics are matched. Although effective, leaving shallow feature unexplored and without locally considering feature statistics, they are prone to unnatural output with unpleasing local distortions. To alleviate this problem, in this paper, we propose a novel attention and normalization module, named Adaptive Attention Normalization (AdaAttN), to adaptively perform attentive normalization on per-point basis. Specifically, spatial attention score is learnt from both shallow and deep features of content and style images. Then per-point weighted statistics are calculated by regarding a style feature point as a distribution of attention-weighted output of all style feature points. Finally, the content feature is normalized so that they demonstrate the same local feature statistics as the calculated per-point weighted style feature statistics. Besides, a novel local feature loss is derived based on AdaAttN to enhance local visual quality. We also extend AdaAttN to be ready for video style transfer with slight modifications. Experiments demonstrate that our method achieves state-of-the-art arbitrary image/video style transfer. Codes and models are available.
△ Less
Submitted 11 August, 2021; v1 submitted 8 August, 2021;
originally announced August 2021.
-
DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
Authors:
Min Yang,
Dongliang He,
Miao Fan,
Baorong Shi,
Xuetong Xue,
Fu Li,
Errui Ding,
Jizhou Huang
Abstract:
Image Retrieval is a fundamental task of obtaining images similar to the query one from a database. A common image retrieval practice is to firstly retrieve candidate images via similarity search using global image features and then re-rank the candidates by leveraging their local features. Previous learning-based studies mainly focus on either global or local image representation learning to tack…
▽ More
Image Retrieval is a fundamental task of obtaining images similar to the query one from a database. A common image retrieval practice is to firstly retrieve candidate images via similarity search using global image features and then re-rank the candidates by leveraging their local features. Previous learning-based studies mainly focus on either global or local image representation learning to tackle the retrieval task. In this paper, we abandon the two-stage paradigm and seek to design an effective single-stage solution by integrating local and global information inside images into compact image representations. Specifically, we propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval. It attentively extracts representative local information with multi-atrous convolutions and self-attention at first. Components orthogonal to the global image representation are then extracted from the local information. At last, the orthogonal components are concatenated with the global representation as a complementary, and then aggregation is performed to generate the final representation. The whole framework is end-to-end differentiable and can be trained with image-level labels. Extensive experimental results validate the effectiveness of our solution and show that our model achieves state-of-the-art image retrieval performances on Revisited Oxford and Paris datasets.
△ Less
Submitted 11 August, 2021; v1 submitted 5 August, 2021;
originally announced August 2021.
-
TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding
Authors:
Dailan He,
Yusheng Zhao,
Junyu Luo,
Tianrui Hui,
Shaofei Huang,
Aixi Zhang,
Si Liu
Abstract:
Recently proposed fine-grained 3D visual grounding is an essential and challenging task, whose goal is to identify the 3D object referred by a natural language sentence from other distractive objects of the same category. Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from dis…
▽ More
Recently proposed fine-grained 3D visual grounding is an essential and challenging task, whose goal is to identify the 3D object referred by a natural language sentence from other distractive objects of the same category. Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents. In this work, we exploit Transformer for its natural suitability on permutation-invariant 3D point clouds data and propose a TransRefer3D network to extract entity-and-relation aware multimodal context among objects for more discriminative feature learning. Concretely, we devise an Entity-aware Attention (EA) module and a Relation-aware Attention (RA) module to conduct fine-grained cross-modal feature matching. Facilitated by co-attention operation, our EA module matches visual entity features with linguistic entity features while RA module matches pair-wise visual relation features with linguistic relation features, respectively. We further integrate EA and RA modules into an Entity-and-Relation aware Contextual Block (ERCB) and stack several ERCBs to form our TransRefer3D for hierarchical multimodal context modeling. Extensive experiments on both Nr3D and Sr3D datasets demonstrate that our proposed model significantly outperforms existing approaches by up to 10.6% and claims the new state-of-the-art. To the best of our knowledge, this is the first work investigating Transformer architecture for fine-grained 3D visual grounding task.
△ Less
Submitted 11 August, 2021; v1 submitted 5 August, 2021;
originally announced August 2021.
-
FAST-LIO2: Fast Direct LiDAR-inertial Odometry
Authors:
Wei Xu,
Yixi Cai,
Dongjiao He,
Jiarong Lin,
Fu Zhang
Abstract:
This paper presents FAST-LIO2: a fast, robust, and versatile LiDAR-inertial odometry framework. Building on a highly efficient tightly-coupled iterated Kalman filter, FAST-LIO2 has two key novelties that allow fast, robust, and accurate LiDAR navigation (and mapping). The first one is directly registering raw points to the map (and subsequently update the map, i.e., mapping) without extracting fea…
▽ More
This paper presents FAST-LIO2: a fast, robust, and versatile LiDAR-inertial odometry framework. Building on a highly efficient tightly-coupled iterated Kalman filter, FAST-LIO2 has two key novelties that allow fast, robust, and accurate LiDAR navigation (and mapping). The first one is directly registering raw points to the map (and subsequently update the map, i.e., mapping) without extracting features. This enables the exploitation of subtle features in the environment and hence increases the accuracy. The elimination of a hand-engineered feature extraction module also makes it naturally adaptable to emerging LiDARs of different scanning patterns; The second main novelty is maintaining a map by an incremental k-d tree data structure, ikd-Tree, that enables incremental updates (i.e., point insertion, delete) and dynamic re-balancing. Compared with existing dynamic data structures (octree, R*-tree, nanoflann k-d tree), ikd-Tree achieves superior overall performance while naturally supports downsampling on the tree. We conduct an exhaustive benchmark comparison in 19 sequences from a variety of open LiDAR datasets. FAST-LIO2 achieves consistently higher accuracy at a much lower computation load than other state-of-the-art LiDAR-inertial navigation systems. Various real-world experiments on solid-state LiDARs with small FoV are also conducted. Overall, FAST-LIO2 is computationally-efficient (e.g., up to 100 Hz odometry and mapping in large outdoor environments), robust (e.g., reliable pose estimation in cluttered indoor environments with rotation up to 1000 deg/s), versatile (i.e., applicable to both multi-line spinning and solid-state LiDARs, UAV and handheld platforms, and Intel and ARM-based processors), while still achieving higher accuracy than existing methods. Our implementation of the system FAST-LIO2, and the data structure ikd-Tree are both open-sourced on Github.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Recursive Utility with Investment Gains and Losses: Existence, Uniqueness, and Convergence
Authors:
Jing Guo,
Xue Dong He
Abstract:
We consider a generalization of the recursive utility model by adding a new component that represents utility of investment gains and losses. We also study the utility process in this generalized model with constant elasticity of intertemporal substitution and relative risk aversion degree, and with infinite time horizon. In a specific, finite-state Markovian setting, we prove that the utility pro…
▽ More
We consider a generalization of the recursive utility model by adding a new component that represents utility of investment gains and losses. We also study the utility process in this generalized model with constant elasticity of intertemporal substitution and relative risk aversion degree, and with infinite time horizon. In a specific, finite-state Markovian setting, we prove that the utility process uniquely exists when the agent derives nonnegative gain-loss utility, and that it can be non-existent or non-unique otherwise. Moreover, we prove that the utility process, when it uniquely exists, can be computed by starting from any initial guess and applying the recursive equation that defines the utility process repeatedly. We then consider a portfolio selection problem with gain-loss utility and solve it by proving that the corresponding dynamic programming equation has a unique solution. Finally, we extend certain previous results to the case in which the state space is infinite.
△ Less
Submitted 11 July, 2021;
originally announced July 2021.