Search | arXiv e-print repository

Jacta: A Versatile Planner for Learning Dexterous and Whole-body Manipulation

Authors: Jan Brüdigam, Ali-Adeeb Abbas, Maks Sorokin, Kuan Fang, Brandon Hung, Maya Guru, Stefan Sosnowski, Jiuguang Wang, Sandra Hirche, Simon Le Cleac'h

Abstract: Robotic manipulation is challenging due to discontinuous dynamics, as well as high-dimensional state and action spaces. Data-driven approaches that succeed in manipulation tasks require large amounts of data and expert demonstrations, typically from humans. Existing manipulation planners are restricted to specific systems and often depend on specialized algorithms for using demonstration. Therefor… ▽ More Robotic manipulation is challenging due to discontinuous dynamics, as well as high-dimensional state and action spaces. Data-driven approaches that succeed in manipulation tasks require large amounts of data and expert demonstrations, typically from humans. Existing manipulation planners are restricted to specific systems and often depend on specialized algorithms for using demonstration. Therefore, we introduce a flexible motion planner tailored to dexterous and whole-body manipulation tasks. Our planner creates readily usable demonstrations for reinforcement learning algorithms, eliminating the need for additional training pipeline complexities. With this approach, we can efficiently learn policies for complex manipulation tasks, where traditional reinforcement learning alone only makes little progress. Furthermore, we demonstrate that learned policies are transferable to real robotic systems for solving complex dexterous manipulation tasks. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.10341 [pdf, other]

Affordance-Guided Reinforcement Learning via Visual Prompting

Authors: Olivia Y. Lee, Annie Xie, Kuan Fang, Karl Pertsch, Chelsea Finn

Abstract: Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as demonstrations or examples of success and failure, to learn task-specific reward functions. Recent… ▽ More Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as demonstrations or examples of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics. These models can perform visual reasoning in physical contexts and generate coarse robot motions for various manipulation tasks. Motivated by this range of capability, in this work, we propose and study rewards shaped by vision-language models (VLMs). State-of-the-art VLMs have demonstrated an impressive ability to reason about affordances through keypoints in zero-shot, and we leverage this to define dense rewards for robotic learning. On a real-world manipulation task specified by natural language description, we find that these rewards improve the sample efficiency of autonomous RL and enable successful completion of the task in 20K online finetuning steps. Additionally, we demonstrate the robustness of the approach to reductions in the number of in-domain demonstrations used for pretraining, reaching comparable performance in 35K online finetuning steps. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 15 pages, 9 figures. Robotics: Science and Systems (RSS) 2024, Task Specification for General-Purpose Intelligent Robots & Lifelong Robot Learning Workshops

arXiv:2407.06027 [pdf, other]

PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficult to use. To address this issue, we propose PAS, an LLM-based plug-and-play APE system. PAS utilizes LLMs trained on high-quality, automatically generated prompt complementary datasets, resulting in exceptional performance. In comprehensive benchmarks, PAS achieves state-of-the-art (SoTA) results compared to previous APE models, with an average improvement of 6.09 points. Moreover, PAS is highly efficient, achieving SoTA performance with only 9000 data points. Additionally, PAS can autonomously generate prompt augmentation data without requiring additional human labor. Its flexibility also allows it to be compatible with all existing LLMs and applicable to a wide range of tasks. PAS excels in human evaluations, underscoring its suitability as a plug-in for users. This combination of high performance, efficiency, and flexibility makes PAS a valuable system for enhancing the usability and effectiveness of LLMs through improved prompt engineering. △ Less

Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2406.05654 [pdf, other]

DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

Authors: Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou

Abstract: Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, ye… ▽ More Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, yet current studies often rely on general knowledge sources like Wikipedia to assess the models' abilities in solving common-sense problems. In this paper, we evaluated LLMs by RAG settings in a domain-specific context, college enrollment. We identified six required abilities for RAG models, including the ability in conversational RAG, analyzing structural information, faithfulness to external knowledge, denoising, solving time-sensitive problems, and understanding multi-document interactions. Each ability has an associated dataset with shared corpora to evaluate the RAG models' performance. We evaluated popular LLMs such as Llama, Baichuan, ChatGLM, and GPT models. Experimental results indicate that existing closed-book LLMs struggle with domain-specific questions, highlighting the need for RAG models to solve expert problems. Moreover, there is room for RAG models to improve their abilities in comprehending conversational history, analyzing structural information, denoising, processing multi-document interactions, and faithfulness in expert knowledge. We expect future studies could solve these problems better. △ Less

Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

arXiv:2404.00357 [pdf, other]

Revisiting Random Weight Perturbation for Efficiently Improving Generalization

Authors: Tao Li, Qinghua Tao, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang, Mingzhen He, Xiaolin Huang

Abstract: Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objecti… ▽ More Improving the generalization ability of modern deep neural networks (DNNs) is a fundamental challenge in machine learning. Two branches of methods have been proposed to seek flat minima and improve generalization: one led by sharpness-aware minimization (SAM) minimizes the worst-case neighborhood loss through adversarial weight perturbation (AWP), and the other minimizes the expected Bayes objective with random weight perturbation (RWP). While RWP offers advantages in computation and is closely linked to AWP on a mathematical basis, its empirical performance has consistently lagged behind that of AWP. In this paper, we revisit the use of RWP for improving generalization and propose improvements from two perspectives: i) the trade-off between generalization and convergence and ii) the random perturbation generation. Through extensive experimental evaluations, we demonstrate that our enhanced RWP methods achieve greater efficiency in enhancing generalization, particularly in large-scale problems, while also offering comparable or even superior performance to SAM. The code is released at https://github.com/nblt/mARWP. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted to TMLR 2024

arXiv:2403.03174 [pdf, other]

MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting

Authors: Fangchen Liu, Kuan Fang, Pieter Abbeel, Sergey Levine

Abstract: Open-world generalization requires robotic systems to have a profound understanding of the physical world and the user command to solve diverse and complex tasks. While the recent advancement in vision-language models (VLMs) has offered unprecedented opportunities to solve open-world problems, how to leverage their capabilities to control robots remains a grand challenge. In this paper, we present… ▽ More Open-world generalization requires robotic systems to have a profound understanding of the physical world and the user command to solve diverse and complex tasks. While the recent advancement in vision-language models (VLMs) has offered unprecedented opportunities to solve open-world problems, how to leverage their capabilities to control robots remains a grand challenge. In this paper, we present MOKA (Marking Open-vocabulary Keypoint Affordances), an approach that employs VLMs to solve robotic manipulation tasks specified by free-form language instructions. Central to our approach is a compact point-based representation of affordance, which bridges the VLM's predictions on observed images and the robot's actions in the physical world. By prompting the pre-trained VLM, our approach utilizes the VLM's commonsense knowledge and concept understanding acquired from broad data sources to predict affordances and generate motions. To facilitate the VLM's reasoning in zero-shot and few-shot manners, we propose a visual prompting technique that annotates marks on images, converting affordance reasoning into a series of visual question-answering problems that are solvable by the VLM. We further explore methods to enhance performance with robot experiences collected by MOKA through in-context learning and policy distillation. We evaluate and analyze MOKA's performance on various table-top manipulation tasks including tool use, deformable body manipulation, and object rearrangement. △ Less

Submitted 19 August, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.12052 [pdf, other]

Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs

Authors: Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, Ji-Rong Wen

Abstract: The integration of large language models (LLMs) and search engines represents a significant evolution in knowledge acquisition methodologies. However, determining the knowledge that an LLM already possesses and the knowledge that requires the help of a search engine remains an unresolved issue. Most existing methods solve this problem through the results of preliminary answers or reasoning done by… ▽ More The integration of large language models (LLMs) and search engines represents a significant evolution in knowledge acquisition methodologies. However, determining the knowledge that an LLM already possesses and the knowledge that requires the help of a search engine remains an unresolved issue. Most existing methods solve this problem through the results of preliminary answers or reasoning done by the LLM itself, but this incurs excessively high computational costs. This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in LLMs with a slim proxy model, to enhance the LLM's knowledge acquisition process. We employ a proxy model which has far fewer parameters, and take its answers as heuristic answers. Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM. We only conduct retrieval for the missing knowledge in questions that the LLM does not know. Extensive experimental results on five datasets with two LLMs demonstrate a notable improvement in the end-to-end performance of LLMs in question-answering tasks, achieving or surpassing current state-of-the-art models with lower LLM inference costs. △ Less

Submitted 30 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted by ACL 2024 main conference. Repo: https://github.com/plageon/SlimPLM

arXiv:2402.02949 [pdf, other]

Kernel PCA for Out-of-Distribution Detection

Authors: Kun Fang, Qinghua Tao, Kexin Lv, Mingzhen He, Xiaolin Huang, Jie Yang

Abstract: Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proce… ▽ More Out-of-Distribution (OoD) detection is vital for the reliability of Deep Neural Networks (DNNs). Existing works have shown the insufficiency of Principal Component Analysis (PCA) straightforwardly applied on the features of DNNs in detecting OoD data from In-Distribution (InD) data. The failure of PCA suggests that the network features residing in OoD and InD are not well separated by simply proceeding in a linear subspace, which instead can be resolved through proper nonlinear mappings. In this work, we leverage the framework of Kernel PCA (KPCA) for OoD detection, seeking subspaces where OoD and InD features are allocated with significantly different patterns. We devise two feature mappings that induce non-linear kernels in KPCA to advocate the separability between InD and OoD data in the subspace spanned by the principal components. Given any test sample, the reconstruction error in such subspace is then used to efficiently obtain the detection result with $\mathcal{O}(1)$ time complexity in inference. Extensive empirical results on multiple OoD data sets and network structures verify the superiority of our KPCA-based detector in efficiency and efficacy with state-of-the-art OoD detection performances. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2312.12478 [pdf, other]

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

Authors: Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

Abstract: The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text… ▽ More The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text retrieval. However, applying them directly to UCDR may not sufficiently to handle both domain shift (i.e., adapting to unfamiliar domains) and semantic shift (i.e., transferring to unknown categories). To this end, we propose \textbf{Pro}mpting-to-\textbf{S}imulate (ProS), the first method to apply prompt tuning for UCDR. ProS employs a two-step process to simulate Content-aware Dynamic Prompts (CaDP) which can impact models to produce generalized features for UCDR. Concretely, in Prompt Units Learning stage, we introduce two Prompt Units to individually capture domain and semantic knowledge in a mask-and-align way. Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP. Extensive experiments conducted on three benchmark datasets show that our method achieves new state-of-the-art performance without bringing excessive parameters. Our method is publicly available at https://github.com/fangkaipeng/ProS. △ Less

Submitted 29 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

arXiv:2311.15596 [pdf, other]

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

Authors: Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, Yang Liu

Abstract: Vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for adva… ▽ More Vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for advancing autonomous agents and robotics, remains largely unexplored. To bridge this research gap, we introduce EgoThink, a novel visual question-answering benchmark that encompasses six core capabilities with twelve detailed dimensions. The benchmark is constructed using selected clips from egocentric videos, with manually annotated question-answer pairs containing first-person information. To comprehensively assess VLMs, we evaluate eighteen popular VLMs on EgoThink. Moreover, given the open-ended format of the answers, we use GPT-4 as the automatic judge to compute single-answer grading. Experimental results indicate that although GPT-4V leads in numerous dimensions, all evaluated VLMs still possess considerable potential for improvement in first-person perspective tasks. Meanwhile, enlarging the number of trainable parameters has the most significant impact on model performance on EgoThink. In conclusion, EgoThink serves as a valuable addition to existing evaluation benchmarks for VLMs, providing an indispensable resource for future research in the realm of embodied artificial intelligence and robotics. △ Less

Submitted 28 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.18738 [pdf, other]

TLM: Token-Level Masking for Transformers

Authors: Yangjun Wu, Kebin Fang, Dongxiang Zhang, Han Wang, Hao Zhang, Gang Chen

Abstract: Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers. In this paper, we propose a new regularization scheme based on token-level rather than structure-level to reduce overfitting. Specifically, we devise a novel Token-Level Masking (TLM) training strategy for Transformers to regularize the con… ▽ More Structured dropout approaches, such as attention dropout and DropHead, have been investigated to regularize the multi-head attention mechanism in Transformers. In this paper, we propose a new regularization scheme based on token-level rather than structure-level to reduce overfitting. Specifically, we devise a novel Token-Level Masking (TLM) training strategy for Transformers to regularize the connections of self-attention, which consists of two masking techniques that are effective and easy to implement. The underlying idea is to manipulate the connections between tokens in the multi-head attention via masking, where the networks are forced to exploit partial neighbors' information to produce a meaningful representation. The generality and effectiveness of TLM are thoroughly evaluated via extensive experiments on 4 diversified NLP tasks across 18 datasets, including natural language understanding benchmark GLUE, ChineseGLUE, Chinese Grammatical Error Correction, and data-to-text generation. The results indicate that TLM can consistently outperform attention dropout and DropHead, e.g., it increases by 0.5 points relative to DropHead with BERT-large on GLUE. Moreover, TLM can establish a new record on the data-to-text benchmark Rotowire (18.93 BLEU). Our code will be publicly available at https://github.com/Young1993/tlm. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: 13 pages. Accepted by EMNLP2023 main conference

arXiv:2310.18026 [pdf, other]

Symmetry-Based Quantum Circuit Mapping

Authors: Di Yu, Kun Fang

Abstract: Quantum circuit mapping is a crucial process in the quantum circuit compilation pipeline, facilitating the transformation of a logical quantum circuit into a list of instructions directly executable on a target quantum system. Recent research has introduced a post-compilation step known as remapping, which seeks to reconfigure the initial circuit mapping to mitigate quantum circuit errors arising… ▽ More Quantum circuit mapping is a crucial process in the quantum circuit compilation pipeline, facilitating the transformation of a logical quantum circuit into a list of instructions directly executable on a target quantum system. Recent research has introduced a post-compilation step known as remapping, which seeks to reconfigure the initial circuit mapping to mitigate quantum circuit errors arising from system variability. As quantum processors continue to scale in size, the efficiency of quantum circuit mapping and the overall compilation process has become of paramount importance. In this work, we introduce a quantum circuit remapping algorithm that leverages the intrinsic symmetries in quantum processors, making it well-suited for large-scale quantum systems. This algorithm identifies all topologically equivalent circuit mappings by constraining the search space using symmetries and accelerates the scoring of each mapping using vector computation. Notably, this symmetry-based circuit remapping algorithm exhibits linear scaling with the number of qubits in the target quantum hardware and is proven to be optimal in terms of its time complexity. Moreover, we conduct a comparative analysis against existing methods in the literature, demonstrating the superior performance of our symmetry-based method on state-of-the-art quantum hardware architectures and highlighting the practical utility of our algorithm, particularly for quantum processors with millions of qubits. △ Less

Submitted 27 October, 2023; originally announced October 2023.

Comments: 10 pages, 5 figures; comments are welcome

arXiv:2310.15896 [pdf, other]

BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT

Authors: Yirong Chen, Zhenyu Wang, Xiaofen Xing, huimin zheng, Zhipei Xu, Kai Fang, Junhong Wang, Sihang Li, Jieling Wu, Qi Liu, Xiangmin Xu

Abstract: Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independen… ▽ More Large language models (LLMs) have performed well in providing general and extensive health suggestions in single-turn conversations, exemplified by systems such as ChatGPT, ChatGLM, ChatDoctor, DoctorGLM, and etc. However, the limited information provided by users during single turn results in inadequate personalization and targeting of the generated suggestions, which requires users to independently select the useful part. It is mainly caused by the missing ability to engage in multi-turn questioning. In real-world medical consultations, doctors usually employ a series of iterative inquiries to comprehend the patient's condition thoroughly, enabling them to provide effective and personalized suggestions subsequently, which can be defined as chain of questioning (CoQ) for LLMs. To improve the CoQ of LLMs, we propose BianQue, a ChatGLM-based LLM finetuned with the self-constructed health conversation dataset BianQueCorpus that is consist of multiple turns of questioning and health suggestions polished by ChatGPT. Experimental results demonstrate that the proposed BianQue can simultaneously balance the capabilities of both questioning and health suggestions, which will help promote the research and application of LLMs in the field of proactive health. △ Less

Submitted 4 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.14227 [pdf, other]

doi 10.1007/s11263-024-02156-x

Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

Authors: Kun Fang, Qinghua Tao, Xiaolin Huang, Jie Yang

Abstract: Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter s… ▽ More Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter space, or namely modes. Interestingly, we observe that these independent modes, which all reach low-loss regions with InD data (training and test data), yet yield significantly different loss landscapes with OoD data. Such an observation provides a novel view to investigate the OoD detection from the loss landscape, and further suggests significantly fluctuating OoD detection performance across these modes. For instance, FPR values of the RankFeat method can range from 46.58% to 84.70% among 5 modes, showing uncertain detection performance evaluations across independent modes. Motivated by such diversities on OoD loss landscape across modes, we revisit the deep ensemble method for OoD detection through mode ensemble, leading to improved performance and benefiting the OoD detector with reduced variances. Extensive experiments covering varied OoD detectors and network structures illustrate high variances across modes and validate the superiority of mode ensemble in boosting OoD detection. We hope this work could attract attention in the view of independent modes in the loss landscape of OoD data and more reliable evaluations on OoD detectors. △ Less

Submitted 15 July, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

Comments: published in International Journal of Computer Vision

arXiv:2310.11021 [pdf, other]

Dynamic quantum circuit compilation

Authors: Kun Fang, Munan Zhang, Ruqi Shi, Yinan Li

Abstract: Quantum computing has shown tremendous promise in addressing complex computational problems, yet its practical realization is hindered by the limited availability of qubits for computation. Recent advancements in quantum hardware have introduced mid-circuit measurements and resets, enabling the reuse of measured qubits and significantly reducing the qubit requirements for executing quantum algorit… ▽ More Quantum computing has shown tremendous promise in addressing complex computational problems, yet its practical realization is hindered by the limited availability of qubits for computation. Recent advancements in quantum hardware have introduced mid-circuit measurements and resets, enabling the reuse of measured qubits and significantly reducing the qubit requirements for executing quantum algorithms. In this work, we present a systematic study of dynamic quantum circuit compilation, a process that transforms static quantum circuits into their dynamic equivalents with a reduced qubit count through qubit-reuse. We establish the first general framework for optimizing the dynamic circuit compilation via graph manipulation. In particular, we completely characterize the optimal quantum circuit compilation using binary integer programming, provide efficient algorithms for determining whether a given quantum circuit can be reduced to a smaller circuit and present heuristic algorithms for devising dynamic compilation schemes in general. Furthermore, we conduct a thorough analysis of quantum circuits with practical relevance, offering optimal compilations for well-known quantum algorithms in quantum computation, ansatz circuits utilized in quantum machine learning, and measurement-based quantum computation crucial for quantum networking. We also perform a comparative analysis against state-of-the-art approaches, demonstrating the superior performance of our methods in both structured and random quantum circuits. Our framework lays a rigorous foundation for comprehending dynamic quantum circuit compilation via qubit-reuse, bridging the gap between theoretical quantum algorithms and their physical implementation on quantum computers with limited resources. △ Less

Submitted 21 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 51 pages, 32 figures; comments are welcome; v2 reorganize the writing and strengthen the results

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, Ajinkya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2309.10305 [pdf, other]

Baichuan 2: Open Large-scale Language Models

Authors: Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong, Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, JunTao Dai, Kun Fang , et al. (30 additional authors not shown)

Abstract: Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of lar… ▽ More Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most powerful LLMs are closed-source or limited in their capability for languages other than English. In this technical report, we present Baichuan 2, a series of large-scale multilingual language models containing 7 billion and 13 billion parameters, trained from scratch, on 2.6 trillion tokens. Baichuan 2 matches or outperforms other open-source models of similar size on public benchmarks like MMLU, CMMLU, GSM8K, and HumanEval. Furthermore, Baichuan 2 excels in vertical domains such as medicine and law. We will release all pre-training model checkpoints to benefit the research community in better understanding the training dynamics of Baichuan 2. △ Less

Submitted 20 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: Baichuan 2 technical report. Github: https://github.com/baichuan-inc/Baichuan2

arXiv:2308.12952 [pdf, other]

BridgeData V2: A Dataset for Robot Learning at Scale

Authors: Homer Walke, Kevin Black, Abraham Lee, Moo Jin Kim, Max Du, Chongyi Zheng, Tony Zhao, Philippe Hansen-Estruch, Quan Vuong, Andre He, Vivek Myers, Kuan Fang, Chelsea Finn, Sergey Levine

Abstract: We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains,… ▽ More We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata △ Less

Submitted 17 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 9 pages

arXiv:2308.12915 [pdf, other]

Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI

Authors: Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour

Abstract: In this paper, we present "1001 Nights", an AI-native game that allows players lead in-game reality through co-created storytelling with the character driven by large language model. The concept is inspired by Wittgenstein's idea of the limits of one's world being determined by the bounds of their language. Using advanced AI tools like GPT-4 and Stable Diffusion, the second iteration of the game e… ▽ More In this paper, we present "1001 Nights", an AI-native game that allows players lead in-game reality through co-created storytelling with the character driven by large language model. The concept is inspired by Wittgenstein's idea of the limits of one's world being determined by the bounds of their language. Using advanced AI tools like GPT-4 and Stable Diffusion, the second iteration of the game enables the protagonist, Shahrzad, to realize words and stories in her world. The player can steer the conversation with the AI King towards specific keywords, which then become battle equipment in the game. This blend of interactive narrative and text-to-image transformation challenges the conventional border between the game world and reality through a dual perspective. We focus on Shahrzad, who seeks to alter her fate compared to the original folklore, and the player, who collaborates with AI to craft narratives and shape the game world. We explore the technical and design elements of implementing such a game with an objective to enhance the narrative game genre with AI-generated content and to delve into AI-native gameplay possibilities. △ Less

Submitted 18 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: The paper was accepted by The 19th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 23)

arXiv:2307.08927 [pdf, other]

Multi-Stage Cable Routing through Hierarchical Imitation Learning

Authors: Jianlan Luo, Charles Xu, Xinyang Geng, Gilbert Feng, Kuan Fang, Liam Tan, Stefan Schaal, Sergey Levine

Abstract: We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of m… ▽ More We study the problem of learning to perform multi-stage robotic manipulation tasks, with applications to cable routing, where the robot must route a cable through a series of clips. This setting presents challenges representative of complex multi-stage robotic manipulation scenarios: handling deformable objects, closing the loop on visual perception, and handling extended behaviors consisting of multiple steps that must be executed successfully to complete the entire task. In such settings, learning individual primitives for each stage that succeed with a high enough rate to perform a complete temporally extended task is impractical: if each stage must be completed successfully and has a non-negligible probability of failure, the likelihood of successful completion of the entire task becomes negligible. Therefore, successful controllers for such multi-stage tasks must be able to recover from failure and compensate for imperfections in low-level controllers by smartly choosing which controllers to trigger at any given time, retrying, or taking corrective action as needed. To this end, we describe an imitation learning system that uses vision-based policies trained from demonstrations at both the lower (motor control) and the upper (sequencing) level, present a system for instantiating this method to learn the cable routing task, and perform evaluations showing great performance in generalizing to very challenging clip placement variations. Supplementary videos, datasets, and code can be found at https://sites.google.com/view/cablerouting. △ Less

Submitted 13 January, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: T-RO 2024

arXiv:2307.00117 [pdf, other]

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

Authors: Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

Abstract: Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsig… ▽ More Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsight with its final state as the goal. In this work, we contribute a method that taps into joint image- and goal- conditioned policies with language using only a small amount of language data. Prior work has made progress on this using vision-language models or by jointly training language-goal-conditioned policies, but so far neither method has scaled effectively to real-world robot tasks without significant human annotation. Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to. We then train a policy on this embedding: the policy benefits from all the unlabeled data, but the aligned embedding provides an interface for language to steer the policy. We show instruction following across a variety of manipulation tasks in different scenes, with generalization to language instructions outside of the labeled data. Videos and code for our approach can be found on our website: https://rail-berkeley.github.io/grif/ . △ Less

Submitted 17 August, 2023; v1 submitted 30 June, 2023; originally announced July 2023.

Comments: 15 pages, 5 figures

arXiv:2306.03346 [pdf, other]

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

Authors: Chongyi Zheng, Benjamin Eysenbach, Homer Walke, Patrick Yin, Kuan Fang, Ruslan Salakhutdinov, Sergey Levine

Abstract: Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement le… ▽ More Robotic systems that rely primarily on self-supervised learning have the potential to decrease the amount of human annotation and engineering effort required to learn control strategies. In the same way that prior robotic systems have leveraged self-supervised techniques from computer vision (CV) and natural language processing (NLP), our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem: learning to reach any goal without human-specified rewards or labels. Despite the seeming appeal, little (if any) prior work has demonstrated how self-supervised RL methods can be practically deployed on robotic systems. By first studying a challenging simulated version of this task, we discover design decisions about architectures and hyperparameters that increase the success rate by $2 \times$. These findings lay the groundwork for our main result: we demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks, with tasks being specified by a single goal image provided after training. △ Less

Submitted 25 February, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: ICLR 2024 Spotlight (< 5%). Website (https://chongyi-zheng.github.io/stable_contrastive_rl) and code (https://github.com/chongyi-zheng/stable_contrastive_rl)

arXiv:2305.07067 [pdf, other]

doi 10.1109/TSE.2021.3078342

SigRec: Automatic Recovery of Function Signatures in Smart Contracts

Authors: Ting Chen, Zihao Li, Xiapu Luo, Xiaofeng Wang, Ting Wang, Zheyuan He, Kezhao Fang, Yufei Zhang, Hang Zhu, Hongwei Li, Yan Cheng, Xiaosong Zhang

Abstract: Millions of smart contracts have been deployed onto Ethereum for providing various services, whose functions can be invoked. For this purpose, the caller needs to know the function signature of a callee, which includes its function id and parameter types. Such signatures are critical to many applications focusing on smart contracts, e.g., reverse engineering, fuzzing, attack detection, and profili… ▽ More Millions of smart contracts have been deployed onto Ethereum for providing various services, whose functions can be invoked. For this purpose, the caller needs to know the function signature of a callee, which includes its function id and parameter types. Such signatures are critical to many applications focusing on smart contracts, e.g., reverse engineering, fuzzing, attack detection, and profiling. Unfortunately, it is challenging to recover the function signatures from contract bytecode, since neither debug information nor type information is present in the bytecode. To address this issue, prior approaches rely on source code, or a collection of known signatures from incomplete databases or incomplete heuristic rules, which, however, are far from adequate and cannot cope with the rapid growth of new contracts. In this paper, we propose a novel solution that leverages how functions are handled by Ethereum virtual machine (EVM) to automatically recover function signatures. In particular, we exploit how smart contracts determine the functions to be invoked to locate and extract function ids, and propose a new approach named type-aware symbolic execution (TASE) that utilizes the semantics of EVM operations on parameters to identify the number and the types of parameters. Moreover, we develop SigRec, a new tool for recovering function signatures from contract bytecode without the need of source code and function signature databases. The extensive experimental results show that SigRec outperforms all existing tools, achieving an unprecedented 98.7 percent accuracy within 0.074 seconds. We further demonstrate that the recovered function signatures are useful in attack detection, fuzzing and reverse engineering of EVM bytecode. △ Less

Submitted 11 May, 2023; originally announced May 2023.

arXiv:2304.05148 [pdf, other]

High-performance and Scalable Software-based NVMe Virtualization Mechanism with I/O Queues Passthrough

Authors: Yiquan Chen, Zhen Jin, Yijing Wang, Yi Chen, Hao Yu, Jiexiong Xu, Jinlong Chen, Wenhai Lin, Kanghua Fang, Chengkun Wei, Qiang Liu, Yuan Xie, Wenzhi Chen

Abstract: NVMe(Non-Volatile Memory Express) is an industry standard for solid-state drives (SSDs) that has been widely adopted in data centers. NVMe virtualization is crucial in cloud computing as it allows for virtualized NVMe devices to be used by virtual machines (VMs), thereby improving the utilization of storage resources. However, traditional software-based solutions have flexibility benefits but ofte… ▽ More NVMe(Non-Volatile Memory Express) is an industry standard for solid-state drives (SSDs) that has been widely adopted in data centers. NVMe virtualization is crucial in cloud computing as it allows for virtualized NVMe devices to be used by virtual machines (VMs), thereby improving the utilization of storage resources. However, traditional software-based solutions have flexibility benefits but often come at the cost of performance degradation or high CPU overhead. On the other hand, hardware-assisted solutions offer high performance and low CPU usage, but their adoption is often limited by the need for special hardware support or the requirement for new hardware development. In this paper, we propose LightIOV, a novel software-based NVMe virtualization mechanism that achieves high performance and scalability without consuming valuable CPU resources and without requiring special hardware support. LightIOV can support thousands of VMs on each server. The key idea behind LightIOV is NVMe hardware I/O queues passthrough, which enables VMs to directly access I/O queues of NVMe devices, thus eliminating virtualization overhead and providing near-native performance. Results from our experiments show that LightIOV can provide comparable performance to VFIO, with an IOPS of 97.6%-100.2% of VFIO. Furthermore, in high-density VMs environments, LightIOV achieves 31.4% lower latency than SPDK-Vhost when running 200 VMs, and an improvement of 27.1% in OPS performance in real-world applications. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2301.04027 [pdf]

doi 10.1038/s43017-023-00450-9

Differentiable modeling to unify machine learning and physical models and advance Geosciences

Authors: Chaopeng Shen, Alison P. Appling, Pierre Gentine, Toshiyuki Bandai, Hoshin Gupta, Alexandre Tartakovsky, Marco Baity-Jesi, Fabrizio Fenicia, Daniel Kifer, Li Li, Xiaofeng Liu, Wei Ren, Yi Zheng, Ciaran J. Harman, Martyn Clark, Matthew Farthing, Dapeng Feng, Praveen Kumar, Doaa Aboelyazeed, Farshid Rahmani, Hylke E. Beck, Tadd Bindas, Dipankar Dwivedi, Kuai Fang, Marvin Höge , et al. (5 additional authors not shown)

Abstract: Process-Based Modeling (PBM) and Machine Learning (ML) are often perceived as distinct paradigms in the geosciences. Here we present differentiable geoscientific modeling as a powerful pathway toward dissolving the perceived barrier between them and ushering in a paradigm shift. For decades, PBM offered benefits in interpretability and physical consistency but struggled to efficiently leverage lar… ▽ More Process-Based Modeling (PBM) and Machine Learning (ML) are often perceived as distinct paradigms in the geosciences. Here we present differentiable geoscientific modeling as a powerful pathway toward dissolving the perceived barrier between them and ushering in a paradigm shift. For decades, PBM offered benefits in interpretability and physical consistency but struggled to efficiently leverage large datasets. ML methods, especially deep networks, presented strong predictive skills yet lacked the ability to answer specific scientific questions. While various methods have been proposed for ML-physics integration, an important underlying theme -- differentiable modeling -- is not sufficiently recognized. Here we outline the concepts, applicability, and significance of differentiable geoscientific modeling (DG). "Differentiable" refers to accurately and efficiently calculating gradients with respect to model variables, critically enabling the learning of high-dimensional unknown relationships. DG refers to a range of methods connecting varying amounts of prior knowledge to neural networks and training them together, capturing a different scope than physics-guided machine learning and emphasizing first principles. Preliminary evidence suggests DG offers better interpretability and causality than ML, improved generalizability and extrapolation capability, and strong potential for knowledge discovery, while approaching the performance of purely data-driven ML. DG models require less training data while scaling favorably in performance and efficiency with increasing amounts of data. With DG, geoscientists may be better able to frame and investigate questions, test hypotheses, and discover unrecognized linkages. △ Less

Submitted 26 December, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

Journal ref: Nat Rev Earth Environ 4, 552-567 (2023)

arXiv:2212.01226 [pdf, other]

doi 10.1007/s11432-023-3773-4

Quantum NETwork: from theory to practice

Authors: Kun Fang, Jingtian Zhao, Xiufan Li, Yifei Li, Runyao Duan

Abstract: The quantum internet is envisioned as the ultimate stage of the quantum revolution, which surpasses its classical counterpart in various aspects, such as the efficiency of data transmission, the security of network services, and the capability of information processing. Given its disruptive impact on the national security and the digital economy, a global race to build scalable quantum networks ha… ▽ More The quantum internet is envisioned as the ultimate stage of the quantum revolution, which surpasses its classical counterpart in various aspects, such as the efficiency of data transmission, the security of network services, and the capability of information processing. Given its disruptive impact on the national security and the digital economy, a global race to build scalable quantum networks has already begun. With the joint effort of national governments, industrial participants and research institutes, the development of quantum networks has advanced rapidly in recent years, bringing the first primitive quantum networks within reach. In this work, we aim to provide an up-to-date review of the field of quantum networks from both theoretical and experimental perspectives, contributing to a better understanding of the building blocks required for the establishment of a global quantum internet. We also introduce a newly developed quantum network toolkit to facilitate the exploration and evaluation of innovative ideas. Particularly, it provides dual quantum computing engines, supporting simulations in both the quantum circuit and measurement-based models. It also includes a compilation scheme for mapping quantum network protocols onto quantum circuits, enabling their emulations on real-world quantum hardware devices. We showcase the power of this toolkit with several featured demonstrations, including a simulation of the Micius quantum satellite experiment, a testing of a four-layer quantum network architecture with resource management, and a quantum emulation of the CHSH game. We hope this work can give a better understanding of the state-of-the-art development of quantum networks and provide the necessary tools to make further contributions along the way. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 36 pages, 33 figures; comments are welcome

Journal ref: Sci China Inf Sci, 2023, 66: 180509

arXiv:2211.11489 [pdf, other]

Efficient Generalization Improvement Guided by Random Weight Perturbation

Authors: Tao Li, Weihao Yan, Zehao Lei, Yingwen Wu, Kun Fang, Ming Yang, Xiaolin Huang

Abstract: To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability. Recently, sharpness-aware minimization (SAM) establishes a generic scheme for generalization improvements by minimizing the sharpness measure within a small neighborhood and achieves state-of-the-art performance. However, SAM requires t… ▽ More To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability. Recently, sharpness-aware minimization (SAM) establishes a generic scheme for generalization improvements by minimizing the sharpness measure within a small neighborhood and achieves state-of-the-art performance. However, SAM requires two consecutive gradient evaluations for solving the min-max problem and inevitably doubles the training time. In this paper, we resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM. Different from the small adversarial perturbations in SAM, RWP is softer and allows a much larger magnitude of perturbations. Specifically, we jointly optimize the loss function with random perturbations and the original loss function: the former guides the network towards a wider flat region while the latter helps recover the necessary local information. These two loss terms are complementary to each other and mutually independent. Hence, the corresponding gradients can be efficiently computed in parallel, enabling nearly the same training speed as regular training. As a result, we achieve very competitive performance on CIFAR and remarkably better performance on ImageNet (e.g. $\mathbf{ +1.1\%}$) compared with SAM, but always require half of the training time. The code is released at https://github.com/nblt/RWP. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.10882 [pdf, other]

On Multi-head Ensemble of Smoothed Classifiers for Certified Robustness

Authors: Kun Fang, Qinghua Tao, Yingwen Wu, Tao Li, Xiaolin Huang, Jie Yang

Abstract: Randomized Smoothing (RS) is a promising technique for certified robustness, and recently in RS the ensemble of multiple deep neural networks (DNNs) has shown state-of-the-art performances. However, such an ensemble brings heavy computation burdens in both training and certification, and yet under-exploits individual DNNs and their mutual effects, as the communication between these classifiers is… ▽ More Randomized Smoothing (RS) is a promising technique for certified robustness, and recently in RS the ensemble of multiple deep neural networks (DNNs) has shown state-of-the-art performances. However, such an ensemble brings heavy computation burdens in both training and certification, and yet under-exploits individual DNNs and their mutual effects, as the communication between these classifiers is commonly ignored in optimization. In this work, starting from a single DNN, we augment the network with multiple heads, each of which pertains a classifier for the ensemble. A novel training strategy, namely Self-PAced Circular-TEaching (SPACTE), is proposed accordingly. SPACTE enables a circular communication flow among those augmented heads, i.e., each head teaches its neighbor with the self-paced learning using smoothed losses, which are specifically designed in relation to certified robustness. The deployed multi-head structure and the circular-teaching scheme of SPACTE jointly contribute to diversify and enhance the classifiers in augmented heads for ensemble, leading to even stronger certified robustness than ensembling multiple DNNs (effectiveness) at the cost of much less computational expenses (efficiency), verified by extensive experiments and discussions. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2211.06134 [pdf, other]

Active Task Randomization: Learning Robust Skills via Unsupervised Generation of Diverse and Feasible Tasks

Authors: Kuan Fang, Toki Migimatsu, Ajay Mandlekar, Li Fei-Fei, Jeannette Bohg

Abstract: Solving real-world manipulation tasks requires robots to have a repertoire of skills applicable to a wide range of circumstances. When using learning-based methods to acquire such skills, the key challenge is to obtain training data that covers diverse and feasible variations of the task, which often requires non-trivial manual labor and domain knowledge. In this work, we introduce Active Task Ran… ▽ More Solving real-world manipulation tasks requires robots to have a repertoire of skills applicable to a wide range of circumstances. When using learning-based methods to acquire such skills, the key challenge is to obtain training data that covers diverse and feasible variations of the task, which often requires non-trivial manual labor and domain knowledge. In this work, we introduce Active Task Randomization (ATR), an approach that learns robust skills through the unsupervised generation of training tasks. ATR selects suitable tasks, which consist of an initial environment state and manipulation goal, for learning robust skills by balancing the diversity and feasibility of the tasks. We propose to predict task diversity and feasibility by jointly learning a compact task representation. The selected tasks are then procedurally generated in simulation using graph-based parameterization. The active selection of these training tasks enables skill policies trained with our framework to robustly handle a diverse range of objects and arrangements at test time. We demonstrate that the learned skills can be composed by a task planner to solve unseen sequential manipulation problems based on visual inputs. Compared to baseline methods, ATR can achieve superior success rates in single-step and sequential manipulation tasks. △ Less

Submitted 18 April, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 9 pages, 5 figures

arXiv:2211.04699 [pdf, other]

FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration

Authors: Yangjun Wu, Kebin Fang, Yao Zhao, Hao Zhang, Lifeng Shi, Mengqi Zhang

Abstract: To accomplish punctuation restoration, most existing methods focus on introducing extra information (e.g., part-of-speech) or addressing the class imbalance problem. Recently, large-scale transformer-based pre-trained language models (PLMS) have been utilized widely and obtained remarkable success. However, the PLMS are trained on the large dataset with marks, which may not fit well with the small… ▽ More To accomplish punctuation restoration, most existing methods focus on introducing extra information (e.g., part-of-speech) or addressing the class imbalance problem. Recently, large-scale transformer-based pre-trained language models (PLMS) have been utilized widely and obtained remarkable success. However, the PLMS are trained on the large dataset with marks, which may not fit well with the small dataset without marks, causing the convergence to be not ideal. In this study, we propose a Feature Fusion two-stream framework (FF2) to bridge the gap. Specifically, one stream leverages a pre-trained language model to capture the semantic feature, while another auxiliary module captures the feature at hand. We also modify the computation of multi-head attention to encourage communication among heads. Then, two features with different perspectives are aggregated to fuse information and enhance context awareness. Without additional data, the experimental results on the popular benchmark IWSLT demonstrate that FF2 achieves new SOTA performance, which verifies that our approach is effective. △ Less

Submitted 9 November, 2022; originally announced November 2022.

Comments: 5pages. arXiv admin note: substantial text overlap with arXiv:2203.12487

arXiv:2210.10537

Online LiDAR-Camera Extrinsic Parameters Self-checking

Authors: Pengjin Wei, Guohang Yan, Yikang Li, Kun Fang, Jie Yang, Wei Liu

Abstract: With the development of neural networks and the increasing popularity of automatic driving, the calibration of the LiDAR and the camera has attracted more and more attention. This calibration task is multi-modal, where the rich color and texture information captured by the camera and the accurate three-dimensional spatial information from the LiDAR is incredibly significant for downstream tasks. C… ▽ More With the development of neural networks and the increasing popularity of automatic driving, the calibration of the LiDAR and the camera has attracted more and more attention. This calibration task is multi-modal, where the rich color and texture information captured by the camera and the accurate three-dimensional spatial information from the LiDAR is incredibly significant for downstream tasks. Current research interests mainly focus on obtaining accurate calibration results through information fusion. However, they seldom analyze whether the calibrated results are correct or not, which could be of significant importance in real-world applications. For example, in large-scale production, the LiDARs and the cameras of each smart car have to get well-calibrated as the car leaves the production line, while in the rest of the car life period, the poses of the LiDARs and cameras should also get continually supervised to ensure the security. To this end, this paper proposes a self-checking algorithm to judge whether the extrinsic parameters are well-calibrated by introducing a binary classification network based on the fused information from the camera and the LiDAR. Moreover, since there is no such dataset for the task in this work, we further generate a new dataset branch from the KITTI dataset tailored for the task. Our experiments on the proposed dataset branch demonstrate the performance of our method. To the best of our knowledge, this is the first work to address the significance of continually checking the calibrated extrinsic parameters for autonomous driving. The code is open-sourced on the Github website at https://github.com/OpenCalib/LiDAR2camera_self-check. △ Less

Submitted 14 January, 2024; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: There are some errors in the methodology section of the paper, which is currently being revised

arXiv:2210.06601 [pdf, other]

Generalization with Lossy Affordances: Leveraging Broad Offline Data for Learning Visuomotor Tasks

Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Homer Walke, Gengchen Yan, Sergey Levine

Abstract: The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement lear… ▽ More The utilization of broad datasets has proven to be crucial for generalization for a wide range of fields. However, how to effectively make use of diverse multi-task data for novel downstream tasks still remains a grand challenge in robotics. To tackle this challenge, we introduce a framework that acquires goal-conditioned policies for unseen temporally extended tasks via offline reinforcement learning on broad data, in combination with online fine-tuning guided by subgoals in learned lossy representation space. When faced with a novel task goal, the framework uses an affordance model to plan a sequence of lossy representations as subgoals that decomposes the original task into easier problems. Learned from the broad data, the lossy representation emphasizes task-relevant information about states and goals while abstracting away redundant contexts that hinder generalization. It thus enables subgoal planning for unseen tasks, provides a compact input to the policy, and facilitates reward shaping during fine-tuning. We show that our framework can be pre-trained on large-scale datasets of robot experiences from prior work and efficiently fine-tuned for novel tasks, entirely from visual inputs without any manual reward engineering. △ Less

Submitted 18 April, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: CoRL 2022

arXiv:2208.06228 [pdf, other]

Unifying Gradients to Improve Real-world Robustness for Deep Networks

Authors: Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

Abstract: The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation… ▽ More The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with SQAs. In this paper, we propose a real-world defense by Unifying Gradients (UniG) of different data so that SQAs could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. We implement UniG efficiently by a Hadamard product module which is plug-and-play. According to extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG outperforms all compared baselines in terms of clean accuracy and achieves the smallest modification of the model output. The code is released at https://github.com/snowien/UniG-pytorch. △ Less

Submitted 24 August, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

Journal ref: ACM Transactions on Intelligent Systems and Technology (TIST), 2023

arXiv:2205.08129 [pdf, other]

Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space

Authors: Kuan Fang, Patrick Yin, Ashvin Nair, Sergey Levine

Abstract: General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to train from scratc… ▽ More General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach configurable goals for a wide range of tasks on command. However, such goal-conditioned policies are notoriously difficult and time-consuming to train from scratch. In this paper, we propose Planning to Practice (PTP), a method that makes it practical to train goal-conditioned policies for long-horizon tasks that require multiple distinct types of interactions to solve. Our approach is based on two key ideas. First, we decompose the goal-reaching problem hierarchically, with a high-level planner that sets intermediate subgoals using conditional subgoal generators in the latent space for a low-level model-free policy. Second, we propose a hybrid approach which first pre-trains both the conditional subgoal generator and the policy on previously collected data through offline reinforcement learning, and then fine-tunes the policy via online exploration. This fine-tuning process is itself facilitated by the planned subgoals, which breaks down the original target task into short-horizon goal-reaching tasks that are significantly easier to learn. We conduct experiments in both the simulation and real world, in which the policy is pre-trained on demonstrations of short primitive behaviors and fine-tuned for temporally extended tasks that are unseen in the offline data. Our experimental results show that PTP can generate feasible sequences of subgoals that enable the policy to efficiently solve the target tasks. △ Less

Submitted 18 April, 2023; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2203.12487 [pdf, other]

A Context-Aware Feature Fusion Framework for Punctuation Restoration

Authors: Yangjun Wu, Kebin Fang, Yao Zhao

Abstract: To accomplish the punctuation restoration task, most existing approaches focused on leveraging extra information (e.g., part-of-speech tags) or addressing the class imbalance problem. Recent works have widely applied the transformer-based language models and significantly improved their effectiveness. To the best of our knowledge, an inherent issue has remained neglected: the attention of individu… ▽ More To accomplish the punctuation restoration task, most existing approaches focused on leveraging extra information (e.g., part-of-speech tags) or addressing the class imbalance problem. Recent works have widely applied the transformer-based language models and significantly improved their effectiveness. To the best of our knowledge, an inherent issue has remained neglected: the attention of individual heads in the transformer will be diluted or powerless while feeding the long non-punctuation utterances. Since those previous contexts, not the followings, are comparatively more valuable to the current position, it's hard to achieve a good balance by independent attention. In this paper, we propose a novel Feature Fusion framework based on two-type Attentions (FFA) to alleviate the shortage. It introduces a two-stream architecture. One module involves interaction between attention heads to encourage the communication, and another masked attention module captures the dependent feature representation. Then, it aggregates two feature embeddings to fuse information and enhances context-awareness. The experiments on the popular benchmark dataset IWSLT demonstrate that our approach is effective. Without additional data, it obtains comparable performance to the current state-of-the-art models. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.03182 [pdf, other]

CROON: Automatic Multi-LiDAR Calibration and Refinement Method in Road Scene

Authors: Pengjin Wei, Guohang Yan, Yikang Li, Kun Fang, Xinyu Cai, Jie Yang, Wei Liu

Abstract: Sensor-based environmental perception is a crucial part of the autonomous driving system. In order to get an excellent perception of the surrounding environment, an intelligent system would configure multiple LiDARs (3D Light Detection and Ranging) to cover the distant and near space of the car. The precision of perception relies on the quality of sensor calibration. This research aims at developi… ▽ More Sensor-based environmental perception is a crucial part of the autonomous driving system. In order to get an excellent perception of the surrounding environment, an intelligent system would configure multiple LiDARs (3D Light Detection and Ranging) to cover the distant and near space of the car. The precision of perception relies on the quality of sensor calibration. This research aims at developing an accurate, automatic, and robust calibration strategy for multiple LiDAR systems in the general road scene. We thus propose CROON (automatiC multi-LiDAR CalibratiOn and Refinement method in rOad sceNe), a two-stage method including rough and refinement calibration. The first stage can calibrate the sensor from an arbitrary initial pose, and the second stage is able to precisely calibrate the sensor iteratively. Specifically, CROON utilize the nature characteristics of road scene so that it is independent and easy to apply in large-scale conditions. Experimental results on real-world and simulated data sets demonstrate the reliability and accuracy of our method. All the related data sets and codes are open-sourced on the Github website https://github.com/OpenCalib/LiDAR2LiDAR. △ Less

Submitted 13 November, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

Comments: 7 pages, 5 figures

arXiv:2112.08686 [pdf, other]

Ruta: Dis-aggregated routing system over multi-cloud

Authors: Kevin Fang

Abstract: Over the years, the SDN evolution create multiple overlay technologies which is inefficient and hard to deploy end-to-end traffic engineering services, Ruta is designed as an unified encapsulation with Segment Routing, Crypto and NAT-Traversal capabilities over UDP. Ruta could be deployed as a cloud native SDN platform globally over multi-cloud and integrated with each applications on transport… ▽ More Over the years, the SDN evolution create multiple overlay technologies which is inefficient and hard to deploy end-to-end traffic engineering services, Ruta is designed as an unified encapsulation with Segment Routing, Crypto and NAT-Traversal capabilities over UDP. Ruta could be deployed as a cloud native SDN platform globally over multi-cloud and integrated with each applications on transport layer, which provide nearly zero loss and almost less than 200ms latency to access anywhere in the world over internet. △ Less

Submitted 9 January, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

arXiv:2111.12229 [pdf, other]

Subspace Adversarial Training

Authors: Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

Abstract: Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust. However, a serious problem of catastrophic overfitting exists, i.e., the robust accuracy against projected gradient descent (PGD) attack suddenly drops to 0% during the training. In this paper, we approach this problem from a novel perspective of optimization and firstly reveal the close… ▽ More Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust. However, a serious problem of catastrophic overfitting exists, i.e., the robust accuracy against projected gradient descent (PGD) attack suddenly drops to 0% during the training. In this paper, we approach this problem from a novel perspective of optimization and firstly reveal the close link between the fast-growing gradient of each sample and overfitting, which can also be applied to understand robust overfitting in multi-step AT. To control the growth of the gradient, we propose a new AT method, Subspace Adversarial Training (Sub-AT), which constrains AT in a carefully extracted subspace. It successfully resolves both kinds of overfitting and significantly boosts the robustness. In subspace, we also allow single-step AT with larger steps and larger radius, further improving the robustness performance. As a result, we achieve state-of-the-art single-step AT performance. Without any regularization term, our single-step AT can reach over 51% robust accuracy against strong PGD-50 attack of radius 8/255 on CIFAR-10, reaching a competitive performance against standard multi-step PGD-10 AT with huge computational advantages. The code is released at https://github.com/nblt/Sub-AT. △ Less

Submitted 21 March, 2022; v1 submitted 23 November, 2021; originally announced November 2021.

Comments: CVPR2022

arXiv:2110.14902 [pdf, other]

NetDAM: Network Direct Attached Memory with Programmable In-Memory Computing ISA

Authors: Kevin Fang, David Peng

Abstract: Data-intensive applications like distributed AI-training may require multi-terabytes memory capacity with multi-terabits bandwidth. We directly attach the memory to the ethernet controller with some programable logic to design an efficient hardware "template" for Memory pooling and in-memory / in-network computing. We built an FPGA prototype of the NetDAM, andwe demonstrate MPI-Allreduce communica… ▽ More Data-intensive applications like distributed AI-training may require multi-terabytes memory capacity with multi-terabits bandwidth. We directly attach the memory to the ethernet controller with some programable logic to design an efficient hardware "template" for Memory pooling and in-memory / in-network computing. We built an FPGA prototype of the NetDAM, andwe demonstrate MPI-Allreduce communication case, the NetDAM can be used as a software and hardware friendly programmable architeture with high performance alternative for RDMA. △ Less

Submitted 28 October, 2021; originally announced October 2021.

arXiv:2110.14842 [pdf, other]

Towards the ultimate limits of quantum channel discrimination

Authors: Kun Fang, Gilad Gour, Xin Wang

Abstract: This note studies the difficulty of discriminating quantum channels under operational regimes. First, we make a conjecture on the exponentially strong converse of quantum channel hypothesis testing under coherent strategies, meaning that any strategy to make the Type II error decays with an exponent larger than the regularized channel relative entropy will unavoidably result in the Type I error co… ▽ More This note studies the difficulty of discriminating quantum channels under operational regimes. First, we make a conjecture on the exponentially strong converse of quantum channel hypothesis testing under coherent strategies, meaning that any strategy to make the Type II error decays with an exponent larger than the regularized channel relative entropy will unavoidably result in the Type I error converging to one exponentially fast in the asymptotic limit. This conjecture will imply the desirable quantum channel Stein's Lemma and the continuity of the regularized (amortized) Sandwiched Rényi channel divergence at $α=1$. We also remark that there was a gap in the proof of the above conjecture in our previous arXiv version. Such gap exists since a lemma basically comes from [Brandao and Plenio, 2010] was found to be false. Second, we develop a framework to show the interplay between the strategies of channel discrimination, the operational regimes, and variants of channel divergences. This framework systematically underlies the operational meaning of quantum channel divergences in quantum channel discrimination. Our work makes an attempt towards understanding the ultimate limit of quantum channel discrimination, as well as its connection to quantum channel divergences in the asymptotic regime. △ Less

Submitted 1 March, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: comments are welcome

arXiv:2106.13935 [pdf, other]

Discovering Generalizable Skills via Automated Generation of Diverse Tasks

Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Abstract: The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover gene… ▽ More The learning efficiency and generalization ability of an intelligent agent can be greatly improved by utilizing a useful set of skills. However, the design of robot skills can often be intractable in real-world applications due to the prohibitive amount of effort and expertise that it requires. In this work, we introduce Skill Learning In Diversified Environments (SLIDE), a method to discover generalizable skills via automated generation of a diverse set of tasks. As opposed to prior work on unsupervised discovery of skills which incentivizes the skills to produce different outcomes in the same environment, our method pairs each skill with a unique task produced by a trainable task generator. To encourage generalizable skills to emerge, our method trains each skill to specialize in the paired task and maximizes the diversity of the generated tasks. A task discriminator defined on the robot behaviors in the generated tasks is jointly trained to estimate the evidence lower bound of the diversity objective. The learned skills can then be composed in a hierarchical reinforcement learning algorithm to solve unseen target tasks. We demonstrate that the proposed method can effectively learn a variety of robot skills in two tabletop manipulation domains. Our results suggest that the learned skills can effectively improve the robot's performance in various unseen target tasks compared to existing reinforcement learning and skill learning methods. △ Less

Submitted 25 June, 2021; originally announced June 2021.

Comments: RSS 2021

arXiv:2104.06204 [pdf, ps, other]

Towards Unbiased Random Features with Lower Variance For Stationary Indefinite Kernels

Authors: Qin Luo, Kun Fang, Jie Yang, Xiaolin Huang

Abstract: Random Fourier Features (RFF) demonstrate wellappreciated performance in kernel approximation for largescale situations but restrict kernels to be stationary and positive definite. And for non-stationary kernels, the corresponding RFF could be converted to that for stationary indefinite kernels when the inputs are restricted to the unit sphere. Numerous methods provide accessible ways to approxima… ▽ More Random Fourier Features (RFF) demonstrate wellappreciated performance in kernel approximation for largescale situations but restrict kernels to be stationary and positive definite. And for non-stationary kernels, the corresponding RFF could be converted to that for stationary indefinite kernels when the inputs are restricted to the unit sphere. Numerous methods provide accessible ways to approximate stationary but indefinite kernels. However, they are either biased or possess large variance. In this article, we propose the generalized orthogonal random features, an unbiased estimation with lower variance.Experimental results on various datasets and kernels verify that our algorithm achieves lower variance and approximation error compared with the existing kernel approximation methods. With better approximation to the originally selected kernels, improved classification accuracy and regression ability is obtained with our approximation algorithm in the framework of support vector machine and regression. △ Less

Submitted 13 April, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

Comments: Accepted by IJCNN2021

arXiv:2104.01542 [pdf, other]

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations

Authors: Zhenyu Jiang, Yifeng Zhu, Maxwell Svetlik, Kuan Fang, Yuke Zhu

Abstract: Grasp detection in clutter requires the robot to reason about the 3D scene from incomplete and noisy perception. In this work, we draw insight that 3D reconstruction and grasp learning are two intimately connected tasks, both of which require a fine-grained understanding of local geometry details. We thus propose to utilize the synergies between grasp affordance and 3D reconstruction through multi… ▽ More Grasp detection in clutter requires the robot to reason about the 3D scene from incomplete and noisy perception. In this work, we draw insight that 3D reconstruction and grasp learning are two intimately connected tasks, both of which require a fine-grained understanding of local geometry details. We thus propose to utilize the synergies between grasp affordance and 3D reconstruction through multi-task learning of a shared representation. Our model takes advantage of deep implicit functions, a continuous and memory-efficient representation, to enable differentiable training of both tasks. We train the model on self-supervised grasp trials data in simulation. Evaluation is conducted on a clutter removal task, where the robot clears cluttered objects by grasping them one at a time. The experimental results in simulation and on the real robot have demonstrated that the use of implicit neural representations and joint learning of grasp affordance and 3D reconstruction have led to state-of-the-art grasping results. Our method outperforms baselines by over 10% in terms of grasp success rate. Additional results and videos can be found at https://sites.google.com/view/rpl-giga2021 △ Less

Submitted 21 July, 2021; v1 submitted 4 April, 2021; originally announced April 2021.

arXiv:2101.01876 [pdf, other]

doi 10.1029/2021WR029583

The data synergy effects of time-series deep learning models in hydrology

Authors: Kuai Fang, Daniel Kifer, Kathryn Lawson, Dapeng Feng, Chaopeng Shen

Abstract: When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to regionalize - to divide a large spatial domain into multiple regions and study each region separately - instead of fitting a single model on the entire data (also known as unification). Traditional wisdom in these fields suggests that models built for each region separately wi… ▽ More When fitting statistical models to variables in geoscientific disciplines such as hydrology, it is a customary practice to regionalize - to divide a large spatial domain into multiple regions and study each region separately - instead of fitting a single model on the entire data (also known as unification). Traditional wisdom in these fields suggests that models built for each region separately will have higher performance because of homogeneity within each region. However, by partitioning the training data, each model has access to fewer data points and cannot learn from commonalities between regions. Here, through two hydrologic examples (soil moisture and streamflow), we argue that unification can often significantly outperform regionalization in the era of big data and deep learning (DL). Common DL architectures, even without bespoke customization, can automatically build models that benefit from regional commonality while accurately learning region-specific differences. We highlight an effect we call data synergy, where the results of the DL models improved when data were pooled together from characteristically different regions. In fact, the performance of the DL models benefited from more diverse rather than more homogeneous training data. We hypothesize that DL models automatically adjust their internal representations to identify commonalities while also providing sufficient discriminatory information to the model. The results here advocate for pooling together larger datasets, and suggest the academic community should place greater emphasis on data sharing and compilation. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Journal ref: Water Resources Research, 2022

arXiv:2010.12190 [pdf, ps, other]

doi 10.1016/j.patcog.2024.110281

Towards Robust Neural Networks via Orthogonal Diversity

Authors: Kun Fang, Qinghua Tao, Yingwen Wu, Tao Li, Jia Cai, Feipeng Cai, Xiaolin Huang, Jie Yang

Abstract: Deep Neural Networks (DNNs) are vulnerable to invisible perturbations on the images generated by adversarial attacks, which raises researches on the adversarial robustness of DNNs. A series of methods represented by the adversarial training and its variants have proven as one of the most effective techniques in enhancing the DNN robustness. Generally, adversarial training focuses on enriching the… ▽ More Deep Neural Networks (DNNs) are vulnerable to invisible perturbations on the images generated by adversarial attacks, which raises researches on the adversarial robustness of DNNs. A series of methods represented by the adversarial training and its variants have proven as one of the most effective techniques in enhancing the DNN robustness. Generally, adversarial training focuses on enriching the training data by involving perturbed data. Such data augmentation effect of the involved perturbed data in adversarial training does not contribute to the robustness of DNN itself and usually suffers from clean accuracy drop. Towards the robustness of DNN itself, we in this paper propose a novel defense that aims at augmenting the model in order to learn features that are adaptive to diverse inputs, including adversarial examples. More specifically, to augment the model, multiple paths are embedded into the network, and an orthogonality constraint is imposed on these paths to guarantee the diversity among them. A margin-maximization loss is then designed to further boost such DIversity via Orthogonality (DIO). In this way, the proposed DIO augments the model and enhances the robustness of DNN itself as the learned features can be corrected by these mutually-orthogonal paths. Extensive empirical results on various data sets, structures and attacks verify the stronger adversarial robustness of the proposed DIO utilizing model augmentation. Besides, DIO can also be flexibly combined with different data augmentation techniques (e.g., TRADES and DDPM), further promoting robustness gains. △ Less

Submitted 15 January, 2024; v1 submitted 23 October, 2020; originally announced October 2020.

Comments: accepted by Pattern Recognition

arXiv:2009.04614 [pdf, other]

doi 10.1016/j.patcog.2022.109057

End-to-end Kernel Learning via Generative Random Fourier Features

Authors: Kun Fang, Fanghui Liu, Xiaolin Huang, Jie Yang

Abstract: Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case. Current RFFs-based kernel learning methods usually work in a two-stage way. In the first-stage process, learning the optimal feature map is often formulated as a target alignment problem, which aims to align the learned kernel with the pre-defined target kernel (usually the ideal kernel). In the second-s… ▽ More Random Fourier features (RFFs) provide a promising way for kernel learning in a spectral case. Current RFFs-based kernel learning methods usually work in a two-stage way. In the first-stage process, learning the optimal feature map is often formulated as a target alignment problem, which aims to align the learned kernel with the pre-defined target kernel (usually the ideal kernel). In the second-stage process, a linear learner is conducted with respect to the mapped random features. Nevertheless, the pre-defined kernel in target alignment is not necessarily optimal for the generalization of the linear learner. Instead, in this paper, we consider a one-stage process that incorporates the kernel learning and linear learner into a unifying framework. To be specific, a generative network via RFFs is devised to implicitly learn the kernel, followed by a linear classifier parameterized as a full-connected layer. Then the generative network and the classifier are jointly trained by solving the empirical risk minimization (ERM) problem to reach a one-stage solution. This end-to-end scheme naturally allows deeper features, in correspondence to a multi-layer structure, and shows superior generalization performance over the classical two-stage, RFFs-based methods in real-world classification tasks. Moreover, inspired by the randomized resampling mechanism of the proposed method, its enhanced adversarial robustness is investigated and experimentally verified. △ Less

Submitted 15 January, 2024; v1 submitted 9 September, 2020; originally announced September 2020.

Comments: Accepted by Pattern Recognition

arXiv:2008.03917 [pdf, other]

Beyond Lexical: A Semantic Retrieval Framework for Textual SearchEngine

Authors: Kuan Fang, Long Zhao, Zhan Shen, RuiXing Wang, RiKang Zhour, LiWen Fan

Abstract: Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this paper, we explore a vector space search framework for document retrieval. Specifically, we trained a deep semantic matching model so that each query and… ▽ More Search engine has become a fundamental component in various web and mobile applications. Retrieving relevant documents from the massive datasets is challenging for a search engine system, especially when faced with verbose or tail queries. In this paper, we explore a vector space search framework for document retrieval. Specifically, we trained a deep semantic matching model so that each query and document can be encoded as a low dimensional embedding. Our model was trained based on BERT architecture. We deployed a fast k-nearest-neighbor index service for online serving. Both offline and online metrics demonstrate that our method improved retrieval performance and search quality considerably, particularly for tail △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: 9 pages

arXiv:2007.00350 [pdf, other]

Adaptive Procedural Task Generation for Hard-Exploration Problems

Authors: Kuan Fang, Yuke Zhu, Silvio Savarese, Li Fei-Fei

Abstract: We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a dire… ▽ More We introduce Adaptive Procedural Task Generation (APT-Gen), an approach to progressively generate a sequence of tasks as curricula to facilitate reinforcement learning in hard-exploration problems. At the heart of our approach, a task generator learns to create tasks from a parameterized task space via a black-box procedural generation module. To enable curriculum learning in the absence of a direct indicator of learning progress, we propose to train the task generator by balancing the agent's performance in the generated tasks and the similarity to the target tasks. Through adversarial training, the task similarity is adaptively estimated by a task discriminator defined on the agent's experiences, allowing the generated tasks to approximate target tasks of unknown parameterization or outside of the predefined task space. Our experiments on the grid world and robotic manipulation task domains show that APT-Gen achieves substantially better performance than various existing baselines by generating suitable tasks of rich variations. △ Less

Submitted 18 March, 2021; v1 submitted 1 July, 2020; originally announced July 2020.

Comments: ICLR 2021

arXiv:2006.04084 [pdf, other]

SERank: Optimize Sequencewise Learning to Rank Using Squeeze-and-Excitation Network

Authors: RuiXing Wang, Kuan Fang, RiKang Zhou, Zhan Shen, LiWen Fan

Abstract: Learning-to-rank (LTR) is a set of supervised machine learning algorithms that aim at generating optimal ranking order over a list of items. A lot of ranking models have been studied during the past decades. And most of them treat each query document pair independently during training and inference. Recently, there are a few methods have been proposed which focused on mining information across ran… ▽ More Learning-to-rank (LTR) is a set of supervised machine learning algorithms that aim at generating optimal ranking order over a list of items. A lot of ranking models have been studied during the past decades. And most of them treat each query document pair independently during training and inference. Recently, there are a few methods have been proposed which focused on mining information across ranking candidates list for further improvements, such as learning multivariant scoring function or learning contextual embedding. However, these methods usually greatly increase computational cost during online inference, especially when with large candidates size in real-world web search systems. What's more, there are few studies that focus on novel design of model structure for leveraging information across ranking candidates. In this work, we propose an effective and efficient method named as SERank which is a Sequencewise Ranking model by using Squeeze-and-Excitation network to take advantage of cross-document information. Moreover, we examine our proposed methods on several public benchmark datasets, as well as click logs collected from a commercial Question Answering search engine, Zhihu. In addition, we also conduct online A/B testing at Zhihu search engine to further verify the proposed approach. Results on both offline datasets and online A/B testing demonstrate that our method contributes to a significant improvement. △ Less

Submitted 7 June, 2020; originally announced June 2020.

Comments: 8 pages

arXiv:2002.12004 [pdf, other]

doi 10.1109/TIT.2021.3064009

Finite Block Length Analysis on Quantum Coherence Distillation and Incoherent Randomness Extraction

Authors: Masahito Hayashi, Kun Fang, Kun Wang

Abstract: We give the first systematic study on the second order asymptotics of the operational task of coherence distillation with and without assistance. In the unassisted setting, we introduce a variant of randomness extraction framework where free incoherent operations are allowed before the incoherent measurement and the randomness extractors. We then show that the maximum number of random bits extract… ▽ More We give the first systematic study on the second order asymptotics of the operational task of coherence distillation with and without assistance. In the unassisted setting, we introduce a variant of randomness extraction framework where free incoherent operations are allowed before the incoherent measurement and the randomness extractors. We then show that the maximum number of random bits extractable from a given quantum state is precisely equal to the maximum number of coherent bits that can be distilled from the same state. This relation enables us to derive tight second order expansions of both tasks in the independent and identically distributed setting. Remarkably, the incoherent operation classes that can empower coherence distillation for generic states all admit the same second order expansions, indicating their operational equivalence for coherence distillation in both asymptotic and large block length regime. We then generalize the above line of research to the assisted setting, arising naturally in bipartite quantum systems where Bob distills coherence from the state at hand, aided by the benevolent Alice possessing the other system. More precisely, we introduce a new assisted incoherent randomness extraction task and establish an exact relation between this task and the assisted coherence distillation. This strengthens the one-shot relation in the unassisted setting and confirms that this cryptographic framework indeed offers a new perspective to the study of quantum coherence distillation. Likewise, this relation yields second order characterizations to the assisted tasks. As by-products, we show the strong converse property of the aforementioned tasks from their second order expansions. △ Less

Submitted 10 November, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: v2 added the second order analysis on the assisted coherence distillation, one author is added; comments are welcome

Journal ref: IEEE Transactions on Information Theory ( Volume: 67, Issue: 6, June 2021)

Showing 1–50 of 65 results for author: Fang, K