-
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
Authors:
Gen Li,
Zhihao Shu,
Jie Ji,
Minghai Qin,
Fatemeh Afghah,
Wei Niu,
Xiaolong Ma
Abstract:
Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks…
▽ More
Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times$ speedup while saving up to 1.61$\times$ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading
Authors:
Chuqiao Zong,
Chaojie Wang,
Molei Qin,
Lei Feng,
Xinrun Wang,
Bo An
Abstract:
High-frequency trading (HFT) that executes algorithmic trading in short time scales, has recently occupied the majority of cryptocurrency market. Besides traditional quantitative trading methods, reinforcement learning (RL) has become another appealing approach for HFT due to its terrific ability of handling high-dimensional financial data and solving sophisticated sequential decision-making probl…
▽ More
High-frequency trading (HFT) that executes algorithmic trading in short time scales, has recently occupied the majority of cryptocurrency market. Besides traditional quantitative trading methods, reinforcement learning (RL) has become another appealing approach for HFT due to its terrific ability of handling high-dimensional financial data and solving sophisticated sequential decision-making problems, \emph{e.g.,} hierarchical reinforcement learning (HRL) has shown its promising performance on second-level HFT by training a router to select only one sub-agent from the agent pool to execute the current transaction. However, existing RL methods for HFT still have some defects: 1) standard RL-based trading agents suffer from the overfitting issue, preventing them from making effective policy adjustments based on financial context; 2) due to the rapid changes in market conditions, investment decisions made by an individual agent are usually one-sided and highly biased, which might lead to significant loss in extreme markets. To tackle these problems, we propose a novel Memory Augmented Context-aware Reinforcement learning method On HFT, \emph{a.k.a.} MacroHFT, which consists of two training phases: 1) we first train multiple types of sub-agents with the market data decomposed according to various financial indicators, specifically market trend and volatility, where each agent owns a conditional adapter to adjust its trading policy according to market conditions; 2) then we train a hyper-agent to mix the decisions from these sub-agents and output a consistently profitable meta-policy to handle rapid market fluctuations, equipped with a memory mechanism to enhance the capability of decision-making. Extensive experiments on various cryptocurrency markets demonstrate that MacroHFT can achieve state-of-the-art performance on minute-level trading tasks.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models
Authors:
Kehua Feng,
Keyan Ding,
Weijie Wang,
Xiang Zhuang,
Zeyuan Wang,
Ming Qin,
Yu Zhao,
Jianhua Yao,
Qiang Zhang,
Huajun Chen
Abstract:
The burgeoning utilization of Large Language Models (LLMs) in scientific research necessitates advanced benchmarks capable of evaluating their understanding and application of scientific knowledge comprehensively. To address this need, we introduce the SciKnowEval benchmark, a novel framework that systematically evaluates LLMs across five progressive levels of scientific knowledge: studying extens…
▽ More
The burgeoning utilization of Large Language Models (LLMs) in scientific research necessitates advanced benchmarks capable of evaluating their understanding and application of scientific knowledge comprehensively. To address this need, we introduce the SciKnowEval benchmark, a novel framework that systematically evaluates LLMs across five progressive levels of scientific knowledge: studying extensively, inquiring earnestly, thinking profoundly, discerning clearly, and practicing assiduously. These levels aim to assess the breadth and depth of scientific knowledge in LLMs, including knowledge coverage, inquiry and exploration capabilities, reflection and reasoning abilities, ethic and safety considerations, as well as practice proficiency. Specifically, we take biology and chemistry as the two instances of SciKnowEval and construct a dataset encompassing 50K multi-level scientific problems and solutions. By leveraging this dataset, we benchmark 20 leading open-source and proprietary LLMs using zero-shot and few-shot prompting strategies. The results reveal that despite achieving state-of-the-art performance, the proprietary LLMs still have considerable room for improvement, particularly in addressing scientific computations and applications. We anticipate that SciKnowEval will establish a comprehensive standard for benchmarking LLMs in science research and discovery, and promote the development of LLMs that integrate scientific knowledge with strong safety awareness. The dataset and code are publicly available at https://github.com/hicai-zju/sciknoweval .
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Pre-train and Refine: Towards Higher Efficiency in K-Agnostic Community Detection without Quality Degradation
Authors:
Meng Qin,
Chaorui Zhang,
Yu Gao,
Weixi Zhang,
Dit-Yan Yeung
Abstract:
Community detection (CD) is a classic graph inference task that partitions nodes of a graph into densely connected groups. While many CD methods have been proposed with either impressive quality or efficiency, balancing the two aspects remains a challenge. This study explores the potential of deep graph learning to achieve a better trade-off between the quality and efficiency of K-agnostic CD, whe…
▽ More
Community detection (CD) is a classic graph inference task that partitions nodes of a graph into densely connected groups. While many CD methods have been proposed with either impressive quality or efficiency, balancing the two aspects remains a challenge. This study explores the potential of deep graph learning to achieve a better trade-off between the quality and efficiency of K-agnostic CD, where the number of communities K is unknown. We propose PRoCD (Pre-training & Refinement fOr Community Detection), a simple yet effective method that reformulates K-agnostic CD as the binary node pair classification. PRoCD follows a pre-training & refinement paradigm inspired by recent advances in pre-training techniques. We first conduct the offline pre-training of PRoCD on small synthetic graphs covering various topology properties. Based on the inductive inference across graphs, we then generalize the pre-trained model (with frozen parameters) to large real graphs and use the derived CD results as the initialization of an existing efficient CD method (e.g., InfoMap) to further refine the quality of CD results. In addition to benefiting from the transfer ability regarding quality, the online generalization and refinement can also help achieve high inference efficiency, since there is no time-consuming model optimization. Experiments on public datasets with various scales demonstrate that PRoCD can ensure higher efficiency in K-agnostic CD without significant quality degradation.
△ Less
Submitted 7 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting
Authors:
Yuanhao Cai,
Zihao Xiao,
Yixun Liang,
Minghan Qin,
Yulun Zhang,
Xiaokang Yang,
Yaoyao Liu,
Alan Yuille
Abstract:
High dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed…
▽ More
High dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed. In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time. Specifically, we design a Dual Dynamic Range (DDR) Gaussian point cloud model that uses spherical harmonics to fit HDR color and employs an MLP-based tone-mapper to render LDR color. The HDR and LDR colors are then fed into two Parallel Differentiable Rasterization (PDR) processes to reconstruct HDR and LDR views. To establish the data foundation for the research of 3D Gaussian splatting-based methods in HDR NVS, we recalibrate the camera parameters and compute the initial positions for Gaussian point clouds. Experiments demonstrate that our HDR-GS surpasses the state-of-the-art NeRF-based method by 3.84 and 1.91 dB on LDR and HDR NVS while enjoying 1000x inference speed and only requiring 6.3% training time. Code, models, and recalibrated data will be publicly available at https://github.com/caiyuanhao1998/HDR-GS
△ Less
Submitted 27 May, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
IB-AdCSCNet:Adaptive Convolutional Sparse Coding Network Driven by Information Bottleneck
Authors:
He Zou,
Meng'en Qin,
Yu Song,
Xiaohui Yang
Abstract:
In the realm of neural network models, the perpetual challenge remains in retaining task-relevant information while effectively discarding redundant data during propagation. In this paper, we introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory. IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks by dynamically adjust…
▽ More
In the realm of neural network models, the perpetual challenge remains in retaining task-relevant information while effectively discarding redundant data during propagation. In this paper, we introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory. IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks by dynamically adjusting the trade-off hyperparameter $λ$ through gradient descent, updating it within the FISTA(Fast Iterative Shrinkage-Thresholding Algorithm ) framework. By optimizing the compressive excitation loss function induced by the information bottleneck principle, IB-AdCSCNet achieves an optimal balance between compression and fitting at a global level, approximating the globally optimal representation feature. This information bottleneck trade-off strategy driven by downstream tasks not only helps to learn effective features of the data, but also improves the generalization of the model. This study's contribution lies in presenting a model with consistent performance and offering a fresh perspective on merging deep learning with sparse representation theory, grounded in the information bottleneck concept. Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that IB-AdCSCNet not only matches the performance of deep residual convolutional networks but also outperforms them when handling corrupted data. Through the inference of the IB trade-off, the model's robustness is notably enhanced.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
NAFRSSR: a Lightweight Recursive Network for Efficient Stereo Image Super-Resolution
Authors:
Yihong Chen,
Zhen Fan,
Shuai Dong,
Zhiwei Chen,
Wenjie Li,
Minghui Qin,
Min Zeng,
Xubing Lu,
Guofu Zhou,
Xingsen Gao,
Jun-Ming Liu
Abstract:
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high co…
▽ More
Stereo image super-resolution (SR) refers to the reconstruction of a high-resolution (HR) image from a pair of low-resolution (LR) images as typically captured by a dual-camera device. To enhance the quality of SR images, most previous studies focused on increasing the number and size of feature maps and introducing complex and computationally intensive structures, resulting in models with high computational complexity. Here, we propose a simple yet efficient stereo image SR model called NAFRSSR, which is modified from the previous state-of-the-art model NAFSSR by introducing recursive connections and lightweighting the constituent modules. Our NAFRSSR model is composed of nonlinear activation free and group convolution-based blocks (NAFGCBlocks) and depth-separated stereo cross attention modules (DSSCAMs). The NAFGCBlock improves feature extraction and reduces number of parameters by removing the simple channel attention mechanism from NAFBlock and using group convolution. The DSSCAM enhances feature fusion and reduces number of parameters by replacing 1x1 pointwise convolution in SCAM with weight-shared 3x3 depthwise convolution. Besides, we propose to incorporate trainable edge detection operator into NAFRSSR to further improve the model performance. Four variants of NAFRSSR with different sizes, namely, NAFRSSR-Mobile (NAFRSSR-M), NAFRSSR-Tiny (NAFRSSR-T), NAFRSSR-Super (NAFRSSR-S) and NAFRSSR-Base (NAFRSSR-B) are designed, and they all exhibit fewer parameters, higher PSNR/SSIM, and faster speed than the previous state-of-the-art models. In particular, to the best of our knowledge, NAFRSSR-M is the lightest (0.28M parameters) and fastest (50 ms inference time) model achieving an average PSNR/SSIM as high as 24.657 dB/0.7622 on the benchmark datasets. Codes and models will be released at https://github.com/JNUChenYiHong/NAFRSSR.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Revisiting a Pain in the Neck: Semantic Phrase Processing Benchmark for Language Models
Authors:
Yang Liu,
Melissa Xiaohui Qin,
Hongming Li,
Chao Huang
Abstract:
We introduce LexBench, a comprehensive evaluation suite enabled to test language models (LMs) on ten semantic phrase processing tasks. Unlike prior studies, it is the first work to propose a framework from the comparative perspective to model the general semantic phrase (i.e., lexical collocation) and three fine-grained semantic phrases, including idiomatic expression, noun compound, and verbal co…
▽ More
We introduce LexBench, a comprehensive evaluation suite enabled to test language models (LMs) on ten semantic phrase processing tasks. Unlike prior studies, it is the first work to propose a framework from the comparative perspective to model the general semantic phrase (i.e., lexical collocation) and three fine-grained semantic phrases, including idiomatic expression, noun compound, and verbal construction. Thanks to \ourbenchmark, we assess the performance of 15 LMs across model architectures and parameter scales in classification, extraction, and interpretation tasks. Through the experiments, we first validate the scaling law and find that, as expected, large models excel better than the smaller ones in most tasks. Second, we investigate further through the scaling semantic relation categorization and find that few-shot LMs still lag behind vanilla fine-tuned models in the task. Third, through human evaluation, we find that the performance of strong models is comparable to the human level regarding semantic phrase processing. Our benchmarking findings can serve future research aiming to improve the generic capability of LMs on semantic phrase comprehension. Our source code and data are available at https://github.com/jacklanda/LexBench
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Deep Reinforcement Learning Based Toolpath Generation for Thermal Uniformity in Laser Powder Bed Fusion Process
Authors:
Mian Qin,
Junhao Ding,
Shuo Qu,
Xu Song,
Charlie C. L. Wang,
Wei-Hsin Liao
Abstract:
Laser powder bed fusion (LPBF) is a widely used metal additive manufacturing technology. However, the accumulation of internal residual stress during printing can cause significant distortion and potential failure. Although various scan patterns have been studied to reduce possible accumulated stress, such as zigzag scanning vectors with changing directions or a chessboard-based scan pattern with…
▽ More
Laser powder bed fusion (LPBF) is a widely used metal additive manufacturing technology. However, the accumulation of internal residual stress during printing can cause significant distortion and potential failure. Although various scan patterns have been studied to reduce possible accumulated stress, such as zigzag scanning vectors with changing directions or a chessboard-based scan pattern with divided small islands, most conventional scan patterns cannot significantly reduce residual stress. The proposed adaptive toolpath generation (ATG) algorithms, aiming to minimize the thermal gradients, may result in extremely accumulated temperature fields in some cases. To address these issues, we developed a deep reinforcement learning (DRL)-based toolpath generation framework, with the goal of achieving uniformly distributed heat and avoiding extremely thermal accumulation regions during the LPBF process. We first developed an overall pipeline for the DRL-based toolpath generation framework, which includes uniformly sampling, agent moving and environment observation, action selection, moving constraints, rewards calculation, and the training process. To accelerate the training process, we simplified the data-intensive numerical model by considering the turning angles on the toolpath. We designed the action spaces with three options, including the minimum temperature value, the smoothest path, and the second smoothest path. The reward function was designed to minimize energy density to ensure the temperature field remains relatively stable. To verify the effectiveness of the proposed DRL-based toolpath generation framework, we performed numerical simulations of polygon shape printing domains. In addition, four groups of thin plate samples with different scan patterns were compared using the LPBF process.
△ Less
Submitted 16 February, 2024;
originally announced April 2024.
-
Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections
Authors:
Dongbin Zhang,
Chuming Wang,
Weitao Wang,
Peihao Li,
Minghan Qin,
Haoqian Wang
Abstract:
Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique…
▽ More
Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to previous methods, with a $1000\times$ increase in rendering speed.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Cradle: Empowering Foundation Agents Towards General Computer Control
Authors:
Weihao Tan,
Wentao Zhang,
Xinrun Xu,
Haochong Xia,
Ziluo Ding,
Boyu Li,
Bohan Zhou,
Junpeng Yue,
Jiechuan Jiang,
Yewen Li,
Ruyi An,
Molei Qin,
Chuqiao Zong,
Longtao Zheng,
Yujie Wu,
Xiaoqiang Chai,
Yifei Bi,
Tianbao Xie,
Pengjie Gu,
Xiyun Li,
Ceyao Zhang,
Long Tian,
Chaojie Wang,
Xinrun Wang,
Börje F. Karlsson
, et al. (3 additional authors not shown)
Abstract:
Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through t…
▽ More
Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i.e., using screenshots as input and keyboard and mouse actions as output. We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. Enhanced by six key modules, Cradle can understand input screenshots and output executable code for low-level keyboard and mouse control after high-level planning, so that Cradle can interact with any software and complete long-horizon complex tasks without relying on any built-in APIs. Experimental results show that Cradle exhibits remarkable generalizability and impressive performance across four previously unexplored commercial video games, five software applications, and a comprehensive benchmark, OSWorld. Cradle is the first to enable foundation agents to follow the main storyline and complete 40-minute-long real missions in the complex AAA game Red Dead Redemption 2 (RDR2). Cradle can also create a city of a thousand people in Cities: Skylines, farm and harvest parsnips in Stardew Valley, and trade and bargain with a maximal weekly total profit of 87% in Dealer's Life 2. Cradle can not only operate daily software, like Chrome, Outlook, and Feishu, but also edit images and videos using Meitu and CapCut. Cradle greatly extends the reach of foundation agents by enabling the easy conversion of any software, especially complex games, into benchmarks to evaluate agents' various abilities and facilitate further data collection, thus paving the way for generalist agents.
△ Less
Submitted 2 July, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation
Authors:
Nuo Xu,
Wen Wang,
Rong Yang,
Mengjie Qin,
Zheyuan Lin,
Wei Song,
Chunlong Zhang,
Jason Gu,
Chao Li
Abstract:
Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph…
▽ More
Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph representation of the scenes, which results in misalignment with visual images. To provide more accurate and coherent scene descriptions and address this misalignment issue, we propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation. Technically, our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception. The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability. We extensively evaluate our method using the AI2-THOR simulator and conduct a series of experiments to demonstrate the effectiveness and efficiency of our navigator. Code available: https://github.com/nuoxu/AKGVP.
△ Less
Submitted 25 April, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist
Authors:
Wentao Zhang,
Lingxuan Zhao,
Haochong Xia,
Shuo Sun,
Jiaze Sun,
Molei Qin,
Xinyi Li,
Yuqing Zhao,
Yilei Zhao,
Xinyu Cai,
Longtao Zheng,
Xinrun Wang,
Bo An
Abstract:
Financial trading is a crucial component of the markets, informed by a multimodal information landscape encompassing news, prices, and Kline charts, and encompasses diverse tasks such as quantitative trading and high-frequency trading with various assets. While advanced AI techniques like deep learning and reinforcement learning are extensively utilized in finance, their application in financial t…
▽ More
Financial trading is a crucial component of the markets, informed by a multimodal information landscape encompassing news, prices, and Kline charts, and encompasses diverse tasks such as quantitative trading and high-frequency trading with various assets. While advanced AI techniques like deep learning and reinforcement learning are extensively utilized in finance, their application in financial trading tasks often faces challenges due to inadequate handling of multimodal data and limited generalizability across various tasks. To address these challenges, we present FinAgent, a multimodal foundational agent with tool augmentation for financial trading. FinAgent's market intelligence module processes a diverse range of data-numerical, textual, and visual-to accurately analyze the financial market. Its unique dual-level reflection module not only enables rapid adaptation to market dynamics but also incorporates a diversified memory retrieval system, enhancing the agent's ability to learn from historical data and improve decision-making processes. The agent's emphasis on reasoning for actions fosters trust in its financial decisions. Moreover, FinAgent integrates established trading strategies and expert insights, ensuring that its trading approaches are both data-driven and rooted in sound financial principles. With comprehensive experiments on 6 financial datasets, including stocks and Crypto, FinAgent significantly outperforms 9 state-of-the-art baselines in terms of 6 financial metrics with over 36% average improvement on profit. Specifically, a 92.27% return (a 84.39% relative improvement) is achieved on one dataset. Notably, FinAgent is the first advanced multimodal foundation agent designed for financial trading tasks.
△ Less
Submitted 28 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Scientific Large Language Models: A Survey on Biological & Chemical Domains
Authors:
Qiang Zhang,
Keyang Ding,
Tianwen Lyv,
Xinda Wang,
Qingyu Yin,
Yiwen Zhang,
Jing Yu,
Yuhao Wang,
Xiaotong Li,
Zhuoyi Xiang,
Xiang Zhuang,
Zeyuan Wang,
Ming Qin,
Mengyao Zhang,
Jinlu Zhang,
Jiyu Cui,
Renjun Xu,
Hongyang Chen,
Xiaohui Fan,
Huabin Xing,
Huajun Chen
Abstract:
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent o…
▽ More
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension, representing a significant stride toward artificial general intelligence. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. This growing interest has led to the advent of scientific LLMs, a novel subclass specifically engineered for facilitating scientific discovery. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration. However, a systematic and up-to-date survey introducing them is currently lacking. In this paper, we endeavor to methodically delineate the concept of "scientific language", whilst providing a thorough review of the latest advancements in scientific LLMs. Given the expansive realm of scientific disciplines, our analysis adopts a focused lens, concentrating on the biological and chemical domains. This includes an in-depth examination of LLMs for textual knowledge, small molecules, macromolecular proteins, genomic sequences, and their combinations, analyzing them in terms of model architectures, capabilities, datasets, and evaluation. Finally, we critically examine the prevailing challenges and point out promising research directions along with the advances of LLMs. By offering a comprehensive overview of technical developments in this field, this survey aspires to be an invaluable resource for researchers navigating the intricate landscape of scientific LLMs.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
How Are Paid and Volunteer Open Source Developers Different? A Study of the Rust Project
Authors:
Yuxia Zhang,
Mian Qin,
Klaas-Jan Stol,
Minghui Zhou,
Hui Liu
Abstract:
It is now commonplace for organizations to pay developers to work on specific open source software (OSS) projects to pursue their business goals. Such paid developers work alongside voluntary contributors, but given the different motivations of these two groups of developers, conflict may arise, which may pose a threat to a project's sustainability. This paper presents an empirical study of paid d…
▽ More
It is now commonplace for organizations to pay developers to work on specific open source software (OSS) projects to pursue their business goals. Such paid developers work alongside voluntary contributors, but given the different motivations of these two groups of developers, conflict may arise, which may pose a threat to a project's sustainability. This paper presents an empirical study of paid developers and volunteers in Rust, a popular open source programming language project. Rust is a particularly interesting case given considerable concerns about corporate participation. We compare volunteers and paid developers through contribution characteristics and long-term participation, and solicit volunteers' perceptions on paid developers. We find that core paid developers tend to contribute more frequently; commits contributed by one-time paid developers have bigger sizes; peripheral paid developers implement more features; and being paid plays a positive role in becoming a long-term contributor. We also find that volunteers do have some prejudices against paid developers. This study suggests that the dichotomous view of paid vs. volunteer developers is too simplistic and that further subgroups can be identified. Companies should become more sensitive to how they engage with OSS communities, in certain ways as suggested by this study.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Multi-Task DNS Security Analysis via High-Order Heterogeneous Graph Embedding
Authors:
Meng Qin
Abstract:
DNS is an essential Internet infrastructure to support network applications and services, but is also a significant tool exploited by various cyberattacks. Existing DNS security analysis techniques mostly focus on one specific task associated with one single entity (e.g., domain) via conventional feature engineering. They rely heavily on the labor-intensive feature selection and largely ignore the…
▽ More
DNS is an essential Internet infrastructure to support network applications and services, but is also a significant tool exploited by various cyberattacks. Existing DNS security analysis techniques mostly focus on one specific task associated with one single entity (e.g., domain) via conventional feature engineering. They rely heavily on the labor-intensive feature selection and largely ignore the intrinsic correlations among the heterogeneous DNS entities (e.g., domain and IP). In this paper, I explore the potential of heterogeneous graph embedding to automatically learn the behavior features of multiple DNS entities, and to simultaneously support more than one security tasks. Considering the joint optimization of malicious domain detection and IP reputation evaluation as an example, I propose a novel joint DNS embedding (JDE) model to formulate the DNS query behavior via a similarity-enhanced graph with heterogeneous entities. The random walk technique is applied to the heterogeneous graph to comprehensively explore the hidden homogeneous and heterogeneous high-order proximities among domains and IPs. Extensive experiments on real DNS traffic demonstrate that the joint optimization of multiple tasks with the latent high-order proximities can lead to better security analysis performance for all the tasks than respectively optimizing each single task with the observable low-order proximity.
△ Less
Submitted 14 January, 2024;
originally announced January 2024.
-
Towards a Unified Method for Network Dynamic via Adversarial Weighted Link Prediction
Authors:
Meng Qin
Abstract:
Network dynamic (e.g., traffic burst in data center networks and channel fading in cellular WiFi networks) has a great impact on the performance of communication networks (e.g., throughput, capacity, delay, and jitter). This article proposes a unified prediction-based method to handle the dynamic of various network systems. From the view of graph deep learning, I generally formulate the dynamic pr…
▽ More
Network dynamic (e.g., traffic burst in data center networks and channel fading in cellular WiFi networks) has a great impact on the performance of communication networks (e.g., throughput, capacity, delay, and jitter). This article proposes a unified prediction-based method to handle the dynamic of various network systems. From the view of graph deep learning, I generally formulate the dynamic prediction of networks as a temporal link prediction task and analyze the possible challenges of the prediction of weighted networks, where link weights have the wide-value-range and sparsity issues. Inspired by the high-resolution video frame prediction with generative adversarial network (GAN), I try to adopt adversarial learning to generate high-quality predicted snapshots for network dynamic, which is expected to support the precise and fine-grained network control. A novel high-quality temporal link prediction (HQ-TLP) model with GAN is then developed to illustrate the potential of my basic idea. Extensive experiments for various application scenarios further demonstrate the powerful capability of HQ-TLP.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
IRWE: Inductive Random Walk for Joint Inference of Identity and Position Network Embedding
Authors:
Meng Qin,
Dit-Yan Yeung
Abstract:
Network embedding, which maps graphs to distributed representations, is a unified framework for various graph inference tasks. According to the topology properties (e.g., structural roles and community memberships of nodes) to be preserved, it can be categorized into the identity and position embedding. However, existing methods can only capture one type of property. Some approaches can support th…
▽ More
Network embedding, which maps graphs to distributed representations, is a unified framework for various graph inference tasks. According to the topology properties (e.g., structural roles and community memberships of nodes) to be preserved, it can be categorized into the identity and position embedding. However, existing methods can only capture one type of property. Some approaches can support the inductive inference that generalizes the embedding model to new nodes or graphs but relies on the availability of attributes. Due to the complicated correlations between topology and attributes, it is unclear for some inductive methods which type of property they can capture. In this study, we explore a unified framework for the joint inductive inference of identity and position embeddings without attributes. An inductive random walk embedding (IRWE) method is proposed, which combines multiple attention units to handle the random walk on graph topology and simultaneously derives identity and position embeddings that are jointly optimized. In particular, we demonstrate that some random walk statistics can be informative features to characterize node identities and positions while supporting the inductive embedding inference. Experiments validate the superior performance of IRWE beyond various baselines for the transductive and inductive inference of identity and position embeddings.
△ Less
Submitted 12 May, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
LangSplat: 3D Language Gaussian Splatting
Authors:
Minghan Qin,
Wanhua Li,
Jiawei Zhou,
Haoqian Wang,
Hanspeter Pfister
Abstract:
Humans live in a 3D world and commonly use natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP langua…
▽ More
Humans live in a 3D world and commonly use natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP language embeddings in a NeRF model, LangSplat advances the field by utilizing a collection of 3D Gaussians, each encoding language features distilled from CLIP, to represent the language field. By employing a tile-based splatting technique for rendering language features, we circumvent the costly rendering process inherent in NeRF. Instead of directly learning CLIP embeddings, LangSplat first trains a scene-wise language autoencoder and then learns language features on the scene-specific latent space, thereby alleviating substantial memory demands imposed by explicit modeling. Existing methods struggle with imprecise and vague 3D language fields, which fail to discern clear boundaries between objects. We delve into this issue and propose to learn hierarchical semantics using SAM, thereby eliminating the need for extensively querying the language field across various scales and the regularization of DINO features. Extensive experimental results show that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a 199 $\times$ speedup compared to LERF at the resolution of 1440 $\times$ 1080. We strongly recommend readers to check out our video results at https://langsplat.github.io/
△ Less
Submitted 31 March, 2024; v1 submitted 26 December, 2023;
originally announced December 2023.
-
Adaptive Robot Coordination: A Subproblem-based Approach for Hybrid Multi-Robot Motion Planning
Authors:
Irving Solis,
James Motes,
Mike Qin,
Marco Morales,
Nancy M. Amato
Abstract:
This work presents Adaptive Robot Coordination (ARC), a novel hybrid framework for multi-robot motion planning (MRMP) that employs local subproblems to resolve inter-robot conflicts. ARC creates subproblems centered around conflicts, and the solutions represent the robot motions required to resolve these conflicts. The use of subproblems enables an inexpensive hybrid exploration of the multi-robot…
▽ More
This work presents Adaptive Robot Coordination (ARC), a novel hybrid framework for multi-robot motion planning (MRMP) that employs local subproblems to resolve inter-robot conflicts. ARC creates subproblems centered around conflicts, and the solutions represent the robot motions required to resolve these conflicts. The use of subproblems enables an inexpensive hybrid exploration of the multi-robot planning space. ARC leverages the hybrid exploration by dynamically adjusting the coupling and decoupling of the multi-robot planning space. This allows ARC to adapt the levels of coordination efficiently by planning in decoupled spaces, where robots can operate independently, and in coupled spaces where coordination is essential. ARC is probabilistically complete, can be used for any robot, and produces efficient cost solutions in reduced planning times. Through extensive evaluation across representative scenarios with different robots requiring various levels of coordination, ARC demonstrates its ability to provide simultaneous scalability and precise coordination. ARC is the only method capable of solving all the scenarios and is competitive with coupled, decoupled, and hybrid baselines.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
The Computational Advantage of MIP* Vanishes in the Presence of Noise
Authors:
Yangjing Dong,
Honghao Fu,
Anand Natarajan,
Minglong Qin,
Haochen Xu,
Penghui Yao
Abstract:
Quantum multiprover interactive proof systems with entanglement MIP* are much more powerful than their classical counterpart MIP (Babai et al. '91, Ji et al. '20): while MIP = NEXP, the quantum class MIP* is equal to RE, a class including the halting problem. This is because the provers in MIP* can share unbounded quantum entanglement. However, recent works of Qin and Yao '21 and '23 have shown th…
▽ More
Quantum multiprover interactive proof systems with entanglement MIP* are much more powerful than their classical counterpart MIP (Babai et al. '91, Ji et al. '20): while MIP = NEXP, the quantum class MIP* is equal to RE, a class including the halting problem. This is because the provers in MIP* can share unbounded quantum entanglement. However, recent works of Qin and Yao '21 and '23 have shown that this advantage is significantly reduced if the provers' shared state contains noise. This paper attempts to exactly characterize the effect of noise on the computational power of quantum multiprover interactive proof systems. We investigate the quantum two-prover one-round interactive system MIP*[poly, O(1)], where the verifier sends polynomially many bits to the provers and the provers send back constantly many bits. We show noise completely destroys the computational advantage given by shared entanglement in this model. Specifically, we show that if the provers are allowed to share arbitrarily many EPR states, where each EPR state is affected by an arbitrarily small constant amount of noise, the resulting complexity class is contained in NEXP = MIP. This improves significantly on the previous best-known bound of NEEEXP (nondeterministic triply exponential time) by Qin and Yao '21. We also show that this collapse in power is due to the noise, rather than the O(1) answer size, by showing that allowing for noiseless EPR states gives the class the full power of RE = MIP*[poly, poly]. Along the way, we develop two technical tools of independent interest. First, we give a new, deterministic tester for the positivity of an exponentially large matrix, provided it has a low-degree Fourier decomposition in terms of Pauli matrices. Secondly, we develop a new invariance principle for smooth matrix functions having bounded third-order Fréchet derivatives or which are Lipschitz continous.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
RaftGP: Random Fast Graph Partitioning
Authors:
Yu Gao,
Meng Qin,
Yibin Ding,
Li Zeng,
Chaorui Zhang,
Weixi Zhang,
Wei Han,
Rongqian Zhao,
Bo Bai
Abstract:
Graph partitioning (GP), a.k.a. community detection, is a classic problem that divides the node set of a graph into densely-connected blocks. Following prior work on the IEEE HPEC Graph Challenge benchmark and recent advances in graph machine learning, we propose a novel RAndom FasT Graph Partitioning (RaftGP) method based on an efficient graph embedding scheme. It uses the Gaussian random project…
▽ More
Graph partitioning (GP), a.k.a. community detection, is a classic problem that divides the node set of a graph into densely-connected blocks. Following prior work on the IEEE HPEC Graph Challenge benchmark and recent advances in graph machine learning, we propose a novel RAndom FasT Graph Partitioning (RaftGP) method based on an efficient graph embedding scheme. It uses the Gaussian random projection to extract community-preserving features from classic GP objectives. These features are fed into a graph neural network (GNN) to derive low-dimensional node embeddings. Surprisingly, our experiments demonstrate that a randomly initialized GNN even without training is enough for RaftGP to derive informative community-preserving embeddings and support high-quality GP. To enable the derived embeddings to tackle GP, we introduce a hierarchical model selection algorithm that simultaneously determines the number of blocks and the corresponding GP result. We evaluate RaftGP on the Graph Challenge benchmark and compare the performance with five baselines, where our method can achieve a better trade-off between quality and efficiency. In particular, compared to the baseline algorithm of the IEEE HPEC Graph Challenge, our method is 6.68x -- 23.9x faster on graphs with 1E3 -- 5E4 nodes and at least 64.5x faster on larger (1E5 node) graphs on which the baseline takes more than 1E4 seconds. Our method achieves better accuracy on all test cases. We also develop a new graph generator to address some limitations of the original generator in the benchmark.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars
Authors:
Yang Liu,
Xiang Huang,
Minghan Qin,
Qinwei Lin,
Haoqian Wang
Abstract:
Neural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaus…
▽ More
Neural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaussians to posed space according to the input poses. We introduce hash-encoded shape and appearance to speed up training and propose time-dependent ambient occlusion to achieve high-quality reconstructions in scenes containing complex motions and dynamic shadows. On both novel view synthesis and novel pose synthesis tasks, our method outperforms existing methods in terms of training time, rendering speed, and reconstruction quality. Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.
△ Less
Submitted 29 November, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Integration and Implementation Strategies for AI Algorithm Deployment with Smart Routing Rules and Workflow Management
Authors:
Barbaros Selnur Erdal,
Vikash Gupta,
Mutlu Demirer,
Kim H. Fair,
Richard D. White,
Jeff Blair,
Barbara Deichert,
Laurie Lafleur,
Ming Melvin Qin,
David Bericat,
Brad Genereaux
Abstract:
This paper reviews the challenges hindering the widespread adoption of artificial intelligence (AI) solutions in the healthcare industry, focusing on computer vision applications for medical imaging, and how interoperability and enterprise-grade scalability can be used to address these challenges. The complex nature of healthcare workflows, intricacies in managing large and secure medical imaging…
▽ More
This paper reviews the challenges hindering the widespread adoption of artificial intelligence (AI) solutions in the healthcare industry, focusing on computer vision applications for medical imaging, and how interoperability and enterprise-grade scalability can be used to address these challenges. The complex nature of healthcare workflows, intricacies in managing large and secure medical imaging data, and the absence of standardized frameworks for AI development pose significant barriers and require a new paradigm to address them.
The role of interoperability is examined in this paper as a crucial factor in connecting disparate applications within healthcare workflows. Standards such as DICOM, Health Level 7 (HL7), and Integrating the Healthcare Enterprise (IHE) are highlighted as foundational for common imaging workflows. A specific focus is placed on the role of DICOM gateways, with Smart Routing Rules and Workflow Management leading transformational efforts in this area.
To drive enterprise scalability, new tools are needed. Project MONAI, established in 2019, is introduced as an initiative aiming to redefine the development of medical AI applications. The MONAI Deploy App SDK, a component of Project MONAI, is identified as a key tool in simplifying the packaging and deployment process, enabling repeatable, scalable, and standardized deployment patterns for AI applications.
The abstract underscores the potential impact of successful AI adoption in healthcare, offering physicians both life-saving and time-saving insights and driving efficiencies in radiology department workflows. The collaborative efforts between academia and industry, are emphasized as essential for advancing the adoption of healthcare AI solutions.
△ Less
Submitted 21 November, 2023; v1 submitted 17 November, 2023;
originally announced November 2023.
-
Controllable Data Generation Via Iterative Data-Property Mutual Mappings
Authors:
Bo Pan,
Muran Qin,
Shiyu Wang,
Yifei Zhang,
Liang Zhao
Abstract:
Deep generative models have been widely used for their ability to generate realistic data samples in various areas, such as images, molecules, text, and speech. One major goal of data generation is controllability, namely to generate new data with desired properties. Despite growing interest in the area of controllable generation, significant challenges still remain, including 1) disentangling des…
▽ More
Deep generative models have been widely used for their ability to generate realistic data samples in various areas, such as images, molecules, text, and speech. One major goal of data generation is controllability, namely to generate new data with desired properties. Despite growing interest in the area of controllable generation, significant challenges still remain, including 1) disentangling desired properties with unrelated latent variables, 2) out-of-distribution property control, and 3) objective optimization for out-of-distribution property control. To address these challenges, in this paper, we propose a general framework to enhance VAE-based data generators with property controllability and ensure disentanglement. Our proposed objective can be optimized on both data seen and unseen in the training set. We propose a training procedure to train the objective in a semi-supervised manner by iteratively conducting mutual mappings between the data and properties. The proposed framework is implemented on four VAE-based controllable generators to evaluate its performance on property error, disentanglement, generation quality, and training time. The results indicate that our proposed framework enables more precise control over the properties of generated samples in a short training time, ensuring the disentanglement and keeping the validity of the generated samples.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
High-Fidelity 3D Head Avatars Reconstruction through Spatially-Varying Expression Conditioned Neural Radiance Field
Authors:
Minghan Qin,
Yifan Liu,
Yuelang Xu,
Xiaochen Zhao,
Yebin Liu,
Haoqian Wang
Abstract:
One crucial aspect of 3D head avatar reconstruction lies in the details of facial expressions. Although recent NeRF-based photo-realistic 3D head avatar methods achieve high-quality avatar rendering, they still encounter challenges retaining intricate facial expression details because they overlook the potential of specific expression variations at different spatial positions when conditioning the…
▽ More
One crucial aspect of 3D head avatar reconstruction lies in the details of facial expressions. Although recent NeRF-based photo-realistic 3D head avatar methods achieve high-quality avatar rendering, they still encounter challenges retaining intricate facial expression details because they overlook the potential of specific expression variations at different spatial positions when conditioning the radiance field. Motivated by this observation, we introduce a novel Spatially-Varying Expression (SVE) conditioning. The SVE can be obtained by a simple MLP-based generation network, encompassing both spatial positional features and global expression information. Benefiting from rich and diverse information of the SVE at different positions, the proposed SVE-conditioned neural radiance field can deal with intricate facial expressions and achieve realistic rendering and geometry details of high-fidelity 3D head avatars. Additionally, to further elevate the geometric and rendering quality, we introduce a new coarse-to-fine training strategy, including a geometry initialization strategy at the coarse stage and an adaptive importance sampling strategy at the fine stage. Extensive experiments indicate that our method outperforms other state-of-the-art (SOTA) methods in rendering and geometry quality on mobile phone-collected and public datasets.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
CCAE: A Corpus of Chinese-based Asian Englishes
Authors:
Yang Liu,
Melissa Xiaohui Qin,
Long Wang,
Chao Huang
Abstract:
Language models have been foundations in various scenarios of NLP applications, but it has not been well applied in language variety studies, even for the most popular language like English. This paper represents one of the few initial efforts to utilize the NLP technology in the paradigm of World Englishes, specifically in creating a multi-variety corpus for studying Asian Englishes. We present a…
▽ More
Language models have been foundations in various scenarios of NLP applications, but it has not been well applied in language variety studies, even for the most popular language like English. This paper represents one of the few initial efforts to utilize the NLP technology in the paradigm of World Englishes, specifically in creating a multi-variety corpus for studying Asian Englishes. We present an overview of the CCAE -- Corpus of Chinese-based Asian English, a suite of corpora comprising six Chinese-based Asian English varieties. It is based on 340 million tokens in 448 thousand web documents from six regions. The ontology of data would make the corpus a helpful resource with enormous research potential for Asian Englishes (especially for Chinese Englishes for which there has not been a publicly accessible corpus yet so far) and an ideal source for variety-specific language modeling and downstream tasks, thus setting the stage for NLP-based World Englishes studies. And preliminary experiments on this corpus reveal the practical value of CCAE. Finally, we make CCAE available at \href{https://huggingface.co/datasets/CCAE/CCAE-Corpus}{this https URL}.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
InstructProtein: Aligning Human and Protein Language via Knowledge Instruction
Authors:
Zeyuan Wang,
Qiang Zhang,
Keyan Ding,
Ming Qin,
Xiang Zhuang,
Xiaotong Li,
Huajun Chen
Abstract:
Large Language Models (LLMs) have revolutionized the field of natural language processing, but they fall short in comprehending biological sequences such as proteins. To address this challenge, we propose InstructProtein, an innovative LLM that possesses bidirectional generation capabilities in both human and protein languages: (i) taking a protein sequence as input to predict its textual function…
▽ More
Large Language Models (LLMs) have revolutionized the field of natural language processing, but they fall short in comprehending biological sequences such as proteins. To address this challenge, we propose InstructProtein, an innovative LLM that possesses bidirectional generation capabilities in both human and protein languages: (i) taking a protein sequence as input to predict its textual function description and (ii) using natural language to prompt protein sequence generation. To achieve this, we first pre-train an LLM on both protein and natural language corpora, enabling it to comprehend individual languages. Then supervised instruction tuning is employed to facilitate the alignment of these two distinct languages. Herein, we introduce a knowledge graph-based instruction generation framework to construct a high-quality instruction dataset, addressing annotation imbalance and instruction deficits in existing protein-text corpus. In particular, the instructions inherit the structural relations between proteins and function annotations in knowledge graphs, which empowers our model to engage in the causal modeling of protein functions, akin to the chain-of-thought processes in natural languages. Extensive experiments on bidirectional protein-text generation tasks show that InstructProtein outperforms state-of-the-art LLMs by large margins. Moreover, InstructProtein serves as a pioneering step towards text-based protein function prediction and sequence design, effectively bridging the gap between protein and human language understanding.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Quantum Pseudorandom Scramblers
Authors:
Chuhan Lu,
Minglong Qin,
Fang Song,
Penghui Yao,
Mingnan Zhao
Abstract:
Quantum pseudorandom state generators (PRSGs) have stimulated exciting developments in recent years. A PRSG, on a fixed initial (e.g., all-zero) state, produces an output state that is computationally indistinguishable from a Haar random state. However, pseudorandomness of the output state is not guaranteed on other initial states. In fact, known PRSG constructions provably fail on some initial st…
▽ More
Quantum pseudorandom state generators (PRSGs) have stimulated exciting developments in recent years. A PRSG, on a fixed initial (e.g., all-zero) state, produces an output state that is computationally indistinguishable from a Haar random state. However, pseudorandomness of the output state is not guaranteed on other initial states. In fact, known PRSG constructions provably fail on some initial state.
In this work, we propose and construct quantum Pseudorandom State Scramblers (PRSSs), which can produce a pseudorandom state on an arbitrary initial state. In the information-theoretical setting, we obtain a scrambler which maps an arbitrary initial state to a distribution of quantum states that is close to Haar random in total variation distance. As a result, our PRSS exhibits a dispersing property. Loosely, it can span an $ε$-net of the state space. This significantly strengthens what standard PRSGs can induce, as they may only concentrate on a small region of the state space as long as the average output state approximates a Haar random state in total variation distance.
Our PRSS construction develops a parallel extension of the famous Kac's walk, and we show that it mixes exponentially faster than the standard Kac's walk. This constitutes the core of our proof. We also describe a few applications of PRSSs. While our PRSS construction assumes a post-quantum one-way function, PRSSs are potentially a weaker primitive and can be separated from one-way functions in a relativized world similar to standard PRSGs.
△ Less
Submitted 16 September, 2023;
originally announced September 2023.
-
BotanicGarden: A High-Quality Dataset for Robot Navigation in Unstructured Natural Environments
Authors:
Yuanzhi Liu,
Yujia Fu,
Minghui Qin,
Yufeng Xu,
Baoxin Xu,
Fengdong Chen,
Bart Goossens,
Poly Z. H. Sun,
Hongwei Yu,
Chun Liu,
Long Chen,
Wei Tao,
Hui Zhao
Abstract:
The rapid developments of mobile robotics and autonomous navigation over the years are largely empowered by public datasets for testing and upgrading, such as sensor odometry and SLAM tasks. Impressive demos and benchmark scores have arisen, which may suggest the maturity of existing navigation techniques. However, these results are primarily based on moderate structured scenario testing. When tra…
▽ More
The rapid developments of mobile robotics and autonomous navigation over the years are largely empowered by public datasets for testing and upgrading, such as sensor odometry and SLAM tasks. Impressive demos and benchmark scores have arisen, which may suggest the maturity of existing navigation techniques. However, these results are primarily based on moderate structured scenario testing. When transitioning to challenging unstructured environments, especially in GNSS-denied, texture-monotonous, and dense-vegetated natural fields, their performance can hardly sustain at a high level and requires further validation and improvement. To bridge this gap, we build a novel robot navigation dataset in a luxuriant botanic garden of more than 48000m2. Comprehensive sensors are used, including Gray and RGB stereo cameras, spinning and MEMS 3D LiDARs, and low-cost and industrial-grade IMUs, all of which are well calibrated and hardware-synchronized. An all-terrain wheeled robot is employed for data collection, traversing through thick woods, riversides, narrow trails, bridges, and grasslands, which are scarce in previous resources. This yields 33 short and long sequences, forming 17.1km trajectories in total. Excitedly, both highly-accurate ego-motions and 3D map ground truth are provided, along with fine-annotated vision semantics. We firmly believe that our dataset can advance robot navigation and sensor fusion research to a higher level.
△ Less
Submitted 2 March, 2024; v1 submitted 25 June, 2023;
originally announced June 2023.
-
Adaptive Network Embedding with Arbitrary Multiple Information Sources in Attributed Graphs
Authors:
Meng Qin
Abstract:
Graph representation learning (a.k.a. network embedding) is a significant topic of network analysis, due to its effectiveness to support various graph inference tasks. In this paper, we study the representation learning with multiple information sources in attributed graphs. Recent studies usually focus on several specific sources (e.g., high-order proximity and node attributes) but few of them ca…
▽ More
Graph representation learning (a.k.a. network embedding) is a significant topic of network analysis, due to its effectiveness to support various graph inference tasks. In this paper, we study the representation learning with multiple information sources in attributed graphs. Recent studies usually focus on several specific sources (e.g., high-order proximity and node attributes) but few of them can be extended to incorporate other available sources not specified. In addition, most existing methods assume that all the integrated sources share consistent latent features but may ignore the possible inconsistency among them, lacking the required robustness. To address these issues, we propose a novel adaptive hybrid graph representation (AHGR) method from a view of graph reweighting, where each information source is formulated as a corresponding auxiliary graph, enabling AHGR to integrate arbitrary available information sources. Moreover, a new transition relation among the reweighted graphs is then introduced to perceive and resist the possible inconsistency among multiple sources, enhancing the robustness of AHGR. We verify the effectiveness of AHGR on a series of synthetic and real attributed graphs, where it presents superior performance over other baselines.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Semantic Random Walk for Graph Representation Learning in Attributed Graphs
Authors:
Meng Qin
Abstract:
In this study, we focus on the graph representation learning (a.k.a. network embedding) in attributed graphs. Different from existing embedding methods that treat the incorporation of graph structure and semantic as the simple combination of two optimization objectives, we propose a novel semantic graph representation (SGR) method to formulate the joint optimization of the two heterogeneous source…
▽ More
In this study, we focus on the graph representation learning (a.k.a. network embedding) in attributed graphs. Different from existing embedding methods that treat the incorporation of graph structure and semantic as the simple combination of two optimization objectives, we propose a novel semantic graph representation (SGR) method to formulate the joint optimization of the two heterogeneous sources into a common high-order proximity based framework. Concretely, we first construct an auxiliary weighted graph, where the complex homogeneous and heterogeneous relations among nodes and attributes in the original graph are comprehensively encoded. Conventional embedding methods that consider high-order topology proximities can then be easily applied to the newly constructed graph to learn the representations of both node and attribute while capturing the nonlinear high-order intrinsic correlation inside or among graph structure and semantic. The learned attribute embeddings can also effectively support some semantic-oriented inference tasks (e.g., semantic community detection), helping to reveal the graph's deep semantic. The effectiveness of SGR is further verified on a series of real graphs, where it achieves impressive performance over other baselines.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
Authors:
Gen Li,
Jie Ji,
Minghai Qin,
Wei Niu,
Bin Ren,
Fatemeh Afghah,
Linke Guo,
Xiaolong Ma
Abstract:
As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, t…
▽ More
As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14$\times$ faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git
△ Less
Submitted 18 June, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
DISCO: Distributed Inference with Sparse Communications
Authors:
Minghai Qin,
Chao Sun,
Jaco Hofmann,
Dejan Vucinic
Abstract:
Deep neural networks (DNNs) have great potential to solve many real-world problems, but they usually require an extensive amount of computation and memory. It is of great difficulty to deploy a large DNN model to a single resource-limited device with small memory capacity. Distributed computing is a common approach to reduce single-node memory consumption and to accelerate the inference of DNN mod…
▽ More
Deep neural networks (DNNs) have great potential to solve many real-world problems, but they usually require an extensive amount of computation and memory. It is of great difficulty to deploy a large DNN model to a single resource-limited device with small memory capacity. Distributed computing is a common approach to reduce single-node memory consumption and to accelerate the inference of DNN models. In this paper, we explore the "within-layer model parallelism", which distributes the inference of each layer into multiple nodes. In this way, the memory requirement can be distributed to many nodes, making it possible to use several edge devices to infer a large DNN model. Due to the dependency within each layer, data communications between nodes during this parallel inference can be a bottleneck when the communication bandwidth is limited. We propose a framework to train DNN models for Distributed Inference with Sparse Communications (DISCO). We convert the problem of selecting which subset of data to transmit between nodes into a model optimization problem, and derive models with both computation and communication reduction when each layer is inferred on multiple nodes. We show the benefit of the DISCO framework on a variety of CV tasks such as image classification, object detection, semantic segmentation, and image super resolution. The corresponding models include important DNN building blocks such as convolutions and transformers. For example, each layer of a ResNet-50 model can be distributively inferred across two nodes with five times less data communications, almost half overall computations and half memory requirement for a single node, and achieve comparable accuracy to the original ResNet-50 model. This also results in 4.7 times overall inference speedup.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.
-
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets
Authors:
Shuo Sun,
Molei Qin,
Xinrun Wang,
Bo An
Abstract:
The financial markets, which involve more than $90 trillion market capitals, attract the attention of innumerable investors around the world. Recently, reinforcement learning in financial markets (FinRL) has emerged as a promising direction to train agents for making profitable investment decisions. However, the evaluation of most FinRL methods only focuses on profit-related measures and ignores m…
▽ More
The financial markets, which involve more than $90 trillion market capitals, attract the attention of innumerable investors around the world. Recently, reinforcement learning in financial markets (FinRL) has emerged as a promising direction to train agents for making profitable investment decisions. However, the evaluation of most FinRL methods only focuses on profit-related measures and ignores many critical axes, which are far from satisfactory for financial practitioners to deploy these methods into real-world financial markets. Therefore, we introduce PRUDEX-Compass, which has 6 axes, i.e., Profitability, Risk-control, Universality, Diversity, rEliability, and eXplainability, with a total of 17 measures for a systematic evaluation. Specifically, i) we propose AlphaMix+ as a strong FinRL baseline, which leverages mixture-of-experts (MoE) and risk-sensitive approaches to make diversified risk-aware investment decisions, ii) we evaluate 8 FinRL methods in 4 long-term real-world datasets of influential financial markets to demonstrate the usage of our PRUDEX-Compass, iii) PRUDEX-Compass together with 4 real-world datasets, standard implementation of 8 FinRL methods and a portfolio management environment is released as public resources to facilitate the design and comparison of new FinRL methods. We hope that PRUDEX-Compass can not only shed light on future FinRL research to prevent untrustworthy results from stagnating FinRL into successful industry deployment but also provide a new challenging algorithm evaluation scenario for the reinforcement learning (RL) community.
△ Less
Submitted 2 March, 2023; v1 submitted 14 January, 2023;
originally announced February 2023.
-
Current State of Community-Driven Radiological AI Deployment in Medical Imaging
Authors:
Vikash Gupta,
Barbaros Selnur Erdal,
Carolina Ramirez,
Ralf Floca,
Laurence Jackson,
Brad Genereaux,
Sidney Bryson,
Christopher P Bridge,
Jens Kleesiek,
Felix Nensa,
Rickmer Braren,
Khaled Younis,
Tobias Penzkofer,
Andreas Michael Bucher,
Ming Melvin Qin,
Gigon Bae,
Hyeonhoon Lee,
M. Jorge Cardoso,
Sebastien Ourselin,
Eric Kerfoot,
Rahul Choudhury,
Richard D. White,
Tessa Cook,
David Bericat,
Matthew Lungren
, et al. (2 additional authors not shown)
Abstract:
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introd…
▽ More
Artificial Intelligence (AI) has become commonplace to solve routine everyday tasks. Because of the exponential growth in medical imaging data volume and complexity, the workload on radiologists is steadily increasing. We project that the gap between the number of imaging exams and the number of expert radiologist readers required to cover this increase will continue to expand, consequently introducing a demand for AI-based tools that improve the efficiency with which radiologists can comfortably interpret these exams. AI has been shown to improve efficiency in medical-image generation, processing, and interpretation, and a variety of such AI models have been developed across research labs worldwide. However, very few of these, if any, find their way into routine clinical use, a discrepancy that reflects the divide between AI research and successful AI translation. To address the barrier to clinical deployment, we have formed MONAI Consortium, an open-source community which is building standards for AI deployment in healthcare institutions, and developing tools and infrastructure to facilitate their implementation. This report represents several years of weekly discussions and hands-on problem solving experience by groups of industry experts and clinicians in the MONAI Consortium. We identify barriers between AI-model development in research labs and subsequent clinical deployment and propose solutions. Our report provides guidance on processes which take an imaging AI model from development to clinical implementation in a healthcare institution. We discuss various AI integration points in a clinical Radiology workflow. We also present a taxonomy of Radiology AI use-cases. Through this report, we intend to educate the stakeholders in healthcare and AI (AI researchers, radiologists, imaging informaticists, and regulators) about cross-disciplinary challenges and possible solutions.
△ Less
Submitted 8 May, 2023; v1 submitted 29 December, 2022;
originally announced December 2022.
-
All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management
Authors:
Yifan Gong,
Zheng Zhan,
Pu Zhao,
Yushu Wu,
Chao Wu,
Caiwen Ding,
Weiwen Jiang,
Minghai Qin,
Yanzhi Wang
Abstract:
During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often c…
▽ More
During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors
Authors:
Sizhe Chen,
Geng Yuan,
Xinwen Cheng,
Yifan Gong,
Minghai Qin,
Yanzhi Wang,
Xiaolin Huang
Abstract:
As data becomes increasingly vital, a company would be very cautious about releasing data, because the competitors could use it to train high-performance models, thereby posing a tremendous threat to the company's commercial competence. To prevent training good models on the data, we could add imperceptible perturbations to it. Since such perturbations aim at hurting the entire training process, t…
▽ More
As data becomes increasingly vital, a company would be very cautious about releasing data, because the competitors could use it to train high-performance models, thereby posing a tremendous threat to the company's commercial competence. To prevent training good models on the data, we could add imperceptible perturbations to it. Since such perturbations aim at hurting the entire training process, they should reflect the vulnerability of DNN training, rather than that of a single model. Based on this new idea, we seek perturbed examples that are always unrecognized (never correctly classified) in training. In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures. That is, our amazing performance of ensemble only requires the computation of training one model. By extensive experiments with 9 baselines on 3 datasets and 5 architectures, SEP is verified to be a new state-of-the-art, e.g., our small $\ell_\infty=2/255$ perturbations reduce the accuracy of a CIFAR-10 ResNet18 from 94.56% to 14.68%, compared to 41.35% by the best-known method. Code is available at https://github.com/Sizhe-Chen/SEP.
△ Less
Submitted 12 April, 2023; v1 submitted 21 November, 2022;
originally announced November 2022.
-
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Authors:
Zhenglun Kong,
Haoyu Ma,
Geng Yuan,
Mengshu Sun,
Yanyue Xie,
Peiyan Dong,
Xin Meng,
Xuan Shen,
Hao Tang,
Minghai Qin,
Tianlong Chen,
Xiaolong Ma,
Xiaohui Xie,
Zhangyang Wang,
Yanzhi Wang
Abstract:
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points…
▽ More
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT.
△ Less
Submitted 19 November, 2022;
originally announced November 2022.
-
Data Level Lottery Ticket Hypothesis for Vision Transformers
Authors:
Xuan Shen,
Zhenglun Kong,
Minghai Qin,
Peiyan Dong,
Geng Yuan,
Xin Meng,
Hao Tang,
Xiaolong Ma,
Yanzhi Wang
Abstract:
The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the…
▽ More
The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at the weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the em winning tickets, which represent a significant amount of information in the input data. We use a ticket selector to generate the winning tickets based on the informativeness of patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. The experiments show that there is a clear difference between the performance of models trained with winning tickets and randomly selected subsets, which verifies our proposed theory. We elaborate on the analogical similarity between our proposed Data-LTH-ViTs and the conventional LTH to further verify the integrity of our theory. The Source codes are available at https://github.com/shawnricecake/vit-lottery-ticket-input.
△ Less
Submitted 29 May, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Temporal Link Prediction: A Unified Framework, Taxonomy, and Review
Authors:
Meng Qin,
Dit-Yan Yeung
Abstract:
Dynamic graphs serve as a generic abstraction and description of the evolutionary behaviors of various complex systems (e.g., social networks and communication networks). Temporal link prediction (TLP) is a classic yet challenging inference task on dynamic graphs, which predicts possible future linkage based on historical topology. The predicted future topology can be used to support some advanced…
▽ More
Dynamic graphs serve as a generic abstraction and description of the evolutionary behaviors of various complex systems (e.g., social networks and communication networks). Temporal link prediction (TLP) is a classic yet challenging inference task on dynamic graphs, which predicts possible future linkage based on historical topology. The predicted future topology can be used to support some advanced applications on real-world systems (e.g., resource pre-allocation) for better system performance. This survey provides a comprehensive review of existing TLP methods. Concretely, we first give the formal problem statements and preliminaries regarding data models, task settings, and learning paradigms that are commonly used in related research. A hierarchical fine-grained taxonomy is further introduced to categorize existing methods in terms of their data models, learning paradigms, and techniques. From a generic perspective, we propose a unified encoder-decoder framework to formulate all the methods reviewed, where different approaches only differ in terms of some components of the framework. Moreover, we envision serving the community with an open-source project OpenTLP that refactors or implements some representative TLP methods using the proposed unified framework and summarizes other public resources. As a conclusion, we finally discuss advanced topics in recent research and highlight possible future directions.
△ Less
Submitted 29 June, 2023; v1 submitted 17 October, 2022;
originally announced October 2022.
-
Trading off Quality for Efficiency of Community Detection: An Inductive Method across Graphs
Authors:
Meng Qin,
Chaorui Zhang,
Bo Bai,
Gong Zhang,
Dit-Yan Yeung
Abstract:
Many network applications can be formulated as NP-hard combinatorial optimization problems of community detection (CD). Due to the NP-hardness, to balance the CD quality and efficiency remains a challenge. Most existing CD methods are transductive, which are independently optimized only for the CD on a single graph. Some of these methods use advanced machine learning techniques to obtain high-qual…
▽ More
Many network applications can be formulated as NP-hard combinatorial optimization problems of community detection (CD). Due to the NP-hardness, to balance the CD quality and efficiency remains a challenge. Most existing CD methods are transductive, which are independently optimized only for the CD on a single graph. Some of these methods use advanced machine learning techniques to obtain high-quality CD results but usually have high complexity. Other approaches use fast heuristic approximation to ensure low runtime but may suffer from quality degradation. In contrast to these transductive methods, we propose an alternative inductive community detection (ICD) method across graphs of a system or scenario to alleviate the NP-hard challenge. ICD first conducts the offline training of an adversarial dual GNN on historical graphs to capture key properties of the system. The trained model is then directly generalized to new unseen graphs for online CD without additional optimization, where a better trade-off between quality and efficiency can be achieved. ICD can also capture the permutation invariant community labels in the offline training and tackle the online CD on new graphs with non-fixed number of nodes and communities. Experiments on a set of benchmarks demonstrate that ICD can achieve a significant trade-off between quality and efficiency over various baselines.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution
Authors:
Yushu Wu,
Yifan Gong,
Pu Zhao,
Yanyu Li,
Zheng Zhan,
Wei Niu,
Hao Tang,
Minghai Qin,
Bin Ren,
Yanzhi Wang
Abstract:
Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this,…
▽ More
Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this, we propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. The inference speed is directly taken into the optimization along with the SR loss to derive SR models with high image quality while satisfying the real-time inference requirement. Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence. With the proposed framework, we achieve real-time SR inference for implementing 720p resolution with competitive SR performance (in terms of PSNR and SSIM) on GPU/DSP of mobile platforms (Samsung Galaxy S21).
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
Multi-Agent Reinforcement Learning for Traffic Signal Control through Universal Communication Method
Authors:
Qize Jiang,
Minhao Qin,
Shengmin Shi,
Weiwei Sun,
Baihua Zheng
Abstract:
How to coordinate the communication among intersections effectively in real complex traffic scenarios with multi-intersection is challenging. Existing approaches only enable the communication in a heuristic manner without considering the content/importance of information to be shared. In this paper, we propose a universal communication form UniComm between intersections. UniComm embeds massive obs…
▽ More
How to coordinate the communication among intersections effectively in real complex traffic scenarios with multi-intersection is challenging. Existing approaches only enable the communication in a heuristic manner without considering the content/importance of information to be shared. In this paper, we propose a universal communication form UniComm between intersections. UniComm embeds massive observations collected at one agent into crucial predictions of their impact on its neighbors, which improves the communication efficiency and is universal across existing methods. We also propose a concise network UniLight to make full use of communications enabled by UniComm. Experimental results on real datasets demonstrate that UniComm universally improves the performance of existing state-of-the-art methods, and UniLight significantly outperforms existing methods on a wide range of traffic situations.
△ Less
Submitted 26 April, 2022;
originally announced April 2022.
-
CHEX: CHannel EXploration for CNN Model Compression
Authors:
Zejiang Hou,
Minghai Qin,
Fei Sun,
Xiaolong Ma,
Kun Yuan,
Yi Xu,
Yen-Kuang Chen,
Rong Jin,
Yuan Xie,
Sun-Yuan Kung
Abstract:
Channel pruning has been broadly recognized as an effective technique to reduce the computation and memory cost of deep convolutional neural networks. However, conventional pruning methods have limitations in that: they are restricted to pruning process only, and they require a fully pre-trained large model. Such limitations may lead to sub-optimal model quality as well as excessive memory and tra…
▽ More
Channel pruning has been broadly recognized as an effective technique to reduce the computation and memory cost of deep convolutional neural networks. However, conventional pruning methods have limitations in that: they are restricted to pruning process only, and they require a fully pre-trained large model. Such limitations may lead to sub-optimal model quality as well as excessive memory and training cost. In this paper, we propose a novel Channel Exploration methodology, dubbed as CHEX, to rectify these problems. As opposed to pruning-only strategy, we propose to repeatedly prune and regrow the channels throughout the training process, which reduces the risk of pruning important channels prematurely. More exactly: From intra-layer's aspect, we tackle the channel pruning problem via a well known column subset selection (CSS) formulation. From inter-layer's aspect, our regrowing stages open a path for dynamically re-allocating the number of channels across all the layers under a global channel sparsity constraint. In addition, all the exploration process is done in a single training from scratch without the need of a pre-trained large model. Experimental results demonstrate that CHEX can effectively reduce the FLOPs of diverse CNN architectures on a variety of computer vision tasks, including image classification, object detection, instance segmentation, and 3D vision. For example, our compressed ResNet-50 model on ImageNet dataset achieves 76% top1 accuracy with only 25% FLOPs of the original ResNet-50 model, outperforming previous state-of-the-art channel pruning methods. The checkpoints and code are available at here .
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Shfl-BW: Accelerating Deep Neural Network Inference with Tensor-Core Aware Weight Pruning
Authors:
Guyue Huang,
Haoran Li,
Minghai Qin,
Fei Sun,
Yufei Ding,
Yuan Xie
Abstract:
Weight pruning in deep neural networks (DNNs) can reduce storage and computation cost, but struggles to bring practical speedup to the model inference time. Tensor-cores can significantly boost the throughput of GPUs on dense computation, but exploiting tensor-cores for sparse DNNs is very challenging. Compared to existing CUDA-cores, tensor-cores require higher data reuse and matrix-shaped instru…
▽ More
Weight pruning in deep neural networks (DNNs) can reduce storage and computation cost, but struggles to bring practical speedup to the model inference time. Tensor-cores can significantly boost the throughput of GPUs on dense computation, but exploiting tensor-cores for sparse DNNs is very challenging. Compared to existing CUDA-cores, tensor-cores require higher data reuse and matrix-shaped instruction granularity, both difficult to yield from sparse DNN kernels. Existing pruning approaches fail to balance the demands of accuracy and efficiency: random sparsity preserves the model quality well but prohibits tensor-core acceleration, while highly-structured block-wise sparsity can exploit tensor-cores but suffers from severe accuracy loss.
In this work, we propose a novel sparse pattern, Shuffled Block-wise sparsity (Shfl-BW), designed to efficiently utilize tensor-cores while minimizing the constraints on the weight structure. Our insight is that row- and column-wise permutation provides abundant flexibility for the weight structure, while introduces negligible overheads using our GPU kernel designs. We optimize the GPU kernels for Shfl-BW in linear and convolution layers. Evaluations show that our techniques can achieve the state-of-the-art speed-accuracy trade-offs on GPUs. For example, with small accuracy loss, we can accelerate the computation-intensive layers of Transformer by 1.81, 4.18 and 1.90 times on NVIDIA V100, T4 and A100 GPUs respectively at 75% sparsity.
△ Less
Submitted 11 March, 2022; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Adaptive Read Thresholds for NAND Flash
Authors:
Borja Peleato,
Rajiv Agarwal,
John Cioffi,
Minghai Qin,
Paul H. Siegel
Abstract:
A primary source of increased read time on NAND flash comes from the fact that in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of re-reads to characterize the noise distribution and recover the stored information. Both hard and soft decoding are cons…
▽ More
A primary source of increased read time on NAND flash comes from the fact that in the presence of noise, the flash medium must be read several times using different read threshold voltages for the decoder to succeed. This paper proposes an algorithm that uses a limited number of re-reads to characterize the noise distribution and recover the stored information. Both hard and soft decoding are considered. For hard decoding, the paper attempts to find a read threshold minimizing bit-error-rate (BER) and derives an expression for the resulting codeword-error-rate. For soft decoding, it shows that minimizing BER and minimizing codeword-error-rate are competing objectives in the presence of a limited number of allowed re-reads, and proposes a trade-off between the two.
The proposed method does not require any prior knowledge about the noise distribution, but can take advantage of such information when it is available. Each read threshold is chosen based on the results of previous reads, following an optimal policy derived through a dynamic programming backward recursion. The method and results are studied from the perspective of an SLC Flash memory with Gaussian noise for each level but the paper explains how the method could be extended to other scenarios.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
Authors:
Zhenglun Kong,
Peiyan Dong,
Xiaolong Ma,
Xin Meng,
Mengshu Sun,
Wei Niu,
Xuan Shen,
Geng Yuan,
Bin Ren,
Minghai Qin,
Hao Tang,
Yanzhi Wang
Abstract:
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Pruning, a traditional model compression paradigm for hardware efficiency, has been widely applied in various DNN structures. Nevertheless, it stays ambiguous on how to perform exclusive pru…
▽ More
Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult. Pruning, a traditional model compression paradigm for hardware efficiency, has been widely applied in various DNN structures. Nevertheless, it stays ambiguous on how to perform exclusive pruning on the ViT structure. Considering three key points: the structural characteristics, the internal data pattern of ViTs, and the related edge device deployment, we leverage the input token sparsity and propose a computation-aware soft pruning framework, which can be set up on vanilla Transformers of both flatten and CNN-type structures, such as Pooling-based ViT (PiT). More concretely, we design a dynamic attention-based multi-head token selector, which is a lightweight module for adaptive instance-wise token selection. We further introduce a soft pruning technique, which integrates the less informative tokens generated by the selector module into a package token that will participate in subsequent calculations rather than being completely discarded. Our framework is bound to the trade-off between accuracy and computation constraints of specific edge devices through our proposed computation-aware training strategy. Experimental results show that our framework significantly reduces the computation cost of ViTs while maintaining comparable performance on image classification. Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms. For example, our method reduces the latency of DeiT-T to 26 ms (26%$\sim $41% superior to existing works) on the mobile device with 0.25%$\sim $4% higher top-1 accuracy on ImageNet.
△ Less
Submitted 20 September, 2022; v1 submitted 27 December, 2021;
originally announced December 2021.
-
Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting
Authors:
Minghai Qin,
Tianyun Zhang,
Fei Sun,
Yen-Kuang Chen,
Makan Fardad,
Yanzhi Wang,
Yuan Xie
Abstract:
Deep neural networks (DNNs) have shown to provide superb performance in many real life applications, but their large computation cost and storage requirement have prevented them from being deployed to many edge and internet-of-things (IoT) devices. Sparse deep neural networks, whose majority weight parameters are zeros, can substantially reduce the computation complexity and memory consumption of…
▽ More
Deep neural networks (DNNs) have shown to provide superb performance in many real life applications, but their large computation cost and storage requirement have prevented them from being deployed to many edge and internet-of-things (IoT) devices. Sparse deep neural networks, whose majority weight parameters are zeros, can substantially reduce the computation complexity and memory consumption of the models. In real-use scenarios, devices may suffer from large fluctuations of the available computation and memory resources under different environment, and the quality of service (QoS) is difficult to maintain due to the long tail inferences with large latency. Facing the real-life challenges, we propose to train a sparse model that supports multiple sparse levels. That is, a hierarchical structure of weights are satisfied such that the locations and the values of the non-zero parameters of the more-sparse sub-model area subset of the less-sparse sub-model. In this way, one can dynamically select the appropriate sparsity level during inference, while the storage cost is capped by the least sparse sub-model. We have verified our methodologies on a variety of DNN models and tasks, including the ResNet-50, PointNet++, GNMT, and graph attention networks. We obtain sparse sub-models with an average of 13.38% weights and 14.97% FLOPs, while the accuracies are as good as their dense counterparts. More-sparse sub-models with 5.38% weights and 4.47% of FLOPs, which are subsets of the less-sparse ones, can be obtained with only 3.25% relative accuracy loss.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks
Authors:
Fei Sun,
Minghai Qin,
Tianyun Zhang,
Xiaolong Ma,
Haoran Li,
Junwen Luo,
Zihao Zhao,
Yen-Kuang Chen,
Yuan Xie
Abstract:
Deep neural networks (DNNs) have been proven to be effective in solving many real-life problems, but its high computation cost prohibits those models from being deployed to edge devices. Pruning, as a method to introduce zeros to model weights, has shown to be an effective method to provide good trade-offs between model accuracy and computation efficiency, and is a widely-used method to generate c…
▽ More
Deep neural networks (DNNs) have been proven to be effective in solving many real-life problems, but its high computation cost prohibits those models from being deployed to edge devices. Pruning, as a method to introduce zeros to model weights, has shown to be an effective method to provide good trade-offs between model accuracy and computation efficiency, and is a widely-used method to generate compressed models. However, the granularity of pruning makes important trade-offs. At the same sparsity level, a coarse-grained structured sparse pattern is more efficient on conventional hardware but results in worse accuracy, while a fine-grained unstructured sparse pattern can achieve better accuracy but is inefficient on existing hardware.
On the other hand, some modern processors are equipped with fast on-chip scratchpad memories and gather/scatter engines that perform indirect load and store operations on such memories. In this work, we propose a set of novel sparse patterns, named gather-scatter (GS) patterns, to utilize the scratchpad memories and gather/scatter engines to speed up neural network inferences. Correspondingly, we present a compact sparse format. The proposed set of sparse patterns, along with a novel pruning methodology, address the load imbalance issue and result in models with quality close to unstructured sparse models and computation efficiency close to structured sparse models. Our experiments show that GS patterns consistently make better trade-offs between accuracy and computation efficiency compared to conventional structured sparse patterns. GS patterns can reduce the runtime of the DNN components by two to three times at the same accuracy levels. This is confirmed on three different deep learning tasks and popular models, namely, GNMT for machine translation, ResNet50 for image recognition, and Japser for acoustic speech recognition.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.