-
T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy
Authors:
Fan Duan,
Jiahao Yu,
Li Chen
Abstract:
Point clouds are commonly used in various practical applications such as autonomous driving and the manufacturing industry. However, these point clouds often suffer from incompleteness due to limited perspectives, scanner resolution and occlusion. Therefore the prediction of missing parts performs a crucial task. In this paper, we propose a novel method for point cloud completion. We utilize a sph…
▽ More
Point clouds are commonly used in various practical applications such as autonomous driving and the manufacturing industry. However, these point clouds often suffer from incompleteness due to limited perspectives, scanner resolution and occlusion. Therefore the prediction of missing parts performs a crucial task. In this paper, we propose a novel method for point cloud completion. We utilize a spherical template to guide the generation of the coarse complete template and generate the dynamic query tokens through a correspondence pooling (Corres-Pooling) query generator. Specifically, we first generate the coarse complete template by embedding a Gaussian spherical template into the partial input and transforming the template to best match the input. Then we use the Corres-Pooling query generator to refine the coarse template and generate dynamic query tokens which could be used to predict the complete point proxies. Finally, we generate the complete point cloud with a FoldingNet following the coarse-to-fine paradigm, according to the fine template and the predicted point proxies. Experimental results demonstrate that our T-CorresNet outperforms the state-of-the-art methods on several benchmarks. Our Codes are available at https://github.com/df-boy/T-CorresNet.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
FLOW: Fusing and Shuffling Global and Local Views for Cross-User Human Activity Recognition with IMUs
Authors:
Qi Qiu,
Tao Zhu,
Furong Duan,
Kevin I-Kai Wang,
Liming Chen,
Mingxing Nie,
Mingxing Nie
Abstract:
Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary…
▽ More
Inertial Measurement Unit (IMU) sensors are widely employed for Human Activity Recognition (HAR) due to their portability, energy efficiency, and growing research interest. However, a significant challenge for IMU-HAR models is achieving robust generalization performance across diverse users. This limitation stems from substantial variations in data distribution among individual users. One primary reason for this distribution disparity lies in the representation of IMU sensor data in the local coordinate system, which is susceptible to subtle user variations during IMU wearing. To address this issue, we propose a novel approach that extracts a global view representation based on the characteristics of IMU data, effectively alleviating the data distribution discrepancies induced by wearing styles. To validate the efficacy of the global view representation, we fed both global and local view data into model for experiments. The results demonstrate that global view data significantly outperforms local view data in cross-user experiments. Furthermore, we propose a Multi-view Supervised Network (MVFNet) based on Shuffling to effectively fuse local view and global view data. It supervises the feature extraction of each view through view division and view shuffling, so as to avoid the model ignoring important features as much as possible. Extensive experiments conducted on OPPORTUNITY and PAMAP2 datasets demonstrate that the proposed algorithm outperforms the current state-of-the-art methods in cross-user HAR.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Authors:
Haoran Que,
Jiaheng Liu,
Ge Zhang,
Chenchen Zhang,
Xingwei Qu,
Yinghao Ma,
Feiyu Duan,
Zhiqi Bai,
Jiakai Wang,
Yuanxing Zhang,
Xu Tan,
Jie Fu,
Wenbo Su,
Jiamang Wang,
Lin Qu,
Bo Zheng
Abstract:
Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually…
▽ More
Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to expand the model's fundamental understanding of specific downstream domains (e.g., math and code). For the CPT on domain-specific LLMs, one important question is how to choose the optimal mixture ratio between the general-corpus (e.g., Dolma, Slim-pajama) and the downstream domain-corpus. Existing methods usually adopt laborious human efforts by grid-searching on a set of mixture ratios, which require high GPU training consumption costs. Besides, we cannot guarantee the selected ratio is optimal for the specific domain. To address the limitations of existing methods, inspired by the Scaling Law for performance prediction, we propose to investigate the Scaling Law of the Domain-specific Continual Pre-Training (D-CPT Law) to decide the optimal mixture ratio with acceptable training costs for LLMs of different sizes. Specifically, by fitting the D-CPT Law, we can easily predict the general and downstream performance of arbitrary mixture ratios, model sizes, and dataset sizes using small-scale training costs on limited experiments. Moreover, we also extend our standard D-CPT Law on cross-domain settings and propose the Cross-Domain D-CPT Law to predict the D-CPT law of target domains, where very small training costs (about 1% of the normal training costs) are needed for the target domains. Comprehensive experimental results on six downstream domains demonstrate the effectiveness and generalizability of our proposed D-CPT Law and Cross-Domain D-CPT Law.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation
Authors:
Keheng Wang,
Feiyu Duan,
Peiguang Li,
Sirui Wang,
Xunliang Cai
Abstract:
Retrieval-Augmented Generation (RAG) demonstrates great value in alleviating outdated knowledge or hallucination by supplying LLMs with updated and relevant knowledge. However, there are still several difficulties for RAG in understanding complex multi-hop query and retrieving relevant documents, which require LLMs to perform reasoning and retrieve step by step. Inspired by human's reasoning proce…
▽ More
Retrieval-Augmented Generation (RAG) demonstrates great value in alleviating outdated knowledge or hallucination by supplying LLMs with updated and relevant knowledge. However, there are still several difficulties for RAG in understanding complex multi-hop query and retrieving relevant documents, which require LLMs to perform reasoning and retrieve step by step. Inspired by human's reasoning process in which they gradually search for the required information, it is natural to ask whether the LLMs could notice the missing information in each reasoning step. In this work, we first experimentally verified the ability of LLMs to extract information as well as to know the missing. Based on the above discovery, we propose a Missing Information Guided Retrieve-Extraction-Solving paradigm (MIGRES), where we leverage the identification of missing information to generate a targeted query that steers the subsequent knowledge retrieval. Besides, we design a sentence-level re-ranking filtering approach to filter the irrelevant content out from document, along with the information extraction capability of LLMs to extract useful information from cleaned-up documents, which in turn to bolster the overall efficacy of RAG. Extensive experiments conducted on multiple public datasets reveal the superiority of the proposed MIGRES method, and analytical experiments demonstrate the effectiveness of our proposed modules.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
HARMamba: Efficient Wearable Sensor Human Activity Recognition Based on Bidirectional Selective SSM
Authors:
Shuangjian Li,
Tao Zhu,
Furong Duan,
Liming Chen,
Huansheng Ning,
Christopher Nugent,
Yaping Wan
Abstract:
Wearable sensor-based human activity recognition (HAR) is a critical research domain in activity perception. However, achieving high efficiency and long sequence recognition remains a challenge. Despite the extensive investigation of temporal deep learning models, such as CNNs, RNNs, and transformers, their extensive parameters often pose significant computational and memory constraints, rendering…
▽ More
Wearable sensor-based human activity recognition (HAR) is a critical research domain in activity perception. However, achieving high efficiency and long sequence recognition remains a challenge. Despite the extensive investigation of temporal deep learning models, such as CNNs, RNNs, and transformers, their extensive parameters often pose significant computational and memory constraints, rendering them less suitable for resource-constrained mobile health applications. This study introduces HARMamba, an innovative light-weight and versatile HAR architecture that combines selective bidirectional SSM and hardware-aware design. To optimize real-time resource consumption in practical scenarios, HARMamba employs linear recursive mechanisms and parameter discretization, allowing it to selectively focus on relevant input sequences while efficiently fusing scan and recompute operations. To address potential issues with invalid sensor data, the system processes the data stream through independent channels, dividing each channel into "patches" and appending classification token to the end of the sequence. Position embeddings are incorporated to represent the sequence order, and the activity categories are output through a classification header. The HARMamba Block serves as the fundamental component of the HARMamba architecture, enabling the effective capture of more discriminative activity sequence features. HARMamba outperforms contemporary state-of-the-art frameworks, delivering comparable or better accuracy with significantly reducing computational and memory demands. It's effectiveness has been extensively validated on public datasets like PAMAP2, WISDM, UNIMIB SHAR and UCI, showcasing impressive results.
△ Less
Submitted 2 May, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
LFSRDiff: Light Field Image Super-Resolution via Diffusion Models
Authors:
Wentao Chao,
Fuqing Duan,
Xuechun Wang,
Yingqian Wang,
Guanghui Wang
Abstract:
Light field (LF) image super-resolution (SR) is a challenging problem due to its inherent ill-posed nature, where a single low-resolution (LR) input LF image can correspond to multiple potential super-resolved outcomes. Despite this complexity, mainstream LF image SR methods typically adopt a deterministic approach, generating only a single output supervised by pixel-wise loss functions. This tend…
▽ More
Light field (LF) image super-resolution (SR) is a challenging problem due to its inherent ill-posed nature, where a single low-resolution (LR) input LF image can correspond to multiple potential super-resolved outcomes. Despite this complexity, mainstream LF image SR methods typically adopt a deterministic approach, generating only a single output supervised by pixel-wise loss functions. This tendency often results in blurry and unrealistic results. Although diffusion models can capture the distribution of potential SR results by iteratively predicting Gaussian noise during the denoising process, they are primarily designed for general images and struggle to effectively handle the unique characteristics and information present in LF images. To address these limitations, we introduce LFSRDiff, the first diffusion-based LF image SR model, by incorporating the LF disentanglement mechanism. Our novel contribution includes the introduction of a disentangled U-Net for diffusion models, enabling more effective extraction and fusion of both spatial and angular information within LF images. Through comprehensive experimental evaluations and comparisons with the state-of-the-art LF image SR methods, the proposed approach consistently produces diverse and realistic SR results. It achieves the highest perceptual metric in terms of LPIPS. It also demonstrates the ability to effectively control the trade-off between perception and distortion. The code is available at \url{https://github.com/chaowentao/LFSRDiff}.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering
Authors:
Keheng Wang,
Feiyu Duan,
Sirui Wang,
Peiguang Li,
Yunsen Xian,
Chuantao Yin,
Wenge Rong,
Zhang Xiong
Abstract:
Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come with incorrect or unfaithful intermediate reasoning steps, especially in the context of answering knowledge-intensive tasks such as KBQA. To alleviate this issue…
▽ More
Equipped with Chain-of-Thought (CoT), Large language models (LLMs) have shown impressive reasoning ability in various downstream tasks. Even so, suffering from hallucinations and the inability to access external knowledge, LLMs often come with incorrect or unfaithful intermediate reasoning steps, especially in the context of answering knowledge-intensive tasks such as KBQA. To alleviate this issue, we propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge, and thus overcome the hallucinations and error propagation. Concretely, we formulate the CoT rationale process of LLMs into a structured multi-round QA format. In each round, LLMs interact with a QA system that retrieves external knowledge and produce faithful reasoning traces based on retrieved precise answers. The structured CoT reasoning of LLMs is facilitated by our developed KBQA CoT collection, which serves as in-context learning demonstrations and can also be utilized as feedback augmentation to train a robust retriever. Extensive experiments on WebQSP and ComplexWebQuestion datasets demonstrate the effectiveness of proposed KD-CoT in task-solving reasoning generation, which outperforms the vanilla CoT ICL with an absolute success rate of 8.0% and 5.1%. Furthermore, our proposed feedback-augmented retriever outperforms the state-of-the-art baselines for retrieving knowledge, achieving significant improvement in Hit and recall performance. Our code and data are released on https://github.com/AdelWang/KD-CoT/tree/main.
△ Less
Submitted 28 October, 2023; v1 submitted 25 August, 2023;
originally announced August 2023.
-
OccCasNet: Occlusion-aware Cascade Cost Volume for Light Field Depth Estimation
Authors:
Wentao Chao,
Fuqing Duan,
Xuechun Wang,
Yingqian Wang,
Guanghui Wang
Abstract:
Light field (LF) depth estimation is a crucial task with numerous practical applications. However, mainstream methods based on the multi-view stereo (MVS) are resource-intensive and time-consuming as they need to construct a finer cost volume. To address this issue and achieve a better trade-off between accuracy and efficiency, we propose an occlusion-aware cascade cost volume for LF depth (dispar…
▽ More
Light field (LF) depth estimation is a crucial task with numerous practical applications. However, mainstream methods based on the multi-view stereo (MVS) are resource-intensive and time-consuming as they need to construct a finer cost volume. To address this issue and achieve a better trade-off between accuracy and efficiency, we propose an occlusion-aware cascade cost volume for LF depth (disparity) estimation. Our cascaded strategy reduces the sampling number while keeping the sampling interval constant during the construction of a finer cost volume. We also introduce occlusion maps to enhance accuracy in constructing the occlusion-aware cost volume. Specifically, we first obtain the coarse disparity map through the coarse disparity estimation network. Then, the sub-aperture images (SAIs) of side views are warped to the center view based on the initial disparity map. Next, we propose photo-consistency constraints between the warped SAIs and the center SAI to generate occlusion maps for each SAI. Finally, we introduce the coarse disparity map and occlusion maps to construct an occlusion-aware refined cost volume, enabling the refined disparity estimation network to yield a more precise disparity map. Extensive experiments demonstrate the effectiveness of our method. Compared with state-of-the-art methods, our method achieves a superior balance between accuracy and efficiency and ranks first in terms of MSE and Q25 metrics among published methods on the HCI 4D benchmark. The code and model of the proposed method are available at https://github.com/chaowentao/OccCasNet.
△ Less
Submitted 28 May, 2023;
originally announced May 2023.
-
A Multi-Task Deep Learning Approach for Sensor-based Human Activity Recognition and Segmentation
Authors:
Furong Duan,
Tao Zhu,
Jinqiang Wang,
Liming Chen,
Huansheng Ning,
Yaping Wan
Abstract:
Sensor-based human activity segmentation and recognition are two important and challenging problems in many real-world applications and they have drawn increasing attention from the deep learning community in recent years. Most of the existing deep learning works were designed based on pre-segmented sensor streams and they have treated activity segmentation and recognition as two separate tasks. I…
▽ More
Sensor-based human activity segmentation and recognition are two important and challenging problems in many real-world applications and they have drawn increasing attention from the deep learning community in recent years. Most of the existing deep learning works were designed based on pre-segmented sensor streams and they have treated activity segmentation and recognition as two separate tasks. In practice, performing data stream segmentation is very challenging. We believe that both activity segmentation and recognition may convey unique information which can complement each other to improve the performance of the two tasks. In this paper, we firstly proposes a new multitask deep neural network to solve the two tasks simultaneously. The proposed neural network adopts selective convolution and features multiscale windows to segment activities of long or short time durations. First, multiple windows of different scales are generated to center on each unit of the feature sequence. Then, the model is trained to predict, for each window, the activity class and the offset to the true activity boundaries. Finally, overlapping windows are filtered out by non-maximum suppression, and adjacent windows of the same activity are concatenated to complete the segmentation task. Extensive experiments were conducted on eight popular benchmarking datasets, and the results show that our proposed method outperforms the state-of-the-art methods both for activity recognition and segmentation.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Enabling Temporal-Spectral Decoding in Pre-movement Detection
Authors:
Hao Jia,
Feng Duan,
Yu Zhang,
Zhe Sun,
Jordi Sole-Casals
Abstract:
Non-invasive brain-computer interfaces help the subjects to control external devices by brain intentions. The multi-class classification of upper limb movements can provide external devices with more control commands. The onsets of the upper limb movements are located by the external limb trajectory to eliminate the delay and bias among the trials. However, the trajectories are not recorded due to…
▽ More
Non-invasive brain-computer interfaces help the subjects to control external devices by brain intentions. The multi-class classification of upper limb movements can provide external devices with more control commands. The onsets of the upper limb movements are located by the external limb trajectory to eliminate the delay and bias among the trials. However, the trajectories are not recorded due to the limitation of experiments. The delay cannot be avoided in the analysis of signals. The delay negatively influences the classification performance, which limits the further application of upper limb movements in the brain-computer interface. This work focuses on multi-channel brain signals analysis in the temporal-frequency approach. It proposes the two-stage-training temporal-spectral neural network (TTSNet) to decode patterns from brain signals. The TTSNet first divides the signals into various filter banks. In each filter bank, task-related component analysis is used to reduce the dimension and reject the noise of the brain. A convolutional neural network (CNN) is then used to optimize the temporal characteristic of signals and extract class-related features. Finally, these class-related features from all filter banks are fused by concatenation and classified by the fully connected layer of the CNN. The proposed method is evaluated in two public datasets. The results show that TTSNet has an improved accuracy of 0.7456$\pm$0.1205 compared to the EEGNet of 0.6506$\pm$0.1275 ($p<0.05$) and FBTRCA of 0.6787$\pm$0.1260 ($p<0.1$) in the movement detection task, which classifies the movement state and the resting state. The proposed method is expected to help detect limb movements and assist in the rehabilitation of stroke patients.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Learning Sub-Pixel Disparity Distribution for Light Field Depth Estimation
Authors:
Wentao Chao,
Xuechun Wang,
Yingqian Wang,
Guanghui Wang,
Fuqing Duan
Abstract:
Light field (LF) depth estimation plays a crucial role in many LF-based applications. Existing LF depth estimation methods consider depth estimation as a regression problem, where a pixel-wise L1 loss is employed to supervise the training process. However, the disparity map is only a sub-space projection (i.e., an expectation) of the disparity distribution, which is essential for models to learn.…
▽ More
Light field (LF) depth estimation plays a crucial role in many LF-based applications. Existing LF depth estimation methods consider depth estimation as a regression problem, where a pixel-wise L1 loss is employed to supervise the training process. However, the disparity map is only a sub-space projection (i.e., an expectation) of the disparity distribution, which is essential for models to learn. In this paper, we propose a simple yet effective method to learn the sub-pixel disparity distribution by fully utilizing the power of deep networks, especially for LF of narrow baselines. We construct the cost volume at the sub-pixel level to produce a finer disparity distribution and design an uncertainty-aware focal loss to supervise the predicted disparity distribution toward the ground truth. Extensive experimental results demonstrate the effectiveness of our method.Our method significantly outperforms recent state-of-the-art LF depth algorithms on the HCI 4D LF Benchmark in terms of all four accuracy metrics (i.e., BadPix 0.01, BadPix 0.03, BadPix 0.07, and MSE $\times$100). The code and model of the proposed method are available at \url{https://github.com/chaowentao/SubFocal}.
△ Less
Submitted 21 November, 2023; v1 submitted 20 August, 2022;
originally announced August 2022.
-
Towards Multi-class Pre-movement Classification
Authors:
Hao Jia,
Zhe Sun,
Feng Duan,
Yu Zhang,
Cesar F. Caiafa,
Jordi Solé-Casals
Abstract:
In non-invasive brain-computer interface systems, pre-movement decoding plays an important role in the detection of movement before limbs actually move. Movement-related cortical potential is a kind of brain activity associated with pre-movement decoding. In current studies, patterns decoded from movement are mainly applied to the binary classification between movement state and resting state, suc…
▽ More
In non-invasive brain-computer interface systems, pre-movement decoding plays an important role in the detection of movement before limbs actually move. Movement-related cortical potential is a kind of brain activity associated with pre-movement decoding. In current studies, patterns decoded from movement are mainly applied to the binary classification between movement state and resting state, such as elbow flexion and rest. The classifications between two movement states and among multiple movement states are still challenging. This study proposes a new method, the star-arrangement spectral filtering (SASF), to solve the multi-class pre-movement classification problem. We first design a referenced task-related component analysis (RTRCA) framework that consists of two modules. This first module is the classification between movement state and resting state; the second module is the classification of multiple movement states. SASF is developed by optimizing the features in RTRCA. In SASF, feature selection on filter banks is used on the first module of RTRCA, and feature selection on time windows is used on the second module of RTRCA. A linear discriminant analysis classifier is used to classify the optimized features. In the binary classification between two motions, the classification accuracy of SASF achieves 0.9670$\pm$0.0522, which is significantly higher than the result provided by the deep convolutional neural network (0.6247$\pm$0.0680) and the discriminative spatial pattern method (0.4400$\pm$0.0700). In the multi-class classification of 7 states, the classification accuracy of SASF is 0.9491$\pm$0.0372. The proposed SASF greatly improves the classification between two motions and enables the classification among multiple motions. The result shows that the movement can be decoded from EEG signals before the actual limb movement.
△ Less
Submitted 6 October, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.
-
Improving Pre-movement Pattern Detection with Filter Bank Selection
Authors:
Hao Jia,
Zhe Sun,
Feng Duan,
Yu Zhang,
Cesar F. Caiafa,
Jordi Solé-Casals
Abstract:
Pre-movement decoding plays an important role in movement detection and is able to detect movement onset with low-frequency electroencephalogram (EEG) signals before the limb moves. In related studies, pre-movement decoding with standard task-related component analysis (STRCA) has been demonstrated to be efficient for classification between movement state and resting state. However, the accuracies…
▽ More
Pre-movement decoding plays an important role in movement detection and is able to detect movement onset with low-frequency electroencephalogram (EEG) signals before the limb moves. In related studies, pre-movement decoding with standard task-related component analysis (STRCA) has been demonstrated to be efficient for classification between movement state and resting state. However, the accuracies of STRCA differ among subbands in the frequency domain. Due to individual differences, the best subband differs among subjects and is difficult to be determined. This study aims to improve the performance of the STRCA method by a feature selection on multiple subbands and avoid the selection of best subbands. This study first compares three frequency range settings ($M_1$: subbands with equally spaced bandwidths; $M_2$: subbands whose high cut-off frequencies are twice the low cut-off frequencies; $M_3$: subbands that start at some specific fixed frequencies and end at the frequencies in an arithmetic sequence.). Then, we develop a mutual information based technique to select the features in these subbands. A binary support vector machine classifier is used to classify the selected essential features. The results show that $M_3$ is a better setting than the other two settings. With the filter banks in $M_3$, the classification accuracy of the proposed FBTRCA achieves 0.8700$\pm$0.1022, which means a significantly improved performance compared to STRCA (0.8287$\pm$0.1101) as well as to the cross validation and testing method (0.8431$\pm$0.1078).
△ Less
Submitted 28 January, 2022;
originally announced January 2022.
-
DecisionHoldem: Safe Depth-Limited Solving With Diverse Opponents for Imperfect-Information Games
Authors:
Qibin Zhou,
Dongdong Bai,
Junge Zhang,
Fuqing Duan,
Kaiqi Huang
Abstract:
An imperfect-information game is a type of game with asymmetric information. It is more common in life than perfect-information game. Artificial intelligence (AI) in imperfect-information games, such like poker, has made considerable progress and success in recent years. The great success of superhuman poker AI, such as Libratus and Deepstack, attracts researchers to pay attention to poker researc…
▽ More
An imperfect-information game is a type of game with asymmetric information. It is more common in life than perfect-information game. Artificial intelligence (AI) in imperfect-information games, such like poker, has made considerable progress and success in recent years. The great success of superhuman poker AI, such as Libratus and Deepstack, attracts researchers to pay attention to poker research. However, the lack of open-source code limits the development of Texas hold'em AI to some extent. This article introduces DecisionHoldem, a high-level AI for heads-up no-limit Texas hold'em with safe depth-limited subgame solving by considering possible ranges of opponent's private hands to reduce the exploitability of the strategy. Experimental results show that DecisionHoldem defeats the strongest openly available agent in heads-up no-limit Texas hold'em poker, namely Slumbot, and a high-level reproduction of Deepstack, viz, Openstack, by more than 730 mbb/h (one-thousandth big blind per round) and 700 mbb/h. Moreover, we release the source codes and tools of DecisionHoldem to promote AI development in imperfect-information games.
△ Less
Submitted 28 May, 2024; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Evaluating importance of nodes in complex networks with local volume information dimension
Authors:
Hanwen Li,
Qiuyan Shang,
Fangzheng Duan,
Yong Deng
Abstract:
How to evaluate the importance of nodes is essential in research of complex network. There are many methods proposed for solving this problem, but they still have room to be improved. In this paper, a new approach called local volume information dimension is proposed. In this method, the sum of degree of nodes within different distances of central node is calculated. The information within the cer…
▽ More
How to evaluate the importance of nodes is essential in research of complex network. There are many methods proposed for solving this problem, but they still have room to be improved. In this paper, a new approach called local volume information dimension is proposed. In this method, the sum of degree of nodes within different distances of central node is calculated. The information within the certain distance is described by the information entropy. Compared to other methods, the proposed method considers the information of the nodes from different distances more comprehensively. For the purpose of showing the effectiveness of the proposed method, experiments on real-world networks are implemented. Promising results indicate the effectiveness of the proposed method.
△ Less
Submitted 24 March, 2022; v1 submitted 23 November, 2021;
originally announced November 2021.
-
L3C-Stereo: Lossless Compression for Stereo Images
Authors:
Zihao Huang,
Zhe Sun,
Feng Duan,
Andrzej Cichocki,
Peiying Ruan,
Chao Li
Abstract:
A large number of autonomous driving tasks need high-definition stereo images, which requires a large amount of storage space. Efficiently executing lossless compression has become a practical problem. Commonly, it is hard to make accurate probability estimates for each pixel. To tackle this, we propose L3C-Stereo, a multi-scale lossless compression model consisting of two main modules: the warpin…
▽ More
A large number of autonomous driving tasks need high-definition stereo images, which requires a large amount of storage space. Efficiently executing lossless compression has become a practical problem. Commonly, it is hard to make accurate probability estimates for each pixel. To tackle this, we propose L3C-Stereo, a multi-scale lossless compression model consisting of two main modules: the warping module and the probability estimation module. The warping module takes advantage of two view feature maps from the same domain to generate a disparity map, which is used to reconstruct the right view so as to improve the confidence of the probability estimate of the right view. The probability estimation module provides pixel-wise logistic mixture distributions for adaptive arithmetic coding. In the experiments, our method outperforms the hand-crafted compression methods and the learning-based method on all three datasets used. Then, we show that a better maximum disparity can lead to a better compression effect. Furthermore, thanks to a compression property of our model, it naturally generates a disparity map of an acceptable quality for the subsequent stereo tasks.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Serial-EMD: Fast Empirical Mode Decomposition Method for Multi-dimensional Signals Based on Serialization
Authors:
Jin Zhang,
Fan Feng,
Pere Marti-Puig,
Cesar F. Caiafa,
Zhe Sun,
Feng Duan,
Jordi Solé-Casals
Abstract:
Empirical mode decomposition (EMD) has developed into a prominent tool for adaptive, scale-based signal analysis in various fields like robotics, security and biomedical engineering. Since the dramatic increase in amount of data puts forward higher requirements for the capability of real-time signal analysis, it is difficult for existing EMD and its variants to trade off the growth of data dimensi…
▽ More
Empirical mode decomposition (EMD) has developed into a prominent tool for adaptive, scale-based signal analysis in various fields like robotics, security and biomedical engineering. Since the dramatic increase in amount of data puts forward higher requirements for the capability of real-time signal analysis, it is difficult for existing EMD and its variants to trade off the growth of data dimension and the speed of signal analysis. In order to decompose multi-dimensional signals at a faster speed, we present a novel signal-serialization method (serial-EMD), which concatenates multi-variate or multi-dimensional signals into a one-dimensional signal and uses various one-dimensional EMD algorithms to decompose it. To verify the effects of the proposed method, synthetic multi-variate time series, artificial 2D images with various textures and real-world facial images are tested. Compared with existing multi-EMD algorithms, the decomposition time becomes significantly reduced. In addition, the results of facial recognition with Intrinsic Mode Functions (IMFs) extracted using our method can achieve a higher accuracy than those obtained by existing multi-EMD algorithms, which demonstrates the superior performance of our method in terms of the quality of IMFs. Furthermore, this method can provide a new perspective to optimize the existing EMD algorithms, that is, transforming the structure of the input signal rather than being constrained by developing envelope computation techniques or signal decomposition methods. In summary, the study suggests that the serial-EMD technique is a highly competitive and fast alternative for multi-dimensional signal analysis.
△ Less
Submitted 21 June, 2021;
originally announced June 2021.
-
A novel multimodal approach for hybrid brain-computer interface
Authors:
Zhe Sun,
Zihao Huang,
Feng Duan,
Yu Liu
Abstract:
Brain-computer interface (BCI) technologies have been widely used in many areas. In particular, non-invasive technologies such as electroencephalography (EEG) or near-infrared spectroscopy (NIRS) have been used to detect motor imagery, disease, or mental state. It has been already shown in literature that the hybrid of EEG and NIRS has better results than their respective individual signals. The f…
▽ More
Brain-computer interface (BCI) technologies have been widely used in many areas. In particular, non-invasive technologies such as electroencephalography (EEG) or near-infrared spectroscopy (NIRS) have been used to detect motor imagery, disease, or mental state. It has been already shown in literature that the hybrid of EEG and NIRS has better results than their respective individual signals. The fusion algorithm for EEG and NIRS sources is the key to implement them in real-life applications. In this research, we propose three fusion methods for the hybrid of the EEG and NIRS-based brain-computer interface system: linear fusion, tensor fusion, and $p$th-order polynomial fusion. Firstly, our results prove that the hybrid BCI system is more accurate, as expected. Secondly, the $p$th-order polynomial fusion has the best classification results out of the three methods, and also shows improvements compared with previous studies. For a motion imagery task and a mental arithmetic task, the best detection accuracy in previous papers were 74.20\% and 88.1\%, whereas our accuracy achieved was 77.53\% and 90.19\% . Furthermore, unlike complex artificial neural network methods, our proposed methods are not as computationally demanding.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
H-OWAN: Multi-distorted Image Restoration with Tensor 1x1 Convolution
Authors:
Zihao Huang,
Chao Li,
Feng Duan,
Qibin Zhao
Abstract:
It is a challenging task to restore images from their variants with combined distortions. In the existing works, a promising strategy is to apply parallel "operations" to handle different types of distortion. However, in the feature fusion phase, a small number of operations would dominate the restoration result due to the features' heterogeneity by different operations. To this end, we introduce…
▽ More
It is a challenging task to restore images from their variants with combined distortions. In the existing works, a promising strategy is to apply parallel "operations" to handle different types of distortion. However, in the feature fusion phase, a small number of operations would dominate the restoration result due to the features' heterogeneity by different operations. To this end, we introduce the tensor 1x1 convolutional layer by imposing high-order tensor (outer) product, by which we not only harmonize the heterogeneous features but also take additional non-linearity into account. To avoid the unacceptable kernel size resulted from the tensor product, we construct the kernels with tensor network decomposition, which is able to convert the exponential growth of the dimension to linear growth. Armed with the new layer, we propose High-order OWAN for multi-distorted image restoration. In the numerical experiments, the proposed net outperforms the previous state-of-the-art and shows promising performance even in more difficult tasks.
△ Less
Submitted 29 January, 2020;
originally announced January 2020.
-
Spatiotemporal Attention Networks for Wind Power Forecasting
Authors:
Xingbo Fu,
Feng Gao,
Jiang Wu,
Xinyu Wei,
Fangwei Duan
Abstract:
Wind power is one of the most important renewable energy sources and accurate wind power forecasting is very significant for reliable and economic power system operation and control strategies. This paper proposes a novel framework with spatiotemporal attention networks (STAN) for wind power forecasting. This model captures spatial correlations among wind farms and temporal dependencies of wind po…
▽ More
Wind power is one of the most important renewable energy sources and accurate wind power forecasting is very significant for reliable and economic power system operation and control strategies. This paper proposes a novel framework with spatiotemporal attention networks (STAN) for wind power forecasting. This model captures spatial correlations among wind farms and temporal dependencies of wind power time series. First of all, we employ a multi-head self-attention mechanism to extract spatial correlations among wind farms. Then, temporal dependencies are captured by the Sequence-to-Sequence (Seq2Seq) model with a global attention mechanism. Finally, experimental results demonstrate that our model achieves better performance than other baseline approaches. Our work provides useful insights to capture non-Euclidean spatial correlations.
△ Less
Submitted 13 November, 2019; v1 submitted 13 September, 2019;
originally announced September 2019.
-
Pulsar Candidate Identification with Artificial Intelligence Techniques
Authors:
Ping Guo,
Fuqing Duan,
Pei Wang,
Yao Yao,
Qian Yin,
Xin Xin
Abstract:
Discovering pulsars is a significant and meaningful research topic in the field of radio astronomy. With the advent of astronomical instruments such as he Five-hundred-meter Aperture Spherical Telescope (FAST) in China, data volumes and data rates are exponentially growing. This fact necessitates a focus on artificial intelligence (AI) technologies that can perform the automatic pulsar candidate i…
▽ More
Discovering pulsars is a significant and meaningful research topic in the field of radio astronomy. With the advent of astronomical instruments such as he Five-hundred-meter Aperture Spherical Telescope (FAST) in China, data volumes and data rates are exponentially growing. This fact necessitates a focus on artificial intelligence (AI) technologies that can perform the automatic pulsar candidate identification to mine large astronomical data sets. Automatic pulsar candidate identification can be considered as a task of determining potential candidates for further investigation and eliminating noises of radio frequency interferences or other non-pulsar signals. It is very hard to raise the performance of DCNN-based pulsar identification because the limited training samples restrict network structure to be designed deep enough for learning good features as well as the crucial class imbalance problem due to very limited number of real pulsar samples. To address these problems, we proposed a framework which combines deep convolution generative adversarial network (DCGAN) with support vector machine (SVM) to deal with imbalance class problem and to improve pulsar identification accuracy. DCGAN is used as sample generation and feature learning model, and SVM is adopted as the classifier for predicting candidate's labels in the inference stage. The proposed framework is a novel technique which not only can solve imbalance class problem but also can learn discriminative feature representations of pulsar candidates instead of computing hand-crafted features in preprocessing steps too, which makes it more accurate for automatic pulsar candidate selection. Experiments on two pulsar datasets verify the effectiveness and efficiency of our proposed method.
△ Less
Submitted 23 October, 2019; v1 submitted 27 November, 2017;
originally announced November 2017.
-
Fisher information as a performance metric for locally optimum processing
Authors:
Fabing Duan,
Francois Chapeau-Blondeau,
Derek Abbott
Abstract:
For a known weak signal in additive white noise, the asymptotic performance of a locally optimum processor (LOP) is shown to be given by the Fisher information (FI) of a standardized even probability density function (PDF) of noise in three cases: (i) the maximum signal-to-noise ratio (SNR) gain for a periodic signal; (ii) the optimal asymptotic relative efficiency (ARE) for signal detection; (iii…
▽ More
For a known weak signal in additive white noise, the asymptotic performance of a locally optimum processor (LOP) is shown to be given by the Fisher information (FI) of a standardized even probability density function (PDF) of noise in three cases: (i) the maximum signal-to-noise ratio (SNR) gain for a periodic signal; (ii) the optimal asymptotic relative efficiency (ARE) for signal detection; (iii) the best cross-correlation gain (CG) for signal transmission. The minimal FI is unity, corresponding to a Gaussian PDF, whereas the FI is certainly larger than unity for any non-Gaussian PDFs. In the sense of a realizable LOP, it is found that the dichotomous noise PDF possesses an infinite FI for known weak signals perfectly processed by the corresponding LOP. The significance of FI lies in that it provides a upper bound for the performance of locally optimum processing.
△ Less
Submitted 23 November, 2011;
originally announced November 2011.
-
A Novel X-Axis Tuning Fork Gyroscope with "8 Vertical Springs-Proofmass" Structure on (111)-Silicon
Authors:
Fei Duan,
Jiwei Jiao,
Yucai Wang,
Ying Zhang,
Binwei Mi,
Jinpeng Li,
Jian Zhu,
Yuelin Wang
Abstract:
A novel x-axis tuning fork MEMS gyroscope with "8 vertical springs-proofmass" structure for Coriolis effect detection is presented. Compared with the common single-plane springs, the 8 vertical springs, symmetrically located at the top and bottom sides, more stably suspend the large thick proofmass featuring large capacitance variation and low mechanical noise. A bulk-micromachining technology i…
▽ More
A novel x-axis tuning fork MEMS gyroscope with "8 vertical springs-proofmass" structure for Coriolis effect detection is presented. Compared with the common single-plane springs, the 8 vertical springs, symmetrically located at the top and bottom sides, more stably suspend the large thick proofmass featuring large capacitance variation and low mechanical noise. A bulk-micromachining technology is applied to obtain the large proofmass and twins-like dual beams. During the fabrication process, the dimensions of the 8 vertical springs are precisely confined by thermal oxide protected limit trenches (LTs) sidewalls and the extreme slowly etched (111)-planes; therefore a small mismatch of less than 30 Hz is achieved before tuning. Initial test shows a sensitivity of 0.15mV/(deg/s) and rate resolution around 0.1deg/s under atmosphere pressure.
△ Less
Submitted 21 February, 2008;
originally announced February 2008.
-
Impact of Thermal Behavior on Offset in a High-Q Gyroscope
Authors:
Fei Duan,
J. Jiao,
Y. Wang
Abstract:
In this paper, CFD approach is used to simulate the thermal behavior in a sensitive high-Q gyroscope. The electromagnetically driving wires, in which AC current flows, are treated as Joule heat sources in the model. We found that the differences of temperature, pressure and velocity along the driving direction and transversely across the proof masses increased as the gap height between the proof…
▽ More
In this paper, CFD approach is used to simulate the thermal behavior in a sensitive high-Q gyroscope. The electromagnetically driving wires, in which AC current flows, are treated as Joule heat sources in the model. We found that the differences of temperature, pressure and velocity along the driving direction and transversely across the proof masses increased as the gap height between the proof mass and top glass became smaller. Local pressure gradient is expected to possibly enhance the impact of any imperfect led by MEMS processes or designs on the offset of our tuning fork type gyroscope, which has been experimentally verified. A device with 200um gap gives a two-third offset down compared with that of its counterpart with 50um gap.
△ Less
Submitted 21 November, 2007;
originally announced November 2007.