-
On Flange-based 3D Hand-Eye Calibration for Soft Robotic Tactile Welding
Authors:
Xudong Han,
Ning Guo,
Yu Jie,
He Wang,
Fang Wan,
Chaoyang Song
Abstract:
This paper investigates the direct application of standardized designs on the robot for conducting robot hand-eye calibration by employing 3D scanners with collaborative robots. The well-established geometric features of the robot flange are exploited by directly capturing its point cloud data. In particular, an iterative method is proposed to facilitate point cloud processing toward a refined cal…
▽ More
This paper investigates the direct application of standardized designs on the robot for conducting robot hand-eye calibration by employing 3D scanners with collaborative robots. The well-established geometric features of the robot flange are exploited by directly capturing its point cloud data. In particular, an iterative method is proposed to facilitate point cloud processing toward a refined calibration outcome. Several extensive experiments are conducted over a range of collaborative robots, including Universal Robots UR5 & UR10 e-series, Franka Emika, and AUBO i5 using an industrial-grade 3D scanner Photoneo Phoxi S & M and a commercial-grade 3D scanner Microsoft Azure Kinect DK. Experimental results show that translational and rotational errors converge efficiently to less than 0.28 mm and 0.25 degrees, respectively, achieving a hand-eye calibration accuracy as high as the camera's resolution, probing the hardware limit. A welding seam tracking system is presented, combining the flange-based calibration method with soft tactile sensing. The experiment results show that the system enables the robot to adjust its motion in real-time, ensuring consistent weld quality and paving the way for more efficient and adaptable manufacturing processes.
△ Less
Submitted 27 July, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation
Authors:
Peiyang Wu,
Nan Guo,
Xiao Xiao,
Wenming Li,
Xiaochun Ye,
Dongrui Fan
Abstract:
Recently, large language models (LLMs) have demonstrated excellent performance in understanding human instructions and generating code, which has inspired researchers to explore the feasibility of generating RTL code with LLMs. However, the existing approaches to fine-tune LLMs on RTL codes typically are conducted on fixed datasets, which do not fully stimulate the capability of LLMs and require l…
▽ More
Recently, large language models (LLMs) have demonstrated excellent performance in understanding human instructions and generating code, which has inspired researchers to explore the feasibility of generating RTL code with LLMs. However, the existing approaches to fine-tune LLMs on RTL codes typically are conducted on fixed datasets, which do not fully stimulate the capability of LLMs and require large amounts of reference data. To mitigate these issues , we introduce a simple yet effective iterative training paradigm named ITERTL. During each iteration, samples are drawn from the model trained in the previous cycle. Then these new samples are employed for training in this loop. Through this iterative approach, the distribution mismatch between the model and the training samples is reduced. Additionally, the model is thus enabled to explore a broader generative space and receive more comprehensive feedback. Theoretical analyses are conducted to investigate the mechanism of the effectiveness. Experimental results show the model trained through our proposed approach can compete with and even outperform the state-of-the-art (SOTA) open-source model with nearly 37\% reference samples, achieving remarkable 42.9\% and 62.2\% pass@1 rate on two VerilogEval evaluation datasets respectively. While using the same amount of reference samples, our method can achieved a relative improvement of 16.9\% and 12.5\% in pass@1 compared to the non-iterative method. This study facilitates the application of LLMs for generating RTL code in practical scenarios with limited data.
△ Less
Submitted 23 July, 2024; v1 submitted 27 June, 2024;
originally announced July 2024.
-
Causal inference approach to appraise long-term effects of maintenance policy on functional performance of asphalt pavements
Authors:
Lingyun You,
Nanning Guo,
Zhengwu Long,
Fusong Wang,
Chundi Si,
Aboelkasim Diab
Abstract:
Asphalt pavements as the most prevalent transportation infrastructure, are prone to serious traffic safety problems due to functional or structural damage caused by stresses or strains imposed through repeated traffic loads and continuous climatic cycles. The good quality or high serviceability of infrastructure networks is vital to the urbanization and industrial development of nations. In order…
▽ More
Asphalt pavements as the most prevalent transportation infrastructure, are prone to serious traffic safety problems due to functional or structural damage caused by stresses or strains imposed through repeated traffic loads and continuous climatic cycles. The good quality or high serviceability of infrastructure networks is vital to the urbanization and industrial development of nations. In order to maintain good functional pavement performance and extend the service life of asphalt pavements, the long-term performance of pavements under maintenance policies needs to be evaluated and favorable options selected based on the condition of the pavement. A major challenge in evaluating maintenance policies is to produce valid treatments for the outcome assessment under the control of uncertainty of vehicle loads and the disturbance of freeze-thaw cycles in the climatic environment. In this study, a novel causal inference approach combining a classical causal structural model and a potential outcome model framework is proposed to appraise the long-term effects of four preventive maintenance treatments for longitudinal cracking over a 5-year period of upkeep. Three fundamental issues were brought to our attention: 1) detection of causal relationships prior to variables under environmental loading (identification of causal structure); 2) obtaining direct causal effects of treatment on outcomes excluding covariates (identification of causal effects); and 3) sensitivity analysis of causal relationships. The results show that the method can accurately evaluate the effect of preventive maintenance treatments and assess the maintenance time to cater well for the functional performance of different preventive maintenance approaches. This framework could help policymakers to develop appropriate maintenance strategies for pavements.
△ Less
Submitted 2 July, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
DSFNet: Learning Disentangled Scenario Factorization for Multi-Scenario Route Ranking
Authors:
Jiahao Yu,
Yihai Duan,
Longfei Xu,
Chao Chen,
Shuliang Liu,
Li Chen,
Kaikui Liu,
Fan Yang,
Ning Guo
Abstract:
Multi-scenario route ranking (MSRR) is crucial in many industrial mapping systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically foc…
▽ More
Multi-scenario route ranking (MSRR) is crucial in many industrial mapping systems. However, the industrial community mainly adopts interactive interfaces to encourage users to select pre-defined scenarios, which may hinder the downstream ranking performance. In addition, in the academic community, the multi-scenario ranking works only come from other fields, and there are no works specifically focusing on route data due to lacking a publicly available MSRR dataset. Moreover, all the existing multi-scenario works still fail to address the three specific challenges of MSRR simultaneously, i.e. explosion of scenario number, high entanglement, and high-capacity demand. Different from the prior, to address MSRR, our key idea is to factorize the complicated scenario in route ranking into several disentangled factor scenario patterns. Accordingly, we propose a novel method, Disentangled Scenario Factorization Network (DSFNet), which flexibly composes scenario-dependent parameters based on a high-capacity multi-factor-scenario-branch structure. Then, a novel regularization is proposed to induce the disentanglement of factor scenarios. Furthermore, two extra novel techniques, i.e. scenario-aware batch normalization and scenario-aware feature filtering, are developed to improve the network awareness of scenario representation. Additionally, to facilitate MSRR research in the academic community, we propose MSDR, the first large-scale publicly available annotated industrial Multi-Scenario Driving Route dataset. Comprehensive experimental results demonstrate the superiority of our DSFNet, which has been successfully deployed in AMap to serve the major online traffic.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Quantum linear algebra is all you need for Transformer architectures
Authors:
Naixu Guo,
Zhan Yu,
Matthew Choi,
Aman Agrawal,
Kouhei Nakaji,
Alán Aspuru-Guzik,
Patrick Rebentrost
Abstract:
Generative machine learning methods such as large-language models are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in large language models that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures…
▽ More
Generative machine learning methods such as large-language models are revolutionizing the creation of text and images. While these models are powerful they also harness a large amount of computational resources. The transformer is a key component in large language models that aims to generate a suitable completion of a given partial sequence. In this work, we investigate transformer architectures under the lens of fault-tolerant quantum computing. The input model is one where trained weight matrices are given as block encodings and we construct the query, key, and value matrices for the transformer. We show how to prepare a block encoding of the self-attention matrix, with a new subroutine for the row-wise application of the softmax function. In addition, we combine quantum subroutines to construct important building blocks in the transformer, the residual connection and layer normalization, and the feed-forward neural network. Our subroutines prepare an amplitude encoding of the transformer output, which can be measured to obtain a prediction. Based on common open-source large-language models, we provide insights into the behavior of important parameters determining the run time of the quantum algorithm. We discuss the potential and challenges for obtaining a quantum advantage.
△ Less
Submitted 30 May, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Integrating Large Language Models with Graphical Session-Based Recommendation
Authors:
Naicheng Guo,
Hongwei Cheng,
Qianqiao Liang,
Linxun Chen,
Bing Han
Abstract:
With the rapid development of Large Language Models (LLMs), various explorations have arisen to utilize LLMs capability of context understanding on recommender systems. While pioneering strategies have primarily transformed traditional recommendation tasks into challenges of natural language generation, there has been a relative scarcity of exploration in the domain of session-based recommendation…
▽ More
With the rapid development of Large Language Models (LLMs), various explorations have arisen to utilize LLMs capability of context understanding on recommender systems. While pioneering strategies have primarily transformed traditional recommendation tasks into challenges of natural language generation, there has been a relative scarcity of exploration in the domain of session-based recommendation (SBR) due to its specificity. SBR has been primarily dominated by Graph Neural Networks, which have achieved many successful outcomes due to their ability to capture both the implicit and explicit relationships between adjacent behaviors. The structural nature of graphs contrasts with the essence of natural language, posing a significant adaptation gap for LLMs. In this paper, we introduce large language models with graphical Session-Based recommendation, named LLMGR, an effective framework that bridges the aforementioned gap by harmoniously integrating LLMs with Graph Neural Networks (GNNs) for SBR tasks. This integration seeks to leverage the complementary strengths of LLMs in natural language understanding and GNNs in relational data processing, leading to a more powerful session-based recommender system that can understand and recommend items within a session. Moreover, to endow the LLM with the capability to empower SBR tasks, we design a series of prompts for both auxiliary and major instruction tuning tasks. These prompts are crafted to assist the LLM in understanding graph-structured data and align textual information with nodes, effectively translating nuanced user interactions into a format that can be understood and utilized by LLM architectures. Extensive experiments on three real-world datasets demonstrate that LLMGR outperforms several competitive baselines, indicating its effectiveness in enhancing SBR tasks and its potential as a research direction for future exploration.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning
Authors:
Guojian Wang,
Faguo Wu,
Xiao Zhang,
Ning Guo,
Zhiming Zheng
Abstract:
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To…
▽ More
Deep reinforcement learning (DRL) faces significant challenges in addressing the hard-exploration problems in tasks with sparse or deceptive rewards and large state spaces. These challenges severely limit the practical application of DRL. Most previous exploration methods relied on complex architectures to estimate state novelty or introduced sensitive hyperparameters, resulting in instability. To mitigate these issues, we propose an efficient adaptive trajectory-constrained exploration strategy for DRL. The proposed method guides the policy of the agent away from suboptimal solutions by leveraging incomplete offline demonstrations as references. This approach gradually expands the exploration scope of the agent and strives for optimality in a constrained optimization manner. Additionally, we introduce a novel policy-gradient-based optimization algorithm that utilizes adaptively clipped trajectory-distance rewards for both single- and multi-agent reinforcement learning. We provide a theoretical analysis of our method, including a deduction of the worst-case approximation error bounds, highlighting the validity of our approach for enhancing exploration. To evaluate the effectiveness of the proposed method, we conducted experiments on two large 2D grid world mazes and several MuJoCo tasks. The extensive experimental results demonstrate the significant advantages of our method in achieving temporally extended exploration and avoiding myopic and suboptimal behaviors in both single- and multi-agent settings. Notably, the specific metrics and quantifiable results further support these findings. The code used in the study is available at \url{https://github.com/buaawgj/TACE}.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
Proprioceptive State Estimation for Amphibious Tactile Sensing
Authors:
Ning Guo,
Xudong Han,
Shuqiao Zhong,
Zhiyuan Zhou,
Jian Lin,
Jian S. Dai,
Fang Wan,
Chaoyang Song
Abstract:
This paper presents a novel vision-based proprioception approach for a soft robotic finger that can estimate and reconstruct tactile interactions in both terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compac…
▽ More
This paper presents a novel vision-based proprioception approach for a soft robotic finger that can estimate and reconstruct tactile interactions in both terrestrial and aquatic environments. The key to this system lies in the finger's unique metamaterial structure, which facilitates omni-directional passive adaptation during grasping, protecting delicate objects across diverse scenarios. A compact in-finger camera captures high-framerate images of the finger's deformation during contact, extracting crucial tactile data in real-time. We present a volumetric discretized model of the soft finger and use the geometry constraints captured by the camera to find the optimal estimation of the deformed shape. The approach is benchmarked using a motion capture system with sparse markers and a haptic device with dense measurements. Both results show state-of-the-art accuracies, with a median error of 1.96 mm for overall body deformation, corresponding to 2.1% of the finger's length. More importantly, the state estimation is robust in both on-land and underwater environments as we demonstrate its usage for underwater object shape sensing. This combination of passive adaptation and real-time tactile sensing paves the way for amphibious robotic grasping applications.
△ Less
Submitted 21 July, 2024; v1 submitted 15 December, 2023;
originally announced December 2023.
-
HiH: A Multi-modal Hierarchy in Hierarchy Network for Unconstrained Gait Recognition
Authors:
Lei Wang,
Bo Liu,
Yinchi Ma,
Fangfang Liang,
Nawei Guo
Abstract:
Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, w…
▽ More
Gait recognition has achieved promising advances in controlled settings, yet it significantly struggles in unconstrained environments due to challenges such as view changes, occlusions, and varying walking speeds. Additionally, efforts to fuse multiple modalities often face limited improvements because of cross-modality incompatibility, particularly in outdoor scenarios. To address these issues, we present a multi-modal Hierarchy in Hierarchy network (HiH) that integrates silhouette and pose sequences for robust gait recognition. HiH features a main branch that utilizes Hierarchical Gait Decomposer (HGD) modules for depth-wise and intra-module hierarchical examination of general gait patterns from silhouette data. This approach captures motion hierarchies from overall body dynamics to detailed limb movements, facilitating the representation of gait attributes across multiple spatial resolutions. Complementing this, an auxiliary branch, based on 2D joint sequences, enriches the spatial and temporal aspects of gait analysis. It employs a Deformable Spatial Enhancement (DSE) module for pose-guided spatial attention and a Deformable Temporal Alignment (DTA) module for aligning motion dynamics through learned temporal offsets. Extensive evaluations across diverse indoor and outdoor datasets demonstrate HiH's state-of-the-art performance, affirming a well-balanced trade-off between accuracy and efficiency.
△ Less
Submitted 1 May, 2024; v1 submitted 18 November, 2023;
originally announced November 2023.
-
Provable learning of quantum states with graphical models
Authors:
Liming Zhao,
Naixu Guo,
Ming-Xing Luo,
Patrick Rebentrost
Abstract:
The complete learning of an $n$-qubit quantum state requires samples exponentially in $n$. Several works consider subclasses of quantum states that can be learned in polynomial sample complexity such as stabilizer states or high-temperature Gibbs states. Other works consider a weaker sense of learning, such as PAC learning and shadow tomography. In this work, we consider learning states that are c…
▽ More
The complete learning of an $n$-qubit quantum state requires samples exponentially in $n$. Several works consider subclasses of quantum states that can be learned in polynomial sample complexity such as stabilizer states or high-temperature Gibbs states. Other works consider a weaker sense of learning, such as PAC learning and shadow tomography. In this work, we consider learning states that are close to neural network quantum states, which can efficiently be represented by a graphical model called restricted Boltzmann machines (RBMs). To this end, we exhibit robustness results for efficient provable two-hop neighborhood learning algorithms for ferromagnetic and locally consistent RBMs. We consider the $L_p$-norm as a measure of closeness, including both total variation distance and max-norm distance in the limit. Our results allow certain quantum states to be learned with a sample complexity \textit{exponentially} better than naive tomography. We hence provide new classes of efficiently learnable quantum states and apply new strategies to learn them.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Predicting Drug Solubility Using Different Machine Learning Methods -- Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network
Authors:
John Ho,
Zhao-Heng Yin,
Colin Zhang,
Nicole Guo,
Yang Ha
Abstract:
Predicting the solubility of given molecules remains crucial in the pharmaceutical industry. In this study, we revisited this extensively studied topic, leveraging the capabilities of contemporary computing resources. We employed two machine learning models: a linear regression model and a graph convolutional neural network (GCNN) model, using various experimental datasets. Both methods yielded re…
▽ More
Predicting the solubility of given molecules remains crucial in the pharmaceutical industry. In this study, we revisited this extensively studied topic, leveraging the capabilities of contemporary computing resources. We employed two machine learning models: a linear regression model and a graph convolutional neural network (GCNN) model, using various experimental datasets. Both methods yielded reasonable predictions, with the GCNN model exhibiting the highest level of performance. However, the present GCNN model has limited interpretability while the linear regression model allows scientists for a greater in-depth analysis of the underlying factors through feature importance analysis, although more human inputs and evaluations on the overall dataset is required. From the perspective of chemistry, using the linear regression model, we elucidated the impact of individual atom species and functional groups on overall solubility, highlighting the significance of comprehending how chemical structure influences chemical properties in the drug development process. It is learned that introducing oxygen atoms can increase the solubility of organic molecules, while almost all other hetero atoms except oxygen and nitrogen tend to decrease solubility.
△ Less
Submitted 4 January, 2024; v1 submitted 23 August, 2023;
originally announced August 2023.
-
Autoencoding a Soft Touch to Learn Grasping from On-land to Underwater
Authors:
Ning Guo,
Xudong Han,
Xiaobo Liu,
Shuqiao Zhong,
Zhiyuan Zhou,
Jian Lin,
Jiansheng Dai,
Fang Wan,
Chaoyang Song
Abstract:
Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of g…
▽ More
Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of grasping knowledge from on-land to underwater via a vision-based soft robotic finger that learns 6D forces and torques (FT) using a Supervised Variational Autoencoder (SVAE). A high-framerate camera captures the whole-body deformations while a soft robotic finger interacts with physical objects on-land and underwater. Results show that the trained SVAE model learned a series of latent representations of the soft mechanics transferrable from land to water, presenting a superior adaptation to the changing environments against commercial FT sensors. Soft, delicate, and reactive grasping enabled by tactile intelligence enhances the gripper's underwater interaction with improved reliability and robustness at a much-reduced cost, paving the path for learning-based intelligent grasping to support fundamental scientific discoveries in environmental and ocean research.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Distributionally Robust Circuit Design Optimization under Variation Shifts
Authors:
Yifan Pan,
Zichang He,
Nanlin Guo,
Zheng Zhang
Abstract:
Due to the significant process variations, designers have to optimize the statistical performance distribution of nano-scale IC design in most cases. This problem has been investigated for decades under the formulation of stochastic optimization, which minimizes the expected value of a performance metric while assuming that the distribution of process variation is exactly given. This paper rethink…
▽ More
Due to the significant process variations, designers have to optimize the statistical performance distribution of nano-scale IC design in most cases. This problem has been investigated for decades under the formulation of stochastic optimization, which minimizes the expected value of a performance metric while assuming that the distribution of process variation is exactly given. This paper rethinks the variation-aware circuit design optimization from a new perspective. First, we discuss the variation shift problem, which means that the actual density function of process variations almost always differs from the given model and is often unknown. Consequently, we propose to formulate the variation-aware circuit design optimization as a distributionally robust optimization problem, which does not require the exact distribution of process variations. By selecting an appropriate uncertainty set for the probability density function of process variations, we solve the shift-aware circuit optimization problem using distributionally robust Bayesian optimization. This method is validated with both a photonic IC and an electronics IC. Our optimized circuits show excellent robustness against variation shifts: the optimized circuit has excellent performance under many possible distributions of process variations that differ from the given statistical model. This work has the potential to enable a new research direction and inspire subsequent research at different levels of the EDA flow under the setting of variation shift.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
Modified Parametric Multichannel Wiener Filter \\for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers
Authors:
Ning Guo,
Tomohiro Nakatani,
Shoko Araki,
Takehiro Moriya
Abstract:
This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers. Although conventional BFs such as linearly constrained minimum variance BF (LCMV BF) can enhance a speech mixture, they typically require such attributes of the speech mixture as the…
▽ More
This paper introduces a novel low-latency online beamforming (BF) algorithm, named Modified Parametric Multichannel Wiener Filter (Mod-PMWF), for enhancing speech mixtures with unknown and varying number of speakers. Although conventional BFs such as linearly constrained minimum variance BF (LCMV BF) can enhance a speech mixture, they typically require such attributes of the speech mixture as the number of speakers and the acoustic transfer functions (ATFs) from the speakers to the microphones. When the mixture attributes are unavailable, estimating them by low-latency processing is challenging, hindering the application of the BFs to the problem. In this paper, we overcome this problem by modifying a conventional Parametric Multichannel Wiener Filter (PMWF). The proposed Mod-PMWF can adaptively form a directivity pattern that enhances all the speakers in the mixture without explicitly estimating these attributes. Our experiments will show the proposed BF's effectiveness in interference reduction ratios and subjective listening tests.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Power Allocation for the Base Matrix of Spatially Coupled Sparse Regression Codes
Authors:
Nian Guo,
Shansuo Liang,
Wei Han
Abstract:
We investigate power allocation for the base matrix of a spatially coupled sparse regression code (SC-SPARC) for reliable communications over an additive white Gaussian noise channel. A conventional SC-SPARC allocates power uniformly to the non-zero entries of its base matrix. Yet, to achieve the channel capacity with uniform power allocation, the coupling width and the coupling length of the base…
▽ More
We investigate power allocation for the base matrix of a spatially coupled sparse regression code (SC-SPARC) for reliable communications over an additive white Gaussian noise channel. A conventional SC-SPARC allocates power uniformly to the non-zero entries of its base matrix. Yet, to achieve the channel capacity with uniform power allocation, the coupling width and the coupling length of the base matrix must satisfy regularity conditions and tend to infinity as the rate approaches the capacity. For a base matrix with a pair of finite and arbitrarily chosen coupling width and coupling length, we propose a novel power allocation policy, termed V-power allocation. V-power allocation puts more power to the outer columns of the base matrix to jumpstart the decoding process and less power to the inner columns, resembling the shape of the letter V. We show that V-power allocation outperforms uniform power allocation since it ensures successful decoding for a wider range of signal-to-noise ratios given a code rate in the limit of large blocklength. In the finite blocklength regime, we show by simulations that power allocations imitating the shape of the letter V improve the error performance of a SC-SPARC.
△ Less
Submitted 13 May, 2023;
originally announced May 2023.
-
GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark
Authors:
Dongyang Li,
Ruixue Ding,
Qiang Zhang,
Zheng Li,
Boli Chen,
Pengjun Xie,
Yao Xu,
Xin Li,
Ning Guo,
Fei Huang,
Xiaofeng He
Abstract:
With a fast developing pace of geographic applications, automatable and intelligent models are essential to be designed to handle the large volume of information. However, few researchers focus on geographic natural language processing, and there has never been a benchmark to build a unified standard. In this work, we propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE.…
▽ More
With a fast developing pace of geographic applications, automatable and intelligent models are essential to be designed to handle the large volume of information. However, few researchers focus on geographic natural language processing, and there has never been a benchmark to build a unified standard. In this work, we propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE. We collect data from open-released geographic resources and introduce six natural language understanding tasks, including geographic textual similarity on recall, geographic textual similarity on rerank, geographic elements tagging, geographic composition analysis, geographic where what cut, and geographic entity alignment. We also pro vide evaluation experiments and analysis of general baselines, indicating the effectiveness and significance of the GeoGLUE benchmark.
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Wireless Powered Short Packet Communications with Multiple WPT Sources
Authors:
Ning Guo,
Xiaopeng Yuan,
Yulin Hu,
Anke Schmeink
Abstract:
We study a multi-source wireless power transfer (WPT) enabled network supporting multi-sensor transmissions. Activated by energy harvesting (EH) from multiple WPT sources, sensors transmit short packets to a destination with finite blocklength (FBL) codes. This work for the first time characterizes the FBL reliability for such multi-source WPT enabled network and provides reliability-oriented reso…
▽ More
We study a multi-source wireless power transfer (WPT) enabled network supporting multi-sensor transmissions. Activated by energy harvesting (EH) from multiple WPT sources, sensors transmit short packets to a destination with finite blocklength (FBL) codes. This work for the first time characterizes the FBL reliability for such multi-source WPT enabled network and provides reliability-oriented resource allocation designs, while a practical nonlinear EH model is considered. For scenario with a fixed frame structure, we maximize the FBL reliability via optimally allocating the transmit power among multi-source. In particular, we first investigate the relationship between the FBL reliability and multiple WPT source power, based on which a power allocation problem is formulated. To solve the formulated non-convex problem, we introduce auxiliary variables and apply successive convex approximation (SCA) technique to the non-convex component. Consequently, a sub-optimal solution can be obtained. Moreover, we extend our design into a dynamic frame structure scenario, i.e., the blocklength allocated for WPT phase and short-packet transmission phase are adjustable, which introduces more flexibility and new challenges to the system design. We provide a joint power and blocklength allocation design to minimize the system overall error probability under the total power and blocklength constraints. To address the high-dimensional optimization problem, auxiliary variables introduction, multiple variable substitutions and SCA technique utilization are exploited to reformulate and efficiently solve the problem. Finally, through numerical results, we validate our analytical model and evaluate the system performance, where a set of guidelines for practical system design are concluded.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.
-
PMT-IQA: Progressive Multi-task Learning for Blind Image Quality Assessment
Authors:
Qingyi Pan,
Ning Guo,
Letu Qingge,
Jingyi Zhang,
Pei Yang
Abstract:
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on lear…
▽ More
Blind image quality assessment (BIQA) remains challenging due to the diversity of distortion and image content variation, which complicate the distortion patterns crossing different scales and aggravate the difficulty of the regression problem for BIQA. However, existing BIQA methods often fail to consider multi-scale distortion patterns and image content, and little research has been done on learning strategies to make the regression model produce better performance. In this paper, we propose a simple yet effective Progressive Multi-Task Image Quality Assessment (PMT-IQA) model, which contains a multi-scale feature extraction module (MS) and a progressive multi-task learning module (PMT), to help the model learn complex distortion patterns and better optimize the regression issue to align with the law of human learning process from easy to hard. To verify the effectiveness of the proposed PMT-IQA model, we conduct experiments on four widely used public datasets, and the experimental results indicate that the performance of PMT-IQA is superior to the comparison approaches, and both MS and PMT modules improve the model's performance.
△ Less
Submitted 3 November, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
Join the High Accuracy Club on ImageNet with A Binary Neural Network Ticket
Authors:
Nianhui Guo,
Joseph Bethge,
Christoph Meinel,
Haojin Yang
Abstract:
Binary neural networks are the extreme case of network quantization, which has long been thought of as a potential edge machine learning solution. However, the significant accuracy gap to the full-precision counterparts restricts their creative potential for mobile applications. In this work, we revisit the potential of binary neural networks and focus on a compelling but unanswered problem: how c…
▽ More
Binary neural networks are the extreme case of network quantization, which has long been thought of as a potential edge machine learning solution. However, the significant accuracy gap to the full-precision counterparts restricts their creative potential for mobile applications. In this work, we revisit the potential of binary neural networks and focus on a compelling but unanswered problem: how can a binary neural network achieve the crucial accuracy level (e.g., 80%) on ILSVRC-2012 ImageNet? We achieve this goal by enhancing the optimization process from three complementary perspectives: (1) We design a novel binary architecture BNext based on a comprehensive study of binary architectures and their optimization process. (2) We propose a novel knowledge-distillation technique to alleviate the counter-intuitive overfitting problem observed when attempting to train extremely accurate binary models. (3) We analyze the data augmentation pipeline for binary networks and modernize it with up-to-date techniques from full-precision models. The evaluation results on ImageNet show that BNext, for the first time, pushes the binary model accuracy boundary to 80.57% and significantly outperforms all the existing binary networks. Code and trained models are available at: https://github.com/hpi-xnor/BNext.git.
△ Less
Submitted 13 December, 2022; v1 submitted 23 November, 2022;
originally announced November 2022.
-
A robust estimator of mutual information for deep learning interpretability
Authors:
Davide Piras,
Hiranya V. Peiris,
Andrew Pontzen,
Luisa Lucie-Smith,
Ningyuan Guo,
Brian Nord
Abstract:
We develop the use of mutual information (MI), a well-established metric in information theory, to interpret the inner workings of deep learning models. To accurately estimate MI from a finite number of samples, we present GMM-MI (pronounced $``$Jimmie$"$), an algorithm based on Gaussian mixture models that can be applied to both discrete and continuous settings. GMM-MI is computationally efficien…
▽ More
We develop the use of mutual information (MI), a well-established metric in information theory, to interpret the inner workings of deep learning models. To accurately estimate MI from a finite number of samples, we present GMM-MI (pronounced $``$Jimmie$"$), an algorithm based on Gaussian mixture models that can be applied to both discrete and continuous settings. GMM-MI is computationally efficient, robust to the choice of hyperparameters and provides the uncertainty on the MI estimate due to the finite sample size. We extensively validate GMM-MI on toy data for which the ground truth MI is known, comparing its performance against established mutual information estimators. We then demonstrate the use of our MI estimator in the context of representation learning, working with synthetic data and physical datasets describing highly non-linear processes. We train deep learning models to encode high-dimensional data within a meaningful compressed (latent) representation, and use GMM-MI to quantify both the level of disentanglement between the latent variables, and their association with relevant physical quantities, thus unlocking the interpretability of the latent representation. We make GMM-MI publicly available.
△ Less
Submitted 23 March, 2023; v1 submitted 31 October, 2022;
originally announced November 2022.
-
Poincaré Heterogeneous Graph Neural Networks for Sequential Recommendation
Authors:
Naicheng Guo,
Xiaolei Liu,
Shaoshuai Li,
Qiongxu Ma,
Kaixin Gao,
Bing Han,
Lin Zheng,
Xiaobo Guo
Abstract:
Sequential recommendation (SR) learns users' preferences by capturing the sequential patterns from users' behaviors evolution. As discussed in many works, user-item interactions of SR generally present the intrinsic power-law distribution, which can be ascended to hierarchy-like structures. Previous methods usually handle such hierarchical information by making user-item sectionalization empirical…
▽ More
Sequential recommendation (SR) learns users' preferences by capturing the sequential patterns from users' behaviors evolution. As discussed in many works, user-item interactions of SR generally present the intrinsic power-law distribution, which can be ascended to hierarchy-like structures. Previous methods usually handle such hierarchical information by making user-item sectionalization empirically under Euclidean space, which may cause distortion of user-item representation in real online scenarios. In this paper, we propose a Poincaré-based heterogeneous graph neural network named PHGR to model the sequential pattern information as well as hierarchical information contained in the data of SR scenarios simultaneously. Specifically, for the purpose of explicitly capturing the hierarchical information, we first construct a weighted user-item heterogeneous graph by aliening all the user-item interactions to improve the perception domain of each user from a global view. Then the output of the global representation would be used to complement the local directed item-item homogeneous graph convolution. By defining a novel hyperbolic inner product operator, the global and local graph representation learning are directly conducted in Poincaré ball instead of commonly used projection operation between Poincaré ball and Euclidean space, which could alleviate the cumulative error issue of general bidirectional translation process. Moreover, for the purpose of explicitly capturing the sequential dependency information, we design two types of temporal attention operations under Poincaré ball space. Empirical evaluations on datasets from the public and financial industry show that PHGR outperforms several comparison methods.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Reliability function for streaming over a DMC with feedback
Authors:
Nian Guo,
Victoria Kostina
Abstract:
Conventionally, posterior matching is investigated in channel coding and block encoding contexts -- the source symbols are equiprobably distributed and are entirely known by the encoder before the transmission. In this paper, we consider a streaming source, whose symbols arrive at the encoder at a sequence of deterministic times. We derive the joint source-channel coding (JSCC) reliability functio…
▽ More
Conventionally, posterior matching is investigated in channel coding and block encoding contexts -- the source symbols are equiprobably distributed and are entirely known by the encoder before the transmission. In this paper, we consider a streaming source, whose symbols arrive at the encoder at a sequence of deterministic times. We derive the joint source-channel coding (JSCC) reliability function for streaming over a discrete memoryless channel (DMC) with feedback. We propose a novel instantaneous encoding phase that operates during the symbol arriving period and achieves the JSCC reliability function for streaming when followed by a block encoding scheme that achieves the JSCC reliability function for a classical source whose symbols are fully accessible before the transmission. During the instantaneous encoding phase, the evolving message alphabet is partitioned into groups, and the encoder determines the index of the group that contains the symbols arrived so far and applies randomization to match the distribution of the transmitted index to the capacity-achieving one. Surprisingly, the JSCC reliability function for streaming is equal to that for a fully accessible source, implying that the knowledge of the entire symbol sequence before the transmission offers no advantage regarding the reliability function. For streaming over a symmetric 2-input DMC, we propose an instantaneous small-enough difference (SED) code that not only achieves the JSCC reliability function but also can be used to stabilize an unstable linear system over a noisy channel. We design low complexity algorithms to implement both the instantaneous encoding phase and the instantaneous SED code. While the reliability function is derived for non-degenerate DMCs, for degenerate DMCs we design a code with instantaneous encoding that achieves zero error for all rates below Shannon's JSCC limit.
△ Less
Submitted 30 November, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
RunnerDNA: Interpretable indicators and model to characterize human activity pattern and individual difference
Authors:
Yao Yao,
Zhuolun Wang,
Peng Luo,
Hanyu Yin,
Ziqi Liu,
Jiaqi Zhang,
Nengjing Guo,
Qingfeng Guan
Abstract:
Human activity analysis based on sensor data plays a significant role in behavior sensing, human-machine interaction, health care, and so on. The current research focused on recognizing human activity and posture at the activity pattern level, neglecting the effective fusion of multi-sensor data and assessing different movement styles at the individual level, thus introducing the challenge to dist…
▽ More
Human activity analysis based on sensor data plays a significant role in behavior sensing, human-machine interaction, health care, and so on. The current research focused on recognizing human activity and posture at the activity pattern level, neglecting the effective fusion of multi-sensor data and assessing different movement styles at the individual level, thus introducing the challenge to distinguish individuals in the same movement. In this study, the concept of RunnerDNA, consisting of five interpretable indicators, balance, stride, steering, stability, and amplitude, was proposed to describe human activity at the individual level. We collected smartphone multi-sensor data from 33 volunteers who engaged in physical activities such as walking, running, and bicycling and calculated the data into five indicators of RunnerDNA. The indicators were then used to build random forest models and recognize movement activities and the identity of users. The results show that the proposed model has high accuracy in identifying activities (accuracy of 0.679) and is also effective in predicting the identity of running users. Furthermore, the accuracy of the human activity recognition model has significant improved by combing RunnerDNA and two motion feature indicators, velocity, and acceleration. Results demonstrate that RunnerDNA is an effective way to describe an individual's physical activity and helps us understand individual differences in sports style, and the significant differences in balance and amplitude between men and women were found.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
Simple Contrastive Representation Adversarial Learning for NLP Tasks
Authors:
Deshui Miao,
Jiaqi Zhang,
Wenbo Xie,
Jian Song,
Xin Li,
Lijuan Jia,
Ning Guo
Abstract:
Self-supervised learning approach like contrastive learning is attached great attention in natural language processing. It uses pairs of training data augmentations to build a classification task for an encoder with well representation ability. However, the construction of learning pairs over contrastive learning is much harder in NLP tasks. Previous works generate word-level changes to form pairs…
▽ More
Self-supervised learning approach like contrastive learning is attached great attention in natural language processing. It uses pairs of training data augmentations to build a classification task for an encoder with well representation ability. However, the construction of learning pairs over contrastive learning is much harder in NLP tasks. Previous works generate word-level changes to form pairs, but small transforms may cause notable changes on the meaning of sentences as the discrete and sparse nature of natural language. In this paper, adversarial training is performed to generate challenging and harder learning adversarial examples over the embedding space of NLP as learning pairs. Using contrastive learning improves the generalization ability of adversarial training because contrastive loss can uniform the sample distribution. And at the same time, adversarial training also enhances the robustness of contrastive learning. Two novel frameworks, supervised contrastive adversarial learning (SCAL) and unsupervised SCAL (USCAL), are proposed, which yields learning pairs by utilizing the adversarial training for contrastive learning. The label-based loss of supervised tasks is exploited to generate adversarial examples while unsupervised tasks bring contrastive loss. To validate the effectiveness of the proposed framework, we employ it to Transformer-based models for natural language understanding, sentence semantic textual similarity and adversarial learning tasks. Experimental results on GLUE benchmark tasks show that our fine-tuned supervised method outperforms BERT$_{base}$ over 1.75\%. We also evaluate our unsupervised method on semantic textual similarity (STS) tasks, and our method gets 77.29\% with BERT$_{base}$. The robustness of our approach conducts state-of-the-art results under multiple adversarial datasets on NLI tasks.
△ Less
Submitted 2 December, 2021; v1 submitted 25 November, 2021;
originally announced November 2021.
-
Nonlinear transformation of complex amplitudes via quantum singular value transformation
Authors:
Naixu Guo,
Kosuke Mitarai,
Keisuke Fujii
Abstract:
Due to the linearity of quantum operations, it is not straightforward to implement nonlinear transformations on a quantum computer, making some practical tasks like a neural network hard to be achieved. In this work, we define a task called nonlinear transformation of complex amplitudes and provide an algorithm to achieve this task. Specifically, we construct a block-encoding of complex amplitudes…
▽ More
Due to the linearity of quantum operations, it is not straightforward to implement nonlinear transformations on a quantum computer, making some practical tasks like a neural network hard to be achieved. In this work, we define a task called nonlinear transformation of complex amplitudes and provide an algorithm to achieve this task. Specifically, we construct a block-encoding of complex amplitudes from a state preparation unitary. This allows us to transform the complex amplitudes by using quantum singular value transformation. We evaluate the required overhead in terms of input dimension and precision, which reveals that the algorithm depends on the roughly square root of input dimension and achieves an exponential speedup on precision compared with previous work. We also discuss its possible applications to quantum machine learning, where complex amplitudes encoding classical or quantum data are processed by the proposed method. This paper provides a promising way to introduce highly complex nonlinearity of the quantum states, which is essentially missing in quantum mechanics.
△ Less
Submitted 17 May, 2024; v1 submitted 22 July, 2021;
originally announced July 2021.
-
Dynamic Distribution of Edge Intelligence at the Node Level for Internet of Things
Authors:
Hawzhin Mohammed,
Tolulope A. Odetola,
Nan Guo,
Syed Rafay Hasan
Abstract:
In this paper, dynamic deployment of Convolutional Neural Network (CNN) architecture is proposed utilizing only IoT-level devices. By partitioning and pipelining the CNN, it horizontally distributes the computation load among resource-constrained devices (called horizontal collaboration), which in turn increases the throughput. Through partitioning, we can decrease the computation and energy consu…
▽ More
In this paper, dynamic deployment of Convolutional Neural Network (CNN) architecture is proposed utilizing only IoT-level devices. By partitioning and pipelining the CNN, it horizontally distributes the computation load among resource-constrained devices (called horizontal collaboration), which in turn increases the throughput. Through partitioning, we can decrease the computation and energy consumption on individual IoT devices and increase the throughput without sacrificing accuracy. Also, by processing the data at the generation point, data privacy can be achieved. The results show that throughput can be increased by 1.55x to 1.75x for sharing the CNN into two and three resource-constrained devices, respectively.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
HCGR: Hyperbolic Contrastive Graph Representation Learning for Session-based Recommendation
Authors:
Naicheng Guo,
Xiaolei Liu,
Shaoshuai Li,
Qiongxu Ma,
Yunan Zhao,
Bing Han,
Lin Zheng,
Kaixin Gao,
Xiaobo Guo
Abstract:
Session-based recommendation (SBR) learns users' preferences by capturing the short-term and sequential patterns from the evolution of user behaviors. Among the studies in the SBR field, graph-based approaches are a relatively powerful kind of way, which generally extract item information by message aggregation under Euclidean space. However, such methods can't effectively extract the hierarchical…
▽ More
Session-based recommendation (SBR) learns users' preferences by capturing the short-term and sequential patterns from the evolution of user behaviors. Among the studies in the SBR field, graph-based approaches are a relatively powerful kind of way, which generally extract item information by message aggregation under Euclidean space. However, such methods can't effectively extract the hierarchical information contained among consecutive items in a session, which is critical to represent users' preferences. In this paper, we present a hyperbolic contrastive graph recommender (HCGR), a principled session-based recommendation framework involving Lorentz hyperbolic space to adequately capture the coherence and hierarchical representations of the items. Within this framework, we design a novel adaptive hyperbolic attention computation to aggregate the graph message of each user's preference in a session-based behavior sequence. In addition, contrastive learning is leveraged to optimize the item representation by considering the geodesic distance between positive and negative samples in hyperbolic space. Extensive experiments on four real-world datasets demonstrate that HCGR consistently outperforms state-of-the-art baselines by 0.43$\%$-28.84$\%$ in terms of $HitRate$, $NDCG$ and $MRR$.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
BoolNet: Minimizing The Energy Consumption of Binary Neural Networks
Authors:
Nianhui Guo,
Joseph Bethge,
Haojin Yang,
Kai Zhong,
Xuefei Ning,
Christoph Meinel,
Yu Wang
Abstract:
Recent works on Binary Neural Networks (BNNs) have made promising progress in narrowing the accuracy gap of BNNs to their 32-bit counterparts. However, the accuracy gains are often based on specialized model designs using additional 32-bit components. Furthermore, almost all previous BNNs use 32-bit for feature maps and the shortcuts enclosing the corresponding binary convolution blocks, which hel…
▽ More
Recent works on Binary Neural Networks (BNNs) have made promising progress in narrowing the accuracy gap of BNNs to their 32-bit counterparts. However, the accuracy gains are often based on specialized model designs using additional 32-bit components. Furthermore, almost all previous BNNs use 32-bit for feature maps and the shortcuts enclosing the corresponding binary convolution blocks, which helps to effectively maintain the accuracy, but is not friendly to hardware accelerators with limited memory, energy, and computing resources. Thus, we raise the following question: How can accuracy and energy consumption be balanced in a BNN network design? We extensively study this fundamental problem in this work and propose a novel BNN architecture without most commonly used 32-bit components: \textit{BoolNet}. Experimental results on ImageNet demonstrate that BoolNet can achieve 4.6x energy reduction coupled with 1.2\% higher accuracy than the commonly used BNN architecture Bi-RealNet. Code and trained models are available at: https://github.com/hpi-xnor/BoolNet.
△ Less
Submitted 13 June, 2021;
originally announced June 2021.
-
Instantaneous SED coding over a DMC
Authors:
Nian Guo,
Victoria Kostina
Abstract:
In this paper, we propose a novel code for transmitting a sequence of $n$ message bits in real time over a discrete-memoryless channel (DMC) with noiseless feedback, where the message bits stream into the encoder one by one at random time instants. Similar to existing posterior matching schemes with block encoding, the encoder in our work takes advantage of the channel feedback to form channel inp…
▽ More
In this paper, we propose a novel code for transmitting a sequence of $n$ message bits in real time over a discrete-memoryless channel (DMC) with noiseless feedback, where the message bits stream into the encoder one by one at random time instants. Similar to existing posterior matching schemes with block encoding, the encoder in our work takes advantage of the channel feedback to form channel inputs that contain the information the decoder does not yet have, and that are distributed close to the capacity-achieving input distribution, but dissimilar to the existing posterior matching schemes, the encoder performs instantaneous encoding--it immediately weaves the new message bits into a continuing transmission. A posterior matching scheme by Naghshvar et al. partitions the source messages into groups so that the group posteriors have a small-enough difference (SED) to the capacity-achieving distribution, and transmits the group index that contains the actual message. Our code adopts the SED rule to apply to the evolving message alphabet that contains all the possible variable-length strings that the source could have emitted up to that time. Our instantaneous SED code achieves better delay-reliability tradeoffs than existing feedback codes over $2$-input DMCs: we establish this dominance both by simulations and via an analysis comparing the performance of the instantaneous SED code to Burnashev's reliability function. We also design a low-complexity code for binary symmetric channels that we name the instantaneous type set SED code with complexity $O(t^4)$. Simulation results show that the gap in performance between the instantaneous SED code and the instantaneous type-set SED code is negligible.
△ Less
Submitted 6 May, 2021; v1 submitted 14 March, 2021;
originally announced March 2021.
-
Deep Metric Learning-based Image Retrieval System for Chest Radiograph and its Clinical Applications in COVID-19
Authors:
Aoxiao Zhong,
Xiang Li,
Dufan Wu,
Hui Ren,
Kyungsang Kim,
Younggon Kim,
Varun Buch,
Nir Neumark,
Bernardo Bizzo,
Won Young Tak,
Soo Young Park,
Yu Rim Lee,
Min Kyu Kang,
Jung Gil Park,
Byung Seok Kim,
Woo Jin Chung,
Ning Guo,
Ittai Dayan,
Mannudeep K. Kalra,
Quanzheng Li
Abstract:
In recent years, deep learning-based image analysis methods have been widely applied in computer-aided detection, diagnosis and prognosis, and has shown its value during the public health crisis of the novel coronavirus disease 2019 (COVID-19) pandemic. Chest radiograph (CXR) has been playing a crucial role in COVID-19 patient triaging, diagnosing and monitoring, particularly in the United States.…
▽ More
In recent years, deep learning-based image analysis methods have been widely applied in computer-aided detection, diagnosis and prognosis, and has shown its value during the public health crisis of the novel coronavirus disease 2019 (COVID-19) pandemic. Chest radiograph (CXR) has been playing a crucial role in COVID-19 patient triaging, diagnosing and monitoring, particularly in the United States. Considering the mixed and unspecific signals in CXR, an image retrieval model of CXR that provides both similar images and associated clinical information can be more clinically meaningful than a direct image diagnostic model. In this work we develop a novel CXR image retrieval model based on deep metric learning. Unlike traditional diagnostic models which aims at learning the direct mapping from images to labels, the proposed model aims at learning the optimized embedding space of images, where images with the same labels and similar contents are pulled together. It utilizes multi-similarity loss with hard-mining sampling strategy and attention mechanism to learn the optimized embedding space, and provides similar images to the query image. The model is trained and validated on an international multi-site COVID-19 dataset collected from 3 different sources. Experimental results of COVID-19 image retrieval and diagnosis tasks show that the proposed model can serve as a robust solution for CXR analysis and patient management for COVID-19. The model is also tested on its transferability on a different clinical decision support task, where the pre-trained model is applied to extract image features from a new dataset without any further training. These results demonstrate our deep metric learning based image retrieval model is highly efficient in the CXR retrieval, diagnosis and prognosis, and thus has great clinical value for the treatment and management of COVID-19 patients.
△ Less
Submitted 25 November, 2020;
originally announced December 2020.
-
Video Face Recognition System: RetinaFace-mnet-faster and Secondary Search
Authors:
Qian Li,
Nan Guo,
Xiaochun Ye,
Dongrui Fan,
Zhimin Tang
Abstract:
Face recognition is widely used in the scene. However, different visual environments require different methods, and face recognition has a difficulty in complex environments. Therefore, this paper mainly experiments complex faces in the video. First, we design an image pre-processing module for fuzzy scene or under-exposed faces to enhance images. Our experimental results demonstrate that effectiv…
▽ More
Face recognition is widely used in the scene. However, different visual environments require different methods, and face recognition has a difficulty in complex environments. Therefore, this paper mainly experiments complex faces in the video. First, we design an image pre-processing module for fuzzy scene or under-exposed faces to enhance images. Our experimental results demonstrate that effective images pre-processing improves the accuracy of 0.11%, 0.2% and 1.4% on LFW, WIDER FACE and our datasets, respectively. Second, we propose RetinacFace-mnet-faster for detection and a confidence threshold specification for face recognition, reducing the lost rate. Our experimental results show that our RetinaFace-mnet-faster for 640*480 resolution on the Tesla P40 and single-thread improve speed of 16.7% and 70.2%, respectively. Finally, we design secondary search mechanism with HNSW to improve performance. Ours is suitable for large-scale datasets, and experimental results show that our method is 82% faster than the violent retrieval for the single-frame detection.
△ Less
Submitted 28 September, 2020; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Clinically Translatable Direct Patlak Reconstruction from Dynamic PET with Motion Correction Using Convolutional Neural Network
Authors:
Nuobei Xie,
Kuang Gong,
Ning Guo,
Zhixing Qin,
Jianan Cui,
Zhifang Wu,
Huafeng Liu,
Quanzheng Li
Abstract:
Patlak model is widely used in 18F-FDG dynamic positron emission tomography (PET) imaging, where the estimated parametric images reveal important biochemical and physiology information. Because of better noise modeling and more information extracted from raw sinogram, direct Patlak reconstruction gains its popularity over the indirect approach which utilizes reconstructed dynamic PET images alone.…
▽ More
Patlak model is widely used in 18F-FDG dynamic positron emission tomography (PET) imaging, where the estimated parametric images reveal important biochemical and physiology information. Because of better noise modeling and more information extracted from raw sinogram, direct Patlak reconstruction gains its popularity over the indirect approach which utilizes reconstructed dynamic PET images alone. As the prerequisite of direct Patlak methods, raw data from dynamic PET are rarely stored in clinics and difficult to obtain. In addition, the direct reconstruction is time-consuming due to the bottleneck of multiple-frame reconstruction. All of these impede the clinical adoption of direct Patlak reconstruction.In this work, we proposed a data-driven framework which maps the dynamic PET images to the high-quality motion-corrected direct Patlak images through a convolutional neural network. For the patient motion during the long period of dynamic PET scan, we combined the correction with the backward/forward projection in direct reconstruction to better fit the statistical model. Results based on fifteen clinical 18F-FDG dynamic brain PET datasets demonstrates the superiority of the proposed framework over Gaussian, nonlocal mean and BM4D denoising, regarding the image bias and contrast-to-noise ratio.
△ Less
Submitted 12 September, 2020;
originally announced September 2020.
-
Top-Related Meta-Learning Method for Few-Shot Object Detection
Authors:
Qian Li,
Nan Guo,
Xiaochun Ye,
Duo Wang,
Dongrui Fan,
Zhimin Tang
Abstract:
Many meta-learning methods are proposed for few-shot detection. However, previous most methods have two main problems, poor detection APs, and strong bias because of imbalance and insufficient datasets. Previous works mainly alleviate these issues by additional datasets, multi-relation attention mechanisms and sub-modules. However, they require more cost. In this work, for meta-learning, we find t…
▽ More
Many meta-learning methods are proposed for few-shot detection. However, previous most methods have two main problems, poor detection APs, and strong bias because of imbalance and insufficient datasets. Previous works mainly alleviate these issues by additional datasets, multi-relation attention mechanisms and sub-modules. However, they require more cost. In this work, for meta-learning, we find that the main challenges focus on related or irrelevant semantic features between categories. Therefore, based on semantic features, we propose a Top-C classification loss (i.e., TCL-C) for classification task and a category-based grouping mechanism for category-based meta-features obtained by the meta-model. The TCL-C exploits the true-label prediction and the most likely C-1 false classification predictions to improve detection performance on few-shot classes. According to similar appearance (i.e., visual appearance, shape, and limbs etc.) and environment in which objects often appear, the category-based grouping mechanism splits categories into disjoint groups to make similar semantic features more compact between categories within a group and obtain more significant difference between groups, alleviating the strong bias problem and further improving detection APs. The whole training consists of the base model and the fine-tuning phases. According to grouping mechanism, we group the meta-features vectors obtained by meta-model, so that the distribution difference between groups is obvious, and the one within each group is less. Extensive experiments on Pascal VOC dataset demonstrate that ours which combines the TCL-C with category-based grouping significantly outperforms previous state-of-the-art methods for few-shot detection. Compared with previous competitive baseline, ours improves detection APs by almost 4% for few-shot detection.
△ Less
Submitted 15 June, 2021; v1 submitted 14 July, 2020;
originally announced July 2020.
-
Collaborative Pipeline Using Opportunistic Mobile Resources via D2D for Computation-Intensive Tasks
Authors:
Terry N. Guo,
Hawzhin Mohammed,
Syed R. Hasan
Abstract:
This paper proposes a mobile pipeline computing concept in a Device-to-Device (D2D) communication setup and studies related issues, where D2D is likely based on millimeter-wave (mmWave) in the 5G mobile communication. The proposed opportunistic system employs a cluster of pipelined resource-limited devices on the move to handle real-time on-site computation-intensive tasks for which current cloud…
▽ More
This paper proposes a mobile pipeline computing concept in a Device-to-Device (D2D) communication setup and studies related issues, where D2D is likely based on millimeter-wave (mmWave) in the 5G mobile communication. The proposed opportunistic system employs a cluster of pipelined resource-limited devices on the move to handle real-time on-site computation-intensive tasks for which current cloud computing technology may not be suitable. The feasibility of such a system can be anticipated as high-speed and low-latency wireless technologies get mature. We present a system model by defining the architecture, basic functions, processes at both system-level and pipeline device level. A pipeline pathfinding algorithm along with a multi-task optimization framework is developed. To minimize the search space since the algorithm may need to be run on resource-limited mobile devices, an adjacency-matrix-power-based graph trimming technique is proposed and validated using simulation. A preliminary feasibility assessment of our proposed techniques is performed using experiments and computer simulation. As part of the feasibility assessment, the impact of mmWave blockage on the pipeline stability is analyzed and examined for both single-pipeline and concurrent-multiple-pipeline scenarios. Our design and analysis results provide certain insight to guide system design and lay a foundation for further work in this line.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
Optimal Causal Rate-Constrained Sampling for a Class of Continuous Markov Processes
Authors:
Nian Guo,
Victoria Kostina
Abstract:
Consider the following communication scenario. An encoder observes a stochastic process and causally decides when and what to transmit about it, under a constraint on the expected number of bits transmitted per second. A decoder uses the received codewords to causally estimate the process in real time. The encoder and the decoder are synchronized in time. For a class of continuous Markov processes…
▽ More
Consider the following communication scenario. An encoder observes a stochastic process and causally decides when and what to transmit about it, under a constraint on the expected number of bits transmitted per second. A decoder uses the received codewords to causally estimate the process in real time. The encoder and the decoder are synchronized in time. For a class of continuous Markov processes satisfying regularity conditions, we find the optimal encoding and decoding policies that minimize the end-to-end estimation mean-square error under the rate constraint. We show that the optimal encoding policy transmits a $1$-bit codeword once the process innovation passes one of two thresholds. The optimal decoder noiselessly recovers the last sample from the 1-bit codewords and codeword-generating time stamps, and uses it to decide the running estimate of the current process, until the next codeword arrives. In particular, we show the optimal causal code for the Ornstein-Uhlenbeck process and calculate its distortion-rate function. Furthermore, we show that the optimal causal code also minimizes the mean-square cost of a continuous-time control system driven by a continuous Markov process and controlled by an additive control signal.
△ Less
Submitted 20 September, 2021; v1 submitted 4 February, 2020;
originally announced February 2020.
-
Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder
Authors:
Qian Li,
Nan Guo,
Xiaochun Ye,
Dongrui Fan,
Zhimin Tang
Abstract:
Recently, many methods have been proposed for object detection. They cannot detect objects by semantic features, adaptively. In this work, according to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information. Howev…
▽ More
Recently, many methods have been proposed for object detection. They cannot detect objects by semantic features, adaptively. In this work, according to channel and spatial attention mechanisms, we mainly analyze that different methods detect objects adaptively. Some state-of-the-art detectors combine different feature pyramids with many mechanisms to enhance multi-level semantic information. However, they require more cost. This work addresses that by an anchor-free detector with shared encoder-decoder with attention mechanism, extracting shared features. We consider features of different levels from backbone (e.g., ResNet-50) as the basis features. Then, we feed the features into a simple module, followed by a detector header to detect objects. Meantime, we use the semantic features to revise geometric locations, and the detector is a pixel-semantic revising of position. More importantly, this work analyzes the impact of different pooling strategies (e.g., mean, maximum or minimum) on multi-scale objects, and finds the minimum pooling improve detection performance on small objects better. Compared with state-of-the-art MNC based on ResNet-101 for the standard MSCOCO 2014 baseline, our method improves detection AP of 3.8%.
△ Less
Submitted 28 September, 2020; v1 submitted 4 January, 2020;
originally announced January 2020.
-
Penalized-likelihood PET Image Reconstruction Using 3D Structural Convolutional Sparse Coding
Authors:
Nuobei Xie,
Kuang Gong,
Ning Guo,
Zhixin Qin,
Zhifang Wu,
Huafeng Liu,
Quanzheng Li
Abstract:
Positron emission tomography (PET) is widely used for clinical diagnosis. As PET suffers from low resolution and high noise, numerous efforts try to incorporate anatomical priors into PET image reconstruction, especially with the development of hybrid PET/CT and PET/MRI systems. In this work, we proposed a novel 3D structural convolutional sparse coding (CSC) concept for penalized-likelihood PET i…
▽ More
Positron emission tomography (PET) is widely used for clinical diagnosis. As PET suffers from low resolution and high noise, numerous efforts try to incorporate anatomical priors into PET image reconstruction, especially with the development of hybrid PET/CT and PET/MRI systems. In this work, we proposed a novel 3D structural convolutional sparse coding (CSC) concept for penalized-likelihood PET image reconstruction, named 3D PET-CSC. The proposed 3D PET-CSC takes advantage of the convolutional operation and manages to incorporate anatomical priors without the need of registration or supervised training. As 3D PET-CSC codes the whole 3D PET image, instead of patches, it alleviates the staircase artifacts commonly presented in traditional patch-based sparse coding methods. Moreover, we developed the residual-image and order-subset mechanisms to further reduce the computational cost and accelerate the convergence for the proposed 3D PET-CSC method. Experiments based on computer simulations and clinical datasets demonstrate the superiority of 3D PET-CSC compared with other reference methods.
△ Less
Submitted 15 December, 2019;
originally announced December 2019.
-
Multi-label Detection and Classification of Red Blood Cells in Microscopic Images
Authors:
Wei Qiu,
Jiaming Guo,
Xiang Li,
Mengjia Xu,
Mo Zhang,
Ning Guo,
Quanzheng Li
Abstract:
Cell detection and cell type classification from biomedical images play an important role for high-throughput imaging and various clinical application. While classification of single cell sample can be performed with standard computer vision and machine learning methods, analysis of multi-label samples (region containing congregating cells) is more challenging, as separation of individual cells ca…
▽ More
Cell detection and cell type classification from biomedical images play an important role for high-throughput imaging and various clinical application. While classification of single cell sample can be performed with standard computer vision and machine learning methods, analysis of multi-label samples (region containing congregating cells) is more challenging, as separation of individual cells can be difficult (e.g. touching cells) or even impossible (e.g. overlapping cells). As multi-instance images are common in analyzing Red Blood Cell (RBC) for Sickle Cell Disease (SCD) diagnosis, we develop and implement a multi-instance cell detection and classification framework to address this challenge. The framework firstly trains a region proposal model based on Region-based Convolutional Network (RCNN) to obtain bounding-boxes of regions potentially containing single or multiple cells from input microscopic images, which are extracted as image patches. High-level image features are then calculated from image patches through a pre-trained Convolutional Neural Network (CNN) with ResNet-50 structure. Using these image features inputs, six networks are then trained to make multi-label prediction of whether a given patch contains cells belonging to a specific cell type. As the six networks are trained with image patches consisting of both individual cells and touching/overlapping cells, they can effectively recognize cell types that are presented in multi-instance image samples. Finally, for the purpose of SCD testing, we train another machine learning classifier to predict whether the given image patch contains abnormal cell type based on outputs from the six networks. Testing result of the proposed framework shows that it can achieve good performance in automatic cell detection and classification.
△ Less
Submitted 14 December, 2019; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Predicting Alzheimer's Disease by Hierarchical Graph Convolution from Positron Emission Tomography Imaging
Authors:
Jiaming Guo,
Wei Qiu,
Xiang Li,
Xuandong Zhao,
Ning Guo,
Quanzheng Li
Abstract:
Imaging-based early diagnosis of Alzheimer Disease (AD) has become an effective approach, especially by using nuclear medicine imaging techniques such as Positron Emission Topography (PET). In various literature it has been found that PET images can be better modeled as signals (e.g. uptake of florbetapir) defined on a network (non-Euclidean) structure which is governed by its underlying graph pat…
▽ More
Imaging-based early diagnosis of Alzheimer Disease (AD) has become an effective approach, especially by using nuclear medicine imaging techniques such as Positron Emission Topography (PET). In various literature it has been found that PET images can be better modeled as signals (e.g. uptake of florbetapir) defined on a network (non-Euclidean) structure which is governed by its underlying graph patterns of pathological progression and metabolic connectivity. In order to effectively apply deep learning framework for PET image analysis to overcome its limitation on Euclidean grid, we develop a solution for 3D PET image representation and analysis under a generalized, graph-based CNN architecture (PETNet), which analyzes PET signals defined on a group-wise inferred graph structure. Computations in PETNet are defined in non-Euclidean, graph (network) domain, as it performs feature extraction by convolution operations on spectral-filtered signals on the graph and pooling operations based on hierarchical graph clustering. Effectiveness of the PETNet is evaluated on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, which shows improved performance over both deep learning and other machine learning-based methods.
△ Less
Submitted 30 September, 2019;
originally announced October 2019.
-
Optimal Causal Rate-Constrained Sampling of the Wiener Process
Authors:
Nian Guo,
Victoria Kostina
Abstract:
We consider the following communication scenario. An encoder causally observes the Wiener process and decides when and what to transmit about it. A decoder makes real-time estimation of the process using causally received codewords. We determine the causal encoding and decoding policies that jointly minimize the mean-square estimation error, under the long-term communication rate constraint of…
▽ More
We consider the following communication scenario. An encoder causally observes the Wiener process and decides when and what to transmit about it. A decoder makes real-time estimation of the process using causally received codewords. We determine the causal encoding and decoding policies that jointly minimize the mean-square estimation error, under the long-term communication rate constraint of $R$ bits per second. We show that an optimal encoding policy can be implemented as a causal sampling policy followed by a causal compressing policy. We prove that the optimal encoding policy samples the Wiener process once the innovation passes either $\sqrt{\frac{1}{R}}$ or $-\sqrt{\frac{1}{R}}$, and compresses the sign of the innovation (SOI) using a 1-bit codeword. The SOI coding scheme achieves the operational distortion-rate function, which is equal to $D^{\mathrm{op}}(R)=\frac{1}{6R}$. Surprisingly, this is significantly better than the distortion-rate tradeoff achieved in the limit of infinite delay by the best non-causal code. This is because the SOI coding scheme leverages the free timing information supplied by the zero-delay channel between the encoder and the decoder. The key to unlock that gain is the event-triggered nature of the SOI sampling policy. In contrast, the distortion-rate tradeoffs achieved with deterministic sampling policies are much worse: we prove that the causal informational distortion-rate function in that scenario is as high as $D_{\mathrm{DET}}(R) = \frac{5}{6R}$. It is achieved by the uniform sampling policy with the sampling interval $\frac{1}{R}$. In either case, the optimal strategy is to sample the process as fast as possible and to transmit 1-bit codewords to the decoder without delay.
△ Less
Submitted 13 May, 2020; v1 submitted 3 September, 2019;
originally announced September 2019.
-
Utilizing the Instability in Weakly Supervised Object Detection
Authors:
Yan Gao,
Boxiao Liu,
Nan Guo,
Xiaochun Ye,
Fang Wan,
Haihang You,
Dongrui Fan
Abstract:
Weakly supervised object detection (WSOD) focuses on training object detector with only image-level annotations, and is challenging due to the gap between the supervision and the objective. Most of existing approaches model WSOD as a multiple instance learning (MIL) problem. However, we observe that the result of MIL based detector is unstable, i.e., the most confident bounding boxes change signif…
▽ More
Weakly supervised object detection (WSOD) focuses on training object detector with only image-level annotations, and is challenging due to the gap between the supervision and the objective. Most of existing approaches model WSOD as a multiple instance learning (MIL) problem. However, we observe that the result of MIL based detector is unstable, i.e., the most confident bounding boxes change significantly when using different initializations. We quantitatively demonstrate the instability by introducing a metric to measure it, and empirically analyze the reason of instability. Although the instability seems harmful for detection task, we argue that it can be utilized to improve the performance by fusing the results of differently initialized detectors. To implement this idea, we propose an end-to-end framework with multiple detection branches, and introduce a simple fusion strategy. We further propose an orthogonal initialization method to increase the difference between detection branches. By utilizing the instability, we achieve 52.6% and 48.0% mAP on the challenging PASCAL VOC 2007 and 2012 datasets, which are both the new state-of-the-arts.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Memory effects teleportation of quantum Fisher information under decoherence
Authors:
Y. N. Guo,
K. zeng,
P. X. Chen
Abstract:
We have investigated how memory effects on the teleportation of quantum Fisher information(QFI) for a single qubit system using a class of X-states as resources influenced by decoherence channels with memory, including amplitude damping, phase-damping and depolarizing channels. Resort to the definition of QFI, we first derive the explicit analytical results of teleportation of QFI with respect to…
▽ More
We have investigated how memory effects on the teleportation of quantum Fisher information(QFI) for a single qubit system using a class of X-states as resources influenced by decoherence channels with memory, including amplitude damping, phase-damping and depolarizing channels. Resort to the definition of QFI, we first derive the explicit analytical results of teleportation of QFI with respect to weight parameter $θ$ and phase parameter $φ$ under the decoherence channels. Component percentages, the teleportation of QFI for a two-qubit entanglement system has also been addressed. The remarkable similarities and differences among these two situations are also analyzed in detail and some significant results are presented.
△ Less
Submitted 14 February, 2019;
originally announced February 2019.
-
Network Modeling and Pathway Inference from Incomplete Data ("PathInf")
Authors:
Xiang Li,
Qitian Chen,
Xing Wang,
Ning Guo,
Nan Wu,
Quanzheng Li
Abstract:
In this work, we developed a network inference method from incomplete data ("PathInf") , as massive and non-uniformly distributed missing values is a common challenge in practical problems. PathInf is a two-stages inference model. In the first stage, it applies a data summarization model based on maximum likelihood to deal with the massive distributed missing values by transforming the observation…
▽ More
In this work, we developed a network inference method from incomplete data ("PathInf") , as massive and non-uniformly distributed missing values is a common challenge in practical problems. PathInf is a two-stages inference model. In the first stage, it applies a data summarization model based on maximum likelihood to deal with the massive distributed missing values by transforming the observation-wise items in the data into state matrix. In the second stage, transition pattern (i.e. pathway) among variables is inferred as a graph inference problem solved by greedy algorithm with constraints. The proposed method was validated and compared with the state-of-art Bayesian network method on the simulation data, and shown consistently superior performance. By applying the PathInf on the lymph vascular metastasis data, we obtained the holistic pathways of the lymph node metastasis with novel discoveries on the jumping metastasis among nodes that are physically apart. The discovery indicates the possible presence of sentinel node groups in the lung lymph nodes which have been previously speculated yet never found. The pathway map can also improve the current dissection examination protocol for better individualized treatment planning, for higher diagnostic accuracy and reducing the patients trauma.
△ Less
Submitted 1 October, 2018;
originally announced October 2018.
-
Learning Personalized Representation for Inverse Problems in Medical Imaging Using Deep Neural Network
Authors:
Kuang Gong,
Kyungsang Kim,
Jianan Cui,
Ning Guo,
Ciprian Catana,
Jinyi Qi,
Quanzheng Li
Abstract:
Recently deep neural networks have been widely and successfully applied in computer vision tasks and attracted growing interests in medical imaging. One barrier for the application of deep neural networks to medical imaging is the need of large amounts of prior training pairs, which is not always feasible in clinical practice. In this work we propose a personalized representation learning framewor…
▽ More
Recently deep neural networks have been widely and successfully applied in computer vision tasks and attracted growing interests in medical imaging. One barrier for the application of deep neural networks to medical imaging is the need of large amounts of prior training pairs, which is not always feasible in clinical practice. In this work we propose a personalized representation learning framework where no prior training pairs are needed, but only the patient's own prior images. The representation is expressed using a deep neural network with the patient's prior images as network input. We then applied this novel image representation to inverse problems in medical imaging in which the original inverse problem was formulated as a constraint optimization problem and solved using the alternating direction method of multipliers (ADMM) algorithm. Anatomically guided brain positron emission tomography (PET) image reconstruction and image denoising were employed as examples to demonstrate the effectiveness of the proposed framework. Quantification results based on simulation and real datasets show that the proposed personalized representation framework outperform other widely adopted methods.
△ Less
Submitted 4 July, 2018;
originally announced July 2018.
-
Medical Image Segmentation Based on Multi-Modal Convolutional Neural Network: Study on Image Fusion Schemes
Authors:
Zhe Guo,
Xiang Li,
Heng Huang,
Ning Guo,
Quanzheng Li
Abstract:
Image analysis using more than one modality (i.e. multi-modal) has been increasingly applied in the field of biomedical imaging. One of the challenges in performing the multimodal analysis is that there exist multiple schemes for fusing the information from different modalities, where such schemes are application-dependent and lack a unified framework to guide their designs. In this work we firstl…
▽ More
Image analysis using more than one modality (i.e. multi-modal) has been increasingly applied in the field of biomedical imaging. One of the challenges in performing the multimodal analysis is that there exist multiple schemes for fusing the information from different modalities, where such schemes are application-dependent and lack a unified framework to guide their designs. In this work we firstly propose a conceptual architecture for the image fusion schemes in supervised biomedical image analysis: fusing at the feature level, fusing at the classifier level, and fusing at the decision-making level. Further, motivated by the recent success in applying deep learning for natural image analysis, we implement the three image fusion schemes above based on the Convolutional Neural Network (CNN) with varied structures, and combined into a single framework. The proposed image segmentation framework is capable of analyzing the multi-modality images using different fusing schemes simultaneously. The framework is applied to detect the presence of soft tissue sarcoma from the combination of Magnetic Resonance Imaging (MRI), Computed Tomography (CT) and Positron Emission Tomography (PET) images. It is found from the results that while all the fusion schemes outperform the single-modality schemes, fusing at the feature level can generally achieve the best performance in terms of both accuracy and computational cost, but also suffers from the decreased robustness in the presence of large errors in any image modalities.
△ Less
Submitted 2 November, 2017; v1 submitted 31 October, 2017;
originally announced November 2017.
-
Self-paced Convolutional Neural Network for Computer Aided Detection in Medical Imaging Analysis
Authors:
Xiang Li,
Aoxiao Zhong,
Ming Lin,
Ning Guo,
Mu Sun,
Arkadiusz Sitek,
Jieping Ye,
James Thrall,
Quanzheng Li
Abstract:
Tissue characterization has long been an important component of Computer Aided Diagnosis (CAD) systems for automatic lesion detection and further clinical planning. Motivated by the superior performance of deep learning methods on various computer vision problems, there has been increasing work applying deep learning to medical image analysis. However, the development of a robust and reliable deep…
▽ More
Tissue characterization has long been an important component of Computer Aided Diagnosis (CAD) systems for automatic lesion detection and further clinical planning. Motivated by the superior performance of deep learning methods on various computer vision problems, there has been increasing work applying deep learning to medical image analysis. However, the development of a robust and reliable deep learning model for computer-aided diagnosis is still highly challenging due to the combination of the high heterogeneity in the medical images and the relative lack of training samples. Specifically, annotation and labeling of the medical images is much more expensive and time-consuming than other applications and often involves manual labor from multiple domain experts. In this work, we propose a multi-stage, self-paced learning framework utilizing a convolutional neural network (CNN) to classify Computed Tomography (CT) image patches. The key contribution of this approach is that we augment the size of training samples by refining the unlabeled instances with a self-paced learning CNN. By implementing the framework on high performance computing servers including the NVIDIA DGX1 machine, we obtained the experimental result, showing that the self-pace boosted network consistently outperformed the original network even with very scarce manual labels. The performance gain indicates that applications with limited training samples such as medical image analysis can benefit from using the proposed framework.
△ Less
Submitted 19 July, 2017;
originally announced July 2017.
-
A Combinatorial Methodology for Optimizing Non-Binary Graph-Based Codes: Theoretical Analysis and Applications in Data Storage
Authors:
Ahmed Hareedy,
Chinmayi Lanka,
Nian Guo,
Lara Dolecek
Abstract:
Non-binary (NB) low-density parity-check (LDPC) codes are graph-based codes that are increasingly being considered as a powerful error correction tool for modern dense storage devices. The increasing levels of asymmetry incorporated by the channels underlying modern dense storage systems exacerbates the error floor problem. In a recent research, the weight consistency matrix (WCM) framework was in…
▽ More
Non-binary (NB) low-density parity-check (LDPC) codes are graph-based codes that are increasingly being considered as a powerful error correction tool for modern dense storage devices. The increasing levels of asymmetry incorporated by the channels underlying modern dense storage systems exacerbates the error floor problem. In a recent research, the weight consistency matrix (WCM) framework was introduced as an effective NB-LDPC code optimization methodology that is suitable for modern Flash memory and magnetic recording (MR) systems. In this paper, we provide the in-depth theoretical analysis needed to understand and properly apply the WCM framework. We focus on general absorbing sets of type two (GASTs). In particular, we introduce a novel tree representation of a GAST called the unlabeled GAST tree, using which we prove that the WCM framework is optimal. Then, we enumerate the WCMs. We demonstrate the significance of the savings achieved by the WCM framework in the number of matrices processed to remove a GAST. Moreover, we provide a linear-algebraic analysis of the null spaces of WCMs associated with a GAST. We derive the minimum number of edge weight changes needed to remove a GAST via its WCMs, along with how to choose these changes. Additionally, we propose a new set of problematic objects, namely the oscillating sets of type two (OSTs), which contribute to the error floor of NB-LDPC codes with even column weights on asymmetric channels, and we show how to customize the WCM framework to remove OSTs. We also extend the domain of the WCM framework applications by demonstrating its benefits in optimizing column weight 5 codes, codes used over Flash channels with soft information, and spatially-coupled codes. The performance gains achieved via the WCM framework range between 1 and nearly 2.5 orders of magnitude in the error floor region over interesting channels.
△ Less
Submitted 20 September, 2019; v1 submitted 22 June, 2017;
originally announced June 2017.
-
Demonstration of Spectrum Sensing with Blindly Learned Feature
Authors:
Peng Zhang,
Robert Qiu,
Nan Guo
Abstract:
Spectrum sensing is essential in cognitive radio. By defining leading \textit{eigenvector} as feature, we introduce a blind feature learning algorithm (FLA) and a feature template matching (FTM) algorithm using learned feature for spectrum sensing. We implement both algorithms on Lyrtech software defined radio platform. Hardware experiment is performed to verify that feature can be learned blindly…
▽ More
Spectrum sensing is essential in cognitive radio. By defining leading \textit{eigenvector} as feature, we introduce a blind feature learning algorithm (FLA) and a feature template matching (FTM) algorithm using learned feature for spectrum sensing. We implement both algorithms on Lyrtech software defined radio platform. Hardware experiment is performed to verify that feature can be learned blindly. We compare FTM with a blind detector in hardware and the results show that the detection performance for FTM is about 3 dB better.
△ Less
Submitted 24 February, 2011;
originally announced February 2011.