Search | arXiv e-print repository

PretVM: Predictable, Efficient Virtual Machine for Real-Time Concurrency

Authors: Shaokai Lin, Erling Jellum, Mirco Theile, Tassilo Tanneberger, Binqi Sun, Chadlia Jerad, Ruomu Xu, Guangyu Feng, Christian Menard, Marten Lohstroh, Jeronimo Castrillon, Sanjit Seshia, Edward Lee

Abstract: This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with wel… ▽ More This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with well-defined worst-case timing bounds. The PretVM provides a clean separation between application logic and coordination logic, yielding more analyzable program executions. Experiments compare the PretVM against the default (more dynamic) LF scheduler and show that it delivers time-accurate deterministic execution. △ Less

Submitted 25 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.15187 [pdf, other]

Chance-Constrained Economic Dispatch with Flexible Loads and RES

Authors: Tian Liu, Bo Sun, Xiaoqi Tan, Danny H. K. Tsang

Abstract: With the increasing penetration of intermittent renewable energy sources (RESs), it becomes increasingly challenging to maintain the supply-demand balance of power systems by solely relying on the generation side. To combat the volatility led by the uncertain RESs, demand-side management by leveraging the multi-dimensional flexibility (MDF) has been recognized as an economic and efficient approach… ▽ More With the increasing penetration of intermittent renewable energy sources (RESs), it becomes increasingly challenging to maintain the supply-demand balance of power systems by solely relying on the generation side. To combat the volatility led by the uncertain RESs, demand-side management by leveraging the multi-dimensional flexibility (MDF) has been recognized as an economic and efficient approach. Thus, it is important to integrate MDF into existing power systems. In this paper, we propose an enhanced day-ahead energy market, where the MDFs of aggregate loads are traded to minimize the generation cost and mitigate the volatility of locational marginal prices (LMPs) in the transmission network. We first explicitly capture the negative impact of the uncertainty from RESs on the day-ahead market by a chance-constrained economic dispatch problem (CEDP). Then, we propose a bidding mechanism for the MDF of the aggregate loads and combine this mechanism into the CEDP for the day-ahead market. Through multiple case studies, we show that MDF from load aggregators can reduce the volatility of LMPs. In addition, we identify the values of the different flexibilities in the MDF bids, which provide useful insights into the design of more complex MDF markets. △ Less

Submitted 4 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.15093 [pdf, other]

Real-Time and Accurate: Zero-shot High-Fidelity Singing Voice Conversion with Multi-Condition Flow Synthesis

Authors: Hui Li, Hongyu Wang, Zhijin Chen, Bohan Sun, Bo Li

Abstract: Singing voice conversion is to convert the source singing voice into the target singing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in speech proces… ▽ More Singing voice conversion is to convert the source singing voice into the target singing voice except for the content. Currently, flow-based models can complete the task of voice conversion, but they struggle to effectively extract latent variables in the more rhythmically rich and emotionally expressive task of singing voice conversion, while also facing issues with low efficiency in speech processing. In this paper, we propose a high-fidelity flow-based model based on multi-decoupling feature constraints called RASVC, which enhances the capture of vocal details by integrating multiple latent attribute encoders. We also use Multi-stream inverse short-time Fourier transform(MS-iSTFT) to enhance the speed of speech processing by skipping some complicated decoder processing steps. We compare the synthesized singing voice with other models from multiple dimensions, and our proposed model is highly consistent with the current state-of-the-art, with the demo which is available at \url{https://lazycat1119.github.io/RASVC-demo/}. △ Less

Submitted 9 September, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 5 pages,3 figures

arXiv:2402.19085 [pdf, other]

Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment

Authors: Yiju Guo, Ganqu Cui, Lifan Yuan, Ning Ding, Jiexin Wang, Huimin Chen, Bowen Sun, Ruobing Xie, Jie Zhou, Yankai Lin, Zhiyuan Liu, Maosong Sun

Abstract: Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, exi… ▽ More Alignment in artificial intelligence pursues the consistency between model responses and human preferences as well as values. In practice, the multifaceted nature of human preferences inadvertently introduces what is known as the "alignment tax" -a compromise where enhancements in alignment within one objective (e.g.,harmlessness) can diminish performance in others (e.g.,helpfulness). However, existing alignment techniques are mostly unidirectional, leading to suboptimal trade-offs and poor flexibility over various objectives. To navigate this challenge, we argue the prominence of grounding LLMs with evident preferences. We introduce controllable preference optimization (CPO), which explicitly specifies preference scores for different objectives, thereby guiding the model to generate responses that meet the requirements. Our experimental analysis reveals that the aligned models can provide responses that match various preferences among the "3H" (helpfulness, honesty, harmlessness) desiderata. Furthermore, by introducing diverse data and alignment goals, we surpass baseline methods in aligning with single objectives, hence mitigating the impact of the alignment tax and achieving Pareto improvements in multi-objective alignment. △ Less

Submitted 29 February, 2024; originally announced February 2024.

arXiv:2209.09776 [pdf, ps, other]

IRS Assisted NOMA Aided Mobile Edge Computing with Queue Stability: Heterogeneous Multi-Agent Reinforcement Learning

Authors: Jiadong Yu, Yang Li, Xiaolan Liu, Bo Sun, Yuan Wu, Danny H. K. Tsang

Abstract: By employing powerful edge servers for data processing, mobile edge computing (MEC) has been recognized as a promising technology to support emerging computation-intensive applications. Besides, non-orthogonal multiple access (NOMA)-aided MEC system can further enhance the spectral-efficiency with massive tasks offloading. However, with more dynamic devices brought online and the uncontrollable st… ▽ More By employing powerful edge servers for data processing, mobile edge computing (MEC) has been recognized as a promising technology to support emerging computation-intensive applications. Besides, non-orthogonal multiple access (NOMA)-aided MEC system can further enhance the spectral-efficiency with massive tasks offloading. However, with more dynamic devices brought online and the uncontrollable stochastic channel environment, it is even desirable to deploy appealing technique, i.e., intelligent reflecting surfaces (IRS), in the MEC system to flexibly tune the communication environment and improve the system energy efficiency. In this paper, we investigate the joint offloading, communication and computation resource allocation for IRS-assisted NOMA MEC system. We firstly formulate a mixed integer energy efficiency maximization problem with system queue stability constraint. We then propose the Lyapunov-function-based Mixed Integer Deep Deterministic Policy Gradient (LMIDDPG) algorithm which is based on the centralized reinforcement learning (RL) framework. To be specific, we design the mixed integer action space mapping which contains both continuous mapping and integer mapping. Moreover, the award function is defined as the upper-bound of the Lyapunov drift-plus-penalty function. To enable end devices (EDs) to choose actions independently at the execution stage, we further propose the Heterogeneous Multi-agent LMIDDPG (HMA-LMIDDPG) algorithm based on distributed RL framework with homogeneous EDs and heterogeneous base station (BS) as heterogeneous multi-agent. Numerical results show that our proposed algorithms can achieve superior energy efficiency performance to the benchmark algorithms while maintaining the queue stability. Specially, the distributed structure HMA-LMIDDPG can acquire more energy efficiency gain than centralized structure LMIDDPG. △ Less

Submitted 20 September, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2206.04264 [pdf, other]

Formation Tracking for a Multi-Auv System Based on an Adaptive Sliding Mode Method in the Water Flow Environment

Authors: Xin Li, Daqi Zhu, Bing Sun, Qi Chen, Wenyang Gan, Zhigang Li

Abstract: In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the im… ▽ More In this paper, formation tracking for a multi-AUV system (MAS) using an improved adaptive sliding mode control method is studied in the Three Dimensional (3-D) underwater environment. Firstly, the kinematics model and the dynamic model of the AUVs are given as the Six Dimensions of Freedom (6-DOF) considered. Then, control law based on the mathematical model of the AUVs is proposed based on the improved sliding mode method. A second order sliding mode control method is adopted to eliminate the chatting phenomenon of the controller. Thirdly, considering the water flow in the underwater working environment of the AUVs, an adaptive module is added to the controller. With the adaptive approach, the finite disturbances caused by water flow could be handled with the controller. The proposed method achieves stability by substituting an adaptive continuous term for the switching term in the controller. At last, a robust sliding mode controller with continuous model predictive control strategy for the multi-AUV system is developed to achieve leader-follower formation tracking under the presence of bounded flow disturbances, and simulations are implemented to confirm the effectiveness of the proposed method. △ Less

Submitted 17 January, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

arXiv:2203.08921 [pdf, other]

Hybrid Pixel-Unshuffled Network for Lightweight Image Super-Resolution

Authors: Bin Sun, Yulun Zhang, Songyao Jiang, Yun Fu

Abstract: Convolutional neural network (CNN) has achieved great success on image super-resolution (SR). However, most deep CNN-based SR models take massive computations to obtain high performance. Downsampling features for multi-resolution fusion is an efficient and effective way to improve the performance of visual recognition. Still, it is counter-intuitive in the SR task, which needs to project a low-res… ▽ More Convolutional neural network (CNN) has achieved great success on image super-resolution (SR). However, most deep CNN-based SR models take massive computations to obtain high performance. Downsampling features for multi-resolution fusion is an efficient and effective way to improve the performance of visual recognition. Still, it is counter-intuitive in the SR task, which needs to project a low-resolution input to high-resolution. In this paper, we propose a novel Hybrid Pixel-Unshuffled Network (HPUN) by introducing an efficient and effective downsampling module into the SR task. The network contains pixel-unshuffled downsampling and Self-Residual Depthwise Separable Convolutions. Specifically, we utilize pixel-unshuffle operation to downsample the input features and use grouped convolution to reduce the channels. Besides, we enhance the depthwise convolution's performance by adding the input feature to its output. Experiments on benchmark datasets show that our HPUN achieves and surpasses the state-of-the-art reconstruction performance with fewer parameters and computation costs. △ Less

Submitted 29 November, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

arXiv:2203.08477 [pdf, ps, other]

Emotion Recognition using Machine Learning and ECG signals

Authors: Bo Sun, Zihuai Lin

Abstract: Various emotions can produce variations in electrocardiograph (ECG) signals, distinct emotions can be distinguished by different changes in ECG signals. This study is about emotion recognition using ECG signals. Data for four emotions, namely happy, exciting, calm, and tense, is gathered. The raw data is then de-noised with a finite impulse filter. We use the Discrete Cosine Transform (DCT) to ext… ▽ More Various emotions can produce variations in electrocardiograph (ECG) signals, distinct emotions can be distinguished by different changes in ECG signals. This study is about emotion recognition using ECG signals. Data for four emotions, namely happy, exciting, calm, and tense, is gathered. The raw data is then de-noised with a finite impulse filter. We use the Discrete Cosine Transform (DCT) to extract characteristics from the obtained data to increase the accuracy of emotion recognition. The classifiers Support Vector Machine (SVM), Random Forest, and K-NN are explored. To find the optimal parameters for the SVM classifier, the Particle Swarm Optimization (PSO) technique is used. The results of the comparison of these classification methods demonstrate that the SVM approach has a greater accuracy in emotion recognition, which may be applied in practice △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2202.13650 [pdf]

Improved Sensing and Positioning via 5G and mmWave radar for Airport Surveillance

Authors: Bo Tan, Elena Simona Lohan, Bo Sun, Wenbo Wang, Taylan Yesilyurt, Christophe Morlaas, Carlos David Morales Pena, Kanaan Abdo, Fathia Ben Slama, Alexandre Simonin, Mohamed Ellejmi

Abstract: This paper explores an integrated approach for improved sensing and positioning with applications in air traffic management (ATM) and in the Advanced Surface Movement Guidance and Control System (A-SMGCS). The integrated approach includes the synergy of 3D Vector Antenna with the novel time-of-arrival and angle-of-arrival estimate methods for accurate positioning, combining the sensing on the sub-… ▽ More This paper explores an integrated approach for improved sensing and positioning with applications in air traffic management (ATM) and in the Advanced Surface Movement Guidance and Control System (A-SMGCS). The integrated approach includes the synergy of 3D Vector Antenna with the novel time-of-arrival and angle-of-arrival estimate methods for accurate positioning, combining the sensing on the sub-6GHz and mmWave spectrum for the enhanced non-cooperative surveillance. For the positioning scope, both uplink and downlink 5G reference signals are investigated and their performance is evaluated. For the non-cooperative sensing scope, a novel 5G-signal-based imaging function is proposed and verified with realistic airport radio-propagation modelling and the AI-based targets tracking-and-motion recognition are investigated. The 5G-based imaging and mmWave radar based detection can be potentially fused to enhance surveillance in the airport. The work is being done within the European-funded project NewSense and it delves into the 5G, Vector Antennas, and mmWave capabilities for future ATM solutions. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 8 pages, 15 figures

arXiv:2202.02487 [pdf, other]

An Olfactory EEG Signal Classification Network Based on Frequency Band Feature Extraction

Authors: Biao Sun, Zhigang Wei, Pei Liang, Huirang Hou

Abstract: Classification of olfactory-induced electroencephalogram (EEG) signals has shown great potential in many fields. Since different frequency bands within the EEG signals contain different information, extracting specific frequency bands for classification performance is important. Moreover, due to the large inter-subject variability of the EEG signals, extracting frequency bands with subject-specifi… ▽ More Classification of olfactory-induced electroencephalogram (EEG) signals has shown great potential in many fields. Since different frequency bands within the EEG signals contain different information, extracting specific frequency bands for classification performance is important. Moreover, due to the large inter-subject variability of the EEG signals, extracting frequency bands with subject-specific information rather than general information is crucial. Considering these, the focus of this letter is to classify the olfactory EEG signals by exploiting the spectral-domain information of specific frequency bands. In this letter, we present an olfactory EEG signal classification network based on frequency band feature extraction. A frequency band generator is first designed to extract frequency bands via the sliding window technique. Then, a frequency band attention mechanism is proposed to optimize frequency bands for a specific subject adaptively. Last, a convolutional neural network (CNN) is constructed to extract the spatio-spectral information and predict the EEG category. Comparison experiment results reveal that the proposed method outperforms a series of baseline methods in terms of both classification quality and inter-subject robustness. Ablation experiment results demonstrate the effectiveness of each component of the proposed method. △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2201.03005 [pdf]

Using Wi-Fi Signal as Sensing Medium: Passive Radar, Channel State Information and Followups

Authors: Bo Tan, Bo Sun

Abstract: The idea of exploiting the Wi-Fi bursts as the medium for sensing purposes, particularly for the human targets in the indoor environment, was cultivated in both radar and computer science communities and it has became a noticeable research genre with cross-disciplinary impact in security, healthcare, human-machine interaction etc.This article comparatively introduces passive radar based and channe… ▽ More The idea of exploiting the Wi-Fi bursts as the medium for sensing purposes, particularly for the human targets in the indoor environment, was cultivated in both radar and computer science communities and it has became a noticeable research genre with cross-disciplinary impact in security, healthcare, human-machine interaction etc.This article comparatively introduces passive radar based and channel state information (CSI) based approaches. For each means, the primary design principles, signal processing and representative applications scenarios are shown. At last, some opportunities and challenges of Wi-Fi sensing are pointed out for the sake of stepping closer to the practitioners and end-users. △ Less

Submitted 9 January, 2022; originally announced January 2022.

Comments: 4 pages, 3 figures

arXiv:2109.14863 [pdf, other]

HLIC: Harmonizing Optimization Metrics in Learned Image Compression by Reinforcement Learning

Authors: Baocheng Sun, Meng Gu, Dailan He, Tongda Xu, Yan Wang, Hongwei Qin

Abstract: Learned image compression is making good progress in recent years. Peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) are the two most popular evaluation metrics. As different metrics only reflect certain aspects of human perception, works in this field normally optimize two models using PSNR and MS-SSIM as loss function separately, which is suboptimal and makes it d… ▽ More Learned image compression is making good progress in recent years. Peak signal-to-noise ratio (PSNR) and multi-scale structural similarity (MS-SSIM) are the two most popular evaluation metrics. As different metrics only reflect certain aspects of human perception, works in this field normally optimize two models using PSNR and MS-SSIM as loss function separately, which is suboptimal and makes it difficult to select the model with best visual quality or overall performance. Towards solving this problem, we propose to Harmonize optimization metrics in Learned Image Compression (HLIC) using online loss function adaptation by reinforcement learning. By doing so, we are able to leverage the advantages of both PSNR and MS-SSIM, achieving better visual quality and higher VMAF score. To our knowledge, our work is the first to explore automatic loss function adaptation for harmonizing optimization metrics in low level vision tasks like learned image compression. △ Less

Submitted 30 September, 2021; originally announced September 2021.

Comments: working paper

arXiv:2107.07161 [pdf, other]

Deep Learning Based OFDM Channel Estimation Using Frequency-Time Division and Attention Mechanism

Authors: Ang Yang, Peng Sun, Tamrakar Rakesh, Bule Sun, Fei Qin

Abstract: In this paper, we propose a frequency-time division network (FreqTimeNet) to improve the performance of deep learning (DL) based OFDM channel estimation. This FreqTimeNet is designed based on the orthogonality between the frequency domain and the time domain. In FreqTimeNet, the input is processed by parallel frequency blocks and parallel time blocks sequentially. By introducing the attention mech… ▽ More In this paper, we propose a frequency-time division network (FreqTimeNet) to improve the performance of deep learning (DL) based OFDM channel estimation. This FreqTimeNet is designed based on the orthogonality between the frequency domain and the time domain. In FreqTimeNet, the input is processed by parallel frequency blocks and parallel time blocks sequentially. By introducing the attention mechanism using the SNR information, an attention based FreqTimeNet (AttenFreqTimeNet) is proposed. Using 3rd Generation Partnership Project (3GPP) channel models, the mean square error (MSE) performance of FreqTimeNet and AttenFreqTimeNet under different scenarios is evaluated. A method for constructing mixed training data is proposed, which could address the generalization problem in DL. It is observed that AttenFreqTimeNet outperforms FreqTimeNet, and FreqTimeNet outperforms other DL networks with reasonable complexity. △ Less

Submitted 30 September, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

Comments: 2021 IEEE Globecom Workshops (GC Wkshps): Workshop on Towards Native-AI Wireless Networks

arXiv:2107.04847 [pdf]

doi 10.1002/mp.15287

Weaving Attention U-net: A Novel Hybrid CNN and Attention-based Method for Organs-at-risk Segmentation in Head and Neck CT Images

Authors: Zhuangzhuang Zhang, Tianyu Zhao, Hiram Gay, Weixiong Zhang, Baozhou Sun

Abstract: In radiotherapy planning, manual contouring is labor-intensive and time-consuming. Accurate and robust automated segmentation models improve the efficiency and treatment outcome. We aim to develop a novel hybrid deep learning approach, combining convolutional neural networks (CNNs) and the self-attention mechanism, for rapid and accurate multi-organ segmentation on head and neck computed tomograph… ▽ More In radiotherapy planning, manual contouring is labor-intensive and time-consuming. Accurate and robust automated segmentation models improve the efficiency and treatment outcome. We aim to develop a novel hybrid deep learning approach, combining convolutional neural networks (CNNs) and the self-attention mechanism, for rapid and accurate multi-organ segmentation on head and neck computed tomography (CT) images. Head and neck CT images with manual contours of 115 patients were retrospectively collected and used. We set the training/validation/testing ratio to 81/9/25 and used the 10-fold cross-validation strategy to select the best model parameters. The proposed hybrid model segmented ten organs-at-risk (OARs) altogether for each case. The performance of the model was evaluated by three metrics, i.e., the Dice Similarity Coefficient (DSC), Hausdorff distance 95% (HD95), and mean surface distance (MSD). We also tested the performance of the model on the Head and Neck 2015 challenge dataset and compared it against several state-of-the-art automated segmentation algorithms. The proposed method generated contours that closely resemble the ground truth for ten OARs. Our results of the new Weaving Attention U-net demonstrate superior or similar performance on the segmentation of head and neck CT images. △ Less

Submitted 22 September, 2021; v1 submitted 10 July, 2021; originally announced July 2021.

Comments: 12 pages, 5 figures

arXiv:2103.15306 [pdf, other]

Checkerboard Context Model for Efficient Learned Image Compression

Authors: Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, Hongwei Qin

Abstract: For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However, the decoding process must be done in a strict scan order, which breaks the parallelization. We propose a parallelizable checkerboard context model (CCM) to solve the problem. Our tw… ▽ More For learned image compression, the autoregressive context model is proved effective in improving the rate-distortion (RD) performance. Because it helps remove spatial redundancies among latent representations. However, the decoding process must be done in a strict scan order, which breaks the parallelization. We propose a parallelizable checkerboard context model (CCM) to solve the problem. Our two-pass checkerboard context calculation eliminates such limitations on spatial locations by re-organizing the decoding order. Speeding up the decoding process more than 40 times in our experiments, it achieves significantly improved computational efficiency with almost the same rate-distortion performance. To the best of our knowledge, this is the first exploration on parallelization-friendly spatial context model for learned image compression. △ Less

Submitted 1 April, 2021; v1 submitted 28 March, 2021; originally announced March 2021.

Comments: CVPR 2021

arXiv:2103.14708 [pdf, other]

Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB

Authors: Bo Sun, Junchi Yan, Xiao Zhou, Yinqiang Zheng

Abstract: To reconstruct spectral signals from multi-channel observations, in particular trichromatic RGBs, has recently emerged as a promising alternative to traditional scanning-based spectral imager. It has been proven that the reconstruction accuracy relies heavily on the spectral response of the RGB camera in use. To improve accuracy, data-driven algorithms have been proposed to retrieve the best respo… ▽ More To reconstruct spectral signals from multi-channel observations, in particular trichromatic RGBs, has recently emerged as a promising alternative to traditional scanning-based spectral imager. It has been proven that the reconstruction accuracy relies heavily on the spectral response of the RGB camera in use. To improve accuracy, data-driven algorithms have been proposed to retrieve the best response curves of existing RGB cameras, or even to design brand new three-channel response curves. Instead, this paper explores the filter-array based color imaging mechanism of existing RGB cameras, and proposes to design the IR-cut filter properly for improved spectral recovery, which stands out as an in-between solution with better trade-off between reconstruction accuracy and implementation complexity. We further propose a deep learning based spectral reconstruction method, which allows to recover the illumination spectrum as well. Experiment results with both synthetic and real images under daylight illumination have shown the benefits of our IR-cut filter tuning method and our illumination-aware spectral reconstruction method. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: CVPR 2021 - Oral

arXiv:2101.10444 [pdf, ps, other]

GnetSeg: Semantic Segmentation Model Optimized on a 224mW CNN Accelerator Chip at the Speed of 318FPS

Authors: Baohua Sun, Weixiong Lin, Hao Sha, Jiapeng Su

Abstract: Semantic segmentation is the task to cluster pixels on an image belonging to the same class. It is widely used in the real-world applications including autonomous driving, medical imaging analysis, industrial inspection, smartphone camera for person segmentation and so on. Accelerating the semantic segmentation models on the mobile and edge devices are practical needs for the industry. Recent year… ▽ More Semantic segmentation is the task to cluster pixels on an image belonging to the same class. It is widely used in the real-world applications including autonomous driving, medical imaging analysis, industrial inspection, smartphone camera for person segmentation and so on. Accelerating the semantic segmentation models on the mobile and edge devices are practical needs for the industry. Recent years have witnessed the wide availability of CNN (Convolutional Neural Networks) accelerators. They have the advantages on power efficiency, inference speed, which are ideal for accelerating the semantic segmentation models on the edge devices. However, the CNN accelerator chips also have the limitations on flexibility and memory. In addition, the CPU load is very critical because the CNN accelerator chip works as a co-processor with a host CPU. In this paper, we optimize the semantic segmentation model in order to fully utilize the limited memory and the supported operators on the CNN accelerator chips, and at the same time reduce the CPU load of the CNN model to zero. The resulting model is called GnetSeg. Furthermore, we propose the integer encoding for the mask of the GnetSeg model, which minimizes the latency of data transfer between the CNN accelerator and the host CPU. The experimental result shows that the model running on the 224mW chip achieves the speed of 318FPS with excellent accuracy for applications such as person segmentation. △ Less

Submitted 9 January, 2021; originally announced January 2021.

Comments: 7 pages, 3 figures, and 2 tables

arXiv:2012.11261 [pdf, other]

doi 10.1109/TSG.2021.3094719

Learning-Based Predictive Control via Real-Time Aggregate Flexibility

Authors: Tongxin Li, Bo Sun, Yue Chen, Zixin Ye, Steven H. Low, Adam Wierman

Abstract: Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. To be used effectively, an aggregator must be able to communicate the available flexibility of the loads they control, as known as the aggregate flexibility to a system operator. However, most of existing aggregate flexibility measures often are slow-timescale estimations and much less attention has… ▽ More Aggregators have emerged as crucial tools for the coordination of distributed, controllable loads. To be used effectively, an aggregator must be able to communicate the available flexibility of the loads they control, as known as the aggregate flexibility to a system operator. However, most of existing aggregate flexibility measures often are slow-timescale estimations and much less attention has been paid to real-time coordination between an aggregator and an operator. In this paper, we consider solving an online optimization in a closed-loop system and present a design of real-time aggregate flexibility feedback, termed the maximum entropy feedback (MEF). In addition to deriving analytic properties of the MEF, combining learning and control, we show that it can be approximated using reinforcement learning and used as a penalty term in a novel control algorithm -- the penalized predictive control (PPC), which modifies vanilla model predictive control (MPC). The benefits of our scheme are (1). Efficient Communication. An operator running PPC does not need to know the exact states and constraints of the loads, but only the MEF. (2). Fast Computation. The PPC often has much less number of variables than an MPC formulation. (3). Lower Costs. We show that under certain regularity assumptions, the PPC is optimal. We illustrate the efficacy of the PPC using a dataset from an adaptive electric vehicle charging network and show that PPC outperforms classical MPC. △ Less

Submitted 31 May, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: 13 pages, 5 figures, extension of arXiv:2006.13814

arXiv:2012.02033 [pdf, ps, other]

SuperOCR: A Conversion from Optical Character Recognition to Image Captioning

Authors: Baohua Sun, Michael Lin, Hao Sha, Lin Yang

Abstract: Optical Character Recognition (OCR) has many real world applications. The existing methods normally detect where the characters are, and then recognize the character for each detected location. Thus the accuracy of characters recognition is impacted by the performance of characters detection. In this paper, we propose a method for recognizing characters without detecting the location of each chara… ▽ More Optical Character Recognition (OCR) has many real world applications. The existing methods normally detect where the characters are, and then recognize the character for each detected location. Thus the accuracy of characters recognition is impacted by the performance of characters detection. In this paper, we propose a method for recognizing characters without detecting the location of each character. This is done by converting the OCR task into an image captioning task. One advantage of the proposed method is that the labeled bounding boxes for the characters are not needed during training. The experimental results show the proposed method outperforms the existing methods on both the license plate recognition and the watermeter character recognition tasks. The proposed method is also deployed into a low-power (300mW) CNN accelerator chip connected to a Raspberry Pi 3 for on-device applications. △ Less

Submitted 21 November, 2020; originally announced December 2020.

Comments: 8 pages, 2 figures, 2 tables

arXiv:2011.06984 [pdf]

Metastatic Cancer Image Classification Based On Deep Learning Method

Authors: Guanwen Qiu, Xiaobing Yu, Baolin Sun, Yunpeng Wang, Lipei Zhang

Abstract: Using histopathological images to automatically classify cancer is a difficult task for accurately detecting cancer, especially to identify metastatic cancer in small image patches obtained from larger digital pathology scans. Computer diagnosis technology has attracted wide attention from researchers. In this paper, we propose a noval method which combines the deep learning algorithm in image cla… ▽ More Using histopathological images to automatically classify cancer is a difficult task for accurately detecting cancer, especially to identify metastatic cancer in small image patches obtained from larger digital pathology scans. Computer diagnosis technology has attracted wide attention from researchers. In this paper, we propose a noval method which combines the deep learning algorithm in image classification, the DenseNet169 framework and Rectified Adam optimization algorithm. The connectivity pattern of DenseNet is direct connections from any layer to all consecutive layers, which can effectively improve the information flow between different layers. With the fact that RAdam is not easy to fall into a local optimal solution, and it can converge quickly in model training. The experimental results shows that our model achieves superior performance over the other classical convolutional neural networks approaches, such as Vgg19, Resnet34, Resnet50. In particular, the Auc-Roc score of our DenseNet169 model is 1.77% higher than Vgg19 model, and the Accuracy score is 1.50% higher. Moreover, we also study the relationship between loss value and batches processed during the training stage and validation stage, and obtain some important and interesting findings. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: 4 pages, 3 figures, 1 table, accepted by ICCECE

arXiv:2011.05182 [pdf, other]

Quantitative imaging for complex-objects via a single-pixel detector

Authors: Xianye Li, Yafei sun, Yikang He, Xun Li, Baoqing Sun

Abstract: Quantitative phase imaging (QPI) is important in many applications such as microscopy and crystallography. To quantitatively reveal phase information, people could either employ interference to map phase distribution into intensity fringes, or analyze intensity-only diffraction patterns through phase retrieval algorithms. Traditionally, both of these two ways use pixelated detectors. In this work,… ▽ More Quantitative phase imaging (QPI) is important in many applications such as microscopy and crystallography. To quantitatively reveal phase information, people could either employ interference to map phase distribution into intensity fringes, or analyze intensity-only diffraction patterns through phase retrieval algorithms. Traditionally, both of these two ways use pixelated detectors. In this work, a novel QPI scheme is reported inspired by single-pixel camera (SPC), which adopts the principle of SPC that retrieves images through structured illumination and corresponding single-pixel signals. Particularly for complex-valued imaging, the structured illumination is performed in the phase domain, and a point detector with restricted sensor size detects the intensity of zero-frequency area. Based on the illumination structures and point signals, a complex image is reconstructed by running a phase retrieval algorithm. This approach is universal for various wavelengths, and needs no a priori information of the targets. Both simulation and experiment show that our single-pixel QPI scheme exhibits great performance even with objects in an extremely rough phase distribution. △ Less

Submitted 10 November, 2020; originally announced November 2020.

arXiv:2008.04488 [pdf]

doi 10.1002/mp.14580

ARPM-net: A novel CNN-based adversarial method with Markov Random Field enhancement for prostate and organs at risk segmentation in pelvic CT images

Authors: Zhuangzhuang Zhang, Tianyu Zhao, Hiram Gay, Weixiong Zhang, Baozhou Sun

Abstract: Purpose: The research is to develop a novel CNN-based adversarial deep learning method to improve and expedite the multi-organ semantic segmentation of CT images, and to generate accurate contours on pelvic CT images. Methods: Planning CT and structure datasets for 120 patients with intact prostate cancer were retrospectively selected and divided for 10-fold cross-validation. The proposed adversar… ▽ More Purpose: The research is to develop a novel CNN-based adversarial deep learning method to improve and expedite the multi-organ semantic segmentation of CT images, and to generate accurate contours on pelvic CT images. Methods: Planning CT and structure datasets for 120 patients with intact prostate cancer were retrospectively selected and divided for 10-fold cross-validation. The proposed adversarial multi-residual multi-scale pooling Markov Random Field (MRF) enhanced network (ARPM-net) implements an adversarial training scheme. A segmentation network and a discriminator network were trained jointly, and only the segmentation network was used for prediction. The segmentation network integrates a newly designed MRF block into a variation of multi-residual U-net. The discriminator takes the product of the original CT and the prediction/ground-truth as input and classifies the input into fake/real. The segmentation network and discriminator network can be trained jointly as a whole, or the discriminator can be used for fine-tuning after the segmentation network is coarsely trained. Multi-scale pooling layers were introduced to preserve spatial resolution during pooling using less memory compared to atrous convolution layers. An adaptive loss function was proposed to enhance the training on small or low contrast organs. The accuracy of modeled contours was measured with the Dice similarity coefficient (DSC), Average Hausdorff Distance (AHD), Average Surface Hausdorff Distance (ASHD), and relative Volume Difference (VD) using clinical contours as references to the ground-truth. The proposed ARPM-net method was compared to several stateof-the-art deep learning methods. △ Less

Submitted 17 September, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

Comments: 21 pages, 8 figures; accepted as a journal article at Medical Physics; abstract presented at AAPM 2020

MSC Class: 68T07(Primary); 68T45(Secondary)

arXiv:2008.03426 [pdf, other]

Recent Advances and New Guidelines on Hyperspectral and Multispectral Image Fusion

Authors: Renwei Dian, Shutao Li, Bin Sun, Anjing Guo

Abstract: Hyperspectral image (HSI) with high spectral resolution often suffers from low spatial resolution owing to the limitations of imaging sensors. Image fusion is an effective and economical way to enhance the spatial resolution of HSI, which combines HSI with higher spatial resolution multispectral image (MSI) of the same scenario. In the past years, many HSI and MSI fusion algorithms are introduced… ▽ More Hyperspectral image (HSI) with high spectral resolution often suffers from low spatial resolution owing to the limitations of imaging sensors. Image fusion is an effective and economical way to enhance the spatial resolution of HSI, which combines HSI with higher spatial resolution multispectral image (MSI) of the same scenario. In the past years, many HSI and MSI fusion algorithms are introduced to obtain high-resolution HSI. However, it lacks a full-scale review for the newly proposed HSI and MSI fusion approaches. To tackle this problem,this work gives a comprehensive review and new guidelines for HSI-MSI fusion. According to the characteristics of HSI-MSI fusion methods, they are categorized as four categories, including pan-sharpening based approaches, matrix factorization based approaches, tensor representation based approaches, and deep convolution neural network based approaches. We make a detailed introduction, discussions, and comparison for the fusion methods in each category. Additionally, the existing challenges and possible future directions for the HSI-MSI fusion are presented. △ Less

Submitted 7 August, 2020; originally announced August 2020.

arXiv:2006.14508 [pdf, ps, other]

Interference Cancellation Based Channel Estimation for Massive MIMO Systems with Time Shifted Pilots

Authors: Bule Sun, Yiqing Zhou, Jinhong Yuan, Jinglin Shi

Abstract: In massive multiple-input multiple-output (MIMO) systems with time shifted pilot (TSP) schemes, the inter-group interference caused by the pilot contamination can be eliminated when the number of base station (BS) antennas M approaches infinity. However, M is finite in practice and the effectiveness of the TSP is limited by channel estimation errors. In this paper, it is analytically shown that th… ▽ More In massive multiple-input multiple-output (MIMO) systems with time shifted pilot (TSP) schemes, the inter-group interference caused by the pilot contamination can be eliminated when the number of base station (BS) antennas M approaches infinity. However, M is finite in practice and the effectiveness of the TSP is limited by channel estimation errors. In this paper, it is analytically shown that the mean square channel estimation error (MSCEE) of the TSP is dominated by the intergroup data interference. To reduce the MSCEE in the finite antenna massive MIMO systems, an interference cancellation based channel estimation for the TSP (IC-TSP) is proposed, where the dominant inter-group data interference is canceled based on BS cooperation. To show the advantage of the IC-TSP, the additional overhead of IC-TSP is evaluated by considering different M and the coherence time of BS-BS channels. Furthermore, the impact of sectorization and compressed sensing based BS-BS channel estimation are also discussed. We show that when 128 < M < 2048, with the inter-group data interference from the nearest two cell layers being canceled, the IC-TSP achieves a spectral efficiency gain of more than 1.2 bps/Hz over the TSP. △ Less

Submitted 26 June, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

Comments: 18 pages, 10 figures, accepted and to appear in IEEE Transactions on Wireless Communications

arXiv:1912.11000 [pdf]

doi 10.1002/mp.14429

Fully Automated Multi-Organ Segmentation in Abdominal Magnetic Resonance Imaging with Deep Neural Networks

Authors: Yuhua Chen, Dan Ruan, Jiayu Xiao, Lixia Wang, Bin Sun, Rola Saouaf, Wensha Yang, Debiao Li, Zhaoyang Fan

Abstract: Segmentation of multiple organs-at-risk (OARs) is essential for radiation therapy treatment planning and other clinical applications. We developed an Automated deep Learning-based Abdominal Multi-Organ segmentation (ALAMO) framework based on 2D U-net and a densely connected network structure with tailored design in data augmentation and training procedures such as deep connection, auxiliary superv… ▽ More Segmentation of multiple organs-at-risk (OARs) is essential for radiation therapy treatment planning and other clinical applications. We developed an Automated deep Learning-based Abdominal Multi-Organ segmentation (ALAMO) framework based on 2D U-net and a densely connected network structure with tailored design in data augmentation and training procedures such as deep connection, auxiliary supervision, and multi-view. The model takes in multi-slice MR images and generates the output of segmentation results. Three-Tesla T1 VIBE (Volumetric Interpolated Breath-hold Examination) images of 102 subjects were collected and used in our study. Ten OARs were studied, including the liver, spleen, pancreas, left/right kidneys, stomach, duodenum, small intestine, spinal cord, and vertebral bodies. Two radiologists manually labeled and obtained the consensus contours as the ground-truth. In the complete cohort of 102, 20 samples were held out for independent testing, and the rest were used for training and validation. The performance was measured using volume overlapping and surface distance. The ALAMO framework generated segmentation labels in good agreement with the manual results. Specifically, among the 10 OARs, 9 achieved high Dice Similarity Coefficients (DSCs) in the range of 0.87-0.96, except for the duodenum with a DSC of 0.80. The inference completes within one minute for a 3D volume of 320x288x180. Overall, the ALAMO model matches the state-of-the-art performance. The proposed ALAMO framework allows for fully automated abdominal MR segmentation with high accuracy and low memory and computation time demands. △ Less

Submitted 23 December, 2019; originally announced December 2019.

Comments: 21 pages, 4 figures, submitted to the journal Medical Physics

arXiv:1812.00348 [pdf, other]

doi 10.1016/j.optlaseng.2019.06.007

Increase the frame rate of a camera via temporal ghost imaging

Authors: Wenjie Jiang, Xianye Li, Shan Jiang, Yupeng Wang, Zexin Zhang, Guanbai He, Baoqing Sun

Abstract: Computational temporal ghost imaging (CTGI) allows the reconstruction of a fast signal from a two dimensional detection with no temporal resolution. High speed spatial modulation is implemented to encode temporal detail of the signal into the two dimensional detection. By calculating the correlation between the modulation and the rendered image, the temporal information can be retrieved. CTGI indi… ▽ More Computational temporal ghost imaging (CTGI) allows the reconstruction of a fast signal from a two dimensional detection with no temporal resolution. High speed spatial modulation is implemented to encode temporal detail of the signal into the two dimensional detection. By calculating the correlation between the modulation and the rendered image, the temporal information can be retrieved. CTGI indicates a way to detect high speed non-reproducible signal from a slow detector. Based on CTGI, we propose an innovative scheme that can increase the frame rate of a camera by resolving the temporal detail of every camera image. To achieve this, CTGI is conducted parallelly to different areas of the scene. High speed spatial multiplexed modulation is performed, constraining the continuous scene into a series of short-time-scale frames. All the modulated frames are accumulated into one image that is eventually used in the correlation retrieval process. By performing CTGI reconstruction on each area independently, the temporal detail of the whole scene can be obtained. This method can have a strong application in ultrafast imaging. △ Less

Submitted 2 December, 2018; originally announced December 2018.

arXiv:1811.12179 [pdf, ps, other]

MRAM Co-designed Processing-in-Memory CNN Accelerator for Mobile and IoT Applications

Authors: Baohua Sun, Daniel Liu, Leo Yu, Jay Li, Helen Liu, Wenhan Zhang, Terry Torng

Abstract: We designed a device for Convolution Neural Network applications with non-volatile MRAM memory and computing-in-memory co-designed architecture. It has been successfully fabricated using 22nm technology node CMOS Si process. More than 40MB MRAM density with 9.9TOPS/W are provided. It enables multiple models within one single chip for mobile and IoT device applications. We designed a device for Convolution Neural Network applications with non-volatile MRAM memory and computing-in-memory co-designed architecture. It has been successfully fabricated using 22nm technology node CMOS Si process. More than 40MB MRAM density with 9.9TOPS/W are provided. It enables multiple models within one single chip for mobile and IoT device applications. △ Less

Submitted 26 November, 2018; originally announced November 2018.

Comments: 4 pages, 4 figures, 1 table. Accepted by NIPS 2018 MLPCD workshop

Showing 1–27 of 27 results for author: Sun, B