Search | arXiv e-print repository

Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering

Authors: Changxin Liu, Yanghao Li, Yuhao Yi, Karl H. Johansson

Abstract: Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. While distributed learning enhances privacy preservation and training efficiency, it faces critical challenges related to Byzantine robustness and communication reduction. Existing Byzantine-robust and communication-efficient methods rely on full gradient information ei… ▽ More Distributed learning has become the standard approach for training large-scale machine learning models across private data silos. While distributed learning enhances privacy preservation and training efficiency, it faces critical challenges related to Byzantine robustness and communication reduction. Existing Byzantine-robust and communication-efficient methods rely on full gradient information either at every iteration or at certain iterations with a probability, and they only converge to an unnecessarily large neighborhood around the solution. Motivated by these issues, we propose a novel Byzantine-robust and communication-efficient stochastic distributed learning method that imposes no requirements on batch size and converges to a smaller neighborhood around the optimal solution than all existing methods, aligning with the theoretical lower bound. Our key innovation is leveraging Polyak Momentum to mitigate the noise caused by both biased compressors and stochastic gradients, thus defending against Byzantine workers under information compression. We provide proof of tight complexity bounds for our algorithm in the context of non-convex smooth loss functions, demonstrating that these bounds match the lower bounds in Byzantine-free scenarios. Finally, we validate the practical significance of our algorithm through an extensive series of experiments, benchmarking its performance on both binary classification and image classification tasks. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 12 pages, 2 figures

arXiv:2408.13605 [pdf, other]

Mobile Edge Computing Networks: Online Low-Latency and Fresh Service Provisioning

Authors: Yuhan Yi, Guanglin Zhang, Hai Jiang

Abstract: Edge service caching can significantly mitigate latency and reduce communication and computing overhead by fetching and initializing services (applications) from clouds. The freshness of cached service data is critical when providing satisfactory services to users, but has been overlooked in existing research efforts. In this paper, we study the online low-latency and fresh service provisioning in… ▽ More Edge service caching can significantly mitigate latency and reduce communication and computing overhead by fetching and initializing services (applications) from clouds. The freshness of cached service data is critical when providing satisfactory services to users, but has been overlooked in existing research efforts. In this paper, we study the online low-latency and fresh service provisioning in mobile edge computing (MEC) networks. Specifically, we jointly optimize the service caching, task offloading, and resource allocation without knowledge of future system information, which is formulated as a joint online long-term optimization problem. This problem is NP-hard. To solve the problem, we design a Lyapunov-based online framework that decouples the problem at temporal level into a series of per-time-slot subproblems. For each subproblem, we propose an online integrated optimization-deep reinforcement learning (OIODRL) method, which contains an optimization stage including a quadratically constrained quadratic program (QCQP) transformation and a semidefinite relaxation (SDR) method, and a learning stage including a deep reinforcement learning (DRL) algorithm. Extensive simulations show that the proposed OIODRL method achieves a near-optimal solution and outperforms other benchmark methods. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.04963 [pdf, other]

LiD-FL: Towards List-Decodable Federated Learning

Authors: Hong Liu, Liren Shan, Han Bao, Ronghui You, Yuhao Yi, Jiancheng Lv

Abstract: Federated learning is often used in environments with many unverified participants. Therefore, federated learning under adversarial attacks receives significant attention. This paper proposes an algorithmic framework for list-decodable federated learning, where a central server maintains a list of models, with at least one guaranteed to perform well. The framework has no strict restriction on the… ▽ More Federated learning is often used in environments with many unverified participants. Therefore, federated learning under adversarial attacks receives significant attention. This paper proposes an algorithmic framework for list-decodable federated learning, where a central server maintains a list of models, with at least one guaranteed to perform well. The framework has no strict restriction on the fraction of honest workers, extending the applicability of Byzantine federated learning to the scenario with more than half adversaries. Under proper assumptions on the loss function, we prove a convergence theorem for our method. Experimental results, including image classification tasks with both convex and non-convex losses, demonstrate that the proposed algorithm can withstand the malicious majority under various attacks. △ Less

Submitted 15 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

Comments: 26 pages, 5 figures

arXiv:2408.01167 [pdf, other]

Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification

Authors: Bryan Wong, Mun Yong Yi

Abstract: Multiple instance learning (MIL) has become a preferred method for classifying gigapixel whole slide images (WSIs), without requiring patch label annotation. The focus of the current MIL research stream is on the embedding-based MIL approach, which involves extracting feature vectors from patches using a pre-trained feature extractor. These feature vectors are then fed into an MIL aggregator for s… ▽ More Multiple instance learning (MIL) has become a preferred method for classifying gigapixel whole slide images (WSIs), without requiring patch label annotation. The focus of the current MIL research stream is on the embedding-based MIL approach, which involves extracting feature vectors from patches using a pre-trained feature extractor. These feature vectors are then fed into an MIL aggregator for slide-level prediction. Despite prior research suggestions on enhancing the most commonly used ResNet50 supervised model pre-trained on ImageNet-1K, there remains a lack of clear guidance on selecting the optimal feature extractor to maximize WSI performance. This study aims at addressing this gap by examining MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method. Extensive experiments were carried out on the two public WSI datasets (TCGA-NSCLC and Camelyon16) using four SOTA MIL models. The main findings indicate the following: 1) Performance significantly improves with larger and more varied pre-training datasets in both CNN and Transformer backbones. 2) `Modern and deeper' backbones greatly outperform `standard' backbones (ResNet and ViT), with performance improvements more guaranteed in Transformer-based backbones. 3) The choice of self-supervised learning (SSL) method is crucial, with the most significant benefits observed when applied to the Transformer (ViT) backbone. The study findings have practical implications, including designing more effective pathological foundation models. Our code is available at: https://anonymous.4open.science/r/MIL-Feature-Extractor-Selection △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 12 pages

arXiv:2408.01162 [pdf, other]

PreMix: Boosting Multiple Instance Learning in Digital Histopathology through Pre-training with Intra-Batch Slide Mixing

Authors: Bryan Wong, Mun Yong Yi

Abstract: The classification of gigapixel-sized whole slide images (WSIs), digital representations of histological slides obtained via a high-resolution scanner, faces significant challenges associated with the meticulous and time-consuming nature of fine-grained labeling. While weakly-supervised multiple instance learning (MIL) has emerged as a promising approach, current MIL methods are constrained by the… ▽ More The classification of gigapixel-sized whole slide images (WSIs), digital representations of histological slides obtained via a high-resolution scanner, faces significant challenges associated with the meticulous and time-consuming nature of fine-grained labeling. While weakly-supervised multiple instance learning (MIL) has emerged as a promising approach, current MIL methods are constrained by their limited ability to leverage the wealth of information embedded within unlabeled WSIs. This limitation often necessitates training MIL feature aggregators from scratch after the feature extraction process, hindering efficiency and accuracy. PreMix extends the general MIL framework by pre-training the MIL aggregator with an intra-batch slide mixing approach. Specifically, PreMix incorporates Barlow Twins Slide Mixing during pre-training, enhancing its ability to handle diverse WSI sizes and maximizing the utility of unlabeled WSIs. Combined with Mixup and Manifold Mixup during fine-tuning, PreMix achieves a mean of 4.7% performance improvement over the baseline MIL framework, the hierarchical image pyramid transformer (HIPT), on the Camelyon16 dataset. The observed improvement across a range of active learning acquisition functions and WSI-labeled training budgets highlights the framework's adaptability to diverse datasets and varying resource constraints. Ultimately, PreMix paves the way for more efficient and accurate WSI classification under limited WSI-labeled datasets, encouraging the broader adoption of unlabeled WSI data in histopathological research. The code is available at https://anonymous.4open.science/r/PreMix △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: 15 pages

arXiv:2406.02430 [pdf, other]

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and subjective evaluations. With fine-tuning, we achieve even higher subjective scores across these metrics. Seed-TTS offers superior controllability over various speech attributes such as emotion and is capable of generating highly expressive and diverse speech for speakers in the wild. Furthermore, we propose a self-distillation method for speech factorization, as well as a reinforcement learning approach to enhance model robustness, speaker similarity, and controllability. We additionally present a non-autoregressive (NAR) variant of the Seed-TTS model, named $\text{Seed-TTS}_\text{DiT}$, which utilizes a fully diffusion-based architecture. Unlike previous NAR-based TTS systems, $\text{Seed-TTS}_\text{DiT}$ does not depend on pre-estimated phoneme durations and performs speech generation through end-to-end processing. We demonstrate that this variant achieves comparable performance to the language model-based variant and showcase its effectiveness in speech editing. We encourage readers to listen to demos at \url{https://bytedancespeech.github.io/seedtts_tech_report}. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.19769 [pdf, other]

All-In-One Medical Image Restoration via Task-Adaptive Routing

Authors: Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Yi, Hui Zhang, Dan Zhao, Bingzheng Wei, Yan Xu

Abstract: Although single-task medical image restoration (MedIR) has witnessed remarkable success, the limited generalizability of these methods poses a substantial obstacle to wider application. In this paper, we focus on the task of all-in-one medical image restoration, aiming to address multiple distinct MedIR tasks with a single universal model. Nonetheless, due to significant differences between differ… ▽ More Although single-task medical image restoration (MedIR) has witnessed remarkable success, the limited generalizability of these methods poses a substantial obstacle to wider application. In this paper, we focus on the task of all-in-one medical image restoration, aiming to address multiple distinct MedIR tasks with a single universal model. Nonetheless, due to significant differences between different MedIR tasks, training a universal model often encounters task interference issues, where different tasks with shared parameters may conflict with each other in the gradient update direction. This task interference leads to deviation of the model update direction from the optimal path, thereby affecting the model's performance. To tackle this issue, we propose a task-adaptive routing strategy, allowing conflicting tasks to select different network paths in spatial and channel dimensions, thereby mitigating task interference. Experimental results demonstrate that our proposed \textbf{A}ll-in-one \textbf{M}edical \textbf{I}mage \textbf{R}estoration (\textbf{AMIR}) network achieves state-of-the-art performance in three MedIR tasks: MRI super-resolution, CT denoising, and PET synthesis, both in single-task and all-in-one settings. The code and data will be available at \href{https://github.com/Yaziwel/All-In-One-Medical-Image-Restoration-via-Task-Adaptive-Routing.git}{https://github.com/Yaziwel/AMIR}. △ Less

Submitted 28 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: This article has been early accepted by MICCAI 2024

arXiv:2405.18251 [pdf, other]

Sensor-Based Distributionally Robust Control for Safe Robot Navigation in Dynamic Environments

Authors: Kehan Long, Yinzhuang Yi, Zhirui Dai, Sylvia Herbert, Jorge Cortés, Nikolay Atanasov

Abstract: We introduce a novel method for safe mobile robot navigation in dynamic, unknown environments, utilizing onboard sensing to impose safety constraints without the need for accurate map reconstruction. Traditional methods typically rely on detailed map information to synthesize safe stabilizing controls for mobile robots, which can be computationally demanding and less effective, particularly in dyn… ▽ More We introduce a novel method for safe mobile robot navigation in dynamic, unknown environments, utilizing onboard sensing to impose safety constraints without the need for accurate map reconstruction. Traditional methods typically rely on detailed map information to synthesize safe stabilizing controls for mobile robots, which can be computationally demanding and less effective, particularly in dynamic operational conditions. By leveraging recent advances in distributionally robust optimization, we develop a distributionally robust control barrier function (DR-CBF) constraint that directly processes range sensor data to impose safety constraints. Coupling this with a control Lyapunov function (CLF) for path tracking, we demonstrate that our CLF-DR-CBF control synthesis method achieves safe, efficient, and robust navigation in uncertain dynamic environments. We demonstrate the effectiveness of our approach in simulated and real autonomous robot navigation experiments, marking a substantial advancement in real-time safety guarantees for mobile robots. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Submitted to the International Journal of Robotics Research (IJRR). Project page: https://existentialrobotics.org/DR_Safe_Navigation_Webpage

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2404.00921 [pdf, other]

Towards Label-Efficient Human Matting: A Simple Baseline for Weakly Semi-Supervised Trimap-Free Human Matting

Authors: Beomyoung Kim, Myeong Yeon Yi, Joonsang Yu, Young Joon Yoo, Sung Ju Hwang

Abstract: This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge f… ▽ More This paper presents a new practical training method for human matting, which demands delicate pixel-level human region identification and significantly laborious annotations. To reduce the annotation cost, most existing matting approaches often rely on image synthesis to augment the dataset. However, the unnaturalness of synthesized training images brings in a new domain generalization challenge for natural images. To address this challenge, we introduce a new learning paradigm, weakly semi-supervised human matting (WSSHM), which leverages a small amount of expensive matte labels and a large amount of budget-friendly segmentation labels, to save the annotation cost and resolve the domain generalization problem. To achieve the goal of WSSHM, we propose a simple and effective training method, named Matte Label Blending (MLB), that selectively guides only the beneficial knowledge of the segmentation and matte data to the matting model. Extensive experiments with our detailed analysis demonstrate our method can substantially improve the robustness of the matting model using a few matte data and numerous segmentation data. Our training method is also easily applicable to real-time models, achieving competitive accuracy with breakneck inference speed (328 FPS on NVIDIA V100 GPU). The implementation code is available at \url{https://github.com/clovaai/WSSHM}. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: Preprint, 15 pages, 13 figures

arXiv:2403.07868 [pdf, other]

Online Digital Twin-Empowered Content Resale Mechanism in Age of Information-Aware Edge Caching Networks

Authors: Yuhan Yi, Guanglin Zhang, Hai Jiang

Abstract: For users requesting popular contents from content providers, edge caching can alleviate backhaul pressure and enhance the quality of experience of users. Recently there is also a growing concern about content freshness that is quantified by age of information (AoI). Therefore, AoI-aware online caching algorithms are required, which is challenging because the content popularity is usually unknown… ▽ More For users requesting popular contents from content providers, edge caching can alleviate backhaul pressure and enhance the quality of experience of users. Recently there is also a growing concern about content freshness that is quantified by age of information (AoI). Therefore, AoI-aware online caching algorithms are required, which is challenging because the content popularity is usually unknown in advance and may vary over time. In this paper, we propose an online digital twin (DT) empowered content resale mechanism in AoI-aware edge caching networks. We aim to design an optimal two-timescale caching strategy to maximize the utility of an edge network service provider (ENSP). The formulated optimization problem is non-convex and NP-hard. To tackle this intractable problem, we propose a DT-assisted Online Caching Algorithm (DT-OCA). In specific, we first decompose our formulated problem into a series of subproblems, each handling a cache period. For each cache period, we use a DT-based prediction method to effectively capture future content popularity, and develop online caching strategy. Competitive ratio analysis and extensive experimental results demonstrate that our algorithm has promising performance, and outperforms other benchmark algorithms. Insightful observations are also found and discussed. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05280 [pdf, other]

ContrastDiagnosis: Enhancing Interpretability in Lung Nodule Diagnosis Using Contrastive Learning

Authors: Chenglong Wang, Yinqiao Yi, Yida Wang, Chengxiu Zhang, Yun Liu, Kensaku Mori, Mei Yuan, Guang Yang

Abstract: With the ongoing development of deep learning, an increasing number of AI models have surpassed the performance levels of human clinical practitioners. However, the prevalence of AI diagnostic products in actual clinical practice remains significantly lower than desired. One crucial reason for this gap is the so-called `black box' nature of AI models. Clinicians' distrust of black box models has d… ▽ More With the ongoing development of deep learning, an increasing number of AI models have surpassed the performance levels of human clinical practitioners. However, the prevalence of AI diagnostic products in actual clinical practice remains significantly lower than desired. One crucial reason for this gap is the so-called `black box' nature of AI models. Clinicians' distrust of black box models has directly hindered the clinical deployment of AI products. To address this challenge, we propose ContrastDiagnosis, a straightforward yet effective interpretable diagnosis framework. This framework is designed to introduce inherent transparency and provide extensive post-hoc explainability for deep learning model, making them more suitable for clinical medical diagnosis. ContrastDiagnosis incorporates a contrastive learning mechanism to provide a case-based reasoning diagnostic rationale, enhancing the model's transparency and also offers post-hoc interpretability by highlighting similar areas. High diagnostic accuracy was achieved with AUC of 0.977 while maintain a high transparency and explainability. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2312.12835 [pdf, ps, other]

doi 10.1609/aaai.v38i15.29584

Near-Optimal Resilient Aggregation Rules for Distributed Learning Using 1-Center and 1-Mean Clustering with Outliers

Authors: Yuhao Yi, Ronghui You, Hong Liu, Changxin Liu, Yuan Wang, Jiancheng Lv

Abstract: Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur in large-scale distributed learning systems. The key to secure resilience against Byzantine machines in distributed learning is resilient aggregation mechanisms. Although abundant resilient aggregation rules have been proposed, they are designed in ad-hoc manners, imposing extra barri… ▽ More Byzantine machine learning has garnered considerable attention in light of the unpredictable faults that can occur in large-scale distributed learning systems. The key to secure resilience against Byzantine machines in distributed learning is resilient aggregation mechanisms. Although abundant resilient aggregation rules have been proposed, they are designed in ad-hoc manners, imposing extra barriers on comparing, analyzing, and improving the rules across performance criteria. This paper studies near-optimal aggregation rules using clustering in the presence of outliers. Our outlier-robust clustering approach utilizes geometric properties of the update vectors provided by workers. Our analysis show that constant approximations to the 1-center and 1-mean clustering problems with outliers provide near-optimal resilient aggregators for metric-based criteria, which have been proven to be crucial in the homogeneous and heterogeneous cases respectively. In addition, we discuss two contradicting types of attacks under which no single aggregation rule is guaranteed to improve upon the naive average. Based on the discussion, we propose a two-phase resilient aggregation framework. We run experiments for image classification using a non-convex loss function. The proposed algorithms outperform previously known aggregation rules by a large margin with both homogeneous and heterogeneous data distributions among non-faulty workers. Code and appendix are available at https://github.com/jerry907/AAAI24-RASHB. △ Less

Submitted 31 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 17 pages, 4 figures. Accepted by the 38th Annual AAAI Conference on Artificial Intelligence (AAAI'24)

Journal ref: AAAI 2024, 38, 16469-16477

arXiv:2311.02400 [pdf, other]

From Plate to Production: Artificial Intelligence in Modern Consumer-Driven Food Systems

Authors: Weiqing Min, Pengfei Zhou, Leyi Xu, Tao Liu, Tianhao Li, Mingyu Huang, Ying Jin, Yifan Yi, Min Wen, Shuqiang Jiang, Ramesh Jain

Abstract: Global food systems confront the urgent challenge of supplying sustainable, nutritious diets in the face of escalating demands. The advent of Artificial Intelligence (AI) is bringing in a personal choice revolution, wherein AI-driven individual decisions transform food systems from dinner tables, to the farms, and back to our plates. In this context, AI algorithms refine personal dietary choices,… ▽ More Global food systems confront the urgent challenge of supplying sustainable, nutritious diets in the face of escalating demands. The advent of Artificial Intelligence (AI) is bringing in a personal choice revolution, wherein AI-driven individual decisions transform food systems from dinner tables, to the farms, and back to our plates. In this context, AI algorithms refine personal dietary choices, subsequently shaping agricultural outputs, and promoting an optimized feedback loop from consumption to cultivation. Initially, we delve into AI tools and techniques spanning the food supply chain, and subsequently assess how AI subfields$\unicode{x2013}$encompassing machine learning, computer vision, and speech recognition$\unicode{x2013}$are harnessed within the AI-enabled Food System (AIFS) framework, which increasingly leverages Internet of Things, multimodal sensors and real-time data exchange. We spotlight the AIFS framework, emphasizing its fusion of AI with technologies such as digitalization, big data analytics, biotechnology, and IoT extensively used in modern food systems in every component. This paradigm shifts the conventional "farm to fork" narrative to a cyclical "consumer-driven farm to fork" model for better achieving sustainable, nutritious diets. This paper explores AI's promise and the intrinsic challenges it poses within the food domain. By championing stringent AI governance, uniform data architectures, and cross-disciplinary partnerships, we argue that AI, when synergized with consumer-centric strategies, holds the potential to steer food systems toward a sustainable trajectory. We furnish a comprehensive survey for the state-of-the-art in diverse facets of food systems, subsequently pinpointing gaps and advocating for the judicious and efficacious deployment of emergent AI methodologies. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2310.08061 [pdf, other]

ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking

Authors: Yiqiang Yi, Xu Wan, Yatao Bian, Le Ou-Yang, Peilin Zhao

Abstract: Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we… ▽ More Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.02565 [pdf, other]

Improving Drumming Robot Via Attention Transformer Network

Authors: Yang Yi, Zonghan Li

Abstract: Robotic technology has been widely used in nowadays society, which has made great progress in various fields such as agriculture, manufacturing and entertainment. In this paper, we focus on the topic of drumming robots in entertainment. To this end, we introduce an improving drumming robot that can automatically complete music transcription based on the popular vision transformer network based on… ▽ More Robotic technology has been widely used in nowadays society, which has made great progress in various fields such as agriculture, manufacturing and entertainment. In this paper, we focus on the topic of drumming robots in entertainment. To this end, we introduce an improving drumming robot that can automatically complete music transcription based on the popular vision transformer network based on the attention mechanism. Equipped with the attention transformer network, our method can efficiently handle the sequential audio embedding input and model their global long-range dependencies. Massive experimental results demonstrate that the improving algorithm can help the drumming robot promote drum classification performance, which can also help the robot to enjoy a variety of smart applications and services. △ Less

Submitted 3 October, 2023; originally announced October 2023.

arXiv:2310.01363 [pdf, other]

EAST: Environment Aware Safe Tracking using Planning and Control Co-Design

Authors: Zhichao Li, Yinzhuang Yi, Zhuolin Niu, Nikolay Atanasov

Abstract: This paper considers the problem of autonomous robot navigation in unknown environments with moving obstacles. We propose a new method that systematically puts planning, motion prediction and safety metric design together to achieve environmental adaptive and safe navigation. This algorithm balances optimality in travel distance and safety with respect to passing clearance. Robot adapts progress s… ▽ More This paper considers the problem of autonomous robot navigation in unknown environments with moving obstacles. We propose a new method that systematically puts planning, motion prediction and safety metric design together to achieve environmental adaptive and safe navigation. This algorithm balances optimality in travel distance and safety with respect to passing clearance. Robot adapts progress speed adaptively according to the sensed environment, being fast in wide open areas and slow down in narrow passages and taking necessary maneuvers to avoid dangerous incoming obstacles. In our method, directional distance measure, conic-shape motion prediction and custom costmap are integrated properly to evaluate system risk accurately with respect to local geometry of surrounding environments. Using such risk estimation, reference governor technique and control barrier function are worked together to enable adaptive and safe path tracking in dynamical environments. We validate our algorithm extensively both in simulation and challenging real-world environments. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.01072 [pdf, other]

Channel Attention Separable Convolution Network for Skin Lesion Segmentation

Authors: Changlu Guo, Jiangyan Dai, Marton Szemenyei, Yugen Yi

Abstract: Skin cancer is a frequently occurring cancer in the human population, and it is very important to be able to diagnose malignant tumors in the body early. Lesion segmentation is crucial for monitoring the morphological changes of skin lesions, extracting features to localize and identify diseases to assist doctors in early diagnosis. Manual de-segmentation of dermoscopic images is error-prone and t… ▽ More Skin cancer is a frequently occurring cancer in the human population, and it is very important to be able to diagnose malignant tumors in the body early. Lesion segmentation is crucial for monitoring the morphological changes of skin lesions, extracting features to localize and identify diseases to assist doctors in early diagnosis. Manual de-segmentation of dermoscopic images is error-prone and time-consuming, thus there is a pressing demand for precise and automated segmentation algorithms. Inspired by advanced mechanisms such as U-Net, DenseNet, Separable Convolution, Channel Attention, and Atrous Spatial Pyramid Pooling (ASPP), we propose a novel network called Channel Attention Separable Convolution Network (CASCN) for skin lesions segmentation. The proposed CASCN is evaluated on the PH2 dataset with limited images. Without excessive pre-/post-processing of images, CASCN achieves state-of-the-art performance on the PH2 dataset with Dice similarity coefficient of 0.9461 and accuracy of 0.9645. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: Accepted by ICONIP 2023

arXiv:2307.14575 [pdf, other]

A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos

Authors: Rongqin Liang, Yuanman Li, Yingxin Yi, Jiantao Zhou, Xia Li

Abstract: Identifying traffic accidents in driving videos is crucial to ensuring the safety of autonomous driving and driver assistance systems. To address the potential danger caused by the long-tailed distribution of driving events, existing traffic accident detection (TAD) methods mainly rely on unsupervised learning. However, TAD is still challenging due to the rapid movement of cameras and dynamic scen… ▽ More Identifying traffic accidents in driving videos is crucial to ensuring the safety of autonomous driving and driver assistance systems. To address the potential danger caused by the long-tailed distribution of driving events, existing traffic accident detection (TAD) methods mainly rely on unsupervised learning. However, TAD is still challenging due to the rapid movement of cameras and dynamic scenes in driving scenarios. Existing unsupervised TAD methods mainly rely on a single pretext task, i.e., an appearance-based or future object localization task, to detect accidents. However, appearance-based approaches are easily disturbed by the rapid movement of the camera and changes in illumination, which significantly reduce the performance of traffic accident detection. Methods based on future object localization may fail to capture appearance changes in video frames, making it difficult to detect ego-involved accidents (e.g., out of control of the ego-vehicle). In this paper, we propose a novel memory-augmented multi-task collaborative framework (MAMTCF) for unsupervised traffic accident detection in driving videos. Different from previous approaches, our method can more accurately detect both ego-involved and non-ego accidents by simultaneously modeling appearance changes and object motions in video frames through the collaboration of optical flow reconstruction and future object localization tasks. Further, we introduce a memory-augmented motion representation mechanism to fully explore the interrelation between different types of motion representations and exploit the high-level features of normal traffic patterns stored in memory to augment motion representations, thus enlarging the difference from anomalies. Experimental results on recently published large-scale dataset demonstrate that our method achieves better performance compared to previous state-of-the-art approaches. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: 12pages,5 figures

arXiv:2306.10792 [pdf, other]

NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning

Authors: Yun Yi, Haokui Zhang, Rong Xiao, Nannan Wang, Xiaoyu Wang

Abstract: As more deep learning models are being applied in real-world applications, there is a growing need for modeling and learning the representations of neural networks themselves. An efficient representation can be used to predict target attributes of networks without the need for actual training and deployment procedures, facilitating efficient network deployment and design. Recently, inspired by the… ▽ More As more deep learning models are being applied in real-world applications, there is a growing need for modeling and learning the representations of neural networks themselves. An efficient representation can be used to predict target attributes of networks without the need for actual training and deployment procedures, facilitating efficient network deployment and design. Recently, inspired by the success of Transformer, some Transformer-based representation learning frameworks have been proposed and achieved promising performance in handling cell-structured models. However, graph neural network (GNN) based approaches still dominate the field of learning representation for the entire network. In this paper, we revisit Transformer and compare it with GNN to analyse their different architecture characteristics. We then propose a modified Transformer-based universal neural network representation learning model NAR-Former V2. It can learn efficient representations from both cell-structured networks and entire networks. Specifically, we first take the network as a graph and design a straightforward tokenizer to encode the network into a sequence. Then, we incorporate the inductive representation learning capability of GNN into Transformer, enabling Transformer to generalize better when encountering unseen architecture. Additionally, we introduce a series of simple yet effective modifications to enhance the ability of the Transformer in learning representation from graph structures. Our proposed method surpasses the GNN-based method NNLP by a significant margin in latency estimation on the NNLQP dataset. Furthermore, regarding accuracy prediction on the NASBench101 and NASBench201 datasets, our method achieves highly comparable performance to other state-of-the-art methods. △ Less

Submitted 16 October, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

Comments: 9 pages, 2 figures, 6 tables. Code is available at https://github.com/yuny220/NAR-Former-V2

arXiv:2304.07296 [pdf]

MLOps Spanning Whole Machine Learning Life Cycle: A Survey

Authors: Fang Zhengxin, Yuan Yi, Zhang Jingyu, Liu Yue, Mu Yuechen, Lu Qinghua, Xu Xiwei, Wang Jeff, Wang Chen, Zhang Shuai, Chen Shiping

Abstract: Google AlphaGos win has significantly motivated and sped up machine learning (ML) research and development, which led to tremendous ML technical advances and wider adoptions in various domains (e.g., Finance, Health, Defense, and Education). These advances have resulted in numerous new concepts and technologies, which are too many for people to catch up to and even make them confused, especially f… ▽ More Google AlphaGos win has significantly motivated and sped up machine learning (ML) research and development, which led to tremendous ML technical advances and wider adoptions in various domains (e.g., Finance, Health, Defense, and Education). These advances have resulted in numerous new concepts and technologies, which are too many for people to catch up to and even make them confused, especially for newcomers to the ML area. This paper is aimed to present a clear picture of the state-of-the-art of the existing ML technologies with a comprehensive survey. We lay out this survey by viewing ML as a MLOps (ML Operations) process, where the key concepts and activities are collected and elaborated with representative works and surveys. We hope that this paper can serve as a quick reference manual (a survey of surveys) for newcomers (e.g., researchers, practitioners) of ML to get an overview of the MLOps process, as well as a good understanding of the key technologies used in each step of the ML process, and know where to find more details. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2303.12802 [pdf, other]

Distributed Learning Meets 6G: A Communication and Computing Perspective

Authors: Shashank Jere, Yifei Song, Yang Yi, Lingjia Liu

Abstract: With the ever-improving computing capabilities and storage capacities of mobile devices in line with evolving telecommunication network paradigms, there has been an explosion of research interest towards exploring Distributed Learning (DL) frameworks to realize stringent key performance indicators (KPIs) that are expected in next-generation/6G cellular networks. In conjunction with Edge Computing,… ▽ More With the ever-improving computing capabilities and storage capacities of mobile devices in line with evolving telecommunication network paradigms, there has been an explosion of research interest towards exploring Distributed Learning (DL) frameworks to realize stringent key performance indicators (KPIs) that are expected in next-generation/6G cellular networks. In conjunction with Edge Computing, Federated Learning (FL) has emerged as the DL architecture of choice in prominent wireless applications. This article lays an outline of how DL in general and FL-based strategies specifically can contribute towards realizing a part of the 6G vision and strike a balance between communication and computing constraints. As a practical use case, we apply Multi-Agent Reinforcement Learning (MARL) within the FL framework to the Dynamic Spectrum Access (DSA) problem and present preliminary evaluation results. Top contemporary challenges in applying DL approaches to 6G networks are also highlighted. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: This article has been accepted to IEEE Wireless Communications Magazine (WCM) under the Special Issue "AI-Powered Telco Network Automation: 5G Evolution and 6G"

arXiv:2212.01554 [pdf, other]

Distributionally Robust Lyapunov Function Search Under Uncertainty

Authors: Kehan Long, Yinzhuang Yi, Jorge Cortes, Nikolay Atanasov

Abstract: This paper develops methods for proving Lyapunov stability of dynamical systems subject to disturbances with an unknown distribution. We assume only a finite set of disturbance samples is available and that the true online disturbance realization may be drawn from a different distribution than the given samples. We formulate an optimization problem to search for a sum-of-squares (SOS) Lyapunov fun… ▽ More This paper develops methods for proving Lyapunov stability of dynamical systems subject to disturbances with an unknown distribution. We assume only a finite set of disturbance samples is available and that the true online disturbance realization may be drawn from a different distribution than the given samples. We formulate an optimization problem to search for a sum-of-squares (SOS) Lyapunov function and introduce a distributionally robust version of the Lyapunov function derivative constraint. We show that this constraint may be reformulated as several SOS constraints, ensuring that the search for a Lyapunov function remains in the class of SOS polynomial optimization problems. For general systems, we provide a distributionally robust chance-constrained formulation for neural network Lyapunov function search. Simulations demonstrate the validity and efficiency of either formulation on non-linear uncertain dynamical systems. △ Less

Submitted 11 July, 2024; v1 submitted 3 December, 2022; originally announced December 2022.

Comments: 5th Annual Learning for Dynamics & Control Conference Code: https://github.com/KehanLong/DR-Lyapunov-Function

arXiv:2211.08024 [pdf, other]

NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction

Authors: Yun Yi, Haokui Zhang, Wenze Hu, Nannan Wang, Xiaoyu Wang

Abstract: With the wide and deep adoption of deep learning models in real applications, there is an increasing need to model and learn the representations of the neural networks themselves. These models can be used to estimate attributes of different neural network architectures such as the accuracy and latency, without running the actual training or inference tasks. In this paper, we propose a neural archi… ▽ More With the wide and deep adoption of deep learning models in real applications, there is an increasing need to model and learn the representations of the neural networks themselves. These models can be used to estimate attributes of different neural network architectures such as the accuracy and latency, without running the actual training or inference tasks. In this paper, we propose a neural architecture representation model that can be used to estimate these attributes holistically. Specifically, we first propose a simple and effective tokenizer to encode both the operation and topology information of a neural network into a single sequence. Then, we design a multi-stage fusion transformer to build a compact vector representation from the converted sequence. For efficient model training, we further propose an information flow consistency augmentation and correspondingly design an architecture consistency loss, which brings more benefits with less augmentation samples compared with previous random augmentation strategies. Experiment results on NAS-Bench-101, NAS-Bench-201, DARTS search space and NNLQP show that our proposed framework can be used to predict the aforementioned latency and accuracy attributes of both cell architectures and whole deep neural networks, and achieves promising performance. Code is available at https://github.com/yuny220/NAR-Former. △ Less

Submitted 22 March, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 8 pages, 4 figures, 7 tables. Accepted by IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 2023

arXiv:2211.06028 [pdf, ps, other]

Dynamic Curing and Network Design in SIS Epidemic Processes

Authors: Yuhao Yi, Liren Shan, Shijie Wang, Philip E. Paré, Karl H. Johansson

Abstract: This paper studies efficient algorithms for dynamic curing policies and the corresponding network design problems to guarantee the fast extinction of epidemic spread in a susceptible-infected-susceptible (SIS) model. We consider a Markov process-based SIS epidemic model. We provide a computationally efficient curing algorithm based on the curing policy proposed by Drakopoulos, Ozdaglar, and Tsitsi… ▽ More This paper studies efficient algorithms for dynamic curing policies and the corresponding network design problems to guarantee the fast extinction of epidemic spread in a susceptible-infected-susceptible (SIS) model. We consider a Markov process-based SIS epidemic model. We provide a computationally efficient curing algorithm based on the curing policy proposed by Drakopoulos, Ozdaglar, and Tsitsiklis (2014). Since the corresponding optimization problem is NP-hard, finding optimal policies is intractable for large graphs. We provide approximation guarantees on the curing budget of the proposed dynamic curing algorithm. We also present a curing algorithm fair to demographic groups. When the total infection rate is high, the original curing policy includes a waiting period in which no measure is taken to mitigate the spread until the rate slows down. To avoid the waiting period, we study network design problems to reduce the total infection rate by deleting edges or reducing the weight of edges. Then the curing processes become continuous since the total infection rate is restricted by network design. We provide algorithms with provable guarantees for the considered network design problems. In summary, the proposed curing and network design algorithms together provide an effective and computationally efficient approach that mitigates SIS epidemic spread in networks. △ Less

Submitted 14 August, 2024; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 24 pages, 3 figure

arXiv:2210.16468 [pdf, other]

Curiosity-Driven Multi-Agent Exploration with Mixed Objectives

Authors: Roben Delos Reyes, Kyunghwan Son, Jinhwan Jung, Wan Ju Kang, Yung Yi

Abstract: Intrinsic rewards have been increasingly used to mitigate the sparse reward problem in single-agent reinforcement learning. These intrinsic rewards encourage the agent to look for novel experiences, guiding the agent to explore the environment sufficiently despite the lack of extrinsic rewards. Curiosity-driven exploration is a simple yet efficient approach that quantifies this novelty as the pred… ▽ More Intrinsic rewards have been increasingly used to mitigate the sparse reward problem in single-agent reinforcement learning. These intrinsic rewards encourage the agent to look for novel experiences, guiding the agent to explore the environment sufficiently despite the lack of extrinsic rewards. Curiosity-driven exploration is a simple yet efficient approach that quantifies this novelty as the prediction error of the agent's curiosity module, an internal neural network that is trained to predict the agent's next state given its current state and action. We show here, however, that naively using this curiosity-driven approach to guide exploration in sparse reward cooperative multi-agent environments does not consistently lead to improved results. Straightforward multi-agent extensions of curiosity-driven exploration take into consideration either individual or collective novelty only and thus, they do not provide a distinct but collaborative intrinsic reward signal that is essential for learning in cooperative multi-agent tasks. In this work, we propose a curiosity-driven multi-agent exploration method that has the mixed objective of motivating the agents to explore the environment in ways that are individually and collectively novel. First, we develop a two-headed curiosity module that is trained to predict the corresponding agent's next observation in the first head and the next joint observation in the second head. Second, we design the intrinsic reward formula to be the sum of the individual and joint prediction errors of this curiosity module. We empirically show that the combination of our curiosity module architecture and intrinsic reward formulation guides multi-agent exploration more efficiently than baseline approaches, thereby providing the best performance boost to MARL algorithms in cooperative navigation environments with sparse rewards. △ Less

Submitted 28 October, 2022; originally announced October 2022.

arXiv:2210.16098 [pdf, other]

Predicting Protein-Ligand Binding Affinity with Equivariant Line Graph Network

Authors: Yiqiang Yi, Xu Wan, Kangfei Zhao, Le Ou-Yang, Peilin Zhao

Abstract: Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local c… ▽ More Binding affinity prediction of three-dimensional (3D) protein ligand complexes is critical for drug repositioning and virtual drug screening. Existing approaches transform a 3D protein-ligand complex to a two-dimensional (2D) graph, and then use graph neural networks (GNNs) to predict its binding affinity. However, the node and edge features of the 2D graph are extracted based on invariant local coordinate systems of the 3D complex. As a result, the method can not fully learn the global information of the complex, such as, the physical symmetry and the topological information of bonds. To address these issues, we propose a novel Equivariant Line Graph Network (ELGN) for affinity prediction of 3D protein ligand complexes. The proposed ELGN firstly adds a super node to the 3D complex, and then builds a line graph based on the 3D complex. After that, ELGN uses a new E(3)-equivariant network layer to pass the messages between nodes and edges based on the global coordinate system of the 3D complex. Experimental results on two real datasets demonstrate the effectiveness of ELGN over several state-of-the-art baselines. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.13010 [pdf, other]

Active Reconfigurable Intelligent Surface Aided Surveillance Scheme

Authors: Xinyue Hu, Yibo Yi, Kun Li, Hongwei Zhang, Caihong Kai

Abstract: This letter attempts to design a surveillance scheme by adopting an active reconfigurable intelligent surface (RIS). Different from the conventional passive RIS, the active RIS could not only adjust the phase shift but also amplify the amplitude of the reflected signal. With such reflecting, the reflected signal of active RIS could jointly adjust the signal to interference plus noise ratio (SINR)… ▽ More This letter attempts to design a surveillance scheme by adopting an active reconfigurable intelligent surface (RIS). Different from the conventional passive RIS, the active RIS could not only adjust the phase shift but also amplify the amplitude of the reflected signal. With such reflecting, the reflected signal of active RIS could jointly adjust the signal to interference plus noise ratio (SINR) of the suspicious receiver and the legitimate monitor, hence the proactive eavesdropping at the physical layer could be effectively realized. We formulate the optimization problem with the target of maximizing the eavesdropping rate to obtain the optimal reflecting coefficient matrix of the active RIS. The formulated optimization problem is nonconvex fractional programming and challenging to deal with. We then solve the problem by approximating it as a series of convex constraints. Simulation results validate the effectiveness of our designed surveillance scheme and show that the proposed active RIS aided surveillance scheme has good performance in terms of eavesdropping rate compared with the scheme with passive RIS. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2210.02060 [pdf, other]

Graph Classification via Discriminative Edge Feature Learning

Authors: Yang Yi, Xuequan Lu, Shang Gao, Antonio Robles-Kelly, Yuejie Zhang

Abstract: Spectral graph convolutional neural networks (GCNNs) have been producing encouraging results in graph classification tasks. However, most spectral GCNNs utilize fixed graphs when aggregating node features, while omitting edge feature learning and failing to get an optimal graph structure. Moreover, many existing graph datasets do not provide initialized edge features, further restraining the abili… ▽ More Spectral graph convolutional neural networks (GCNNs) have been producing encouraging results in graph classification tasks. However, most spectral GCNNs utilize fixed graphs when aggregating node features, while omitting edge feature learning and failing to get an optimal graph structure. Moreover, many existing graph datasets do not provide initialized edge features, further restraining the ability of learning edge features via spectral GCNNs. In this paper, we try to address this issue by designing an edge feature scheme and an add-on layer between every two stacked graph convolution layers in GCNN. Both are lightweight while effective in filling the gap between edge feature learning and performance enhancement of graph classification. The edge feature scheme makes edge features adapt to node representations at different graph convolution layers. The add-on layers help adjust the edge features to an optimal graph structure. To test the effectiveness of our method, we take Euclidean positions as initial node features and extract graphs with semantic information from point cloud objects. The node features of our extracted graphs are more scalable for edge feature learning than most existing graph datasets (in one-hot encoded label format). Three new graph datasets are constructed based on ModelNet40, ModelNet10 and ShapeNet Part datasets. Experimental results show that our method outperforms state-of-the-art graph classification methods on the new datasets by reaching 96.56% overall accuracy on Graph-ModelNet40, 98.79% on Graph-ModelNet10 and 97.91% on Graph-ShapeNet Part. The constructed graph datasets will be released to the community. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2210.01341 [pdf, ps, other]

Safe and Stable Control Synthesis for Uncertain System Models via Distributionally Robust Optimization

Authors: Kehan Long, Yinzhuang Yi, Jorge Cortes, Nikolay Atanasov

Abstract: This paper considers enforcing safety and stability of dynamical systems in the presence of model uncertainty. Safety and stability constraints may be specified using a control barrier function (CBF) and a control Lyapunov function (CLF), respectively. To take model uncertainty into account, robust and chance formulations of the constraints are commonly considered. However, this requires known err… ▽ More This paper considers enforcing safety and stability of dynamical systems in the presence of model uncertainty. Safety and stability constraints may be specified using a control barrier function (CBF) and a control Lyapunov function (CLF), respectively. To take model uncertainty into account, robust and chance formulations of the constraints are commonly considered. However, this requires known error bounds or a known distribution for the model uncertainty, and the resulting formulations may suffer from over-conservatism or over-confidence. In this paper, we assume that only a finite set of model parametric uncertainty samples is available and formulate a distributionally robust chance-constrained program (DRCCP) for control synthesis with CBF safety and CLF stability guarantees. To facilitate efficient computation of control inputs during online execution, we present a reformulation of the DRCCP as a second-order cone program (SOCP). Our formulation is evaluated in an adaptive cruise control example in comparison to 1) a baseline CLF-CBF quadratic programming approach, 2) a robust approach that assumes known error bounds of the system uncertainty, and 3) a chance-constrained approach that assumes a known Gaussian Process distribution of the uncertainty. △ Less

Submitted 16 March, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2208.06977 [pdf, other]

Secure Transmission Design for Virtual Antenna Array-aided Device-to-device Multicast Communications

Authors: Xinyue Hu, Yibo Yi, Kun Li, Hongwei Zhang, Caihong Kai

Abstract: This paper investigates the physical-layer security in a Virtual Antenna Array (VAA)-aided Device- to-Device Multicast (D2DM) communication system, where the User Equipments (UEs) who cache the common content will form a VAA system and cooperatively multicast the content to UEs who desire it. Specifically, with the target of securing the VAA-aided D2DM communication under the threat of multiple ea… ▽ More This paper investigates the physical-layer security in a Virtual Antenna Array (VAA)-aided Device- to-Device Multicast (D2DM) communication system, where the User Equipments (UEs) who cache the common content will form a VAA system and cooperatively multicast the content to UEs who desire it. Specifically, with the target of securing the VAA-aided D2DM communication under the threat of multiple eavesdroppers, we propose a secure beamforming scheme by jointly considering the formed VAA and the Base Station (BS). For obtaining the optimal beamforing vectors, a nonsmooth and nonconvex Weight Sum Rate Maximization Problem (WRMP) is formulated and solved using Successive Convex Approximation (SCA) approach. Furthermore, we consider the worst case that eavesdroppers cooperatively form a eavesdrop VAA to enhance the overhearing capacity. In this case, we modify the securing beamforming scheme, formulate the corresponding WRMP and solve it using a two-level optimization. Simulation results validate the improvements of the VAA-aided D2DM scheme in terms of communication security compared with conventional D2DM schemes. △ Less

Submitted 14 August, 2022; originally announced August 2022.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2207.12110 [pdf, ps, other]

A Sample-Based Algorithm for Approximately Testing $r$-Robustness of a Digraph

Authors: Yuhao Yi, Yuan Wang, Xingkang He, Stacy Patterson, Karl H. Johansson

Abstract: One of the intensely studied concepts of network robustness is $r$-robustness, which is a network topology property quantified by an integer $r$. It is required by mean subsequence reduced (MSR) algorithms and their variants to achieve resilient consensus. However, determining $r$-robustness is intractable for large networks. In this paper, we propose a sample-based algorithm to approximately test… ▽ More One of the intensely studied concepts of network robustness is $r$-robustness, which is a network topology property quantified by an integer $r$. It is required by mean subsequence reduced (MSR) algorithms and their variants to achieve resilient consensus. However, determining $r$-robustness is intractable for large networks. In this paper, we propose a sample-based algorithm to approximately test $r$-robustness of a digraph with $n$ vertices and $m$ edges. For a digraph with a moderate assumption on the minimum in-degree, and an error parameter $0<ε\leq 1$, the proposed algorithm distinguishes $(r+εn)$-robust graphs from graphs which are not $r$-robust with probability $(1-δ)$. Our algorithm runs in $\exp(O((\ln{\frac{1}{εδ}})/ε^2))\cdot m$ time. The running time is linear in the number of edges if $ε$ is a constant. △ Less

Submitted 25 July, 2022; originally announced July 2022.

Comments: 8 pages, 3 figures

arXiv:2207.09818 [pdf, other]

Operating Envelopes under Probabilistic Electricity Demand and Solar Generation Forecasts

Authors: Yu Yi, Gregor Verbic

Abstract: The increasing penetration of distributed energy resources in low-voltage networks is turning end-users from consumers to prosumers. However, the incomplete smart meter rollout and paucity of smart meter data due to the regulatory separation between retail and network service provision make active distribution network management difficult. Furthermore, distribution network operators oftentimes do… ▽ More The increasing penetration of distributed energy resources in low-voltage networks is turning end-users from consumers to prosumers. However, the incomplete smart meter rollout and paucity of smart meter data due to the regulatory separation between retail and network service provision make active distribution network management difficult. Furthermore, distribution network operators oftentimes do not have access to real-time smart meter data, which creates an additional challenge. For the lack of better solutions, they use blanket rooftop solar export limits, leading to suboptimal outcomes. To address this, we designed a conditional generative adversarial network (CGAN)-based model to forecast household solar generation and electricity demand, which serves as an input to chance-constrained optimal power flow used to compute fair operating envelopes under uncertainty. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: In proceedings of the 11th Bulk Power Systems Dynamics and Control Symposium (IREP 2022), July 25-30, 2022, Banff, Canada

Report number: IREP2022-79

arXiv:2205.04421 [pdf, other]

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

Authors: Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, Yuanhao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

Abstract: Text to speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level quality, how to define/judge that quality and how to achieve it. In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing app… ▽ More Text to speech (TTS) has made rapid progress in both academia and industry in recent years. Some questions naturally arise that whether a TTS system can achieve human-level quality, how to define/judge that quality and how to achieve it. In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset. Specifically, we leverage a variational autoencoder (VAE) for end-to-end text to waveform generation, with several key modules to enhance the capacity of the prior from text and reduce the complexity of the posterior from speech, including phoneme pre-training, differentiable duration modeling, bidirectional prior/posterior modeling, and a memory mechanism in VAE. Experiment evaluations on popular LJSpeech dataset show that our proposed NaturalSpeech achieves -0.01 CMOS (comparative mean opinion score) to human recordings at the sentence level, with Wilcoxon signed rank test at p-level p >> 0.05, which demonstrates no statistically significant difference from human recordings for the first time on this dataset. △ Less

Submitted 10 May, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

Comments: 19 pages, 3 figures, 8 tables

arXiv:2204.12105 [pdf, other]

Learning Dual-Pixel Alignment for Defocus Deblurring

Authors: Yu Li, Yaling Yi, Dongwei Ren, Qince Li, Wangmeng Zuo

Abstract: It is a challenging task to recover sharp image from a single defocus blurry image in real-world applications. On many modern cameras, dual-pixel (DP) sensors create two-image views, based on which stereo information can be exploited to benefit defocus deblurring. Despite the impressive results achieved by existing DP defocus deblurring methods, the misalignment between DP image views is still not… ▽ More It is a challenging task to recover sharp image from a single defocus blurry image in real-world applications. On many modern cameras, dual-pixel (DP) sensors create two-image views, based on which stereo information can be exploited to benefit defocus deblurring. Despite the impressive results achieved by existing DP defocus deblurring methods, the misalignment between DP image views is still not studied, leaving room for improving DP defocus deblurring. In this work, we propose a Dual-Pixel Alignment Network (DPANet) for defocus deblurring. Generally, DPANet is an encoder-decoder with skip-connections, where two branches with shared parameters in the encoder are employed to extract and align deep features from left and right views, and one decoder is adopted to fuse aligned features for predicting the sharp image. Due to that DP views suffer from different blur amounts, it is not trivial to align left and right views. To this end, we propose novel encoder alignment module (EAM) and decoder alignment module (DAM). In particular, a correlation layer is suggested in EAM to measure the disparity between DP views, whose deep features can then be accordingly aligned using deformable convolutions. DAM can further enhance the alignment of skip-connected features from encoder and deep features in decoder. By introducing several EAMs and DAMs, DP views in DPANet can be well aligned for better predicting latent sharp image. Experimental results on real-world datasets show that our DPANet is notably superior to state-of-the-art deblurring methods in reducing defocus blur while recovering visually plausible sharp structures and textures. △ Less

Submitted 19 February, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

Comments: Project page: https://github.com/liyucs/DPANet

arXiv:2203.02286 [pdf, other]

Semi-parametric Makeup Transfer via Semantic-aware Correspondence

Authors: Mingrui Zhu, Yun Yi, Nannan Wang, Xiaoyu Wang, Xinbo Gao

Abstract: The large discrepancy between the source non-makeup image and the reference makeup image is one of the key challenges in makeup transfer. Conventional approaches for makeup transfer either learn disentangled representation or perform pixel-wise correspondence in a parametric way between two images. We argue that non-parametric techniques have a high potential for addressing the pose, expression, a… ▽ More The large discrepancy between the source non-makeup image and the reference makeup image is one of the key challenges in makeup transfer. Conventional approaches for makeup transfer either learn disentangled representation or perform pixel-wise correspondence in a parametric way between two images. We argue that non-parametric techniques have a high potential for addressing the pose, expression, and occlusion discrepancies. To this end, this paper proposes a \textbf{S}emi-\textbf{p}arametric \textbf{M}akeup \textbf{T}ransfer (SpMT) method, which combines the reciprocal strengths of non-parametric and parametric mechanisms. The non-parametric component is a novel \textbf{S}emantic-\textbf{a}ware \textbf{C}orrespondence (SaC) module that explicitly reconstructs content representation with makeup representation under the strong constraint of component semantics. The reconstructed representation is desired to preserve the spatial and identity information of the source image while "wearing" the makeup of the reference image. The output image is synthesized via a parametric decoder that draws on the reconstructed representation. Extensive experiments demonstrate the superiority of our method in terms of visual quality, robustness, and flexibility. Code and pre-trained model are available at \url{https://github.com/AnonymScholar/SpMT. △ Less

Submitted 4 March, 2022; originally announced March 2022.

Comments: 20 pages, 2 tables, 17 figures

arXiv:2201.03549 [pdf, other]

A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

Authors: Tianhan Zhang, Yuxiao Yi, Yifan Xu, Zhi X. Chen, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu

Abstract: Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing… ▽ More Machine learning has long been considered as a black box for predicting combustion chemical kinetics due to the extremely large number of parameters and the lack of evaluation standards and reproducibility. The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be. Sampling and preprocessing determine the DNN training dataset, further affect DNN prediction ability. The current work proposes using Box-Cox transformation (BCT) to preprocess the combustion data. In addition, this work compares different sampling methods with or without preprocessing, including the Monte Carlo method, manifold sampling, generative neural network method (cycle-GAN), and newly-proposed multi-scale sampling. Our results reveal that the DNN trained by the manifold data can capture the chemical kinetics in limited configurations but cannot remain robust toward perturbation, which is inevitable for the DNN coupled with the flow field. The Monte Carlo and cycle-GAN samplings can cover a wider phase space but fail to capture small-scale intermediate species, producing poor prediction results. A three-hidden-layer DNN, based on the multi-scale method without specific flame simulation data, allows predicting chemical kinetics in various scenarios and being stable during the temporal evolutions. This single DNN is readily implemented with several CFD codes and validated in various combustors, including (1). zero-dimensional autoignition, (2). one-dimensional freely propagating flame, (3). two-dimensional jet flame with triple-flame structure, and (4). three-dimensional turbulent lifted flames. The results demonstrate the satisfying accuracy and generalization ability of the pre-trained DNN. The Fortran and Python versions of DNN and example code are attached in the supplementary for reproducibility. △ Less

Submitted 12 August, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

arXiv:2112.08702 [pdf, other]

Learning to Share in Multi-Agent Reinforcement Learning

Authors: Yuxuan Yi, Ge Li, Yaowei Wang, Zongqing Lu

Abstract: In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents to make decisions in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the f… ▽ More In this paper, we study the problem of networked multi-agent reinforcement learning (MARL), where a number of agents are deployed as a partially connected network and each interacts only with nearby agents. Networked MARL requires all agents to make decisions in a decentralized manner to optimize a global objective with restricted communication between neighbors over the network. Inspired by the fact that sharing plays a key role in human's learning of cooperation, we propose LToS, a hierarchically decentralized MARL framework that enables agents to learn to dynamically share reward with neighbors so as to encourage agents to cooperate on the global objective through collectives. For each agent, the high-level policy learns how to share reward with neighbors to decompose the global objective, while the low-level policy learns to optimize the local objective induced by the high-level policies in the neighborhood. The two policies form a bi-level optimization and learn alternately. We empirically demonstrate that LToS outperforms existing methods in both social dilemma and networked MARL scenarios across scales. △ Less

Submitted 21 June, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: ICLR 2022 Workshop on Gamification and Multiagent Solutions, Best Cooperative AI Paper Award

arXiv:2107.09404 [pdf, ps, other]

Maximizing the Set Cardinality of Users Scheduled for Ultra-dense uRLLC Networks

Authors: Shiwen He, Jun Yuan, Zhenyu An, Yunshan Yi, Yongming Huang

Abstract: Ultra-reliability and low latency communication has long been an important but challenging task in the fifth and sixth generation wireless communication systems. Scheduling as many users as possible to serve on the limited time-frequency resource is one of a crucial topic, subjecting to the maximum allowable transmission power and the minimum rate requirement of each user. We address it by proposi… ▽ More Ultra-reliability and low latency communication has long been an important but challenging task in the fifth and sixth generation wireless communication systems. Scheduling as many users as possible to serve on the limited time-frequency resource is one of a crucial topic, subjecting to the maximum allowable transmission power and the minimum rate requirement of each user. We address it by proposing a mixed integer programming model, with the goal of maximizing the set cardinality of users instead of maximizing the system sum rate or energy efficiency. Mathematical transformations and successive convex approximation are combined to solve the complex optimization problem. Numerical results show that the proposed method achieves a considerable performance compared with exhaustive search method, but with lower computational complexity. △ Less

Submitted 9 September, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: 4 pages, 3 figures

arXiv:2106.15354 [pdf, other]

doi 10.1371/journal.pone.0277878

Text mining and sentiment analysis of COVID-19 tweets

Authors: Qihuang Zhang, Grace Y. Yi, Li-Pang Chen, Wenqing He

Abstract: The human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2), causing the COVID-19 disease, has continued to spread all over the world. It menacingly affects not only public health and global economics but also mental health and mood. While the impact of the COVID-19 pandemic has been widely studied, relatively fewer discussions about the sentimental reaction of the population have been… ▽ More The human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2), causing the COVID-19 disease, has continued to spread all over the world. It menacingly affects not only public health and global economics but also mental health and mood. While the impact of the COVID-19 pandemic has been widely studied, relatively fewer discussions about the sentimental reaction of the population have been available. In this article, we scrape COVID-19 related tweets on the microblogging platform, Twitter, and examine the tweets from Feb~24, 2020 to Oct~14, 2020 in four Canadian cities (Toronto, Montreal, Vancouver, and Calgary) and four U.S. cities (New York, Los Angeles, Chicago, and Seattle). Applying the Vader and NRC approaches, we evaluate the sentiment intensity scores and visualize the information over different periods of the pandemic. Sentiment scores for the tweets concerning three anti-epidemic measures, masks, vaccine, and lockdown, are computed for comparisons. The results of four Canadian cities are compared with four cities in the United States. We study the causal relationships between the infected cases, the tweet activities, and the sentiment scores of COVID-19 related tweets, by integrating the echo state network method with convergent cross-mapping. Our analysis shows that public sentiments regarding COVID-19 vary in different time periods and locations. In general, people have a positive mood about COVID-19 and masks, but negative in the topics of vaccine and lockdown. The causal inference shows that the sentiment influences people's activities on Twitter, which is also correlated to the daily number of infections. △ Less

Submitted 26 June, 2021; originally announced June 2021.

Comments: 20 pages, 10 figures, 1 table

arXiv:2102.03688 [pdf, other]

Making Intelligent Reflecting Surfaces More Intelligent: A Roadmap Through Reservoir Computing

Authors: Zhou Zhou, Kangjun Bai, Nima Mohammadi, Yang Yi, Lingjia Liu

Abstract: This article introduces a neural network-based signal processing framework for intelligent reflecting surface (IRS) aided wireless communications systems. By modeling radio-frequency (RF) impairments inside the "meta-atoms" of IRS (including nonlinearity and memory effects), we present an approach that generalizes the entire IRS-aided system as a reservoir computing (RC) system, an efficient recur… ▽ More This article introduces a neural network-based signal processing framework for intelligent reflecting surface (IRS) aided wireless communications systems. By modeling radio-frequency (RF) impairments inside the "meta-atoms" of IRS (including nonlinearity and memory effects), we present an approach that generalizes the entire IRS-aided system as a reservoir computing (RC) system, an efficient recurrent neural network (RNN) operating in a state near the "edge of chaos". This framework enables us to take advantage of the nonlinearity of this "fabricated" wireless environment to overcome link degradation due to model mismatch. Accordingly, the randomness of the wireless channel and RF imperfections are naturally embedded into the RC framework, enabling the internal RC dynamics lying on the edge of chaos. Furthermore, several practical issues, such as channel state information acquisition, passive beamforming design, and physical layer reference signal design, are discussed. △ Less

Submitted 6 February, 2021; originally announced February 2021.

arXiv:2101.12240 [pdf, other]

Differential Privacy Meets Federated Learning under Communication Constraints

Authors: Nima Mohammadi, Jianan Bai, Qiang Fan, Yifei Song, Yang Yi, Lingjia Liu

Abstract: The performance of federated learning systems is bottlenecked by communication costs and training variance. The communication overhead problem is usually addressed by three communication-reduction techniques, namely, model compression, partial device participation, and periodic aggregation, at the cost of increased training variance. Different from traditional distributed learning systems, federat… ▽ More The performance of federated learning systems is bottlenecked by communication costs and training variance. The communication overhead problem is usually addressed by three communication-reduction techniques, namely, model compression, partial device participation, and periodic aggregation, at the cost of increased training variance. Different from traditional distributed learning systems, federated learning suffers from data heterogeneity (since the devices sample their data from possibly different distributions), which induces additional variance among devices during training. Various variance-reduced training algorithms have been introduced to combat the effects of data heterogeneity, while they usually cost additional communication resources to deliver necessary control information. Additionally, data privacy remains a critical issue in FL, and thus there have been attempts at bringing Differential Privacy to this framework as a mediator between utility and privacy requirements. This paper investigates the trade-offs between communication costs and training variance under a resource-constrained federated system theoretically and experimentally, and how communication reduction techniques interplay in a differentially private setting. The results provide important insights into designing practical privacy-aware federated learning systems. △ Less

Submitted 28 January, 2021; originally announced January 2021.

Comments: 11 pages, 4 figures

arXiv:2101.09244 [pdf, other]

Extracting Lifestyle Factors for Alzheimer's Disease from Clinical Notes Using Deep Learning with Weak Supervision

Authors: Zitao Shen, Yoonkwon Yi, Anusha Bompelli, Fang Yu, Yanshan Wang, Rui Zhang

Abstract: Since no effective therapies exist for Alzheimer's disease (AD), prevention has become more critical through lifestyle factor changes and interventions. Analyzing electronic health records (EHR) of patients with AD can help us better understand lifestyle's effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to demonstrate th… ▽ More Since no effective therapies exist for Alzheimer's disease (AD), prevention has become more critical through lifestyle factor changes and interventions. Analyzing electronic health records (EHR) of patients with AD can help us better understand lifestyle's effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to demonstrate the feasibility of natural language processing (NLP) models to classify lifestyle factors (e.g., physical activity and excessive diet) from clinical texts. We automatically generated labels for the training data by using a rule-based NLP algorithm. We conducted weak supervision for pre-trained Bidirectional Encoder Representations from Transformers (BERT) models on the weakly labeled training corpus. These models include the BERT base model, PubMedBERT(abstracts + full text), PubMedBERT(only abstracts), Unified Medical Language System (UMLS) BERT, Bio BERT, and Bio-clinical BERT. We performed two case studies: physical activity and excessive diet, in order to validate the effectiveness of BERT models in classifying lifestyle factors for AD. These models were compared on the developed Gold Standard Corpus (GSC) on the two case studies. The PubmedBERT(Abs) model achieved the best performance for physical activity, with its precision, recall, and F-1 scores of 0.96, 0.96, and 0.96, respectively. Regarding classifying excessive diet, the Bio BERT model showed the highest performance with perfect precision, recall, and F-1 scores. The proposed approach leveraging weak supervision could significantly increase the sample size, which is required for training the deep learning models. The study also demonstrates the effectiveness of BERT models for extracting lifestyle factors for Alzheimer's disease from clinical notes. △ Less

Submitted 24 January, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

arXiv:2012.00945 [pdf, other]

Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance

Authors: Yu Li, Ming Liu, Yaling Yi, Qince Li, Dongwei Ren, Wangmeng Zuo

Abstract: Removing undesired reflection from an image captured through a glass surface is a very challenging problem with many practical application scenarios. For improving reflection removal, cascaded deep models have been usually adopted to estimate the transmission in a progressive manner. However, most existing methods are still limited in exploiting the result in prior stage for guiding transmission e… ▽ More Removing undesired reflection from an image captured through a glass surface is a very challenging problem with many practical application scenarios. For improving reflection removal, cascaded deep models have been usually adopted to estimate the transmission in a progressive manner. However, most existing methods are still limited in exploiting the result in prior stage for guiding transmission estimation. In this paper, we present a novel two-stage network with reflection-aware guidance (RAGNet) for single image reflection removal (SIRR). To be specific, the reflection layer is firstly estimated due to that it generally is much simpler and is relatively easier to estimate. Reflectionaware guidance (RAG) module is then elaborated for better exploiting the estimated reflection in predicting transmission layer. By incorporating feature maps from the estimated reflection and observation, RAG can be used (i) to mitigate the effect of reflection from the observation, and (ii) to generate mask in partial convolution for mitigating the effect of deviating from linear combination hypothesis. A dedicated mask loss is further presented for reconciling the contributions of encoder and decoder features. Experiments on five commonly used datasets demonstrate the quantitative and qualitative superiority of our RAGNet in comparison to the state-of-the-art SIRR methods. The source code and pre-trained model are available at https://github.com/liyucs/RAGNet. △ Less

Submitted 21 February, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

arXiv:2011.11087 [pdf, ps, other]

Edge Deletion Algorithms for Minimizing Spread in SIR Epidemic Models

Authors: Yuhao Yi, Liren Shan, Philip E. Paré, Karl H. Johansson

Abstract: This paper studies algorithmic strategies to effectively reduce the number of infections in susceptible-infected-recovered (SIR) epidemic models. We consider a Markov chain SIR model and its two instantiations in the deterministic SIR (D-SIR) model and the independent cascade SIR (IC-SIR) model. We investigate the problem of minimizing the number of infections by restricting contacts under realist… ▽ More This paper studies algorithmic strategies to effectively reduce the number of infections in susceptible-infected-recovered (SIR) epidemic models. We consider a Markov chain SIR model and its two instantiations in the deterministic SIR (D-SIR) model and the independent cascade SIR (IC-SIR) model. We investigate the problem of minimizing the number of infections by restricting contacts under realistic constraints. Under moderate assumptions on the reproduction number, we prove that the infection numbers are bounded by supermodular functions in the D-SIR model and the IC-SIR model for large classes of random networks. We propose efficient algorithms with approximation guarantees to minimize infections. The theoretical results are illustrated by numerical simulations. △ Less

Submitted 22 November, 2020; originally announced November 2020.

Comments: 29 pages, 4 figures

arXiv:2010.05449 [pdf, other]

Deep Echo State Q-Network (DEQN) and Its Application in Dynamic Spectrum Sharing for 5G and Beyond

Authors: Hao-Hsuan Chang, Lingjia Liu, Yang Yi

Abstract: Deep reinforcement learning (DRL) has been shown to be successful in many application domains. Combining recurrent neural networks (RNNs) and DRL further enables DRL to be applicable in non-Markovian environments by capturing temporal information. However, training of both DRL and RNNs is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted app… ▽ More Deep reinforcement learning (DRL) has been shown to be successful in many application domains. Combining recurrent neural networks (RNNs) and DRL further enables DRL to be applicable in non-Markovian environments by capturing temporal information. However, training of both DRL and RNNs is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted applications, such as those used in the fifth generation (5G) cellular communication, the environment is highly dynamic while the available training data is very limited. Therefore, it is extremely important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment requiring limited training overhead. In this paper, we introduce the deep echo state Q-network (DEQN) that can adapt to the highly dynamic environment in a short period of time with limited training data. We evaluate the performance of the introduced DEQN method under the dynamic spectrum sharing (DSS) scenario, which is a promising technology in 5G and future 6G networks to increase the spectrum utilization. Compared to conventional spectrum management policy that grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to share the spectrum with the primary system. Our work sheds light on the application of an efficient DRL framework in highly dynamic environments with limited available training data. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: This work is accepted in IEEE Transactions on Neural Networks and Learning Systems

arXiv:2009.12969 [pdf, other]

Simultaneous Relevance and Diversity: A New Recommendation Inference Approach

Authors: Yifang Liu, Zhentao Xu, Qiyuan An, Yang Yi, Yanzhi Wang, Trevor Hastie

Abstract: Relevance and diversity are both important to the success of recommender systems, as they help users to discover from a large pool of items a compact set of candidates that are not only interesting but exploratory as well. The challenge is that relevance and diversity usually act as two competing objectives in conventional recommender systems, which necessities the classic trade-off between exploi… ▽ More Relevance and diversity are both important to the success of recommender systems, as they help users to discover from a large pool of items a compact set of candidates that are not only interesting but exploratory as well. The challenge is that relevance and diversity usually act as two competing objectives in conventional recommender systems, which necessities the classic trade-off between exploitation and exploration. Traditionally, higher diversity often means sacrifice on relevance and vice versa. We propose a new approach, heterogeneous inference, which extends the general collaborative filtering (CF) by introducing a new way of CF inference, negative-to-positive. Heterogeneous inference achieves divergent relevance, where relevance and diversity support each other as two collaborating objectives in one recommendation model, and where recommendation diversity is an inherent outcome of the relevance inference process. Benefiting from its succinctness and flexibility, our approach is applicable to a wide range of recommendation scenarios/use-cases at various sophistication levels. Our analysis and experiments on public datasets and real-world production data show that our approach outperforms existing methods on relevance and diversity simultaneously. △ Less

Submitted 27 September, 2020; originally announced September 2020.

Comments: 9 pages

arXiv:2009.08829 [pdf, other]

Residual Spatial Attention Network for Retinal Vessel Segmentation

Authors: Changlu Guo, Márton Szemenyei, Yugen Yi, Wei Zhou, Haodong Bian

Abstract: Reliable segmentation of retinal vessels can be employed as a way of monitoring and diagnosing certain diseases, such as diabetes and hypertension, as they affect the retinal vascular structure. In this work, we propose the Residual Spatial Attention Network (RSAN) for retinal vessel segmentation. RSAN employs a modified residual block structure that integrates DropBlock, which can not only be uti… ▽ More Reliable segmentation of retinal vessels can be employed as a way of monitoring and diagnosing certain diseases, such as diabetes and hypertension, as they affect the retinal vascular structure. In this work, we propose the Residual Spatial Attention Network (RSAN) for retinal vessel segmentation. RSAN employs a modified residual block structure that integrates DropBlock, which can not only be utilized to construct deep networks to extract more complex vascular features, but can also effectively alleviate the overfitting. Moreover, in order to further improve the representation capability of the network, based on this modified residual block, we introduce the spatial attention (SA) and propose the Residual Spatial Attention Block (RSAB) to build RSAN. We adopt the public DRIVE and CHASE DB1 color fundus image datasets to evaluate the proposed RSAN. Experiments show that the modified residual structure and the spatial attention are effective in this work, and our proposed RSAN achieves the state-of-the-art performance. △ Less

Submitted 18 September, 2020; originally announced September 2020.

Comments: ICONIP 2020

arXiv:2009.00795 [pdf, ps, other]

Information Source Finding in Networks: Querying with Budgets

Authors: Jaeyoung Choi, Sangwoo Moon, Jiin Woo, Kyunghwan Son, Jinwoo Shin, Yung Yi

Abstract: In this paper, we study a problem of detecting the source of diffused information by querying individuals, given a sample snapshot of the information diffusion graph, where two queries are asked: {\em (i)} whether the respondent is the source or not, and {\em (ii)} if not, which neighbor spreads the information to the respondent. We consider the case when respondents may not always be truthful and… ▽ More In this paper, we study a problem of detecting the source of diffused information by querying individuals, given a sample snapshot of the information diffusion graph, where two queries are asked: {\em (i)} whether the respondent is the source or not, and {\em (ii)} if not, which neighbor spreads the information to the respondent. We consider the case when respondents may not always be truthful and some cost is taken for each query. Our goal is to quantify the necessary and sufficient budgets to achieve the detection probability $1-δ$ for any given $0<δ<1.$ To this end, we study two types of algorithms: adaptive and non-adaptive ones, each of which corresponds to whether we adaptively select the next respondents based on the answers of the previous respondents or not. We first provide the information theoretic lower bounds for the necessary budgets in both algorithm types. In terms of the sufficient budgets, we propose two practical estimation algorithms, each of non-adaptive and adaptive types, and for each algorithm, we quantitatively analyze the budget which ensures $1-δ$ detection accuracy. This theoretical analysis not only quantifies the budgets needed by practical estimation algorithms achieving a given target detection accuracy in finding the diffusion source, but also enables us to quantitatively characterize the amount of extra budget required in non-adaptive type of estimation, refereed to as {\em adaptivity gap}. We validate our theoretical findings over synthetic and real-world social network topologies. △ Less

Submitted 22 October, 2020; v1 submitted 1 September, 2020; originally announced September 2020.

Comments: Part of this work was presented at the IEEE INFOCOM 2017 (arXiv:1805.03532) and IEEE ISIT 2018 (arXiv:1711.05496)

arXiv:2006.12010 [pdf, other]

QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning

Authors: Kyunghwan Son, Sungsoo Ahn, Roben Delos Reyes, Jinwoo Shin, Yung Yi

Abstract: QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of learning the largest class of joint-action value functions up to date. However, despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments, such as Starcraft Multi-Agent Challenge (SMAC). In this paper, we identify the performance bottleneck of QTRAN and propose a substantially… ▽ More QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of learning the largest class of joint-action value functions up to date. However, despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments, such as Starcraft Multi-Agent Challenge (SMAC). In this paper, we identify the performance bottleneck of QTRAN and propose a substantially improved version, coined QTRAN++. Our gains come from (i) stabilizing the training objective of QTRAN, (ii) removing the strict role separation between the action-value estimators of QTRAN, and (iii) introducing a multi-head mixing network for value transformation. Through extensive evaluation, we confirm that our diagnosis is correct, and QTRAN++ successfully bridges the gap between empirical performance and theoretical guarantee. In particular, QTRAN++ newly achieves state-of-the-art performance in the SMAC environment. The code will be released. △ Less

Submitted 5 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

Showing 1–50 of 102 results for author: Yi, Y