Search | arXiv e-print repository

Interdisciplinary Expertise to Advance Equitable Explainable AI

Authors: Chloe R. Bennett, Heather Cole-Lewis, Stephanie Farquhar, Naama Haamel, Boris Babenko, Oran Lang, Mat Fleck, Ilana Traynis, Charles Lau, Ivor Horn, Courtney Lyles

Abstract: The field of artificial intelligence (AI) is rapidly influencing health and healthcare, but bias and poor performance persists for populations who face widespread structural oppression. Previous work has clearly outlined the need for more rigorous attention to data representativeness and model performance to advance equity and reduce bias. However, there is an opportunity to also improve the expla… ▽ More The field of artificial intelligence (AI) is rapidly influencing health and healthcare, but bias and poor performance persists for populations who face widespread structural oppression. Previous work has clearly outlined the need for more rigorous attention to data representativeness and model performance to advance equity and reduce bias. However, there is an opportunity to also improve the explainability of AI by leveraging best practices of social epidemiology and health equity to help us develop hypotheses for associations found. In this paper, we focus on explainable AI (XAI) and describe a framework for interdisciplinary expert panel review to discuss and critically assess AI model explanations from multiple perspectives and identify areas of bias and directions for future research. We emphasize the importance of the interdisciplinary expert panel to produce more accurate, equitable interpretations which are historically and contextually informed. Interdisciplinary panel discussions can help reduce bias, identify potential confounders, and identify opportunities for additional research where there are gaps in the literature. In turn, these insights can suggest opportunities for AI model improvement. △ Less

Submitted 29 May, 2024; originally announced June 2024.

arXiv:2405.06697 [pdf, other]

Automated Conversion of Static to Dynamic Scheduler via Natural Language

Authors: Paul Mingzheng Tang, Kenji Kah Hoe Leong, Nowshad Shaik, Hoong Chuin Lau

Abstract: In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect chan… ▽ More In this paper, we explore the potential application of Large Language Models (LLMs) that will automatically model constraints and generate code for dynamic scheduling problems given an existing static model. Static scheduling problems are modelled and coded by optimization experts. These models may be easily obsoleted as the underlying constraints may need to be fine-tuned in order to reflect changes in the scheduling rules. Furthermore, it may be necessary to turn a static model into a dynamic one in order to cope with disturbances in the environment. In this paper, we propose a Retrieval-Augmented Generation (RAG) based LLM model to automate the process of implementing constraints for Dynamic Scheduling (RAGDyS), without seeking help from an optimization modeling expert. Our framework aims to minimize technical complexities related to mathematical modelling and computational workload for end-users, thereby allowing end-users to quickly obtain a new schedule close to the original schedule with changes reflected by natural language constraint descriptions. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 7 pages (excluding appendix), 10 figures, 3 tables

arXiv:2405.03162 [pdf, other]

Advancing Multimodal Medical Capabilities of Gemini

Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.18416 [pdf, other]

Capabilities of Gemini Models in Medicine

Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-Gemini, a family of highly capable multimodal models that are specialized in medicine with the ability to seamlessly use web search, and that can be efficiently tailored to novel modalities using custom encoders. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them, and surpass the GPT-4 model family on every benchmark where a direct comparison is viable, often by a wide margin. On the popular MedQA (USMLE) benchmark, our best-performing Med-Gemini model achieves SoTA performance of 91.1% accuracy, using a novel uncertainty-guided search strategy. On 7 multimodal benchmarks including NEJM Image Challenges and MMMU (health & medicine), Med-Gemini improves over GPT-4V by an average relative margin of 44.5%. We demonstrate the effectiveness of Med-Gemini's long-context capabilities through SoTA performance on a needle-in-a-haystack retrieval task from long de-identified health records and medical video question answering, surpassing prior bespoke methods using only in-context learning. Finally, Med-Gemini's performance suggests real-world utility by surpassing human experts on tasks such as medical text summarization, alongside demonstrations of promising potential for multimodal medical dialogue, medical research and education. Taken together, our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment in this safety-critical domain. △ Less

Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.15435 [pdf, other]

Introduction to Eye Tracking: A Hands-On Tutorial for Students and Practitioners

Authors: Enkelejda Kasneci, Hong Gao, Suleyman Ozdel, Virmarie Maquiling, Enkeleda Thaqi, Carrie Lau, Yao Rong, Gjergji Kasneci, Efe Bozkir

Abstract: Eye-tracking technology is widely used in various application areas such as psychology, neuroscience, marketing, and human-computer interaction, as it is a valuable tool for understanding how people process information and interact with their environment. This tutorial provides a comprehensive introduction to eye tracking, from the basics of eye anatomy and physiology to the principles and applica… ▽ More Eye-tracking technology is widely used in various application areas such as psychology, neuroscience, marketing, and human-computer interaction, as it is a valuable tool for understanding how people process information and interact with their environment. This tutorial provides a comprehensive introduction to eye tracking, from the basics of eye anatomy and physiology to the principles and applications of different eye-tracking systems. The guide is designed to provide a hands-on learning experience for everyone interested in working with eye-tracking technology. Therefore, we include practical case studies to teach students and professionals how to effectively set up and operate an eye-tracking system. The tutorial covers a variety of eye-tracking systems, calibration techniques, data collection, and analysis methods, including fixations, saccades, pupil diameter, and visual scan path analysis. In addition, we emphasize the importance of considering ethical aspects when conducting eye-tracking research and experiments, especially informed consent and participant privacy. We aim to give the reader a solid understanding of basic eye-tracking principles and the practical skills needed to conduct their experiments. Python-based code snippets and illustrative examples are included in the tutorials and can be downloaded at: https://gitlab.lrz.de/hctl/Eye-Tracking-Tutorial. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2403.18545 [pdf, other]

Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters

Authors: Zizhao Mo, Huanle Xu, Wing Cheong Lau

Abstract: Ensuring the highest training throughput to maximize resource efficiency, while maintaining fairness among users, is critical for deep learning (DL) training in heterogeneous GPU clusters. However, current DL schedulers provide only limited fairness properties and suboptimal training throughput, impeding tenants from effectively leveraging heterogeneous resources. The underlying design challenge s… ▽ More Ensuring the highest training throughput to maximize resource efficiency, while maintaining fairness among users, is critical for deep learning (DL) training in heterogeneous GPU clusters. However, current DL schedulers provide only limited fairness properties and suboptimal training throughput, impeding tenants from effectively leveraging heterogeneous resources. The underlying design challenge stems from inherent conflicts between efficiency and fairness properties. In this paper, we introduce OEF, a new resource allocation framework specifically developed for achieving optimal resource efficiency and ensuring diverse fairness properties in heterogeneous GPU clusters. By integrating resource efficiency and fairness within a global optimization framework, OEF is capable of providing users with maximized overall efficiency, as well as various guarantees of fairness, in both cooperative and non-cooperative environments. We have implemented OEF in a cluster resource manager and conducted large-scale experiments, showing that OEF can improve the overall training throughput by up to 32% while improving fairness compared to state-of-the-art heterogeneity-aware schedulers. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2402.03907 [pdf, other]

doi 10.1145/3640794.3665563

Embedding Large Language Models into Extended Reality: Opportunities and Challenges for Inclusion, Engagement, and Privacy

Authors: Efe Bozkir, Süleyman Özdel, Ka Hei Carrie Lau, Mengdi Wang, Hong Gao, Enkelejda Kasneci

Abstract: Advances in artificial intelligence and human-computer interaction will likely lead to extended reality (XR) becoming pervasive. While XR can provide users with interactive, engaging, and immersive experiences, non-player characters are often utilized in pre-scripted and conventional ways. This paper argues for using large language models (LLMs) in XR by embedding them in avatars or as narratives… ▽ More Advances in artificial intelligence and human-computer interaction will likely lead to extended reality (XR) becoming pervasive. While XR can provide users with interactive, engaging, and immersive experiences, non-player characters are often utilized in pre-scripted and conventional ways. This paper argues for using large language models (LLMs) in XR by embedding them in avatars or as narratives to facilitate inclusion through prompt engineering and fine-tuning the LLMs. We argue that this inclusion will promote diversity for XR use. Furthermore, the versatile conversational capabilities of LLMs will likely increase engagement in XR, helping XR become ubiquitous. Lastly, we speculate that combining the information provided to LLM-powered spaces by users and the biometric data obtained might lead to novel privacy invasions. While exploring potential privacy breaches, examining user privacy concerns and preferences is also essential. Therefore, despite challenges, LLM-powered XR is a promising area with several opportunities. △ Less

Submitted 20 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: ACM Conversational User Interfaces 2024

arXiv:2312.11017 [pdf, ps, other]

Information Inequalities via Ideas from Additive Combinatorics

Authors: Chin Wa Lau, Chandra Nair

Abstract: Ruzsa's equivalence theorem provided a framework for converting certain families of inequalities in additive combinatorics to entropic inequalities (which sometimes did not possess stand-alone entropic proofs). In this work, we first establish formal equivalences between some families (different from Ruzsa) of inequalities in additive combinatorics and entropic ones. As a first step to further the… ▽ More Ruzsa's equivalence theorem provided a framework for converting certain families of inequalities in additive combinatorics to entropic inequalities (which sometimes did not possess stand-alone entropic proofs). In this work, we first establish formal equivalences between some families (different from Ruzsa) of inequalities in additive combinatorics and entropic ones. As a first step to further these equivalences, we establish an information-theoretic characterization of the magnification ratio that could also be of independent interest. △ Less

Submitted 20 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 15 pages, The authors were made aware that some of the results had been obtained earlier. The revised version acknowledges and references this work. A conference version of this was published in the proceeding of IEEE ISIT 2023. s

arXiv:2312.01895 [pdf, other]

doi 10.1109/OJCOMS.2024.3414622

An In-Depth Survey on Virtualization Technologies in 6G Integrated Terrestrial and Non-Terrestrial Networks

Authors: Sahar Ammar, Chun Pong Lau, Basem Shihada

Abstract: 6G networks are envisioned to deliver a large diversity of applications and meet stringent quality of service (QoS) requirements. Hence, integrated terrestrial and non-terrestrial networks (TN-NTNs) are anticipated to be key enabling technologies. However, the TN-NTNs integration faces a number of challenges that could be addressed through network virtualization technologies such as Software-Defin… ▽ More 6G networks are envisioned to deliver a large diversity of applications and meet stringent quality of service (QoS) requirements. Hence, integrated terrestrial and non-terrestrial networks (TN-NTNs) are anticipated to be key enabling technologies. However, the TN-NTNs integration faces a number of challenges that could be addressed through network virtualization technologies such as Software-Defined Networking (SDN), Network Function Virtualization (NFV) and network slicing. In this survey, we provide a comprehensive review on the adaptation of these networking paradigms in 6G networks. We begin with a brief overview on NTNs and virtualization techniques. Then, we highlight the integral role of Artificial Intelligence in improving network virtualization by summarizing major research areas where AI models are applied. Building on this foundation, the survey identifies the main issues arising from the adaptation of SDN, NFV, and network slicing in integrated TN-NTNs, and proposes a taxonomy of integrated TN-NTNs virtualization offering a thorough review of relevant contributions. The taxonomy is built on a four-level classification indicating for each study the level of TN-NTNs integration, the used virtualization technology, the addressed problem, the type of the study and the proposed solution, which can be based on conventional or AI-enabled methods. Moreover, we present a summary on the simulation tools commonly used in the testing and validation of such networks. Finally, we discuss open issues and give insights on future research directions for the advancement of integrated TN-NTNs virtualization in the 6G era. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.17074 [pdf, other]

Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification

Authors: Siyuan Huang, Yifan Zhou, Ram Prabhakar, Xijun Liu, Yuxiang Guo, Hongrui Yi, Cheng Peng, Rama Chellappa, Chun Pong Lau

Abstract: Person Re-Identification (ReID) is a challenging problem, focusing on identifying individuals across diverse settings. However, previous ReID methods primarily concentrated on a single domain or modality, such as Clothes-Changing ReID (CC-ReID) and video ReID. Real-world ReID is not constrained by factors like clothes or input types. Recent approaches emphasize on learning semantics through pre-tr… ▽ More Person Re-Identification (ReID) is a challenging problem, focusing on identifying individuals across diverse settings. However, previous ReID methods primarily concentrated on a single domain or modality, such as Clothes-Changing ReID (CC-ReID) and video ReID. Real-world ReID is not constrained by factors like clothes or input types. Recent approaches emphasize on learning semantics through pre-training to enhance ReID performance but are hindered by coarse granularity, on-clothes focus and pre-defined areas. To address these limitations, we propose a Local Semantic Extraction (LSE) module inspired by Interactive Segmentation Models. The LSE module captures fine-grained, biometric, and flexible local semantics, enhancing ReID accuracy. Additionally, we introduce Semantic ReID (SemReID), a pre-training method that leverages LSE to learn effective semantics for seamless transfer across various ReID domains and modalities. Extensive evaluations across nine ReID datasets demonstrates SemReID's robust performance across multiple domains, including clothes-changing ReID, video ReID, unconstrained ReID, and short-term ReID. Our findings highlight the importance of effective semantics in ReID, as SemReID can achieve great performances without domain-specific designs. △ Less

Submitted 14 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.15551 [pdf, other]

Instruct2Attack: Language-Guided Semantic Adversarial Attacks

Authors: Jiang Liu, Chen Wei, Yuxiang Guo, Heng Yu, Alan Yuille, Soheil Feizi, Chun Pong Lau, Rama Chellappa

Abstract: We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. Compared to existing no… ▽ More We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. Compared to existing noise-based and semantic attacks, I2A generates more natural and diverse adversarial examples while providing better controllability and interpretability. We further automate the attack process with GPT-4 to generate diverse image-specific text instructions. We show that I2A can successfully break state-of-the-art deep neural networks even under strong adversarial defenses, and demonstrate great transferability among a variety of network architectures. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: under submission, code coming soon

arXiv:2311.14311 [pdf, other]

doi 10.1016/j.ins.2023.120022

RelJoin: Relative-cost-based Selection of Distributed Join Methods for Query Plan Optimization

Authors: F. Liang, F. C. M. Lau, H. Cui, Y. Li, B. Lin, C. Li, X. Hu

Abstract: Selecting appropriate distributed join methods for logical join operations in a query plan is crucial for the performance of data-intensive scalable computing (DISC). Different network communication patterns in the data exchange phase generate varying network communication workloads and significantly affect the distributed join performance. However, most cost-based query optimizers focus on the lo… ▽ More Selecting appropriate distributed join methods for logical join operations in a query plan is crucial for the performance of data-intensive scalable computing (DISC). Different network communication patterns in the data exchange phase generate varying network communication workloads and significantly affect the distributed join performance. However, most cost-based query optimizers focus on the local computing cost and do not precisely model the network communication cost. We propose a cost model for various distributed join methods to optimize join queries in DISC platforms. Our method precisely measures the network and local computing workloads in different execution phases, using information on the size and cardinality statistics of datasets and cluster join parallelism. Our cost model reveals the importance of the relative size of the joining datasets. We implement an efficient distributed join selection strategy, known as RelJoin in SparkSQL, which is an industry-prevalent distributed data processing framework. RelJoin uses runtime adaptive statistics for accurate cost estimation and selects optimal distributed join methods for logical joins to optimize the physical query plan. The evaluation results on the TPC-DS benchmark show that RelJoin performs best in 62 of the 97 queries and can reduce the average query time by 21% compared with other strategies. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Journal ref: Information Sciences 658 (2024) 120022

arXiv:2311.05725 [pdf, other]

Whole-body Detection, Recognition and Identification at Altitude and Range

Authors: Siyuan Huang, Ram Prabhakar Kathirvel, Chun Pong Lau, Rama Chellappa

Abstract: In this paper, we address the challenging task of whole-body biometric detection, recognition, and identification at distances of up to 500m and large pitch angles of up to 50 degree. We propose an end-to-end system evaluated on diverse datasets, including the challenging Biometric Recognition and Identification at Range (BRIAR) dataset. Our approach involves pre-training the detector on common im… ▽ More In this paper, we address the challenging task of whole-body biometric detection, recognition, and identification at distances of up to 500m and large pitch angles of up to 50 degree. We propose an end-to-end system evaluated on diverse datasets, including the challenging Biometric Recognition and Identification at Range (BRIAR) dataset. Our approach involves pre-training the detector on common image datasets and fine-tuning it on BRIAR's complex videos and images. After detection, we extract body images and employ a feature extractor for recognition. We conduct thorough evaluations under various conditions, such as different ranges and angles in indoor, outdoor, and aerial scenarios. Our method achieves an average F1 score of 98.29% at IoU = 0.7 and demonstrates strong performance in recognition accuracy and true acceptance rate at low false acceptance rates compared to existing models. On a test set of 100 subjects with 444 distractors, our model achieves a rank-20 recognition accuracy of 75.13% and a TAR@1%FAR of 54.09%. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.17097 [pdf, other]

Navigating Data Heterogeneity in Federated Learning A Semi-Supervised Federated Object Detection

Authors: Taehyeon Kim, Eric Lin, Junu Lee, Christian Lau, Vaikkunth Mugunthan

Abstract: Federated Learning (FL) has emerged as a potent framework for training models across distributed data sources while maintaining data privacy. Nevertheless, it faces challenges with limited high-quality labels and non-IID client data, particularly in applications like autonomous driving. To address these hurdles, we navigate the uncharted waters of Semi-Supervised Federated Object Detection (SSFOD)… ▽ More Federated Learning (FL) has emerged as a potent framework for training models across distributed data sources while maintaining data privacy. Nevertheless, it faces challenges with limited high-quality labels and non-IID client data, particularly in applications like autonomous driving. To address these hurdles, we navigate the uncharted waters of Semi-Supervised Federated Object Detection (SSFOD). We present a pioneering SSFOD framework, designed for scenarios where labeled data reside only at the server while clients possess unlabeled data. Notably, our method represents the inaugural implementation of SSFOD for clients with 0% labeled non-IID data, a stark contrast to previous studies that maintain some subset of labels at each client. We propose FedSTO, a two-stage strategy encompassing Selective Training followed by Orthogonally enhanced full-parameter training, to effectively address data shift (e.g. weather conditions) between server and clients. Our contributions include selectively refining the backbone of the detector to avert overfitting, orthogonality regularization to boost representation divergence, and local EMA-driven pseudo label assignment to yield high-quality pseudo labels. Extensive validation on prominent autonomous driving datasets (BDD100K, Cityscapes, and SODA10M) attests to the efficacy of our approach, demonstrating state-of-the-art results. Remarkably, FedSTO, using just 20-30% of labels, performs nearly as well as fully-supervised centralized training methods. △ Less

Submitted 2 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2309.14653 [pdf, other]

doi 10.1109/LCOMM.2023.3320105

Joint Design of Source-Channel Codes with Linear Source Encoding Complexity and Good Channel Thresholds Based on Double-Protograph LDPC Codes

Authors: Jia Zhan, Francis C. M. Lau

Abstract: We propose the use of a lower or upper triangular sub-base matrix to replace the identity matrix in the source-check-channel-variable linking protomatrix of a double-protograph low-density parity-check joint-source-channel code (DP-LDPC JSCC). The elements along the diagonal of the proposed lower or upper triangular sub-base matrix are assigned as "1" and the other non-zero elements can take any n… ▽ More We propose the use of a lower or upper triangular sub-base matrix to replace the identity matrix in the source-check-channel-variable linking protomatrix of a double-protograph low-density parity-check joint-source-channel code (DP-LDPC JSCC). The elements along the diagonal of the proposed lower or upper triangular sub-base matrix are assigned as "1" and the other non-zero elements can take any non-negative integral values. Compared with the traditional DP-LDPC JSCC designs, the new designs show a theoretical channel threshold improvement of up to 0.41 dB and a simulated source symbol error rate improvement of up to 0.5 dB at an error rate of 1e-6. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 7 pages, 5 figures, 3 tables, to appear in IEEE Communications Letters

arXiv:2308.16501 [pdf, other]

Individually Rational Collaborative Vehicle Routing through Give-And-Take Exchanges

Authors: Paul Mingzheng Tang, Ba Phong Tran, Hoong Chuin Lau

Abstract: In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pa… ▽ More In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pairs of vehicles from different logistics companies, optimizing the overall routes while considering standard VRP constraints plus individual rationality constraints. By facilitating cooperation among competing logistics agents through a Give-and-Take approach, we show that it is possible to reduce travel distance and increase operational efficiency system-wide. More importantly, our approach ensures individual rationality and faster convergence, which are important properties of ensuring the long-term sustainability of the marketplace platform. We demonstrate the efficacy of our approach through extensive experiments using real-world test data from major logistics companies. The results reveal our algorithm's ability to rapidly identify numerous optimal solutions, underscoring its practical applicability and potential to transform the logistics industry. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: 7 pages 4 figures This paper was presented in the IJCAI 2023 First International Workshop on Search and Planning with Complex Objectives (WoSePCO) http://idm-lab.org/wiki/complex-objective

arXiv:2308.08785 [pdf, other]

A Feasibility-Preserved Quantum Approximate Solver for the Capacitated Vehicle Routing Problem

Authors: Ningyi Xie, Xinwei Lee, Dongsheng Cai, Yoshiyuki Saito, Nobuyoshi Asai, Hoong Chuin Lau

Abstract: The Capacitated Vehicle Routing Problem (CVRP) is an NP-optimization problem (NPO) that arises in various fields including transportation and logistics. The CVRP extends from the Vehicle Routing Problem (VRP), aiming to determine the most efficient plan for a fleet of vehicles to deliver goods to a set of customers, subject to the limited carrying capacity of each vehicle. As the number of possibl… ▽ More The Capacitated Vehicle Routing Problem (CVRP) is an NP-optimization problem (NPO) that arises in various fields including transportation and logistics. The CVRP extends from the Vehicle Routing Problem (VRP), aiming to determine the most efficient plan for a fleet of vehicles to deliver goods to a set of customers, subject to the limited carrying capacity of each vehicle. As the number of possible solutions skyrockets when the number of customers increases, finding the optimal solution remains a significant challenge. Recently, the Quantum Approximate Optimization Algorithm (QAOA), a quantum-classical hybrid algorithm, has exhibited enhanced performance in certain combinatorial optimization problems compared to classical heuristics. However, its ability diminishes notably in solving constrained optimization problems including the CVRP. This limitation primarily arises from the typical approach of encoding the given problems as penalty-inclusive binary optimization problems. In this case, the QAOA faces challenges in sampling solutions satisfying all constraints. Addressing this, our work presents a new binary encoding for the CVRP, with an alternative objective function of minimizing the shortest path that bypasses the vehicle capacity constraint of the CVRP. The search space is further restricted by the constraint-preserving mixing operation. We examine and discuss the effectiveness of the proposed encoding under the framework of the variant of the QAOA, Quantum Alternating Operator Ansatz (AOA), through its application to several illustrative examples. Compared to the typical QAOA approach, the proposed method not only preserves the feasibility but also achieves a significant enhancement in the probability of measuring optimal solutions. △ Less

Submitted 21 April, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: 10 pages, 10 figures, 1 table

arXiv:2308.01317 [pdf]

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Atilla Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden , et al. (3 additional authors not shown)

Abstract: In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR ach… ▽ More In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI. △ Less

Submitted 7 September, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

arXiv:2307.16382 [pdf, other]

Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?

Authors: Albert Yu Sun, Eliott Zemour, Arushi Saxena, Udith Vaidyanathan, Eric Lin, Christian Lau, Vaikkunth Mugunthan

Abstract: Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memoriza… ▽ More Machine learning practitioners often fine-tune generative pre-trained models like GPT-3 to improve model performance at specific tasks. Previous works, however, suggest that fine-tuned machine learning models memorize and emit sensitive information from the original fine-tuning dataset. Companies such as OpenAI offer fine-tuning services for their models, but no prior work has conducted a memorization attack on any closed-source models. In this work, we simulate a privacy attack on GPT-3 using OpenAI's fine-tuning API. Our objective is to determine if personally identifiable information (PII) can be extracted from this model. We (1) explore the use of naive prompting methods on a GPT-3 fine-tuned classification model, and (2) we design a practical word generation task called Autocomplete to investigate the extent of PII memorization in fine-tuned GPT-3 within a real-world context. Our findings reveal that fine-tuning GPT3 for both tasks led to the model memorizing and disclosing critical personally identifiable information (PII) obtained from the underlying fine-tuning dataset. To encourage further research, we have made our codes and datasets publicly available on GitHub at: https://github.com/albertsun1/gpt3-pii-attacks △ Less

Submitted 15 April, 2024; v1 submitted 30 July, 2023; originally announced July 2023.

arXiv:2307.14578 [pdf, other]

GADER: GAit DEtection and Recognition in the Wild

Authors: Yuxiang Guo, Cheng Peng, Ram Prabhakar, Chun Pong Lau, Rama Chellappa

Abstract: Gait recognition holds the promise of robustly identifying subjects based on their walking patterns instead of color information. While previous approaches have performed well for curated indoor scenes, they have significantly impeded applicability in unconstrained situations, e.g. outdoor, long distance scenes. We propose an end-to-end GAit DEtection and Recognition (GADER) algorithm for human au… ▽ More Gait recognition holds the promise of robustly identifying subjects based on their walking patterns instead of color information. While previous approaches have performed well for curated indoor scenes, they have significantly impeded applicability in unconstrained situations, e.g. outdoor, long distance scenes. We propose an end-to-end GAit DEtection and Recognition (GADER) algorithm for human authentication in challenging outdoor scenarios. Specifically, GADER leverages a Double Helical Signature to detect the fragment of human movement and incorporates a novel gait recognition method, which learns representations by distilling from an auxiliary RGB recognition model. At inference time, GADER only uses the silhouette modality but benefits from a more robust representation. Extensive experiments on indoor and outdoor datasets demonstrate that the proposed method outperforms the State-of-The-Arts for gait recognition and verification, with a significant 20.6% improvement on unconstrained, long distance scenes. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.14334 [pdf, other]

Towards Generalist Biomedical AI

Authors: Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Chuck Lau, Ryutaro Tanno, Ira Ktena, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S Sara Mahdavi, Bradley Green, Ewa Dominowska, Blaise Aguera y Arcas, Joelle Barral , et al. (7 additional authors not shown)

Abstract: Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench… ▽ More Medicine is inherently multimodal, with rich data modalities spanning text, imaging, genomics, and more. Generalist biomedical artificial intelligence (AI) systems that flexibly encode, integrate, and interpret this data at scale can potentially enable impactful applications ranging from scientific discovery to care delivery. To enable the development of these models, we first curate MultiMedBench, a new multimodal biomedical benchmark. MultiMedBench encompasses 14 diverse tasks such as medical question answering, mammography and dermatology image interpretation, radiology report generation and summarization, and genomic variant calling. We then introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data including clinical language, imaging, and genomics with the same set of model weights. Med-PaLM M reaches performance competitive with or exceeding the state of the art on all MultiMedBench tasks, often surpassing specialist models by a wide margin. We also report examples of zero-shot generalization to novel medical concepts and tasks, positive transfer learning across tasks, and emergent zero-shot medical reasoning. To further probe the capabilities and limitations of Med-PaLM M, we conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales. In a side-by-side ranking on 246 retrospective chest X-rays, clinicians express a pairwise preference for Med-PaLM M reports over those produced by radiologists in up to 40.50% of cases, suggesting potential clinical utility. While considerable work is needed to validate these models in real-world use cases, our results represent a milestone towards the development of generalist biomedical AI systems. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2306.09128 [pdf, ps, other]

Fast Algorithms for Directed Graph Partitioning Using Flows and Reweighted Eigenvalues

Authors: Lap Chi Lau, Kam Chuen Tung, Robert Wang

Abstract: We consider a new semidefinite programming relaxation for directed edge expansion, which is obtained by adding triangle inequalities to the reweighted eigenvalue formulation. Applying the matrix multiplicative weight update method to this relaxation, we derive almost linear-time algorithms to achieve $O(\sqrt{\log{n}})$-approximation and Cheeger-type guarantee for directed edge expansion, as well… ▽ More We consider a new semidefinite programming relaxation for directed edge expansion, which is obtained by adding triangle inequalities to the reweighted eigenvalue formulation. Applying the matrix multiplicative weight update method to this relaxation, we derive almost linear-time algorithms to achieve $O(\sqrt{\log{n}})$-approximation and Cheeger-type guarantee for directed edge expansion, as well as an improved cut-matching game for directed graphs. This provides a primal-dual flow-based framework to obtain the best known algorithms for directed graph partitioning. The same approach also works for vertex expansion and for hypergraphs, providing a simple and unified approach to achieve the best known results for different expansion problems and different algorithmic techniques. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.08309 [pdf, other]

doi 10.1109/TVCG.2023.3278691

Taming Reversible Halftoning via Predictive Luminance

Authors: Cheuk-Kit Lau, Menghan Xia, Tien-Tsin Wong

Abstract: Traditional halftoning usually drops colors when dithering images with binary dots, which makes it difficult to recover the original color information. We proposed a novel halftoning technique that converts a color image into a binary halftone with full restorability to its original version. Our novel base halftoning technique consists of two convolutional neural networks (CNNs) to produce the rev… ▽ More Traditional halftoning usually drops colors when dithering images with binary dots, which makes it difficult to recover the original color information. We proposed a novel halftoning technique that converts a color image into a binary halftone with full restorability to its original version. Our novel base halftoning technique consists of two convolutional neural networks (CNNs) to produce the reversible halftone patterns, and a noise incentive block (NIB) to mitigate the flatness degradation issue of CNNs. Furthermore, to tackle the conflicts between the blue-noise quality and restoration accuracy in our novel base method, we proposed a predictor-embedded approach to offload predictable information from the network, which in our case is the luminance information resembling from the halftone pattern. Such an approach allows the network to gain more flexibility to produce halftones with better blue-noise quality without compromising the restoration quality. Detailed studies on the multiple-stage training method and loss weightings have been conducted. We have compared our predictor-embedded method and our novel method regarding spectrum analysis on halftone, halftone accuracy, restoration accuracy, and the data embedding studies. Our entropy evaluation evidences our halftone contains less encoding information than our novel base method. The experiments show our predictor-embedded method gains more flexibility to improve the blue-noise quality of halftones and maintains a comparable restoration quality with a higher tolerance for disturbances. △ Less

Submitted 7 February, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: published in IEEE Transactions on Visualization and Computer Graphics

arXiv:2306.00985 [pdf]

Using generative AI to investigate medical imagery models and datasets

Authors: Oran Lang, Doron Yaya-Stupp, Ilana Traynis, Heather Cole-Lewis, Chloe R. Bennett, Courtney Lyles, Charles Lau, Christopher Semturs, Dale R. Webster, Greg S. Corrado, Avinatan Hassidim, Yossi Matias, Yun Liu, Naama Hammel, Boris Babenko

Abstract: AI models have shown promise in many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust in AI-based models, and could enable novel scientific discovery by uncovering signals in the data that are not yet known to experts. In this paper, we present a method for automatic visual expl… ▽ More AI models have shown promise in many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust in AI-based models, and could enable novel scientific discovery by uncovering signals in the data that are not yet known to experts. In this paper, we present a method for automatic visual explanations leveraging team-based expertise by generating hypotheses of what visual signals in the images are correlated with the task. We propose the following 4 steps: (i) Train a classifier to perform a given task (ii) Train a classifier guided StyleGAN-based image generator (StylEx) (iii) Automatically detect and visualize the top visual attributes that the classifier is sensitive towards (iv) Formulate hypotheses for the underlying mechanisms, to stimulate future research. Specifically, we present the discovered attributes to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health. We demonstrate results on eight prediction tasks across three medical imaging modalities: retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples of attributes that capture clinically known features, confounders that arise from factors beyond physiological mechanisms, and reveal a number of physiologically plausible novel attributes. Our approach has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models. Importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors. Finally, we intend to release code to enable researchers to train their own StylEx models and analyze their predictive tasks. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: 34 pages, 1 figure

arXiv:2305.13625 [pdf, other]

DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection

Authors: Jiang Liu, Chun Pong Lau, Rama Chellappa

Abstract: The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from being identified by unauthorized FR systems utilizing adversarial attacks to generate encrypted face images. However, existing methods suffer from… ▽ More The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from being identified by unauthorized FR systems utilizing adversarial attacks to generate encrypted face images. However, existing methods suffer from poor visual quality or low attack success rates, which limit their utility. Recently, diffusion models have achieved tremendous success in image generation. In this work, we ask: can diffusion models be used to generate adversarial examples to improve both visual quality and attack performance? We propose DiffProtect, which utilizes a diffusion autoencoder to generate semantically meaningful perturbations on FR systems. Extensive experiments demonstrate that DiffProtect produces more natural-looking encrypted images than state-of-the-art methods while achieving significantly higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the CelebA-HQ and FFHQ datasets. △ Less

Submitted 28 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

Comments: Code will be available at https://github.com/joellliu/DiffProtect/

arXiv:2305.13548 [pdf, ps, other]

Attribute-Guided Encryption with Facial Texture Masking

Authors: Chun Pong Lau, Jiang Liu, Rama Chellappa

Abstract: The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from unauthorized FR systems utilizing adversarial attacks to generate encrypted face images to protect users from being identified by FR systems. Howe… ▽ More The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from unauthorized FR systems utilizing adversarial attacks to generate encrypted face images to protect users from being identified by FR systems. However, existing methods suffer from poor visual quality or low attack success rates, which limit their usability in practice. In this paper, we propose Attribute Guided Encryption with Facial Texture Masking (AGE-FTM) that performs a dual manifold adversarial attack on FR systems to achieve both good visual quality and high black box attack success rates. In particular, AGE-FTM utilizes a high fidelity generative adversarial network (GAN) to generate natural on-manifold adversarial samples by modifying facial attributes, and performs the facial texture masking attack to generate imperceptible off-manifold adversarial samples. Extensive experiments on the CelebA-HQ dataset demonstrate that our proposed method produces more natural-looking encrypted images than state-of-the-art methods while achieving competitive attack performance. We further evaluate the effectiveness of AGE-FTM in the real world using a commercial FR API and validate its usefulness in practice through an user study. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2305.02799 [pdf, ps, other]

A Heterogeneous 6G Networked Sensing Architecture with Active and Passive Anchors

Authors: Qipeng Wang, Liang Liu, Shuowen Zhang, Boya Di, Francis C. M. Lau

Abstract: In the future 6G integrated sensing and communication (ISAC) cellular systems, networked sensing is a promising technique that can leverage the cooperation among the base stations (BSs) to perform high-resolution localization. However, a dense deployment of BSs to fully reap the networked sensing gain is not a cost-efficient solution in practice. Motivated by the advance in the intelligent reflect… ▽ More In the future 6G integrated sensing and communication (ISAC) cellular systems, networked sensing is a promising technique that can leverage the cooperation among the base stations (BSs) to perform high-resolution localization. However, a dense deployment of BSs to fully reap the networked sensing gain is not a cost-efficient solution in practice. Motivated by the advance in the intelligent reflecting surface (IRS) technology for 6G communication, this paper examines the feasibility of deploying the low-cost IRSs to enhance the anchor density for networked sensing. Specifically, we propose a novel heterogeneous networked sensing architecture, which consists of both the active anchors, i.e., the BSs, and the passive anchors, i.e., the IRSs. Under this framework, the BSs emit the orthogonal frequency division multiplexing (OFDM) communication signals in the downlink for localizing the targets based on their echoes reflected via/not via the IRSs. However, there are two challenges for using passive anchors in localization. First, it is impossible to utilize the round-trip signal between a passive IRS and a passive target for estimating their distance. Second, before localizing a target, we do not know which IRS is closest to it and serves as its anchor. In this paper, we show that the distance between a target and its associated IRS can be indirectly estimated based on the length of the BS-target-BS path and the BS-target-IRS-BS path. Moreover, we propose an efficient data association method to match each target to its associated IRS. Numerical results are given to validate the feasibility and effectiveness of our proposed heterogeneous networked sensing architecture with both active and passive anchors. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: submitted to IEEE journal

arXiv:2305.01942 [pdf, ps, other]

Experimental Design for Any $p$-Norm

Authors: Lap Chi Lau, Robert Wang, Hong Zhou

Abstract: We consider a general $p$-norm objective for experimental design problems that captures some well-studied objectives (D/A/E-design) as special cases. We prove that a randomized local search approach provides a unified algorithm to solve this problem for all $p$. This provides the first approximation algorithm for the general $p$-norm objective, and a nice interpolation of the best known bounds of… ▽ More We consider a general $p$-norm objective for experimental design problems that captures some well-studied objectives (D/A/E-design) as special cases. We prove that a randomized local search approach provides a unified algorithm to solve this problem for all $p$. This provides the first approximation algorithm for the general $p$-norm objective, and a nice interpolation of the best known bounds of the special cases. △ Less

Submitted 3 May, 2023; originally announced May 2023.

Comments: 29 pages

arXiv:2303.14646 [pdf, other]

A Survey of Machine Learning-Based Ride-Hailing Planning

Authors: Dacheng Wen, Yupeng Li, Francis C. M. Lau

Abstract: Ride-hailing is a sustainable transportation paradigm where riders access door-to-door traveling services through a mobile phone application, which has attracted a colossal amount of usage. There are two major planning tasks in a ride-hailing system: (1) matching, i.e., assigning available vehicles to pick up the riders, and (2) repositioning, i.e., proactively relocating vehicles to certain locat… ▽ More Ride-hailing is a sustainable transportation paradigm where riders access door-to-door traveling services through a mobile phone application, which has attracted a colossal amount of usage. There are two major planning tasks in a ride-hailing system: (1) matching, i.e., assigning available vehicles to pick up the riders, and (2) repositioning, i.e., proactively relocating vehicles to certain locations to balance the supply and demand of ride-hailing services. Recently, many studies of ride-hailing planning that leverage machine learning techniques have emerged. In this article, we present a comprehensive overview on latest developments of machine learning-based ride-hailing planning. To offer a clear and structured review, we introduce a taxonomy into which we carefully fit the different categories of related works according to the types of their planning tasks and solution schemes, which include collective matching, distributed matching, collective repositioning, distributed repositioning, and joint matching and repositioning. We further shed light on many real-world datasets and simulators that are indispensable for empirical studies on machine learning-based ride-hailing planning strategies. At last, we propose several promising research directions for this rapidly growing research and practical field. △ Less

Submitted 26 March, 2023; originally announced March 2023.

arXiv:2301.00407 [pdf, other]

MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs

Authors: Huaizheng Zhang, Yuanming Li, Wencong Xiao, Yizheng Huang, Xing Di, Jianxiong Yin, Simon See, Yong Luo, Chiew Tong Lau, Yang You

Abstract: New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensiv… ▽ More New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon. △ Less

Submitted 1 January, 2023; originally announced January 2023.

Comments: 10 pages, 11 figures

arXiv:2212.03659 [pdf, other]

The BeMi Stardust: a Structured Ensemble of Binarized Neural Networks

Authors: Ambrogio Maria Bernardelli, Stefano Gualandi, Hoong Chuin Lau, Simone Milanesi

Abstract: Binarized Neural Networks (BNNs) are receiving increasing attention due to their lightweight architecture and ability to run on low-power devices. The state-of-the-art for training classification BNNs restricted to few-shot learning is based on a Mixed Integer Programming (MIP) approach. This paper proposes the BeMi ensemble, a structured architecture of BNNs based on training a single BNN for eac… ▽ More Binarized Neural Networks (BNNs) are receiving increasing attention due to their lightweight architecture and ability to run on low-power devices. The state-of-the-art for training classification BNNs restricted to few-shot learning is based on a Mixed Integer Programming (MIP) approach. This paper proposes the BeMi ensemble, a structured architecture of BNNs based on training a single BNN for each possible pair of classes and applying a majority voting scheme to predict the final output. The training of a single BNN discriminating between two classes is achieved by a MIP model that optimizes a lexicographic multi-objective function according to robustness and simplicity principles. This approach results in training networks whose output is not affected by small perturbations on the input and whose number of active weights is as small as possible, while good accuracy is preserved. We computationally validate our model using the MNIST and Fashion-MNIST datasets using up to 40 training images per class. Our structured ensemble outperforms both BNNs trained by stochastic gradient descent and state-of-the-art MIP-based approaches. While the previous approaches achieve an average accuracy of 51.1% on the MNIST dataset, the BeMi ensemble achieves an average accuracy of 61.7% when trained with 10 images per class and 76.4% when trained with 40 images per class. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 17 pages, 5 figure, 2 tables

arXiv:2211.09776 [pdf, other]

Cheeger Inequalities for Directed Graphs and Hypergraphs Using Reweighted Eigenvalues

Authors: Lap Chi Lau, Kam Chuen Tung, Robert Wang

Abstract: We derive Cheeger inequalities for directed graphs and hypergraphs using the reweighted eigenvalue approach that was recently developed for vertex expansion in undirected graphs [OZ22,KLT22,JPV22]. The goal is to develop a new spectral theory for directed graphs and an alternative spectral theory for hypergraphs. The first main result is a Cheeger inequality relating the vertex expansion… ▽ More We derive Cheeger inequalities for directed graphs and hypergraphs using the reweighted eigenvalue approach that was recently developed for vertex expansion in undirected graphs [OZ22,KLT22,JPV22]. The goal is to develop a new spectral theory for directed graphs and an alternative spectral theory for hypergraphs. The first main result is a Cheeger inequality relating the vertex expansion $\vecψ(G)$ of a directed graph $G$ to the vertex-capacitated maximum reweighted second eigenvalue $\vecλ_2^{v*}$: \[ \vecλ_2^{v*} \lesssim \vecψ(G) \lesssim \sqrt{\vecλ_2^{v*} \cdot \log (Δ/\vecλ_2^{v*})}. \] This provides a combinatorial characterization of the fastest mixing time of a directed graph by vertex expansion, and builds a new connection between reweighted eigenvalued, vertex expansion, and fastest mixing time for directed graphs. The second main result is a stronger Cheeger inequality relating the edge conductance $\vecφ(G)$ of a directed graph $G$ to the edge-capacitated maximum reweighted second eigenvalue $\vecλ_2^{e*}$: \[ \vecλ_2^{e*} \lesssim \vecφ(G) \lesssim \sqrt{\vecλ_2^{e*} \cdot \log (1/\vecλ_2^{e*})}. \] This provides a certificate for a directed graph to be an expander and a spectral algorithm to find a sparse cut in a directed graph, playing a similar role as Cheeger's inequality in certifying graph expansion and in the spectral partitioning algorithm for undirected graphs. We also use this reweighted eigenvalue approach to derive the improved Cheeger inequality for directed graphs, and furthermore to derive several Cheeger inequalities for hypergraphs that match and improve the existing results in [Lou15,CLTZ18]. These are supporting results that this provides a unifying approach to lift the spectral theory for undirected graphs to more general settings. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 51 pages, 3 figures

arXiv:2211.03061 [pdf, other]

Improved Target-specific Stance Detection on Social Media Platforms by Delving into Conversation Threads

Authors: Yupeng Li, Haorui He, Shaonan Wang, Francis C. M. Lau, Yunya Song

Abstract: Target-specific stance detection on social media, which aims at classifying a textual data instance such as a post or a comment into a stance class of a target issue, has become an emerging opinion mining paradigm of importance. An example application would be to overcome vaccine hesitancy in combating the coronavirus pandemic. However, existing stance detection strategies rely merely on the indiv… ▽ More Target-specific stance detection on social media, which aims at classifying a textual data instance such as a post or a comment into a stance class of a target issue, has become an emerging opinion mining paradigm of importance. An example application would be to overcome vaccine hesitancy in combating the coronavirus pandemic. However, existing stance detection strategies rely merely on the individual instances which cannot always capture the expressed stance of a given target. In response, we address a new task called conversational stance detection which is to infer the stance towards a given target (e.g., COVID-19 vaccination) when given a data instance and its corresponding conversation thread. To tackle the task, we first propose a benchmarking conversational stance detection (CSD) dataset with annotations of stances and the structures of conversation threads among the instances based on six major social media platforms in Hong Kong. To infer the desired stances from both data instances and conversation threads, we propose a model called Branch-BERT that incorporates contextual information in conversation threads. Extensive experiments on our CSD dataset show that our proposed model outperforms all the baseline models that do not make use of contextual information. Specifically, it improves the F1 score by 10.3% compared with the state-of-the-art method in the SemEval-2016 Task 6 competition. This shows the potential of incorporating rich contextual information on detecting target-specific stances on social media platforms and implies a more practical way to construct future stance detection tasks. △ Less

Submitted 6 November, 2022; originally announced November 2022.

arXiv:2211.00759 [pdf, other]

Online Control of Adaptive Large Neighborhood Search using Deep Reinforcement Learning

Authors: Robbert Reijnen, Yingqian Zhang, Hoong Chuin Lau, Zaharah Bukhsh

Abstract: The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach c… ▽ More The Adaptive Large Neighborhood Search (ALNS) algorithm has shown considerable success in solving combinatorial optimization problems (COPs). Nonetheless, the performance of ALNS relies on the proper configuration of its selection and acceptance parameters, which is known to be a complex and resource-intensive task. To address this, we introduce a Deep Reinforcement Learning (DRL) based approach called DR-ALNS that selects operators, adjusts parameters, and controls the acceptance criterion throughout the search. The proposed method aims to learn, based on the state of the search, to configure ALNS for the next iteration to yield more effective solutions for the given optimization problem. We evaluate the proposed method on an orienteering problem with stochastic weights and time windows, as presented in an IJCAI competition. The results show that our approach outperforms vanilla ALNS, ALNS tuned with Bayesian optimization, and two state-of-the-art DRL approaches that were the winning methods of the competition, achieving this with significantly fewer training observations. Furthermore, we demonstrate several good properties of the proposed DR-ALNS method: it is easily adapted to solve different routing problems, its learned policies perform consistently well across various instance sizes, and these policies can be directly applied to different problem variants. △ Less

Submitted 3 April, 2024; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.04050 [pdf, other]

Multi-Modal Human Authentication Using Silhouettes, Gait and RGB

Authors: Yuxiang Guo, Cheng Peng, Chun Pong Lau, Rama Chellappa

Abstract: Whole-body-based human authentication is a promising approach for remote biometrics scenarios. Current literature focuses on either body recognition based on RGB images or gait recognition based on body shapes and walking patterns; both have their advantages and drawbacks. In this work, we propose Dual-Modal Ensemble (DME), which combines both RGB and silhouette data to achieve more robust perform… ▽ More Whole-body-based human authentication is a promising approach for remote biometrics scenarios. Current literature focuses on either body recognition based on RGB images or gait recognition based on body shapes and walking patterns; both have their advantages and drawbacks. In this work, we propose Dual-Modal Ensemble (DME), which combines both RGB and silhouette data to achieve more robust performances for indoor and outdoor whole-body based recognition. Within DME, we propose GaitPattern, which is inspired by the double helical gait pattern used in traditional gait analysis. The GaitPattern contributes to robust identification performance over a large range of viewing angles. Extensive experimental results on the CASIA-B dataset demonstrate that the proposed method outperforms state-of-the-art recognition systems. We also provide experimental results using the newly collected BRIAR dataset. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2209.05245 [pdf, other]

Continual learning benefits from multiple sleep mechanisms: NREM, REM, and Synaptic Downscaling

Authors: Brian S. Robinson, Clare W. Lau, Alexander New, Shane M. Nichols, Erik C. Johnson, Michael Wolmetz, William G. Coon

Abstract: Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial ad… ▽ More Learning new tasks and skills in succession without losing prior learning (i.e., catastrophic forgetting) is a computational challenge for both artificial and biological neural networks, yet artificial systems struggle to achieve parity with their biological analogues. Mammalian brains employ numerous neural operations in support of continual learning during sleep. These are ripe for artificial adaptation. Here, we investigate how modeling three distinct components of mammalian sleep together affects continual learning in artificial neural networks: (1) a veridical memory replay process observed during non-rapid eye movement (NREM) sleep; (2) a generative memory replay process linked to REM sleep; and (3) a synaptic downscaling process which has been proposed to tune signal-to-noise ratios and support neural upkeep. We find benefits from the inclusion of all three sleep components when evaluating performance on a continual learning CIFAR-100 image classification benchmark. Maximum accuracy improved during training and catastrophic forgetting was reduced during later tasks. While some catastrophic forgetting persisted over the course of network training, higher levels of synaptic downscaling lead to better retention of early tasks and further facilitated the recovery of early task accuracy during subsequent training. One key takeaway is that there is a trade-off at hand when considering the level of synaptic downscaling to use - more aggressive downscaling better protects early tasks, but less downscaling enhances the ability to learn new tasks. Intermediate levels can strike a balance with the highest overall accuracies during training. Overall, our results both provide insight into how to adapt sleep components to enhance artificial continual learning systems and highlight areas for future neuroscientific sleep research to further such systems. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: 9 pages, 12 figures, code available upon reasonable request. Corresponding author: William G. Coon (will.coon@jhuapl.edu)

arXiv:2208.05572 [pdf, other]

doi 10.1109/TVCG.2022.3197560

CreatureShop: Interactive 3D Character Modeling and Texturing from a Single Color Drawing

Authors: Congyi Zhang, Lei Yang, Nenglun Chen, Nicholas Vining, Alla Sheffer, Francis C. M. Lau, Guoping Wang, Wenping Wang

Abstract: Creating 3D shapes from 2D drawings is an important problem with applications in content creation for computer animation and virtual reality. We introduce a new sketch-based system, CreatureShop, that enables amateurs to create high-quality textured 3D character models from 2D drawings with ease and efficiency. CreatureShop takes an input bitmap drawing of a character (such as an animal or other c… ▽ More Creating 3D shapes from 2D drawings is an important problem with applications in content creation for computer animation and virtual reality. We introduce a new sketch-based system, CreatureShop, that enables amateurs to create high-quality textured 3D character models from 2D drawings with ease and efficiency. CreatureShop takes an input bitmap drawing of a character (such as an animal or other creature), depicted from an arbitrary descriptive pose and viewpoint, and creates a 3D shape with plausible geometric details and textures from a small number of user annotations on the 2D drawing. Our key contributions are a novel oblique view modeling method, a set of systematic approaches for producing plausible textures on the invisible or occluded parts of the 3D character (as viewed from the direction of the input drawing), and a user-friendly interactive system. We validate our system and methods by creating numerous 3D characters from various drawings, and compare our results with related works to show the advantages of our method. We perform a user study to evaluate the usability of our system, which demonstrates that our system is a practical and efficient approach to create fully-textured 3D character models for novice users. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: This is the author's version of the article published in IEEE Transactions on Visualization and Computer Graphics, 2022

arXiv:2207.09109 [pdf, other]

Active-Learning-as-a-Service: An Automatic and Efficient MLOps System for Data-Centric AI

Authors: Yizheng Huang, Huaizheng Zhang, Yuanming Li, Chiew Tong Lau, Yang You

Abstract: The success of today's AI applications requires not only model training (Model-centric) but also data engineering (Data-centric). In data-centric AI, active learning (AL) plays a vital role, but current AL tools 1) require users to manually select AL strategies, and 2) can not perform AL tasks efficiently. To this end, this paper presents an automatic and efficient MLOps system for AL, named ALaaS… ▽ More The success of today's AI applications requires not only model training (Model-centric) but also data engineering (Data-centric). In data-centric AI, active learning (AL) plays a vital role, but current AL tools 1) require users to manually select AL strategies, and 2) can not perform AL tasks efficiently. To this end, this paper presents an automatic and efficient MLOps system for AL, named ALaaS (Active-Learning-as-a-Service). Specifically, 1) ALaaS implements an AL agent, including a performance predictor and a workflow controller, to decide the most suitable AL strategies given users' datasets and budgets. We call this a predictive-based successive halving early-stop (PSHEA) procedure. 2) ALaaS adopts a server-client architecture to support an AL pipeline and implements stage-level parallelism for high efficiency. Meanwhile, caching and batching techniques are employed to further accelerate the AL process. In addition to efficiency, ALaaS ensures accessibility with the help of the design philosophy of configuration-as-a-service. Extensive experiments show that ALaaS outperforms all other baselines in terms of latency and throughput. Also, guided by the AL agent, ALaaS can automatically select and run AL strategies for non-expert users under different datasets and budgets. Our code is available at \url{https://github.com/MLSysOps/Active-Learning-as-a-Service}. △ Less

Submitted 5 November, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: 13 pages, 5 figures

arXiv:2205.12667 [pdf, ps, other]

Trilateration-Based Device-Free Sensing: Two Base Stations and One Passive IRS Are Sufficient

Authors: Qipeng Wang, Liang Liu, Shuowen Zhang, Francis C. M. Lau

Abstract: The classic trilateration technique can localize each target based on its distances to three anchors with known coordinates. Usually, this technique requires all the anchors and targets, e.g., the satellites and the mobile phones in Global Navigation Satellite System (GNSS), to actively transmit/receive radio signals such that the delay of the one-way radio signal propagated between each anchor an… ▽ More The classic trilateration technique can localize each target based on its distances to three anchors with known coordinates. Usually, this technique requires all the anchors and targets, e.g., the satellites and the mobile phones in Global Navigation Satellite System (GNSS), to actively transmit/receive radio signals such that the delay of the one-way radio signal propagated between each anchor and each target can be measured. Excitingly, this paper will show that the trilateration technique can be generalized to the scenario where one of the three anchors and all the targets merely reflect the radio signals passively as in radar networks, even if the propagation delay between the passive IRS and the passive targets is difficult to be measured directly, and the data association issue for multi-sensor multi-target tracking arises. Specifically, we consider device-free sensing in a cellular network consisting of two base stations (BSs), one passive intelligent reflecting surface (IRS), and multiple passive targets, to realize integrated sensing and communication (ISAC). The two BSs transmit the orthogonal frequency division multiplexing (OFDM) signals in the downlink and estimate the locations of the targets based on their reflected signals via/not via the IRS. We propose an efficient trilateration-based strategy that can first estimate the distances of each target to the two BSs and the IRS and then localize the targets. Numerical results show that the considered networked sensing architecture with heterogenous anchors can outperform its counterpart with three BSs. △ Less

Submitted 27 May, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: submitted for possible publication

arXiv:2205.08121 [pdf, other]

Design of Joint Source-Channel Codes Based on a Generic Protograph

Authors: Jia Zhan, Francis C. M. Lau

Abstract: In this paper, we propose using a generic protograph to design joint source-channel codes (JSCCs). We present a generalized algorithm, called protograph extrinsic information transfer for JSCC algorithm (PEXIT-JSCC algorithm), for analyzing the channel threshold of the proposed JSCC. We also propose a source generic protograph EXIT (SGP-EXIT) algorithm, which is more appropriate than the generaliz… ▽ More In this paper, we propose using a generic protograph to design joint source-channel codes (JSCCs). We present a generalized algorithm, called protograph extrinsic information transfer for JSCC algorithm (PEXIT-JSCC algorithm), for analyzing the channel threshold of the proposed JSCC. We also propose a source generic protograph EXIT (SGP-EXIT) algorithm, which is more appropriate than the generalized source protograph extrinsic information transfer (GSP-EXIT) algorithm, for evaluating the source threshold of a generic protograph. Moreover, a collaborative optimization method based on the SGP-EXIT and PEXIT-JSCC algorithms is proposed to construct generic-protograph JSCCs with good source and channel thresholds. Finally, we construct generic-protograph JSCCs, analyze their decoding thresholds, and compare their theoretical and error performance with JSCC systems based on optimized double-protographs. Results show that our proposed codes can attain channel thresholds within 1 dB from the Shannon limit and outperform double-protograph-based JSCCs. △ Less

Submitted 18 October, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

Comments: 26 pages, 15 figures, 5 tables

arXiv:2203.10139 [pdf]

AI system for fetal ultrasound in low-resource settings

Authors: Ryan G. Gomes, Bellington Vwalika, Chace Lee, Angelica Willis, Marcin Sieniek, Joan T. Price, Christina Chen, Margaret P. Kasaro, James A. Taylor, Elizabeth M. Stringer, Scott Mayer McKinney, Ntazana Sindano, George E. Dahl, William Goodnight III, Justin Gilmer, Benjamin H. Chi, Charles Lau, Terry Spitz, T Saensuksopa, Kris Liu, Jonny Wong, Rory Pilgrim, Akib Uddin, Greg Corrado, Lily Peng , et al. (4 additional authors not shown)

Abstract: Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to… ▽ More Despite considerable progress in maternal healthcare, maternal and perinatal deaths remain high in low-to-middle income countries. Fetal ultrasound is an important component of antenatal care, but shortage of adequately trained healthcare workers has limited its adoption. We developed and validated an artificial intelligence (AI) system that uses novice-acquired "blind sweep" ultrasound videos to estimate gestational age (GA) and fetal malpresentation. We further addressed obstacles that may be encountered in low-resourced settings. Using a simplified sweep protocol with real-time AI feedback on sweep quality, we have demonstrated the generalization of model performance to minimally trained novice ultrasound operators using low cost ultrasound devices with on-device AI integration. The GA model was non-inferior to standard fetal biometry estimates with as few as two sweeps, and the fetal malpresentation model had high AUC-ROCs across operators and devices. Our AI models have the potential to assist in upleveling the capabilities of lightly trained ultrasound operators in low resource settings. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2203.06168 [pdf, other]

Cheeger Inequalities for Vertex Expansion and Reweighted Eigenvalues

Authors: Tsz Chiu Kwok, Lap Chi Lau, Kam Chuen Tung

Abstract: The classical Cheeger's inequality relates the edge conductance $φ$ of a graph and the second smallest eigenvalue $λ_2$ of the Laplacian matrix. Recently, Olesker-Taylor and Zanetti discovered a Cheeger-type inequality $ψ^2 / \log |V| \lesssim λ_2^* \lesssim ψ$ connecting the vertex expansion $ψ$ of a graph $G=(V,E)$ and the maximum reweighted second smallest eigenvalue $λ_2^*$ of the Laplacian ma… ▽ More The classical Cheeger's inequality relates the edge conductance $φ$ of a graph and the second smallest eigenvalue $λ_2$ of the Laplacian matrix. Recently, Olesker-Taylor and Zanetti discovered a Cheeger-type inequality $ψ^2 / \log |V| \lesssim λ_2^* \lesssim ψ$ connecting the vertex expansion $ψ$ of a graph $G=(V,E)$ and the maximum reweighted second smallest eigenvalue $λ_2^*$ of the Laplacian matrix. In this work, we first improve their result to $ψ^2 / \log d \lesssim λ_2^* \lesssim ψ$ where $d$ is the maximum degree in $G$, which is optimal assuming the small-set expansion conjecture. Also, the improved result holds for weighted vertex expansion, answering an open question by Olesker-Taylor and Zanetti. Building on this connection, we then develop a new spectral theory for vertex expansion. We discover that several interesting generalizations of Cheeger inequalities relating edge conductances and eigenvalues have a close analog in relating vertex expansions and reweighted eigenvalues. These include an analog of Trevisan's result on bipartiteness, an analog of higher order Cheeger's inequality, and an analog of improved Cheeger's inequality. Finally, inspired by this connection, we present negative evidence to the $0/1$-polytope edge expansion conjecture by Mihail and Vazirani. We construct $0/1$-polytopes whose graphs have very poor vertex expansion. This implies that the fastest mixing time to the uniform distribution on the vertices of these $0/1$-polytopes is almost linear in the graph size. This does not provide a counterexample to the conjecture, but this is in contrast with known positive results which proved poly-logarithmic mixing time to the uniform distribution on the vertices of subclasses of $0/1$-polytopes. △ Less

Submitted 19 September, 2022; v1 submitted 11 March, 2022; originally announced March 2022.

Comments: 65 pages, 1 figure. Minor changes

arXiv:2201.01485 [pdf, ps, other]

Exploiting Temporal Side Information in Massive IoT Connectivity

Authors: Qipeng Wang, Liang Liu, Shuowen Zhang, Francis C. M. Lau

Abstract: This paper considers the joint device activity detection and channel estimation problem in a massive Internet of Things (IoT) connectivity system, where a large number of IoT devices exist but merely a random subset of them become active for short-packet transmission in each coherence block. In particular, we propose to leverage the temporal correlation in device activity, e.g., a device active in… ▽ More This paper considers the joint device activity detection and channel estimation problem in a massive Internet of Things (IoT) connectivity system, where a large number of IoT devices exist but merely a random subset of them become active for short-packet transmission in each coherence block. In particular, we propose to leverage the temporal correlation in device activity, e.g., a device active in the previous coherence block is more likely to be still active in the current coherence block, to improve the detection and estimation performance. However, it is challenging to utilize this temporal correlation as side information (SI), which relies on the knowledge about the exact statistical relation between the estimated activity pattern for the previous coherence block (which may be imperfect with unknown error) and the true activity pattern in the current coherence block. To tackle this challenge, we establish a novel SI-aided multiple measurement vector approximate message passing (MMV-AMP) framework. Specifically, thanks to the state evolution of the MMV-AMP algorithm, the correlation between the activity pattern estimated by the MMV-AMP algorithm in the previous coherence block and the real activity pattern in the current coherence block is quantified explicitly. Based on the well-defined temporal correlation, we further manage to embed this useful SI into the denoiser design under the MMV-AMP framework. Specifically, the SI-based soft-thresholding denoisers with binary thresholds and the SI-based minimum mean-squared error (MMSE) denoisers are characterized for the cases without and with the knowledge of the channel distribution, respectively. Numerical results are given to show the significant gain in device activity detection and channel estimation performance brought by our proposed SI-aided MMV-AMP framework. △ Less

Submitted 5 January, 2022; originally announced January 2022.

Comments: submitted for possible IEEE journal publication

arXiv:2112.08557 [pdf, ps, other]

Protograph Bit-Interleaved Coded Modulation: A Bandwidth-Efficient Design Paradigm for 6G Wireless Communications

Authors: Yi Fang, Pingping Chen, Yong Liang Guan, Francis C. M. Lau, Yonghui Li, Guanrong Chen

Abstract: Bit-interleaved coded modulation (BICM) has attracted considerable attention from the research community in the past three decades, because it can achieve desirable error performance with relatively low implementation complexity for a large number of communication and storage systems. By exploiting the iterative demapping and decoding (ID), the BICM is able to approach capacity limits of coded mod… ▽ More Bit-interleaved coded modulation (BICM) has attracted considerable attention from the research community in the past three decades, because it can achieve desirable error performance with relatively low implementation complexity for a large number of communication and storage systems. By exploiting the iterative demapping and decoding (ID), the BICM is able to approach capacity limits of coded modulation over various channels. In recent years, protograph low-density parity-check (PLDPC) codes and their spatially-coupled (SC) variants have emerged to be a pragmatic forward-error-correction (FEC) solution for BICM systems due to their tremendous error-correction capability and simple structures, and found widespread applications such as deep-space communication, satellite communication, wireless communication, optical communication, and data storage. This article offers a comprehensive survey on the state-of-the-art development of PLDPC-BICM and its innovative SC variants over a variety of channel models, e.g., additive white Gaussian noise (AWGN) channels, fading channels, Poisson pulse position modulation (PPM) channels, and flash-memory channels. Of particular interest is code construction, constellation shaping, as well as bit-mapper design, where the receiver is formulated as a serially-concatenated decoding framework consisting of a soft-decision demapper and a belief-propagation decoder. Finally, several promising research directions are discussed, which have not been adequately addressed in the current literature. △ Less

Submitted 27 October, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

arXiv:2112.06323 [pdf, other]

doi 10.1109/TPAMI.2023.3286772

Interpolated Joint Space Adversarial Training for Robust and Generalizable Defenses

Authors: Chun Pong Lau, Jiang Liu, Hossein Souri, Wei-An Lin, Soheil Feizi, Rama Chellappa

Abstract: Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks. However, models trained with AT sacrifice standard accuracy and do not generalize well to novel attacks. Recent works show generalization improvement with adversarial samples under novel threat models such as on-manifold threat model or neural perceptual threat model. However, the former re… ▽ More Adversarial training (AT) is considered to be one of the most reliable defenses against adversarial attacks. However, models trained with AT sacrifice standard accuracy and do not generalize well to novel attacks. Recent works show generalization improvement with adversarial samples under novel threat models such as on-manifold threat model or neural perceptual threat model. However, the former requires exact manifold information while the latter requires algorithm relaxation. Motivated by these considerations, we exploit the underlying manifold information with Normalizing Flow, ensuring that exact manifold assumption holds. Moreover, we propose a novel threat model called Joint Space Threat Model (JSTM), which can serve as a special case of the neural perceptual threat model that does not require additional relaxation to craft the corresponding adversarial attacks. Under JSTM, we develop novel adversarial attacks and defenses. The mixup strategy improves the standard accuracy of neural networks but sacrifices robustness when combined with AT. To tackle this issue, we propose the Robust Mixup strategy in which we maximize the adversity of the interpolated images and gain robustness and prevent overfitting. Our experiments show that Interpolated Joint Space Adversarial Training (IJSAT) achieves good performance in standard accuracy, robustness, and generalization in CIFAR-10/100, OM-ImageNet, and CIFAR-10-C datasets. IJSAT is also flexible and can be used as a data augmentation method to improve standard accuracy and combine with many existing AT approaches to improve robustness. △ Less

Submitted 12 December, 2021; originally announced December 2021.

Comments: Under submission

arXiv:2112.05005 [pdf, other]

doi 10.1109/TIFS.2022.3184262

Mutual Adversarial Training: Learning together is better than going alone

Authors: Jiang Liu, Chun Pong Lau, Hossein Souri, Soheil Feizi, Rama Chellappa

Abstract: Recent studies have shown that robustness to adversarial attacks can be transferred across networks. In other words, we can make a weak model more robust with the help of a strong teacher model. We ask if instead of learning from a static teacher, can models "learn together" and "teach each other" to achieve better robustness? In this paper, we study how interactions among models affect robustness… ▽ More Recent studies have shown that robustness to adversarial attacks can be transferred across networks. In other words, we can make a weak model more robust with the help of a strong teacher model. We ask if instead of learning from a static teacher, can models "learn together" and "teach each other" to achieve better robustness? In this paper, we study how interactions among models affect robustness via knowledge distillation. We propose mutual adversarial training (MAT), in which multiple models are trained together and share the knowledge of adversarial examples to achieve improved robustness. MAT allows robust models to explore a larger space of adversarial samples, and find more robust feature spaces and decision boundaries. Through extensive experiments on CIFAR-10 and CIFAR-100, we demonstrate that MAT can effectively improve model robustness and outperform state-of-the-art methods under white-box attacks, bringing $\sim$8% accuracy gain to vanilla adversarial training (AT) under PGD-100 attacks. In addition, we show that MAT can also mitigate the robustness trade-off among different perturbation types, bringing as much as 13.1% accuracy gain to AT baselines against the union of $l_\infty$, $l_2$ and $l_1$ attacks. These results show the superiority of the proposed method and demonstrate that collaborative learning is an effective strategy for designing robust models. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Under submission

arXiv:2112.04532 [pdf, other]

Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection

Authors: Jiang Liu, Alexander Levine, Chun Pong Lau, Rama Chellappa, Soheil Feizi

Abstract: Object detection plays a key role in many security-critical systems. Adversarial patch attacks, which are easy to implement in the physical world, pose a serious threat to state-of-the-art object detectors. Developing reliable defenses for object detectors against patch attacks is critical but severely understudied. In this paper, we propose Segment and Complete defense (SAC), a general framework… ▽ More Object detection plays a key role in many security-critical systems. Adversarial patch attacks, which are easy to implement in the physical world, pose a serious threat to state-of-the-art object detectors. Developing reliable defenses for object detectors against patch attacks is critical but severely understudied. In this paper, we propose Segment and Complete defense (SAC), a general framework for defending object detectors against patch attacks through detection and removal of adversarial patches. We first train a patch segmenter that outputs patch masks which provide pixel-level localization of adversarial patches. We then propose a self adversarial training algorithm to robustify the patch segmenter. In addition, we design a robust shape completion algorithm, which is guaranteed to remove the entire patch from the images if the outputs of the patch segmenter are within a certain Hamming distance of the ground-truth patch masks. Our experiments on COCO and xView datasets demonstrate that SAC achieves superior robustness even under strong adaptive attacks with no reduction in performance on clean images, and generalizes well to unseen patch shapes, attack budgets, and unseen attack methods. Furthermore, we present the APRICOT-Mask dataset, which augments the APRICOT dataset with pixel-level annotations of adversarial patches. We show SAC can significantly reduce the targeted attack success rate of physical patch attacks. Our code is available at https://github.com/joellliu/SegmentAndComplete. △ Less

Submitted 2 May, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: CVPR 2022 camera ready

arXiv:2111.06726 [pdf, other]

doi 10.1016/j.knosys.2021.107683

One model Packs Thousands of Items with Recurrent Conditional Query Learning

Authors: Dongda Li, Zhaoquan Gu, Yuexuan Wang, Changwei Ren, Francis C. M. Lau

Abstract: Recent studies have revealed that neural combinatorial optimization (NCO) has advantages over conventional algorithms in many combinatorial optimization problems such as routing, but it is less efficient for more complicated optimization tasks such as packing which involves mutually conditioned action spaces. In this paper, we propose a Recurrent Conditional Query Learning (RCQL) method to solve b… ▽ More Recent studies have revealed that neural combinatorial optimization (NCO) has advantages over conventional algorithms in many combinatorial optimization problems such as routing, but it is less efficient for more complicated optimization tasks such as packing which involves mutually conditioned action spaces. In this paper, we propose a Recurrent Conditional Query Learning (RCQL) method to solve both 2D and 3D packing problems. We first embed states by a recurrent encoder, and then adopt attention with conditional queries from previous actions. The conditional query mechanism fills the information gap between learning steps, which shapes the problem as a Markov decision process. Benefiting from the recurrence, a single RCQL model is capable of handling different sizes of packing problems. Experiment results show that RCQL can effectively learn strong heuristics for offline and online strip packing problems (SPPs), outperforming a wide range of baselines in space utilization ratio. RCQL reduces the average bin gap ratio by 1.83% in offline 2D 40-box cases and 7.84% in 3D cases compared with state-of-the-art methods. Meanwhile, our method also achieves 5.64% higher space utilization ratio for SPPs with 1000 items than the state of the art. △ Less

Submitted 12 November, 2021; originally announced November 2021.

Comments: 16 pages, 5 figures, 3 tables. Accepted to Knowledge-Based Systems, 2022

ACM Class: I.2.6; I.2.8

Journal ref: Knowledge-Based Systems, Volume 235, 2022, 107683, ISSN 0950-7051

arXiv:2110.07906 [pdf, ps, other]

Hardware Architecture of Layered Decoders for PLDPC-Hadamard Codes

Authors: Peng W. Zhang, Francis C. M. Lau, Chiu-W. Sham

Abstract: Protograph-based low-density parity-check Hadamard codes (PLDPC-HCs) are a new type of ultimate-Shannon-limit-approaching codes. In this paper, we propose a hardware architecture for the PLDPC-HC layered decoders. The decoders consist mainly of random address memories, Hadamard sub-decoders and control logics. Two types of pipelined structures are presented and the latency and throughput of these… ▽ More Protograph-based low-density parity-check Hadamard codes (PLDPC-HCs) are a new type of ultimate-Shannon-limit-approaching codes. In this paper, we propose a hardware architecture for the PLDPC-HC layered decoders. The decoders consist mainly of random address memories, Hadamard sub-decoders and control logics. Two types of pipelined structures are presented and the latency and throughput of these two structures are derived. Implementation of the decoder design on an FPGA board shows that a throughput of $1.48$ Gbps is achieved with a bit error rate (BER) of $10^{-5}$ at around $E_b/N_0 = - 0.40$ dB. The decoder can also achieve the same BER at $E_b/N_0 = - 1.14$ dB with a reduced throughput of $0.20$ Gbps. △ Less

Submitted 19 August, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: The paper has been accepted to IEEE Trans. on Circuits on Systems I

arXiv:2110.06802 [pdf, other]

Identification of Attack-Specific Signatures in Adversarial Examples

Authors: Hossein Souri, Pirazh Khorramshahi, Chun Pong Lau, Micah Goldblum, Rama Chellappa

Abstract: The adversarial attack literature contains a myriad of algorithms for crafting perturbations which yield pathological behavior in neural networks. In many cases, multiple algorithms target the same tasks and even enforce the same constraints. In this work, we show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they q… ▽ More The adversarial attack literature contains a myriad of algorithms for crafting perturbations which yield pathological behavior in neural networks. In many cases, multiple algorithms target the same tasks and even enforce the same constraints. In this work, we show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims. We begin by demonstrating that one can determine the attack algorithm that crafted an adversarial example. Then, we leverage recent advances in parameter-space saliency maps to show, both visually and quantitatively, that adversarial attack algorithms differ in which parts of the network and image they target. Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Showing 1–50 of 118 results for author: Lau, C