Search | arXiv e-print repository

Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear. To address this gap, we established a most comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 39 specific tasks. Our findings reveal that existing foundation models excel at certain task types but struggle to effectively handle the full breadth of clinical tasks. To improve the generalization of pathology foundation models, we propose a unified knowledge distillation framework consisting of both expert and self knowledge distillation, where the former allows the model to learn from the knowledge of multiple expert models, while the latter leverages self-distillation to enable image representation learning via local-global alignment. Based on this framework, a Generalizable Pathology Foundation Model (GPFM) is pretrained on a large-scale dataset consisting of 190 million images from around 86,000 public H\&E whole slides across 34 major tissue types. Evaluated on the established benchmark, GPFM achieves an impressive average rank of 1.36, with 29 tasks ranked 1st, while the the second-best model, UNI, attains an average rank of 2.96, with only 4 tasks ranked 1st. The superior generalization of GPFM demonstrates its exceptional modeling capabilities across a wide range of clinical tasks, positioning it as a new cornerstone for feature representation in CPath. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Report number: I.2.10

arXiv:2407.15362 [pdf, other]

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

Authors: Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

Abstract: Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles whi… ▽ More Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles which respectively offer distinct knowledge for versatile clinical applications. Second, the current progress in pathology FMs predominantly concentrates on the patch level, where the restricted context of patch-level pretraining fails to capture whole-slide patterns. Here we curated the largest multimodal dataset consisting of H\&E diagnostic whole slide images and their associated pathology reports and RNA-Seq data, resulting in 26,169 slide-level modality pairs from 10,275 patients across 32 cancer types. To leverage these data for CPath, we propose a novel whole-slide pretraining paradigm which injects multimodal knowledge at the whole-slide context into the pathology FM, called Multimodal Self-TAught PRetraining (mSTAR). The proposed paradigm revolutionizes the workflow of pretraining for CPath, which enables the pathology FM to acquire the whole-slide context. To our knowledge, this is the first attempt to incorporate multimodal knowledge at the slide level for enhancing pathology FMs, expanding the modelling context from unimodal to multimodal knowledge and from patch-level to slide-level. To systematically evaluate the capabilities of mSTAR, extensive experiments including slide-level unimodal and multimodal applications, are conducted across 7 diverse types of tasks on 43 subtasks, resulting in the largest spectrum of downstream tasks. The average performance in various slide-level applications consistently demonstrates significant performance enhancements for mSTAR compared to SOTA FMs. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 44 pages, 9 figures

arXiv:2407.14114 [pdf]

A3Rank: Augmentation Alignment Analysis for Prioritizing Overconfident Failing Samples for Deep Learning Models

Authors: Zhengyuan Wei, Haipeng Wang, Qilin Zhou, W. K. Chan

Abstract: Sharpening deep learning models by training them with examples close to the decision boundary is a well-known best practice. Nonetheless, these models are still error-prone in producing predictions. In practice, the inference of the deep learning models in many application systems is guarded by a rejector, such as a confidence-based rejector, to filter out samples with insufficient prediction conf… ▽ More Sharpening deep learning models by training them with examples close to the decision boundary is a well-known best practice. Nonetheless, these models are still error-prone in producing predictions. In practice, the inference of the deep learning models in many application systems is guarded by a rejector, such as a confidence-based rejector, to filter out samples with insufficient prediction confidence. Such confidence-based rejectors cannot effectively guard against failing samples with high confidence. Existing test case prioritization techniques effectively distinguish confusing samples from confident samples to identify failing samples among the confusing ones, yet prioritizing the failing ones high among many confident ones is challenging. In this paper, we propose $A^3$Rank, a novel test case prioritization technique with augmentation alignment analysis, to address this problem. $A^3$Rank generates augmented versions of each test case and assesses the extent of the prediction result for the test case misaligned with these of the augmented versions and vice versa. Our experiment shows that $A^3$Rank can effectively rank failing samples escaping from the checking of confidence-based rejectors, which significantly outperforms the peer techniques by 163.63\% in the detection ratio of top-ranked samples. We also provide a framework to construct a detector devoted to augmenting these rejectors to defend these failing samples, and our detector can achieve a significantly higher defense success rate. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.12867 [pdf, other]

Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run

Authors: Gayathri Raman, Samuele Ronchini, James Delaunay, Aaron Tohuvavohu, Jamie A. Kennea, Tyler Parsotan, Elena Ambrosi, Maria Grazia Bernardini, Sergio Campana, Giancarlo Cusumano, Antonino D'Ai, Paolo D'Avanzo, Valerio D'Elia, Massimiliano De Pasquale, Simone Dichiara, Phil Evans, Dieter Hartmann, Paul Kuin, Andrea Melandri, Paul O'Brien, Julian P. Osborne, Kim Page, David M. Palmer, Boris Sbarufatti, Gianpiero Tagliaferri , et al. (1797 additional authors not shown)

Abstract: We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav… ▽ More We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 50 pages, 10 figures, 4 tables

arXiv:2407.11235 [pdf, other]

Quantum chemistry, classical heuristics, and quantum advantage

Authors: Garnet Kin-Lic Chan

Abstract: We describe the problems of quantum chemistry, the intuition behind classical heuristic methods used to solve them, a conjectured form of the classical complexity of quantum chemistry problems, and the subsequent opportunities for quantum advantage. This article is written for both quantum chemists and quantum information theorists. In particular, we attempt to summarize the domain of quantum chem… ▽ More We describe the problems of quantum chemistry, the intuition behind classical heuristic methods used to solve them, a conjectured form of the classical complexity of quantum chemistry problems, and the subsequent opportunities for quantum advantage. This article is written for both quantum chemists and quantum information theorists. In particular, we attempt to summarize the domain of quantum chemistry problems as well as the chemical intuition that is applied to solve them within concrete statements (such as a classical heuristic cost conjecture and a classification of different avenues for quantum advantage) in the hope that this may stimulate future analysis. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10777 [pdf]

Exploring the Factors of "AI Guilt" Among Students -- Are You Guilty of Using AI in Your Homework?

Authors: Cecilia Ka Yuk Chan

Abstract: This study explores the phenomenon of "AI guilt" among secondary school students, a form of moral discomfort arising from the use of AI tools in academic tasks traditionally performed by humans. Through qualitative methodologies, the research examines the factors contributing to AI guilt, its social and psychological impacts, and its implications for educational practices. The findings revealed th… ▽ More This study explores the phenomenon of "AI guilt" among secondary school students, a form of moral discomfort arising from the use of AI tools in academic tasks traditionally performed by humans. Through qualitative methodologies, the research examines the factors contributing to AI guilt, its social and psychological impacts, and its implications for educational practices. The findings revealed three main dimensions for AI guilt - perceived laziness and authenticity, fear of judgment, and identity and self-efficacy concerns. The findings suggest a need to redefine academic integrity and shift our mindset to reconsider what we should value in education. The study also emphasizes the importance of ethical guidelines and educational support and provides implications to help students navigate the complexities of AI in education, reducing feelings of guilt while enhancing learning outcomes. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.09700 [pdf, other]

Introducing GPU-acceleration into the Python-based Simulations of Chemistry Framework

Authors: Rui Li, Qiming Sun, Xing Zhang, Garnet Kin-Lic Chan

Abstract: We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF. As a core functionality, this provides a GPU implementation of two-electron repulsion integrals (ERIs) for contracted basis sets comprising up to g functions using Rys quadrature. As an illustration of how this can accelerate a quantum chemistry workflow, we describe how to use the ERIs effici… ▽ More We introduce the first version of GPU4PySCF, a module that provides GPU acceleration of methods in PySCF. As a core functionality, this provides a GPU implementation of two-electron repulsion integrals (ERIs) for contracted basis sets comprising up to g functions using Rys quadrature. As an illustration of how this can accelerate a quantum chemistry workflow, we describe how to use the ERIs efficiently in the integral-direct Hartree-Fock Fock build and nuclear gradient construction. Benchmark calculations show a significant speedup of two orders of magnitude with respect to the multi-threaded CPU Hartree-Fock code of PySCF, and performance comparable to other GPU-accelerated quantum chemical packages including GAMESS and QUICK on a single NVIDIA A100 GPU. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07457 [pdf, other]

GLBench: A Comprehensive Benchmark for Graph with Large Language Models

Authors: Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li

Abstract: The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen… ▽ More The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehensive benchmark for evaluating GraphLLM methods in both supervised and zero-shot scenarios. GLBench provides a fair and thorough evaluation of different categories of GraphLLM methods, along with traditional baselines such as graph neural networks. Through extensive experiments on a collection of real-world datasets with consistent data processing and splitting strategies, we have uncovered several key findings. Firstly, GraphLLM methods outperform traditional baselines in supervised settings, with LLM-as-enhancers showing the most robust performance. However, using LLMs as predictors is less effective and often leads to uncontrollable output issues. We also notice that no clear scaling laws exist for current GraphLLM methods. In addition, both structures and semantics are crucial for effective zero-shot transfer, and our proposed simple baseline can even outperform several models tailored for zero-shot scenarios. The data and code of the benchmark can be found at https://github.com/NineAbyss/GLBench. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2306.10280 by other authors

arXiv:2407.06614 [pdf, other]

Implicit Regression in Subspace for High-Sensitivity CEST Imaging

Authors: Chu Chen, Yang Liu, Se Weon Park, Jizhou Li, Kannie W. Y. Chan, Raymond H. F. Chan

Abstract: Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, c… ▽ More Chemical Exchange Saturation Transfer (CEST) MRI demonstrates its capability in significantly enhancing the detection of proteins and metabolites with low concentrations through exchangeable protons. The clinical application of CEST, however, is constrained by its low contrast and low signal-to-noise ratio (SNR) in the acquired data. Denoising, as one of the post-processing stages for CEST data, can effectively improve the accuracy of CEST quantification. In this work, by modeling spatial variant z-spectrums into low-dimensional subspace, we introduce Implicit Regression in Subspace (IRIS), which is an unsupervised denoising algorithm utilizing the excellent property of implicit neural representation for continuous mapping. Experiments conducted on both synthetic and in-vivo data demonstrate that our proposed method surpasses other CEST denoising methods regarding both qualitative and quantitative performance. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.18052 [pdf, other]

Flexible Conformal Highest Predictive Conditional Density Sets

Authors: Max Sampson, Kung-Sik Chan

Abstract: We introduce our method, conformal highest conditional density sets (CHCDS), that forms conformal prediction sets using existing estimated conditional highest density predictive regions. We prove the validity of the method and that conformal adjustment is negligible under some regularity conditions. In particular, if we correctly specify the underlying conditional density estimator, the conformal… ▽ More We introduce our method, conformal highest conditional density sets (CHCDS), that forms conformal prediction sets using existing estimated conditional highest density predictive regions. We prove the validity of the method and that conformal adjustment is negligible under some regularity conditions. In particular, if we correctly specify the underlying conditional density estimator, the conformal adjustment will be negligible. When the underlying model is incorrect, the conformal adjustment provides guaranteed nominal unconditional coverage. We compare the proposed method via simulation and a real data analysis to other existing methods. Our numerical results show that the flexibility of being able to use any existing conditional density estimation method is a large advantage for CHCDS compared to existing methods. △ Less

Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.15737 [pdf, other]

Correlation Functions From Tensor Network Influence Functionals: The Case of the Spin-Boson Model

Authors: Haimi Nguyen, Nathan Ng, Lachlan P. Lindoy, Gunhee Park, Andrew J. Millis, Garnet Kin-Lic Chan, David R. Reichman

Abstract: We investigate the application of matrix product state (MPS) representations of the influence functionals (IF) for the calculation of real-time equilibrium correlation functions in open quantum systems. Focusing specifically on the unbiased spin-boson model, we explore the use of IF-MPSs for complex time propagation, as well as IF-MPSs for constructing correlation functions in the steady state. We… ▽ More We investigate the application of matrix product state (MPS) representations of the influence functionals (IF) for the calculation of real-time equilibrium correlation functions in open quantum systems. Focusing specifically on the unbiased spin-boson model, we explore the use of IF-MPSs for complex time propagation, as well as IF-MPSs for constructing correlation functions in the steady state. We examine three different IF approaches: one based on the Kadanoff-Baym contour targeting correlation functions at all times, one based on a complex contour targeting the correlation function at a single time, and a steady state formulation which avoids imaginary or complex times, while providing access to correlation functions at all times. We show that within the IF language, the steady state formulation provides a powerful approach to evaluate equilibrium correlation functions. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.14377 [pdf, other]

Computation-Efficient Semi-Supervised Learning for ECG-based Cardiovascular Diseases Detection

Authors: Rushuang Zhou, Zijun Liu, Lei Clifton, David A. Clifton, Kannie W. Y. Chan, Yuan-Ting Zhang, Yining Dong

Abstract: Label scarcity problem is the main challenge that hinders the wide application of deep learning systems in automatic cardiovascular diseases (CVDs) detection using electrocardiography (ECG). Tuning pre-trained models alleviates this problem by transferring knowledge learned from large datasets to downstream small datasets. However, bottlenecks in computational efficiency and CVDs detection perform… ▽ More Label scarcity problem is the main challenge that hinders the wide application of deep learning systems in automatic cardiovascular diseases (CVDs) detection using electrocardiography (ECG). Tuning pre-trained models alleviates this problem by transferring knowledge learned from large datasets to downstream small datasets. However, bottlenecks in computational efficiency and CVDs detection performance limit its clinical applications. It is difficult to improve the detection performance without significantly sacrificing model computational efficiency. Here, we propose a computation-efficient semi-supervised learning paradigm (FastECG) for robust and computation-efficient CVDs detection using ECG. It enables a robust adaptation of pre-trained models on downstream datasets with limited supervision and high computational efficiency. First, a random-deactivation technique is developed to achieve robust and fast low-rank adaptation of pre-trained weights. Subsequently, we propose a one-shot rank allocation module to determine the optimal ranks for the update matrices of the pre-trained weights. Finally, a lightweight semi-supervised learning pipeline is introduced to enhance model performance by leveraging labeled and unlabeled data with high computational efficiency. Extensive experiments on four downstream ECG datasets demonstrate that FastECG not only outperforms the state-of-the-art methods in multi-label CVDs detection but also consumes fewer GPU footprints, training time, and parameter storage space. As such, this paradigm provides an effective solution for achieving high computational efficiency and robust detection performance in the clinical applications of pre-trained models under limited supervision. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13546 [pdf, ps, other]

Dual of the Geometric Lemma and the Second Adjointness Theorem for $p$-adic reductive groups

Authors: Kei Yuen Chan

Abstract: Let $P,Q$ be standard parabolic subgroups of a $p$-adic reductive group $G$. We study the smooth dual of the filtration on a parabolically induced module arising from the geometric lemma associated to the cosets $P\setminus G/Q$. We prove that the dual filtration coincides with the filtration associated to the cosets $P\setminus G/Q^-$ via the Bernstein-Casselman canonical pairing from the second… ▽ More Let $P,Q$ be standard parabolic subgroups of a $p$-adic reductive group $G$. We study the smooth dual of the filtration on a parabolically induced module arising from the geometric lemma associated to the cosets $P\setminus G/Q$. We prove that the dual filtration coincides with the filtration associated to the cosets $P\setminus G/Q^-$ via the Bernstein-Casselman canonical pairing from the second adjointness of parabolic induction. This result generalizes a result of Bezrukavnikov-Kazhdan on the explicit description in the second adjointness. Along the way, we also study some group theoretic results. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 20 pages, comments welcome

arXiv:2406.12133 [pdf, other]

Mott physics and universal Planckian relaxation in the high-Tc cuprates

Authors: A. Shekhter, B. J. Ramshaw, M. K. Chan, N. Harrison

Abstract: Shortly after the discovery of high-temperature superconducting cuprates, Anderson proposed that Mott physics is instrumental in understanding their phase diagrams. Specifically, he suggested that, similar to the 'almost-localized' Fermi liquid in 3He, the effective mass renormalization in the cuprates is characteristic of a doped Mott insulator, scaling inversely with doping p away from half-fill… ▽ More Shortly after the discovery of high-temperature superconducting cuprates, Anderson proposed that Mott physics is instrumental in understanding their phase diagrams. Specifically, he suggested that, similar to the 'almost-localized' Fermi liquid in 3He, the effective mass renormalization in the cuprates is characteristic of a doped Mott insulator, scaling inversely with doping p away from half-filling. However, Mott physics has struggled to account for the 'strange metal' behavior, characterized by a linear-in-temperature (T) 'Planckian' resistivity that extends to very high temperatures, casting doubt on the relevance of Mott physics in the cuprates. Here, we report a comprehensive survey of calorimetry and resistivity data spanning broad doping and temperature ranges. We find that the entropy at high temperatures closely adheres to that of an almost-localized Fermi liquid, implying that Mott physics remains relevant at high energies. We find that the strong doping dependence of the coefficient of the T-linear resistivity at high temperatures also scales inversely with p, suggesting a true universality of the Planckian relaxation rate across the entire phase diagram. Thus, the physics of the cuprates over their entire phase diagram is determined by the joint action of Mott physics and Planckian relaxation physics, with each operating at very different energy scales. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.08366 [pdf, other]

Highest Probability Density Conformal Regions

Authors: Max Sampson, Kung-Sik Chan

Abstract: We propose a new method for finding the highest predictive density set or region using signed conformal inference. The proposed method is computationally efficient, while also carrying conformal coverage guarantees. We prove that under, mild regularity conditions, the conformal prediction set is asymptotically close to its oracle counterpart. The efficacy of the method is illustrated through simul… ▽ More We propose a new method for finding the highest predictive density set or region using signed conformal inference. The proposed method is computationally efficient, while also carrying conformal coverage guarantees. We prove that under, mild regularity conditions, the conformal prediction set is asymptotically close to its oracle counterpart. The efficacy of the method is illustrated through simulations and real applications. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.04498 [pdf, other]

Conformal Multi-Target Hyperrectangles

Authors: Max Sampson, Kung-Sik Chan

Abstract: We propose conformal hyperrectangular prediction regions for multi-target regression. We propose split conformal prediction algorithms for both point and quantile regression to form hyperrectangular prediction regions, which allow for easy marginal interpretation and do not require covariance estimation. In practice, it is preferable that a prediction region is balanced, that is, having identical… ▽ More We propose conformal hyperrectangular prediction regions for multi-target regression. We propose split conformal prediction algorithms for both point and quantile regression to form hyperrectangular prediction regions, which allow for easy marginal interpretation and do not require covariance estimation. In practice, it is preferable that a prediction region is balanced, that is, having identical marginal prediction coverage, since prediction accuracy is generally equally important across components of the response vector. The proposed algorithms possess two desirable properties, namely, tight asymptotic overall nominal coverage as well as asymptotic balance, that is, identical asymptotic marginal coverage, under mild conditions. We then compare our methods to some existing methods on both simulated and real data sets. Our simulation results and real data analysis show that our methods outperform existing methods while achieving the desired nominal coverage and good balance between dimensions. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04331 [pdf, other]

PaCE: Parsimonious Concept Engineering for Large Language Models

Authors: Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan Chan, Darshan Thaker, Aditya Chattopadhyay, Chris Callison-Burch, René Vidal

Abstract: Large Language Models (LLMs) are being used for a wide variety of tasks. While they are capable of generating human-like responses, they can also produce undesirable output including potentially harmful information, racist or sexist language, and hallucinations. Alignment methods are designed to reduce such undesirable output, via techniques such as fine-tuning, prompt engineering, and representat… ▽ More Large Language Models (LLMs) are being used for a wide variety of tasks. While they are capable of generating human-like responses, they can also produce undesirable output including potentially harmful information, racist or sexist language, and hallucinations. Alignment methods are designed to reduce such undesirable output, via techniques such as fine-tuning, prompt engineering, and representation engineering. However, existing methods face several challenges: some require costly fine-tuning for every alignment task; some do not adequately remove undesirable concepts, failing alignment; some remove benign concepts, lowering the linguistic capabilities of LLMs. To address these issues, we propose Parsimonious Concept Engineering (PaCE), a novel activation engineering framework for alignment. First, to sufficiently model the concepts, we construct a large-scale concept dictionary in the activation space, in which each atom corresponds to a semantic concept. Then, given any alignment task, we instruct a concept partitioner to efficiently annotate the concepts as benign or undesirable. Finally, at inference time, we decompose the LLM activations along the concept dictionary via sparse coding, to accurately represent the activation as a linear combination of the benign and undesirable components. By removing the latter ones from the activation, we reorient the behavior of LLMs towards alignment goals. We conduct experiments on tasks such as response detoxification, faithfulness enhancement, and sentiment revising, and show that PaCE achieves state-of-the-art alignment performance while maintaining linguistic capabilities. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 26 pages, 17 figures, 5 tables, dataset and code at https://github.com/peterljq/Parsimonious-Concept-Engineering

arXiv:2406.00337 [pdf, other]

The Odyssey Journey: Hemifacial Spasm Patients' Top-Tier Medical Resource Seeking in China from an Actor-Network Perspective

Authors: Ka I Chan, Yuntao Wang, Siying Hu, Bo Hei, Zhicong Lu, Pei-Luen Patrick Rau, Yuanchun Shi

Abstract: Health information-seeking behaviors are critical for individuals managing illnesses, especially in cases like hemifacial spasm (HFS), a condition familiar to specialists but not to general practitioners and the broader public. The limited awareness of HFS often leads to scarce online resources for self-diagnosis and a heightened risk of misdiagnosis. In China, the imbalance in the doctor-to-patie… ▽ More Health information-seeking behaviors are critical for individuals managing illnesses, especially in cases like hemifacial spasm (HFS), a condition familiar to specialists but not to general practitioners and the broader public. The limited awareness of HFS often leads to scarce online resources for self-diagnosis and a heightened risk of misdiagnosis. In China, the imbalance in the doctor-to-patient ratio and HFS's low incidence exacerbate information and power asymmetries within doctor-patient relationship. While HCI and CSCW research predominantly focuses on more common chronic conditions, our study delves into HFS, aiming to deepen the understanding of HFS patients' health information-seeking journeys in China, as well as exploring how these patients utilize various stakeholders and online resources to overcome asymmetries in the doctor-patient relationship and access top-tier medical resources. Through interviews with three neurosurgeons and 12 HFS patients from both rural and urban areas, and applying Actor-Network Theory, we offer empirical insights into the interactions and workflows within the health information-seeking network. Our analysis identified five strategies HFS patients adopted to access top-tier medical resources. We also propose design opportunities for technology to aid patients in overcoming the challenges encountered during their health information-seeking journey. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.18771 [pdf, other]

Benchmarking the Exponential Ansatz for the Holstein model

Authors: Junjie Yang, Zhi-Hao Cui, Ankit Mahajan, Huanchen Zhai, David R. Reichman, Garnet Kin-Lic Chan

Abstract: Polarons are quasiparticles formed as a result of lattice distortions induced by charge carriers. The single-electron Holstein model captures the fundamentals of single polaron physics. We examine the power of the exponential ansatz for the polaron ground-state wavefunction in its coupled cluster, canonical transformation, and (canonically transformed) perturbative variants across the parameter sp… ▽ More Polarons are quasiparticles formed as a result of lattice distortions induced by charge carriers. The single-electron Holstein model captures the fundamentals of single polaron physics. We examine the power of the exponential ansatz for the polaron ground-state wavefunction in its coupled cluster, canonical transformation, and (canonically transformed) perturbative variants across the parameter space of the Holstein model. Our benchmark serves to guide future developments of polaron wavefunctions beyond the single-electron Holstein model. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 10 pages, 6 figures

arXiv:2405.18709 [pdf, other]

Towards an exact electronic quantum many-body treatment of Kondo correlation in magnetic impurities

Authors: Tianyu Zhu, Linqing Peng, Huanchen Zhai, Zhi-Hao Cui, Garnet Kin-Lic Chan

Abstract: The Kondo effect is a prototypical quantum phenomenon arising from the interaction between localized electrons in a magnetic impurity and itinerant electrons in a metallic host. Although it has served as the testing ground for quantum many-body methods for decades, the precise description of Kondo physics with material specificity remains challenging. Here, we present a systematic ab initio approa… ▽ More The Kondo effect is a prototypical quantum phenomenon arising from the interaction between localized electrons in a magnetic impurity and itinerant electrons in a metallic host. Although it has served as the testing ground for quantum many-body methods for decades, the precise description of Kondo physics with material specificity remains challenging. Here, we present a systematic ab initio approach to converge towards an exact zero-temperature electronic treatment of Kondo correlations. Across a series of 3d transition metals, we extract Kondo temperatures matching the subtle experimental trends, with an accuracy far exceeding that of standard models. We further obtain microscopic insight into the origin of these trends. More broadly, we demonstrate the possibility to start from fully ab initio many-body simulations and push towards the realm of converged predictions. △ Less

Submitted 17 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 11 pages, 3 figures, with supplementary materials

arXiv:2405.12244 [pdf]

Real-Time Go-Around Prediction: A case study of JFK airport

Authors: Ke Liu, Kaijing Ding, Lu Dai, Mark Hansen, Kennis Chan, John Schade

Abstract: In this paper, we employ the long-short-term memory model (LSTM) to predict the real-time go-around probability as an arrival flight is approaching JFK airport and within 10 nm of the landing runway threshold. We further develop methods to examine the causes to go-around occurrences both from a global view and an individual flight perspective. According to our results, in-trail spacing, and simult… ▽ More In this paper, we employ the long-short-term memory model (LSTM) to predict the real-time go-around probability as an arrival flight is approaching JFK airport and within 10 nm of the landing runway threshold. We further develop methods to examine the causes to go-around occurrences both from a global view and an individual flight perspective. According to our results, in-trail spacing, and simultaneous runway operation appear to be the top factors that contribute to overall go-around occurrences. We then integrate these pre-trained models and analyses with real-time data streaming, and finally develop a demo web-based user interface that integrates the different components designed previously into a real-time tool that can eventually be used by flight crews and other line personnel to identify situations in which there is a high risk of a go-around. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: https://www.icrat.org/

Journal ref: International Conference on Research in Air Transportation (ICRAT2024)

arXiv:2405.11089 [pdf, ps, other]

Optimal Update Policy for the Monitoring of Distributed Sources

Authors: Eric Graves, Jake B. Perazzone, Kevin Chan

Abstract: When making decisions in a network, it is important to have up-to-date knowledge of the current state of the system. Obtaining this information, however, comes at a cost. In this paper, we determine the optimal finite-time update policy for monitoring the binary states of remote sources with a reporting rate constraint. We first prove an upper and lower bound of the minimal probability of error be… ▽ More When making decisions in a network, it is important to have up-to-date knowledge of the current state of the system. Obtaining this information, however, comes at a cost. In this paper, we determine the optimal finite-time update policy for monitoring the binary states of remote sources with a reporting rate constraint. We first prove an upper and lower bound of the minimal probability of error before solving the problem analytically. The error probability is defined as the probability that the system performs differently than it would with full system knowledge. More specifically, an error occurs when the destination node incorrectly determines which top-K priority sources are in the ``free'' state. We find that the optimal policy follows a specific ordered 3-stage update pattern. We then provide the optimal transition points for each stage for each source. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted at ISIT 2024

arXiv:2405.09510 [pdf, other]

The Instrumental Variable Model with Categorical Instrument, Treatment and Outcome

Authors: Yilin Song, K. C. Gary Chan, Thomas S. Richardson

Abstract: Instrumental variable models are central to the inference of causal effects in many settings. We consider the instrumental variable model with discrete variables where the instrument (Z), exposure (X) and outcome (Y) take Q, K, and M levels respectively. We assume that the instrument is randomized and that there is no direct effect of Z on Y so that Y(x,z) = Y(x). We first provide a simple charact… ▽ More Instrumental variable models are central to the inference of causal effects in many settings. We consider the instrumental variable model with discrete variables where the instrument (Z), exposure (X) and outcome (Y) take Q, K, and M levels respectively. We assume that the instrument is randomized and that there is no direct effect of Z on Y so that Y(x,z) = Y(x). We first provide a simple characterization of the set of joint distributions of the potential outcomes P(Y(x=1), ..., Y(x=K)) compatible with a given observed distribution P(X, Y | Z). We then discuss the variation (in)dependence property of the marginal probability distribution of the potential outcomes P(Y(x=1)), ..., P(Y(x=K)) which has direct implications for partial identification of average causal effect contrasts such as E[Y(x=i) - Y(x=j)]. We also include simulation results on the volume of the observed distributions not compatible with the IV model as K and Q change. △ Less

Submitted 27 June, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.07675 [pdf]

Super-concentrated alkali hydroxide electrolytes for rechargeable Zn batteries

Authors: Yilin Ma, Jiajia Huang, Shengyong Gao, iangyu Li, Zhibin Yi, Diwen Xiao, Cheuk Kai Kevin Chan, Ding Pan, Qing Chen

Abstract: Rechargeable Zn batteries offer safe, inexpensive energy storage, but when deeply discharged to compete with lithium-ion batteries, they are plagued by parasitic reactions at the Zn anodes. We apply super-concentrated alkaline electrolytes to suppress two key parasitic reactions, hydrogen evolution and ZnO passivation. An electrolyte with 15 M KOH displays a broad electrochemical window (>2.5 V on… ▽ More Rechargeable Zn batteries offer safe, inexpensive energy storage, but when deeply discharged to compete with lithium-ion batteries, they are plagued by parasitic reactions at the Zn anodes. We apply super-concentrated alkaline electrolytes to suppress two key parasitic reactions, hydrogen evolution and ZnO passivation. An electrolyte with 15 M KOH displays a broad electrochemical window (>2.5 V on Au), a high ZnO solubility (>1.5 M), and an exceptionally high ionic conductivity (>0.27 S/cm at 25 C). Spectroscopies and ab-initio molecular dynamics simulation suggest K+-OH- pairs and a tightened water network to underpin the stability. The simulation further reveals unique triggered proton hopping that offsets the lack of water wires to sustain the conductivity. Low hydrogen evolution, confirmed via online mass spectroscopy, and slow passivation enable a NiOOH||Zn battery to deliver a cumulative capacity of 8.4 Ah cm-2 and a Zn-air battery to last for over 110 hours. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.07668 [pdf, other]

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

Authors: Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, W. K. Chan

Abstract: Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, ex… ▽ More Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 23 pages, 2 figures, accepted by FSE 2024 (The ACM International Conference on the Foundations of Software Engineering)

arXiv:2405.05711 [pdf, ps, other]

Zeta-functions of Curves over Finite Fields

Authors: Kin Wai Chan

Abstract: Curves over finite fields are of great importance in cryptography and coding theory. Through studying their zeta-functions, we would be able to find out vital arithmetic and geometric information about them and their Jacobians, including the number of rational points on this kind of curves. In this paper, I investigate if it is possible to construct a curve over finite fields of a given genus $g$… ▽ More Curves over finite fields are of great importance in cryptography and coding theory. Through studying their zeta-functions, we would be able to find out vital arithmetic and geometric information about them and their Jacobians, including the number of rational points on this kind of curves. In this paper, I investigate if it is possible to construct a curve over finite fields of a given genus $g$ whose zeta-function is given as a product of zeta-functions of $g$ elliptic curves, and find out alternative methods if it is not possible. Basically, I look for conditions which those $g$ elliptic curves should satisfy such that their product (of their Jacobians) is isogenous to the Jacobian of a curve of a given genus $g$. Then from this isogenous relationship I can determine the characteristic polynomial of the Frobenius endomorphism of the Jacobian of the new curve and by this characteristic polynomial I can thus determine the zeta-function of this new curve. By using the zeta-functions of curves in the form as generating functions, the number of rational points on curves can even be found out, which may lead to further researches relating to some applications in cryptography, coding theory and even information theory. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 13 pages

arXiv:2405.03797 [pdf, other]

Tensor Network Computations That Capture Strict Variationality, Volume Law Behavior, and the Efficient Representation of Neural Network States

Authors: Wen-Yuan Liu, Si-Jing Du, Ruojing Peng, Johnnie Gray, Garnet Kin-Lic Chan

Abstract: We introduce a change of perspective on tensor network states that is defined by the computational graph of the contraction of an amplitude. The resulting class of states, which we refer to as tensor network functions, inherit the conceptual advantages of tensor network states while removing computational restrictions arising from the need to converge approximate contractions. We use tensor networ… ▽ More We introduce a change of perspective on tensor network states that is defined by the computational graph of the contraction of an amplitude. The resulting class of states, which we refer to as tensor network functions, inherit the conceptual advantages of tensor network states while removing computational restrictions arising from the need to converge approximate contractions. We use tensor network functions to compute strict variational estimates of the energy on loopy graphs, analyze their expressive power for ground-states, show that we can capture aspects of volume law time evolution, and provide a mapping of general feed-forward neural nets onto efficient tensor network functions. Our work expands the realm of computable tensor networks to ones where accurate contraction methods are not available, and opens up new avenues to use tensor networks. △ Less

Submitted 21 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 4+9 pages, 13 figures

arXiv:2405.03636 [pdf, other]

Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey

Authors: Joshua C. Zhao, Saurabh Bagchi, Salman Avestimehr, Kevin S. Chan, Somali Chaterji, Dimitris Dimitriadis, Jiacheng Li, Ninghui Li, Arash Nourian, Holger R. Roth

Abstract: Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important pr… ▽ More Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology enabling collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be "reverse engineered" to infer information about the private training data. It has been shown under a wide variety of settings that this premise for privacy does {\em not} hold. In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which FL client privacy can be broken. We dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL. We conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Submitted to ACM Computing Surveys

ACM Class: I.2; H.4; I.5

arXiv:2405.02560 [pdf, other]

A Pilot Study on the Comparison of Prefrontal Cortex Activities of Robotic Therapies on Elderly with Mild Cognitive Impairment

Authors: King Tai Henry Au-Yeung, William Wai Lam Chan, Kwan Yin Brian Chan, Hongjie Jiang, Junpei Zhong

Abstract: Demographic shifts have led to an increase in mild cognitive impairment (MCI), and this study investigates the effects of cognitive training (CT) and reminiscence therapy (RT) conducted by humans or socially assistive robots (SARs) on prefrontal cortex activation in elderly individuals with MCI, aiming to determine the most effective therapy-modality combination for promoting cognitive function. T… ▽ More Demographic shifts have led to an increase in mild cognitive impairment (MCI), and this study investigates the effects of cognitive training (CT) and reminiscence therapy (RT) conducted by humans or socially assistive robots (SARs) on prefrontal cortex activation in elderly individuals with MCI, aiming to determine the most effective therapy-modality combination for promoting cognitive function. This pilot study employs a randomized control trial (RCT) design. Additionally, the study explores the efficacy of Reminiscence Therapy (RT) in comparison to Cognitive Training (CT). Eight MCI subjects, with a mean age of 70.125 years, were randomly assigned to ``human-led'' or ``SAR-led'' groups. Utilizing Functional Near-infrared Spectroscopy (fNIRS) to measure oxy-hemoglobin concentration changes in the dorsolateral prefrontal cortex (DLPFC), the study found no significant differences in the effects of human-led and SAR-led cognitive training on DLPFC activation. However, distinct patterns emerged in memory encoding and retrieval phases between RT and CT, shedding light on the impacts of these interventions on brain activation in the context of MCI. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: submitted to IEEE on affective computing

arXiv:2405.01824 [pdf, other]

Creation of Novel Soft Robot Designs using Generative AI

Authors: Wee Kiat Chan, PengWei Wang, Raye Chen-Hua Yeow

Abstract: Soft robotics has emerged as a promising field with the potential to revolutionize industries such as healthcare and manufacturing. However, designing effective soft robots presents challenges, particularly in managing the complex interplay of material properties, structural design, and control strategies. Traditional design methods are often time-consuming and may not yield optimal designs. In th… ▽ More Soft robotics has emerged as a promising field with the potential to revolutionize industries such as healthcare and manufacturing. However, designing effective soft robots presents challenges, particularly in managing the complex interplay of material properties, structural design, and control strategies. Traditional design methods are often time-consuming and may not yield optimal designs. In this paper, we explore the use of generative AI to create 3D models of soft actuators. We create a dataset of over 70 text-shape pairings of soft pneumatic robot actuator designs, and adapt a latent diffusion model (SDFusion) to learn the data distribution and generate novel designs from it. By employing transfer learning and data augmentation techniques, we significantly improve the performance of the diffusion model. These findings highlight the potential of generative AI in designing complex soft robotic systems, paving the way for future advancements in the field. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.01356 [pdf, other]

Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

Authors: Kelvin C. K. Chan, Yang Zhao, Xuhui Jia, Ming-Hsuan Yang, Huisheng Wang

Abstract: In subject-driven text-to-image synthesis, the synthesis process tends to be heavily influenced by the reference images provided by users, often overlooking crucial attributes detailed in the text prompt. In this work, we propose Subject-Agnostic Guidance (SAG), a simple yet effective solution to remedy the problem. We show that through constructing a subject-agnostic condition and applying our pr… ▽ More In subject-driven text-to-image synthesis, the synthesis process tends to be heavily influenced by the reference images provided by users, often overlooking crucial attributes detailed in the text prompt. In this work, we propose Subject-Agnostic Guidance (SAG), a simple yet effective solution to remedy the problem. We show that through constructing a subject-agnostic condition and applying our proposed dual classifier-free guidance, one could obtain outputs consistent with both the given subject and input text prompts. We validate the efficacy of our approach through both optimization-based and encoder-based methods. Additionally, we demonstrate its applicability in second-order customization methods, where an encoder-based model is fine-tuned with DreamBooth. Our approach is conceptually simple and requires only minimal code modifications, but leads to substantial quality improvements, as evidenced by our evaluations and user studies. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: Accepted to CVPR 2024

arXiv:2405.00761 [pdf, ps, other]

Survey on the Canonical Metrics on the Teichmüller Spaces and the Moduli Spaces of Riemann Surfaces

Authors: Kin Wai Chan

Abstract: This thesis results from an intensive study on the canonical metrics on the Teichmüller spaces and the moduli spaces of Riemann surfaces. There are several renowned classical metrics on $T_g$ and $\mathcal{M}_g$, including the Weil-Petersson metric, the Teichmüller metric, the Kobayashi metric, the Bergman metric, the Carathéodory metric and the Kähler-Einstein metric. The Teichmüller metric, the… ▽ More This thesis results from an intensive study on the canonical metrics on the Teichmüller spaces and the moduli spaces of Riemann surfaces. There are several renowned classical metrics on $T_g$ and $\mathcal{M}_g$, including the Weil-Petersson metric, the Teichmüller metric, the Kobayashi metric, the Bergman metric, the Carathéodory metric and the Kähler-Einstein metric. The Teichmüller metric, the Kobayashi metric and the Carathéodory metric are only (complete) Finsler metrics, but they are effective tools in the study of hyperbolic property of $\mathcal{M}_g$. The Weil-Petersson metric is an incomplete Kähler metric, while the Bergman metric and the Kähler-Einstein metric are complete Kähler metrics. However, McMullen introduced a new complete Kähler metric, called the McMullen metric, by perturbing the Weil-Petersson metric. This metric is indeed equivalent to the Teichmüller metric. Recently, Liu-Sun-Yau proved that the equivalence of the Kähler-Einstein metric to the Teichmüller metric, and hence gave a positive answer to a conjecture proposed by Yau. Their approach in the proof is to introduce two new complete Kähler metrics, namely, the Ricci metric and the perturbed Ricci metric, and then establish the equivalence of the Ricci metric to the Kähler-Einstein metric and the equivalence of the Ricci metric to the McMullen metric. The main purpose of this thesis is to survey the properties of these various metrics and the geometry of $T_g$ and $\mathcal{M}_g$ induced by these metrics. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 108 pages. arXiv admin note: substantial text overlap with arXiv:math/0403068, arXiv:math/0411247, arXiv:0912.5471 by other authors

arXiv:2404.12508 [pdf, ps, other]

Convergence Analysis of the Stochastic Resolution of Identity: Comparing Hutchinson to Hutch++ for the Second-Order Green's Function

Authors: Leopoldo Mejía, Sandeep Sharma, Roi Baer, Garnet Kin-Lic Chan, Eran Rabani

Abstract: Stochastic orbital techniques offer reduced computational scaling and memory requirements to describe ground and excited states at the cost of introducing controlled statistical errors. Such techniques often rely on two basic operations, stochastic trace estimation and stochastic resolution of identity, both of which lead to statistical errors that scale with the number of stochastic realizations… ▽ More Stochastic orbital techniques offer reduced computational scaling and memory requirements to describe ground and excited states at the cost of introducing controlled statistical errors. Such techniques often rely on two basic operations, stochastic trace estimation and stochastic resolution of identity, both of which lead to statistical errors that scale with the number of stochastic realizations ($N_ξ$) as $\sqrt{N_ξ^{-1}}$. Reducing the statistical errors without significantly increasing $N_ξ$ has been challenging and is central to the development of efficient and accurate stochastic algorithms. In this work, we build upon recent progress made to improve stochastic trace estimation based on the ubiquitous Hutchinson's algorithm and propose a two-step approach for the stochastic resolution of identity, in the spirit of the Hutch++ method. Our approach is based on employing a randomized low-rank approximation followed by a residual calculation, resulting in statistical errors that scale much better than $\sqrt{N_ξ^{-1}}$. We implement the approach within the second-order Born approximation for the self-energy in the computation of neutral excitations and discuss three different low-rank approximations for the two-body Coulomb integrals. Tests on a series of hydrogen dimer chains with varying lengths demonstrate that the Hutch++-like approximations are computationally more efficient than both deterministic and purely stochastic (Hutchinson) approaches for low error thresholds and intermediate system sizes. Notably, for arbitrarily large systems, the Hutchinson-like approximation outperforms both deterministic and Hutch++-like methods. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12489 [pdf, other]

Grammatical Error Correction for Code-Switched Sentences by Learners of English

Authors: Kelvin Wey Han Chan, Christopher Bryant, Li Nguyen, Andrew Caines, Zheng Yuan

Abstract: Code-switching (CSW) is a common phenomenon among multilingual speakers where multiple languages are used in a single discourse or utterance. Mixed language utterances may still contain grammatical errors however, yet most existing Grammar Error Correction (GEC) systems have been trained on monolingual data and not developed with CSW in mind. In this work, we conduct the first exploration into the… ▽ More Code-switching (CSW) is a common phenomenon among multilingual speakers where multiple languages are used in a single discourse or utterance. Mixed language utterances may still contain grammatical errors however, yet most existing Grammar Error Correction (GEC) systems have been trained on monolingual data and not developed with CSW in mind. In this work, we conduct the first exploration into the use of GEC systems on CSW text. Through this exploration, we propose a novel method of generating synthetic CSW GEC datasets by translating different spans of text within existing GEC corpora. We then investigate different methods of selecting these spans based on CSW ratio, switch-point factor and linguistic constraints, and identify how they affect the performance of GEC systems on CSW text. Our best model achieves an average increase of 1.57 $F_{0.5}$ across 3 CSW test sets (English-Chinese, English-Korean and English-Japanese) without affecting the model's performance on a monolingual dataset. We furthermore discovered that models trained on one CSW language generalise relatively well to other typologically similar CSW languages. △ Less

Submitted 6 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11475 [pdf, other]

AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adapters

Authors: Hao-Wei Chen, Yu-Syuan Xu, Kelvin C. K. Chan, Hsien-Kai Kuo, Chun-Yi Lee, Ming-Hsuan Yang

Abstract: Existing image restoration approaches typically employ extensive networks specifically trained for designated degradations. Despite being effective, such methods inevitably entail considerable storage costs and computational overheads due to the reliance on task-specific networks. In this work, we go beyond this well-established framework and exploit the inherent commonalities among image restorat… ▽ More Existing image restoration approaches typically employ extensive networks specifically trained for designated degradations. Despite being effective, such methods inevitably entail considerable storage costs and computational overheads due to the reliance on task-specific networks. In this work, we go beyond this well-established framework and exploit the inherent commonalities among image restoration tasks. The primary objective is to identify components that are shareable across restoration tasks and augment the shared components with modules specifically trained for individual tasks. Towards this goal, we propose AdaIR, a novel framework that enables low storage cost and efficient training without sacrificing performance. Specifically, a generic restoration network is first constructed through self-supervised pre-training using synthetic degradations. Subsequent to the pre-training phase, adapters are trained to adapt the pre-trained network to specific degradations. AdaIR requires solely the training of lightweight, task-specific modules, ensuring a more efficient storage and training regimen. We have conducted extensive experiments to validate the effectiveness of AdaIR and analyze the influence of the pre-training strategy on discovering shareable components. Extensive experimental results show that AdaIR achieves outstanding results on multi-task restoration while utilizing significantly fewer parameters (1.9 MB) and less training time (7 hours) for each restoration task. The source codes and trained models will be released. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.03129 [pdf, other]

doi 10.1063/5.0212274

Performant Automatic Differentiation of Local Coupled Cluster Theories: Response Properties and Ab Initio Molecular Dynamics

Authors: Xing Zhang, Chenghan Li, Hong-Zhou Ye, Timothy C. Berkelbach, Garnet Kin-Lic Chan

Abstract: In this work, we introduce a differentiable implementation of the local natural orbital coupled cluster (LNOCC) method within the automatic differentiation framework of the PySCFAD package. The implementation is comprehensively tuned for enhanced performance, which enables the calculation of first-order static response properties on medium-sized molecular systems using coupled cluster theory with… ▽ More In this work, we introduce a differentiable implementation of the local natural orbital coupled cluster (LNOCC) method within the automatic differentiation framework of the PySCFAD package. The implementation is comprehensively tuned for enhanced performance, which enables the calculation of first-order static response properties on medium-sized molecular systems using coupled cluster theory with single, double, and perturbative triple excitations [CCSD(T)]. We evaluate the accuracy of our method by benchmarking it against the canonical CCSD(T) reference for nuclear gradients, dipole moments, and geometry optimizations. In addition, we demonstrate the possibility of property calculations for chemically interesting systems through the computation of bond orders and Mössbauer spectroscopy parameters for a [NiFe]-hydrogenase active site model, along with the simulation of infrared (IR) spectra via ab initio LNO-CC molecular dynamics for a protonated water hexamer. △ Less

Submitted 2 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.12999 [pdf]

Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control

Authors: On Tai Wu, Frodo Kin Sun Chan, Zunhao Zhang, Yan Nei Law, Benny Drescher, Edmond Shiao Bun Lai

Abstract: Few-shot prompting and step-by-step reasoning have enhanced the capabilities of Large Language Models (LLMs) in tackling complex tasks including code generation. In this paper, we introduce a prompt selection and augmentation algorithm aimed at improving mathematical reasoning and robot arm operations. Our approach incorporates a multi-stage example augmentation scheme combined with an example sel… ▽ More Few-shot prompting and step-by-step reasoning have enhanced the capabilities of Large Language Models (LLMs) in tackling complex tasks including code generation. In this paper, we introduce a prompt selection and augmentation algorithm aimed at improving mathematical reasoning and robot arm operations. Our approach incorporates a multi-stage example augmentation scheme combined with an example selection scheme. This algorithm improves LLM performance by selecting a set of examples that increase diversity, minimize redundancy, and increase relevance to the question. When combined with the Program-of-Thought prompting, our algorithm demonstrates an improvement in performance on the GSM8K and SVAMP benchmarks, with increases of 0.3% and 1.1% respectively. Furthermore, in simulated tabletop environments, our algorithm surpasses the Code-as-Policies approach by achieving a 3.4% increase in successful task completions and a decrease of over 70% in the number of examples used. Its ability to discard examples that contribute little to solving the problem reduces the inferencing time of an LLM-powered robotics system. This algorithm also offers important benefits for industrial process automation by streamlining the development and deployment process, reducing manual programming effort, and enhancing code reusability. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 17 pages, 4 figures

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2402.19197 [pdf, other]

doi 10.1609/aaai.v38i2.27856

Fine Structure-Aware Sampling: A New Sampling Training Scheme for Pixel-Aligned Implicit Models in Single-View Human Reconstruction

Authors: Kennard Yanting Chan, Fayao Liu, Guosheng Lin, Chuan Sheng Foo, Weisi Lin

Abstract: Pixel-aligned implicit models, such as PIFu, PIFuHD, and ICON, are used for single-view clothed human reconstruction. These models need to be trained using a sampling training scheme. Existing sampling training schemes either fail to capture thin surfaces (e.g. ears, fingers) or cause noisy artefacts in reconstructed meshes. To address these problems, we introduce Fine Structured-Aware Sampling (F… ▽ More Pixel-aligned implicit models, such as PIFu, PIFuHD, and ICON, are used for single-view clothed human reconstruction. These models need to be trained using a sampling training scheme. Existing sampling training schemes either fail to capture thin surfaces (e.g. ears, fingers) or cause noisy artefacts in reconstructed meshes. To address these problems, we introduce Fine Structured-Aware Sampling (FSS), a new sampling training scheme to train pixel-aligned implicit models for single-view human reconstruction. FSS resolves the aforementioned problems by proactively adapting to the thickness and complexity of surfaces. In addition, unlike existing sampling training schemes, FSS shows how normals of sample points can be capitalized in the training process to improve results. Lastly, to further improve the training process, FSS proposes a mesh thickness loss signal for pixel-aligned implicit models. It becomes computationally feasible to introduce this loss once a slight reworking of the pixel-aligned implicit function framework is carried out. Our results show that our methods significantly outperform SOTA methods qualitatively and quantitatively. Our code is publicly available at https://github.com/kcyt/FSS. △ Less

Submitted 29 February, 2024; originally announced February 2024.

Comments: Accepted in Proceedings of the AAAI Conference on Artificial Intelligence, 2024 (AAAI 2024)

Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 2024, pp. 964-971

arXiv:2402.18827 [pdf, other]

Measurement of the photometric Baryon Acoustic Oscillations with self-calibrated redshift distribution

Authors: Ruiyu Song, Kwan Chuen Chan, Haojie Xu, Weilun Zheng

Abstract: We use a galaxy sample derived from the DECaLS DR9 to measure the Baryonic Acoustic Oscillations (BAO). The magnitude-limited sample consists of 10.6 million galaxies in an area of 4974 deg$^2$ over the redshift range of [0.6, 1]. A key novelty of this work is that the true redshift distribution of the photo-$z$ sample is derived from the self calibration method, which determines the true redshift… ▽ More We use a galaxy sample derived from the DECaLS DR9 to measure the Baryonic Acoustic Oscillations (BAO). The magnitude-limited sample consists of 10.6 million galaxies in an area of 4974 deg$^2$ over the redshift range of [0.6, 1]. A key novelty of this work is that the true redshift distribution of the photo-$z$ sample is derived from the self calibration method, which determines the true redshift distribution using the clustering information of the photometric data alone. Through the angular correlation function in four tomographic bins, we constrain the BAO scale dilation parameter $α$ to be $1.025\pm 0.033 $, consistent with the fiducial Planck cosmology. Alternatively, the ratio between the comoving angular diameter distance and the sound horizon, $D_{\rm M} / r_{\rm s}$ is constrained to be $18.94 \pm 0.61 $ at the effective redshift of 0.749. We corroborate our results with the true redshift distribution obtained from a weighted spectroscopic sample, finding very good agreement. We have conducted a series of tests to demonstrate the robustness of the measurement. Our work demonstrates that the self calibration method can effectively constrain the true redshift distribution in cosmological applications, especially in the context of photometric BAO measurement. △ Less

Submitted 12 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 13 pages, 10 figures, matched to the published version

arXiv:2402.14476 [pdf, ps, other]

Quantifying neural network uncertainty under volatility clustering

Authors: Steven Y. K. Wong, Jennifer S. K. Chan, Lamiae Azizi

Abstract: Time-series with time-varying variance pose a unique challenge to uncertainty quantification (UQ) methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. Building on recent advances in neural network UQ literature, we extend and simplify Deep Evidential Regression and Deep Ensembles… ▽ More Time-series with time-varying variance pose a unique challenge to uncertainty quantification (UQ) methods. Time-varying variance, such as volatility clustering as seen in financial time-series, can lead to large mismatch between predicted uncertainty and forecast error. Building on recent advances in neural network UQ literature, we extend and simplify Deep Evidential Regression and Deep Ensembles into a unified framework to deal with UQ under the presence of volatility clustering. We show that a Scale Mixture Distribution is a simpler alternative to the Normal-Inverse-Gamma prior that provides favorable complexity-accuracy trade-off. To illustrate the performance of our proposed approach, we apply it to two sets of financial time-series exhibiting volatility clustering: cryptocurrencies and U.S. equities. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 38 pages

arXiv:2402.12505 [pdf, other]

doi 10.1145/3613904.3642452

Situating Data Sets: Making Public Data Actionable for Housing Justice

Authors: Anh-Ton Tran, Grace Guo, Jordan Taylor, Katsuki Chan, Elora Raymond, Carl DiSalvo

Abstract: Activists, governmentsm and academics regularly advocate for more open data. But how is data made open, and for whom is it made useful and usable? In this paper, we investigate and describe the work of making eviction data open to tenant organizers. We do this through an ethnographic description of ongoing work with a local housing activist organization. This work combines observation, direct part… ▽ More Activists, governmentsm and academics regularly advocate for more open data. But how is data made open, and for whom is it made useful and usable? In this paper, we investigate and describe the work of making eviction data open to tenant organizers. We do this through an ethnographic description of ongoing work with a local housing activist organization. This work combines observation, direct participation in data work, and creating media artifacts, specifically digital maps. Our interpretation is grounded in D'Ignazio and Klein's Data Feminism, emphasizing standpoint theory. Through our analysis and discussion, we highlight how shifting positionalities from data intermediaries to data accomplices affects the design of data sets and maps. We provide HCI scholars with three design implications when situating data for grassroots organizers: becoming a domain beginner, striving for data actionability, and evaluating our design artifacts by the social relations they sustain rather than just their technical efficacy. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 16 pages including references, 4 figures, 1 table, ACM CHI 2024

arXiv:2402.12335 [pdf, other]

Image Super-resolution Inspired Electron Density Prediction

Authors: Chenghan Li, Or Sharir, Shunyue Yuan, Garnet K. Chan

Abstract: Drawing inspiration from the domain of image super-resolution, we view the electron density as a 3D grayscale image and use a convolutional residual network to transform a crude and trivially generated guess of the molecular density into an accurate ground-state quantum mechanical density. We find that this model outperforms all prior density prediction approaches. Because the input is itself a re… ▽ More Drawing inspiration from the domain of image super-resolution, we view the electron density as a 3D grayscale image and use a convolutional residual network to transform a crude and trivially generated guess of the molecular density into an accurate ground-state quantum mechanical density. We find that this model outperforms all prior density prediction approaches. Because the input is itself a real-space density, the predictions are equivariant to molecular symmetry transformations even though the model is not constructed to be. Due to its simplicity, the model is directly applicable to unseen molecular conformations and chemical elements. We show that fine-tuning on limited new data provides high accuracy even in challenging cases of exotic elements and charge states. Our work suggests new routes to learning real-space physical quantities drawing from the established ideas of image processing. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.11133 [pdf, other]

Two-Sample Hypothesis Testing for Large Random Graphs of Unequal Size

Authors: Xin Jin, Kit Chan, Ian Barnett, Riddhi Pratim Ghosh

Abstract: Two-sample hypothesis testing for large graphs is popular in cognitive science, probabilistic machine learning and artificial intelligence. While numerous methods have been proposed in the literature to address this problem, less attention has been devoted to scenarios involving graphs of unequal size or situations where there are only one or a few samples of graphs. In this article, we propose a… ▽ More Two-sample hypothesis testing for large graphs is popular in cognitive science, probabilistic machine learning and artificial intelligence. While numerous methods have been proposed in the literature to address this problem, less attention has been devoted to scenarios involving graphs of unequal size or situations where there are only one or a few samples of graphs. In this article, we propose a Frobenius test statistic tailored for small sample sizes and unequal-sized random graphs to test whether they are generated from the same model or not. Our approach involves an algorithm for generating bootstrapped adjacency matrices from estimated community-wise edge probability matrices, forming the basis of the Frobenius test statistic. We derive the asymptotic distribution of the proposed test statistic and validate its stability and efficiency in detecting minor differences in underlying models through simulations. Furthermore, we explore its application to fMRI data where we are able to distinguish brain activity patterns when subjects are exposed to sentences and pictures for two different stimuli and the control group. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10697 [pdf, other]

Dark Energy Survey: Galaxy Sample for the Baryonic Acoustic Oscillation Measurement from the Final Dataset

Authors: J. Mena-Fernández, M. Rodríguez-Monroy, S. Avila, A. Porredon, K. C. Chan, H. Camacho, N. Weaverdyck, I. Sevilla-Noarbe, E. Sanchez, L. Toribio San Cipriano, J. De Vicente, I. Ferrero, R. Cawthon, A. Carnero Rosell, J. Elvin-Poole, G. Giannini, M. Adamow, K. Bechtol, A. Drlica-Wagner, R. A. Gruendl, W. G. Hartley, A. Pieres, A. J. Ross, E. S. Rykoff, E. Sheldon , et al. (63 additional authors not shown)

Abstract: In this paper we present and validate the galaxy sample used for the analysis of the baryon acoustic oscillation (BAO) signal in the Dark Energy Survey (DES) Y6 data. The definition is based on a color and redshift-dependent magnitude cut optimized to select galaxies at redshifts higher than 0.6, while ensuring a high-quality photo-$z$ determination. The optimization is performed using a Fisher fo… ▽ More In this paper we present and validate the galaxy sample used for the analysis of the baryon acoustic oscillation (BAO) signal in the Dark Energy Survey (DES) Y6 data. The definition is based on a color and redshift-dependent magnitude cut optimized to select galaxies at redshifts higher than 0.6, while ensuring a high-quality photo-$z$ determination. The optimization is performed using a Fisher forecast algorithm, finding the optimal $i$-magnitude cut to be given by $i$<19.64+2.894$z_{\rm ph}$. For the optimal sample, we forecast an increase in precision in the BAO measurement of $\sim$25% with respect to the Y3 analysis. Our BAO sample has a total of 15,937,556 galaxies in the redshift range 0.6<$z_{\rm ph}$<1.2, and its angular mask covers 4,273.42 deg${}^2$ to a depth of $i$=22.5. We validate its redshift distributions with three different methods: directional neighborhood fitting algorithm (DNF), which is our primary photo-$z$ estimation; direct calibration with spectroscopic redshifts from VIPERS; and clustering redshift using SDSS galaxies. The fiducial redshift distribution is a combination of these three techniques performed by modifying the mean and width of the DNF distributions to match those of VIPERS and clustering redshift. In this paper we also describe the methodology used to mitigate the effect of observational systematics, which is analogous to the one used in the Y3 analysis. This paper is one of the two dedicated to the analysis of the BAO signal in DES Y6. In its companion paper, we present the angular diameter distance constraints obtained through the fitting to the BAO scale. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 23 pages, 10 figures. Submitted to PRD

Report number: FERMILAB-PUB-24-0072-PPD

arXiv:2402.10696 [pdf, other]

Dark Energy Survey: A 2.1% measurement of the angular Baryonic Acoustic Oscillation scale at redshift $z_{\rm eff}$=0.85 from the final dataset

Authors: DES Collaboration, T. M. C. Abbott, M. Adamow, M. Aguena, S. Allam, O. Alves, A. Amon, F. Andrade-Oliveira, J. Asorey, S. Avila, D. Bacon, K. Bechtol, G. M. Bernstein, E. Bertin, J. Blazek, S. Bocquet, D. Brooks, D. L. Burke, H. Camacho, A. Carnero Rosell, D. Carollo, J. Carretero, F. J. Castander, R. Cawthon, K. C. Chan , et al. (83 additional authors not shown)

Abstract: We present the angular diameter distance measurement obtained with the Baryonic Acoustic Oscillation feature from galaxy clustering in the completed Dark Energy Survey, consisting of six years (Y6) of observations. We use the Y6 BAO galaxy sample, optimized for BAO science in the redshift range 0.6<$z$<1.2, with an effective redshift at $z_{\rm eff}$=0.85 and split into six tomographic bins. The s… ▽ More We present the angular diameter distance measurement obtained with the Baryonic Acoustic Oscillation feature from galaxy clustering in the completed Dark Energy Survey, consisting of six years (Y6) of observations. We use the Y6 BAO galaxy sample, optimized for BAO science in the redshift range 0.6<$z$<1.2, with an effective redshift at $z_{\rm eff}$=0.85 and split into six tomographic bins. The sample has nearly 16 million galaxies over 4,273 square degrees. Our consensus measurement constrains the ratio of the angular distance to sound horizon scale to $D_M(z_{\rm eff})/r_d$ = 19.51$\pm$0.41 (at 68.3% confidence interval), resulting from comparing the BAO position in our data to that predicted by Planck $Λ$CDM via the BAO shift parameter $α=(D_M/r_d)/(D_M/r_d)_{\rm Planck}$. To achieve this, the BAO shift is measured with three different methods, Angular Correlation Function (ACF), Angular Power Spectrum (APS), and Projected Correlation Function (PCF) obtaining $α=$ 0.952$\pm$0.023, 0.962$\pm$0.022, and 0.955$\pm$0.020, respectively, which we combine to $α=$ 0.957$\pm$0.020, including systematic errors. When compared with the $Λ$CDM model that best fits Planck data, this measurement is found to be 4.3% and 2.1$σ$ below the angular BAO scale predicted. To date, it represents the most precise angular BAO measurement at $z$>0.75 from any survey and the most precise measurement at any redshift from photometric surveys. The analysis was performed blinded to the BAO position and it is shown to be robust against analysis choices, data removal, redshift calibrations and observational systematics. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Submitted to PRD, 39 pages, 12 figures

Report number: FERMILAB-PUB-24-0027-PPD

arXiv:2402.08873 [pdf, ps, other]

Balancing Method for Non-monotone Missing Data

Authors: Jianing Dong, Raymond K. W. Wong, Kwun Chuen Gary Chan

Abstract: Covariate balancing methods have been widely applied to single or monotone missing patterns and have certain advantages over likelihood-based methods and inverse probability weighting approaches based on standard logistic regression. In this paper, we consider non-monotone missing data under the complete-case missing variable condition (CCMV), which is a case of missing not at random (MNAR). Using… ▽ More Covariate balancing methods have been widely applied to single or monotone missing patterns and have certain advantages over likelihood-based methods and inverse probability weighting approaches based on standard logistic regression. In this paper, we consider non-monotone missing data under the complete-case missing variable condition (CCMV), which is a case of missing not at random (MNAR). Using relationships between each missing pattern and the complete-case subsample, a weighted estimator can be used for estimation, where the weight is a sum of ratios of conditional probability of observing a particular missing pattern versus that of observing the complete-case pattern, given the variables observed in the corresponding missing pattern. Plug-in estimators of the propensity ratios, however, can be unbounded and lead to unstable estimation. Using further relations between propensity ratios and balancing of moments across missing patterns, we employ tailored loss functions each encouraging empirical balance across patterns to estimate propensity ratios flexibly using functional basis expansion. We propose two penalizations to separately control propensity ratio model complexity and covariate imbalance. We study the asymptotic properties of the proposed estimators and show that they are consistent under mild smoothness assumptions. Asymptotic normality and efficiency are also developed. Numerical simulation results show that the proposed method achieves smaller mean squared errors than other methods. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.08351 [pdf, other]

Wireless Channel Prediction via Gaussian Mixture Models

Authors: Nurettin Turan, Benedikt Böck, Kai Jie Chan, Benedikt Fesl, Friedrich Burmeister, Michael Joham, Gerhard Fettweis, Wolfgang Utschick

Abstract: In this work, we utilize a Gaussian mixture model (GMM) to capture the underlying probability density function (PDF) of the channel trajectories of moving mobile terminals (MTs) within the coverage area of a base station (BS) in an offline phase. We propose to leverage the same GMM for channel prediction in the online phase. Our proposed approach does not require signal-to-noise ratio (SNR)-specif… ▽ More In this work, we utilize a Gaussian mixture model (GMM) to capture the underlying probability density function (PDF) of the channel trajectories of moving mobile terminals (MTs) within the coverage area of a base station (BS) in an offline phase. We propose to leverage the same GMM for channel prediction in the online phase. Our proposed approach does not require signal-to-noise ratio (SNR)-specific training and allows for parallelization. Numerical simulations for both synthetic and measured channel data demonstrate the effectiveness of our proposed GMM-based channel predictor compared to state-ofthe-art channel prediction methods. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.05492 [pdf, other]

Cosmological Forecast of the Void Size Function Measurement from the CSST Spectroscopic Survey

Authors: Yingxiao Song, Qi Xiong, Yan Gong, Furen Deng, Kwan Chuen Chan, Xuelei Chen, Qi Guo, Jiaxin Han, Guoliang Li, Ming Li, Yun Liu, Yu Luo, Wenxiang Pei, Chengliang Wei

Abstract: Void size function (VSF) contains information of the cosmic large-scale structure (LSS), and can be used to derive the properties of dark energy and dark matter. We predict the VSFs measured from the spectroscopic galaxy survey operated by the China Space Station Telescope (CSST), and study the strength of cosmological constraint. We employ a high-resolution Jiutian simulation to get CSST galaxy m… ▽ More Void size function (VSF) contains information of the cosmic large-scale structure (LSS), and can be used to derive the properties of dark energy and dark matter. We predict the VSFs measured from the spectroscopic galaxy survey operated by the China Space Station Telescope (CSST), and study the strength of cosmological constraint. We employ a high-resolution Jiutian simulation to get CSST galaxy mock samples based on an improved semi-analytical model. We identify voids from this galaxy catalog using the watershed algorithm without assuming a spherical shape, and estimate the VSFs at different redshift bins from $z=0.5$ to 1.1. We propose a void selection method based on the ellipticity, and assume the void linear underdensity threshold $δ_{\rm v}$ in the theoretical model is redshift-dependent and set it as a free parameter in each redshift bin. The Markov Chain Monte Carlo (MCMC) method is adopted to implement the constraints on the cosmological and void parameters. We find that the CSST VSF measurement can constrain the cosmological parameters to a few percent level. The best-fit values of $δ_{\rm v}$ are ranging from $\sim-0.4$ to $-0.1$ as the redshift increases from 0.5 to 1.1, which has a distinct difference from the theoretical calculation with $δ_{\rm v}\simeq-2.7$ assuming the spherical evolution and using particles as tracer. Our method can provide a good reference for void identification and selection in the VSF analysis of the spectroscopic galaxy surveys. △ Less

Submitted 24 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 10 pages, 7 figures, 3 tables. Accepted for publication in MNRAS

Journal ref: MNRAS, 532, 1049-1058 (2024)

arXiv:2402.05412 [pdf]

Multi-Network Constrained Operational Optimization in Community Integrated Energy Systems: A Safe Reinforcement Learning Approach

Authors: Ze Hu, Ka Wing Chan, Ziqing Zhu, Xiang Wei, Siqi Bu

Abstract: The integrated community energy system (ICES) has emerged as a promising solution for enhancing the efficiency of the distribution system by effectively coordinating multiple energy sources. However, the operational optimization of ICES is hindered by the physical constraints of heterogeneous networks including electricity, natural gas, and heat. These challenges are difficult to address due to th… ▽ More The integrated community energy system (ICES) has emerged as a promising solution for enhancing the efficiency of the distribution system by effectively coordinating multiple energy sources. However, the operational optimization of ICES is hindered by the physical constraints of heterogeneous networks including electricity, natural gas, and heat. These challenges are difficult to address due to the non-linearity of network constraints and the high complexity of multi-network coordination. This paper, therefore, proposes a novel Safe Reinforcement Learning (SRL) algorithm to optimize the multi-network constrained operation problem of ICES. Firstly, a comprehensive ICES model is established considering integrated demand response (IDR), multiple energy devices, and network constraints. The multi-network operational optimization problem of ICES is then presented and reformulated as a constrained Markov Decision Process (C-MDP) accounting for violating physical network constraints. The proposed novel SRL algorithm, named Primal-Dual Twin Delayed Deep Deterministic Policy Gradient (PD-TD3), solves the C-MDP by employing a Lagrangian multiplier to penalize the multi-network constraint violation, ensuring that violations are within a tolerated range and avoid over-conservative strategy with a low reward at the same time. The proposed algorithm accurately estimates the cumulative reward and cost of the training process, thus achieving a fair balance between improving profits and reducing constraint violations in a privacy-protected environment with only partial information. A case study comparing the proposed algorithm with benchmark RL algorithms demonstrates the computational performance in increasing total profits and alleviating the network constraint violations. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Showing 1–50 of 980 results for author: Chan, K