Search | arXiv e-print repository

Supercharging Federated Learning with Flower and NVIDIA FLARE

Authors: Holger R. Roth, Daniel J. Beutel, Yan Cheng, Javier Fernandez Marques, Heng Pan, Chester Chen, Zhihong Zhang, Yuhong Wen, Sean Yang, Isaac, Yang, Yuan-Ting Hsieh, Ziyue Xu, Daguang Xu, Nicholas D. Lane, Andrew Feng

Abstract: Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re… ▽ More Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in research and industry. Conversely, FLARE has prioritized the creation of an enterprise-ready, resilient runtime environment explicitly designed for FL applications in production environments. In this paper, we describe our initial integration of both frameworks and show how they can work together to supercharge the FL ecosystem as a whole. Through the seamless integration of Flower and FLARE, applications crafted within the Flower framework can effortlessly operate within the FLARE runtime environment without necessitating any modifications. This initial integration streamlines the process, eliminating complexities and ensuring smooth interoperability between the two platforms, thus enhancing the overall efficiency and accessibility of FL applications. △ Less

Submitted 21 May, 2024; originally announced July 2024.

arXiv:2405.19087 [pdf, other]

Impacts of ALP on the Constraints of Dark Photon

Authors: Chuan-Ren Chen, Yuan-Feng Hsieh, Chrisna Setyo Nugroho

Abstract: Dark sector may exist and interact with Standard Model (SM) through the $U(1)$ kinetic mixing. Through this portal-type interaction, dark photon from dark sector couples to SM fermions, and may explain the discrepancy between experimental data and SM calculations on muon anomalous magnetic moment, muon $g-2$. However, current searches for dark photon impose stringent constraints on the mixing para… ▽ More Dark sector may exist and interact with Standard Model (SM) through the $U(1)$ kinetic mixing. Through this portal-type interaction, dark photon from dark sector couples to SM fermions, and may explain the discrepancy between experimental data and SM calculations on muon anomalous magnetic moment, muon $g-2$. However, current searches for dark photon impose stringent constraints on the mixing parameter $\varepsilon$ for various dark photon masses, excluding the favorite parameter space for muon $g-2$. In this paper, we study the case where a global $U(1)$ in dark sector is spontaneously broken, resulting a light pseudo-Goldstone, axion-like particle (ALP) $a$, which couples to dark photon and SM photon, $g_{aγγ'}$. Through this interaction, dark photon may decay into photon and ALP when this channel is kinematically allowed. As a result, the experimental constraints on dark photon change significantly, and dark photon is able to explain the muon $g-2$ anomaly when its mass is heavier than $10$ GeV. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 9 pages, 3 figures

arXiv:2405.16557 [pdf, other]

Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning

Authors: Chun-Kai Huang, Yi-Hsien Hsieh, Ta-Jung Chien, Li-Cheng Chien, Shao-Hua Sun, Tung-Hung Su, Jia-Horng Kao, Che Lin

Abstract: Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furth… ▽ More Multivariate time series (MTS) data, when sampled irregularly and asynchronously, often present extensive missing values. Conventional methodologies for MTS analysis tend to rely on temporal embeddings based on timestamps that necessitate subsequent imputations, yet these imputed values frequently deviate substantially from their actual counterparts, thereby compromising prediction accuracy. Furthermore, these methods typically fail to provide robust initial embeddings for values infrequently observed or even absent within the training set, posing significant challenges to model generalizability. In response to these challenges, we propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token, effectively bypassing the need for imputation. SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism. Coupling SCANE with the Transformer Encoder architecture, we develop the Scalable nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries. Our experimental validation, conducted across three disparate electronic health record (EHR) datasets marked by elevated missing value frequencies, confirms the superior performance of SUMMIT over contemporary state-of-the-art approaches addressing similar challenges. These results substantiate the efficacy of SCANE and SUMMIT, underscoring their potential applicability across a broad spectrum of MTS data analytical tasks. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2403.02363 [pdf, other]

Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity

Authors: Ying-Hsuan Wu, Jun-Wei Hsieh, Li Xin, Shin-You Teng, Yi-Kuan Hsieh, Ming-Ching Chang

Abstract: Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining s… ▽ More Real-world datasets commonly exhibit noisy labels and class imbalance, such as long-tailed distributions. While previous research addresses this issue by differentiating noisy and clean samples, reliance on information from predictions based on noisy long-tailed data introduces potential errors. To overcome the limitations of prior works, we introduce an effective two-stage approach by combining soft-label refurbishing with multi-expert ensemble learning. In the first stage of robust soft label refurbishing, we acquire unbiased features through contrastive learning, making preliminary predictions using a classifier trained with a carefully designed BAlanced Noise-tolerant Cross-entropy (BANC) loss. In the second stage, our label refurbishment method is applied to obtain soft labels for multi-expert ensemble learning, providing a principled solution to the long-tail noisy label problem. Experiments conducted across multiple benchmarks validate the superiority of our approach, Label Refurbishment considering Label Rarity (LR^2), achieving remarkable accuracies of 94.19% and 77.05% on simulated noisy CIFAR-10 and CIFAR-100 long-tail datasets, as well as 77.74% and 81.40% on real-noise long-tail datasets, Food-101N and Animal-10N, surpassing existing state-of-the-art methods. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.07792 [pdf, other]

Empowering Federated Learning for Massive Models with NVIDIA FLARE

Authors: Holger R. Roth, Ziyue Xu, Yuan-Ting Hsieh, Adithya Renduchintala, Isaac Yang, Zhihong Zhang, Yuhong Wen, Sean Yang, Kevin Lu, Kristopher Kersten, Camir Ricketts, Daguang Xu, Chester Chen, Yan Cheng, Andrew Feng

Abstract: In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copy… ▽ More In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copyright issues, and the sheer effort required to move vast datasets. In this paper, we explore how federated learning enabled by NVIDIA FLARE can address these challenges with easy and scalable integration capabilities, enabling parameter-efficient and full supervised fine-tuning of LLMs for natural language processing and biopharmaceutical applications to enhance their accuracy and robustness. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.02998 [pdf, other]

Careful with that Scalpel: Improving Gradient Surgery with an EMA

Authors: Yu-Guan Hsieh, James Thornton, Eugene Ndiaye, Michal Klein, Marco Cuturi, Pierre Ablin

Abstract: Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent works have shown that one… ▽ More Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent works have shown that one can improve performance by blending the gradients beyond a simple sum; this is known as gradient surgery. We cast the problem as a constrained minimization problem where the auxiliary objective is minimized among the set of minimizers of the training loss. To solve this bilevel problem, we follow a parameter update direction that combines the training loss gradient and the orthogonal projection of the auxiliary gradient to the training gradient. In a setting where gradients come from mini-batches, we explain how, using a moving average of the training loss gradients, we can carefully maintain this critical orthogonality property. We demonstrate that our method, Bloop, can lead to much better performances on NLP and vision experiments than other gradient surgery methods without EMA. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.01300 [pdf, other]

doi 10.1021/acs.nanolett.3c01208

Engineering the strain and interlayer excitons of 2D materials via lithographically engraved hexagonal boron nitride

Authors: Yu-Chiang Hsieh, Zhen-You Lin, Shin-Ji Fung, Wen-Shin Lu, Sheng-Chin Ho, Siang-Ping Hong, Sheng-Zhu Ho, Chiu-Hua Huang, Kenji Watanabe, Takashi Taniguchi, Yang-Hao Chan, Yi-Chun Chen, Chung-Lin Wu, Tse-Ming Chen

Abstract: Strain engineering has quickly emerged as a viable option to modify the electronic, optical and magnetic properties of 2D materials. However, it remains challenging to arbitrarily control the strain. Here we show that by creating atomically-flat surface nanostructures in hexagonal boron nitride, we achieve an arbitrary on-chip control of both the strain distribution and magnitude on high-quality m… ▽ More Strain engineering has quickly emerged as a viable option to modify the electronic, optical and magnetic properties of 2D materials. However, it remains challenging to arbitrarily control the strain. Here we show that by creating atomically-flat surface nanostructures in hexagonal boron nitride, we achieve an arbitrary on-chip control of both the strain distribution and magnitude on high-quality molybdenum disulfide. The phonon and exciton emissions are shown to vary in accordance with our strain field designs, enabling us to write and draw any photoluminescence color image in a single chip. Moreover, our strain engineering offers a powerful means to significantly and controllably alter the strengths and energies of interlayer excitons at room temperature. This method can be easily extended to other material systems and offers a promise for functional excitonic devices. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: 8 pages, 5 figures

Journal ref: Nano Lett. 23, 7244-7251 (2023)

arXiv:2312.16771 [pdf, other]

Scale-Aware Crowd Count Network with Annotation Error Correction

Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Li Xin

Abstract: Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varyi… ▽ More Traditional crowd counting networks suffer from information loss when feature maps are downsized through pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, the use of a fixed Gaussian kernel fails to account for the varying pixel distribution with respect to the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a ``scale-aware'' architecture with error-correcting capabilities of noisy annotations. For the first time, we {\bf simultaneously} model labeling errors (mean) and scale variations (variance) by spatially-varying Gaussian distributions to produce fine-grained heat maps for crowd counting. Furthermore, the proposed adaptive Gaussian kernel variance enables the model to learn dynamically with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. The performance of SACC-Net is extensively evaluated on four public datasets: UCF-QNRF, UCF CC 50, NWPU, and ShanghaiTech A-B. Experimental results demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd counting accuracy. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 7 pages, 6 figues. arXiv admin note: text overlap with arXiv:2211.06835

arXiv:2312.03314 [pdf, other]

Implications of Gamma Ray Burst GRB221009A for Extra Dimensions

Authors: Janus Capellan Aban, Chuan-Ren Chen, Yuan-Feng Hsieh, Chrisna Setyo Nugroho

Abstract: Anomalous high energy photons, known as GRB221009A, with 18 TeV and 251 TeV were observed by LHAASO and Carpet-2 recently. Such observation of high energy gamma-ray bursts from distant source causes a mystery since high energy photons suffer severe attenuation before reaching the earth. One possibility is the existence of axion-like particles (ALP), and high energy photons at the source can conver… ▽ More Anomalous high energy photons, known as GRB221009A, with 18 TeV and 251 TeV were observed by LHAASO and Carpet-2 recently. Such observation of high energy gamma-ray bursts from distant source causes a mystery since high energy photons suffer severe attenuation before reaching the earth. One possibility is the existence of axion-like particles (ALP), and high energy photons at the source can convert to these ALPs which travel intergalactically. In this paper, we study the effects of extra dimensions on the conversion probability between photon and ALPs. The conversion probability saturates and may reach almost $100\%$ for high energy photons. We show that the size of extra dimension affects the energy at which saturation occurs. The observations of high-energy photons may support the possibility of smaller extra dimension. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.02213 [pdf, other]

JarviX: A LLM No code Platform for Tabular Data Analysis and Optimization

Authors: Shang-Ching Liu, ShengKun Wang, Wenqi Lin, Chung-Wei Hsiung, Yi-Chen Hsieh, Yu-Ping Cheng, Sian-Hong Luo, Tsungyao Chang, Jianwei Zhang

Abstract: In this study, we introduce JarviX, a sophisticated data analytics framework. JarviX is designed to employ Large Language Models (LLMs) to facilitate an automated guide and execute high-precision data analyzes on tabular datasets. This framework emphasizes the significance of varying column types, capitalizing on state-of-the-art LLMs to generate concise data insight summaries, propose relevant an… ▽ More In this study, we introduce JarviX, a sophisticated data analytics framework. JarviX is designed to employ Large Language Models (LLMs) to facilitate an automated guide and execute high-precision data analyzes on tabular datasets. This framework emphasizes the significance of varying column types, capitalizing on state-of-the-art LLMs to generate concise data insight summaries, propose relevant analysis inquiries, visualize data effectively, and provide comprehensive explanations for results drawn from an extensive data analysis pipeline. Moreover, JarviX incorporates an automated machine learning (AutoML) pipeline for predictive modeling. This integration forms a comprehensive and automated optimization cycle, which proves particularly advantageous for optimizing machine configuration. The efficacy and adaptability of JarviX are substantiated through a series of practical use case studies. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.16706 [pdf, ps, other]

Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm

Authors: Mohammad Reza Karimi, Ya-Ping Hsieh, Andreas Krause

Abstract: Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we b… ▽ More Many problems in machine learning can be formulated as solving entropy-regularized optimal transport on the space of probability measures. The canonical approach involves the Sinkhorn iterates, renowned for their rich mathematical properties. Recently, the Sinkhorn algorithm has been recast within the mirror descent framework, thus benefiting from classical optimization theory insights. Here, we build upon this result by introducing a continuous-time analogue of the Sinkhorn algorithm. This perspective allows us to derive novel variants of Sinkhorn schemes that are robust to noise and bias. Moreover, our continuous-time dynamics not only generalize but also offer a unified perspective on several recently discovered dynamics in machine learning and mathematics, such as the "Wasserstein mirror flow" of (Deb et al. 2023) or the "mean-field Schrödinger equation" of (Claisse et al. 2023). △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.02374 [pdf, other]

Riemannian stochastic optimization methods avoid strict saddle points

Authors: Ya-Ping Hsieh, Mohammad Reza Karimi, Andreas Krause, Panayotis Mertikopoulos

Abstract: Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesical… ▽ More Many modern machine learning applications - from online principal component analysis to covariance matrix identification and dictionary learning - can be formulated as minimization problems on Riemannian manifolds, and are typically solved with a Riemannian stochastic gradient method (or some variant thereof). However, in many cases of interest, the resulting minimization problem is not geodesically convex, so the convergence of the chosen solver to a desirable solution - i.e., a local minimizer - is by no means guaranteed. In this paper, we study precisely this question, that is, whether stochastic Riemannian optimization algorithms are guaranteed to avoid saddle points with probability 1. For generality, we study a family of retraction-based methods which, in addition to having a potentially much lower per-iteration cost relative to Riemannian gradient descent, include other widely used algorithms, such as natural policy gradient methods and mirror descent in ordinary convex spaces. In this general setting, we show that, under mild assumptions for the ambient manifold and the oracle providing gradient information, the policies under study avoid strict saddle points / submanifolds with probability 1, from any initial condition. This result provides an important sanity check for the use of gradient methods on manifolds as it shows that, almost always, the limit state of a stochastic Riemannian algorithm can only be a local minimizer. △ Less

Submitted 4 November, 2023; originally announced November 2023.

Comments: 27 pages, 3 figures

MSC Class: Primary 62L20; 37N40; secondary 90C15; 90C48

arXiv:2310.01733 [pdf, other]

doi 10.1109/ICDH60066.2023.00019

Health Guardian: Using Multi-modal Data to Understand Individual Health

Authors: Vince S. Siu, Kuan Yu Hsieh, Italo Buleje, Takashi Itoh, Tian Hao, Ben Civjan, Nigel Hinds, Bing Dang, Jeffrey L. Rogers, Bo Wen

Abstract: Artificial intelligence (AI) has shown great promise in revolutionizing the field of digital health by improving disease diagnosis, treatment, and prevention. This paper describes the Health Guardian platform, a non-commercial, scientific research-based platform developed by the IBM Digital Health team to rapidly translate AI research into cloud-based microservices. The platform can collect health… ▽ More Artificial intelligence (AI) has shown great promise in revolutionizing the field of digital health by improving disease diagnosis, treatment, and prevention. This paper describes the Health Guardian platform, a non-commercial, scientific research-based platform developed by the IBM Digital Health team to rapidly translate AI research into cloud-based microservices. The platform can collect health-related data from various digital devices, including wearables and mobile applications. Its flexible architecture supports microservices that accept diverse data types such as text, audio, and video, expanding the range of digital health assessments and enabling holistic health evaluations by capturing voice, facial, and motion bio-signals. These microservices can be deployed to a clinical cohort specified through the Clinical Task Manager (CTM). The CTM then collects multi-modal, clinical data that can iteratively improve the accuracy of AI predictive models, discover new disease mechanisms, or identify novel biomarkers. This paper highlights three microservices with different input data types, including a text-based microservice for depression assessment, a video-based microservice for sit-to-stand mobility assessment, and a wearable-based microservice for functional mobility assessment. The CTM is also discussed as a tool to help design and set up clinical studies to unlock the full potential of the platform. Today, the Health Guardian platform is being leveraged in collaboration with research partners to optimize the development of AI models by utilizing a multitude of input sources. This approach streamlines research efforts, enhances efficiency, and facilitates the development and validation of digital health applications. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 10 pages, 6 figures

Journal ref: IEEE International Conference on Digital Health (ICDH), 2023, pp. 65-74

arXiv:2310.01673 [pdf, other]

doi 10.1109/ICDH60066.2023.00021

A Versatile Data Fabric for Advanced IoT-Based Remote Health Monitoring

Authors: Italo Buleje, Vince S. Siu, Kuan Yu Hsieh, Nigel Hinds, Bing Dang, Erhan Bilal, Thanhnha Nguyen, Ellen E. Lee, Colin A. Depp, Jeffrey L. Rogers

Abstract: This paper presents a data-centric and security-focused data fabric designed for digital health applications. With the increasing interest in digital health research, there has been a surge in the volume of Internet of Things (IoT) data derived from smartphones, wearables, and ambient sensors. Managing this vast amount of data, encompassing diverse data types and varying time scales, is crucial. M… ▽ More This paper presents a data-centric and security-focused data fabric designed for digital health applications. With the increasing interest in digital health research, there has been a surge in the volume of Internet of Things (IoT) data derived from smartphones, wearables, and ambient sensors. Managing this vast amount of data, encompassing diverse data types and varying time scales, is crucial. Moreover, compliance with regulatory and contractual obligations is essential. The proposed data fabric comprises an architecture and a toolkit that facilitate the integration of heterogeneous data sources, across different environments, to provide a unified view of the data in dashboards. Furthermore, the data fabric supports the development of reusable and configurable data integration components, which can be shared as open-source or inner-source software. These components are used to generate data pipelines that can be deployed and scheduled to run either in the cloud or on-premises. Additionally, we present the implementation of our data fabric in a home-based telemonitoring research project involving older adults, conducted in collaboration with the University of California, San Diego (UCSD). The study showcases the streamlined integration of data collected from various IoT sensors and mobile applications to create a unified view of older adults' health for further analysis and research. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Journal ref: 2023 IEEE International Conference on Digital Health (ICDH), Chicago, IL, USA, 2023, pp. 88-90

arXiv:2309.14859 [pdf, other]

Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation

Authors: Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao, Bernard B W Yang, Giyeong Oh, Yanmin Gong

Abstract: Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading open-source model in this fast-growing field. However, the intricacies of fine-tuning these models pose multiple challenges from new methodology integration to systematic evaluation. Addressing these iss… ▽ More Text-to-image generative models have garnered immense attention for their ability to produce high-fidelity images from text prompts. Among these, Stable Diffusion distinguishes itself as a leading open-source model in this fast-growing field. However, the intricacies of fine-tuning these models pose multiple challenges from new methodology integration to systematic evaluation. Addressing these issues, this paper introduces LyCORIS (Lora beYond Conventional methods, Other Rank adaptation Implementations for Stable diffusion) [https://github.com/KohakuBlueleaf/LyCORIS], an open-source library that offers a wide selection of fine-tuning methodologies for Stable Diffusion. Furthermore, we present a thorough framework for the systematic assessment of varied fine-tuning techniques. This framework employs a diverse suite of metrics and delves into multiple facets of fine-tuning, including hyperparameter adjustments and the evaluation with different prompt types across various concept categories. Through this comprehensive approach, our work provides essential insights into the nuanced effects of fine-tuning parameters, bridging the gap between state-of-the-art research and practical application. △ Less

Submitted 11 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

Comments: In International Conference on Learning Representations 12 (ICLR 2024) [79 pages, 54 figures, 7 tables]

arXiv:2309.09514 [pdf, other]

PanoMixSwap Panorama Mixing via Structural Swapping for Indoor Scene Understanding

Authors: Yu-Cheng Hsieh, Cheng Sun, Suraj Dengale, Min Sun

Abstract: The volume and diversity of training data are critical for modern deep learningbased methods. Compared to the massive amount of labeled perspective images, 360 panoramic images fall short in both volume and diversity. In this paper, we propose PanoMixSwap, a novel data augmentation technique specifically designed for indoor panoramic images. PanoMixSwap explicitly mixes various background styles,… ▽ More The volume and diversity of training data are critical for modern deep learningbased methods. Compared to the massive amount of labeled perspective images, 360 panoramic images fall short in both volume and diversity. In this paper, we propose PanoMixSwap, a novel data augmentation technique specifically designed for indoor panoramic images. PanoMixSwap explicitly mixes various background styles, foreground furniture, and room layouts from the existing indoor panorama datasets and generates a diverse set of new panoramic images to enrich the datasets. We first decompose each panoramic image into its constituent parts: background style, foreground furniture, and room layout. Then, we generate an augmented image by mixing these three parts from three different images, such as the foreground furniture from one image, the background style from another image, and the room structure from the third image. Our method yields high diversity since there is a cubical increase in image combinations. We also evaluate the effectiveness of PanoMixSwap on two indoor scene understanding tasks: semantic segmentation and layout estimation. Our experiments demonstrate that state-of-the-art methods trained with PanoMixSwap outperform their original setting on both tasks consistently. △ Less

Submitted 27 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: BMVC'23; project page:https://yuchenghsieh.github.io/PanoMixSwap

arXiv:2308.14750 [pdf]

Ultrahigh Photoresponsivity of Gold Nanodisk Array/CVD MoS$_2$-based Hybrid Phototransistor

Authors: Shyam Narayan Singh Yadav, Po-Liang Chen, Yu-Chi Yao, Yen-Yu Wang, Der-Hsien Lien, Yu-Jung Lu, Ya-Ping Hsieh, Chang-Hua Liu, Ta-Jen Yen

Abstract: Owing to its atomically thin thickness, layer-dependent tunable band gap, flexibility, and CMOS compatibility, MoS$_2$ is a promising candidate for photodetection. However, mono-layer MoS2-based photodetectors typically show poor optoelectronic performances, mainly limited by their low optical absorption. In this work, we hybridized CVD-grown monolayer MoS$_2$ with a gold nanodisk (AuND) array to… ▽ More Owing to its atomically thin thickness, layer-dependent tunable band gap, flexibility, and CMOS compatibility, MoS$_2$ is a promising candidate for photodetection. However, mono-layer MoS2-based photodetectors typically show poor optoelectronic performances, mainly limited by their low optical absorption. In this work, we hybridized CVD-grown monolayer MoS$_2$ with a gold nanodisk (AuND) array to demonstrate a superior visible photodetector through a synergetic effect. It is evident from our experimental results that there is a strong light-matter interaction between AuNDs and monolayer MoS$_2$, which results in better photodetection due to a surface trap state passivation with a longer charge carrier lifetime compared to pristine MoS$_2$. In particular, the AuND/MoS$_2$ system demonstrated a photoresponsivity of $8.7 \times 10^{4}$ A/W, specific detectivity of $6.9 \times 10^{13}$ Jones, and gain $1.7 \times 10^{5}$ at $31.84 μW/cm^{2}$ illumination power density of 632 nm wavelength with an applied voltage of 4.0 V for an AuND/MoS$_2$-based photodetector. To our knowledge, these optoelectronic responses are one order higher than reported results for CVD MoS$_2$-based photodetector in the literature. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: 30 pages, 14 figures

arXiv:2306.09099 [pdf, other]

Unbalanced Diffusion Schrödinger Bridge

Authors: Matteo Pariset, Ya-Ping Hsieh, Charlotte Bunne, Andreas Krause, Valentin De Bortoli

Abstract: Schrödinger bridges (SBs) provide an elegant framework for modeling the temporal evolution of populations in physical, chemical, or biological systems. Such natural processes are commonly subject to changes in population size over time due to the emergence of new species or birth and death events. However, existing neural parameterizations of SBs such as diffusion Schrödinger bridges (DSBs) are re… ▽ More Schrödinger bridges (SBs) provide an elegant framework for modeling the temporal evolution of populations in physical, chemical, or biological systems. Such natural processes are commonly subject to changes in population size over time due to the emergence of new species or birth and death events. However, existing neural parameterizations of SBs such as diffusion Schrödinger bridges (DSBs) are restricted to settings in which the endpoints of the stochastic process are both probability measures and assume conservation of mass constraints. To address this limitation, we introduce unbalanced DSBs which model the temporal evolution of marginals with arbitrary finite mass. This is achieved by deriving the time reversal of stochastic differential equations with killing and birth terms. We present two novel algorithmic schemes that comprise a scalable objective function for training unbalanced DSBs and provide a theoretical analysis alongside challenging applications on predicting heterogeneous molecular single-cell responses to various cancer drugs and simulating the emergence and spread of new viral variants. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2305.12444 [pdf, other]

On the Impossibility of General Parallel Fast-forwarding of Hamiltonian Simulation

Authors: Nai-Hui Chia, Kai-Min Chung, Yao-Ching Hsieh, Han-Hsuan Lin, Yao-Ting Lin, Yu-Ching Shen

Abstract: Hamiltonian simulation is one of the most important problems in the field of quantum computing. There have been extended efforts on designing algorithms for faster simulation, and the evolution time $T$ for the simulation turns out to largely affect algorithm runtime. While there are some specific types of Hamiltonians that can be fast-forwarded, i.e., simulated within time $o(T)$, for large enoug… ▽ More Hamiltonian simulation is one of the most important problems in the field of quantum computing. There have been extended efforts on designing algorithms for faster simulation, and the evolution time $T$ for the simulation turns out to largely affect algorithm runtime. While there are some specific types of Hamiltonians that can be fast-forwarded, i.e., simulated within time $o(T)$, for large enough classes of Hamiltonians (e.g., all local/sparse Hamiltonians), existing simulation algorithms require running time at least linear in the evolution time $T$. On the other hand, while there exist lower bounds of $Ω(T)$ circuit size for some large classes of Hamiltonian, these lower bounds do not rule out the possibilities of Hamiltonian simulation with large but "low-depth" circuits by running things in parallel. Therefore, it is intriguing whether we can achieve fast Hamiltonian simulation with the power of parallelism. In this work, we give a negative result for the above open problem, showing that sparse Hamiltonians and (geometrically) local Hamiltonians cannot be parallelly fast-forwarded. In the oracle model, we prove that there are time-independent sparse Hamiltonians that cannot be simulated via an oracle circuit of depth $o(T)$. In the plain model, relying on the random oracle heuristic, we show that there exist time-independent local Hamiltonians and time-dependent geometrically local Hamiltonians that cannot be simulated via an oracle circuit of depth $o(T/n^c)$, where the Hamiltonians act on $n$-qubits, and $c$ is a constant. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 44 pages, 7 figures

arXiv:2304.12171 [pdf, ps, other]

Monotone comparative statics for submodular functions, with an application to aggregated deferred acceptance

Authors: Alfred Galichon, Yu-Wei Hsieh, Maxime Sylvestre

Abstract: We propose monotone comparative statics results for maximizers of submodular functions, as opposed to maximizers of supermodular functions as in the classical theory put forth by Veinott, Topkis, Milgrom, and Shannon among others. We introduce matrons, a natural structure that is dual to sublattices that generalizes existing structures such as matroids and polymatroids in combinatorial optimizatio… ▽ More We propose monotone comparative statics results for maximizers of submodular functions, as opposed to maximizers of supermodular functions as in the classical theory put forth by Veinott, Topkis, Milgrom, and Shannon among others. We introduce matrons, a natural structure that is dual to sublattices that generalizes existing structures such as matroids and polymatroids in combinatorial optimization and M-sets in discrete convex analysis. Our monotone comparative statics result is based on a natural order on matrons, which is dual in some sense to Veinott's strong set order on sublattices. As an application, we propose a deferred acceptance algorithm that operates in the case of divisible goods, and we study its convergence properties. △ Less

Submitted 24 April, 2023; originally announced April 2023.

arXiv:2302.11419 [pdf, other]

Aligned Diffusion Schrödinger Bridges

Authors: Vignesh Ram Somnath, Matteo Pariset, Ya-Ping Hsieh, Maria Rodriguez Martinez, Andreas Krause, Charlotte Bunne

Abstract: Diffusion Schrödinger bridges (DSB) have recently emerged as a powerful framework for recovering stochastic dynamics via their marginal observations at different time points. Despite numerous successful applications, existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data, which naturally arises in many biological phenomena. In this paper, we propose a nove… ▽ More Diffusion Schrödinger bridges (DSB) have recently emerged as a powerful framework for recovering stochastic dynamics via their marginal observations at different time points. Despite numerous successful applications, existing algorithms for solving DSBs have so far failed to utilize the structure of aligned data, which naturally arises in many biological phenomena. In this paper, we propose a novel algorithmic framework that, for the first time, solves DSBs while respecting the data alignment. Our approach hinges on a combination of two decades-old ideas: The classical Schrödinger bridge theory and Doob's $h$-transform. Compared to prior methods, our approach leads to a simpler training procedure with lower variance, which we further augment with principled regularization schemes. This ultimately leads to sizeable improvements across experiments on synthetic and real data, including the tasks of predicting conformational changes in proteins and temporal evolution of cellular differentiation processes. △ Less

Submitted 28 April, 2024; v1 submitted 22 February, 2023; originally announced February 2023.

arXiv:2302.05831 [pdf, ps, other]

On the Difficulty of Characterizing Network Formation with Endogenous Behavior

Authors: Benjamin Golub, Yu-Chi Hsieh, Evan Sadler

Abstract: Bolletta (2021, Math. Soc. Sci. 114:1-10) studies a model in which a network is strategically formed and then agents play a linear best-response investment game in it. The model is motivated by an application in which people choose both their study partners and their levels of educational effort. Agents have different one-dimensional types $\unicode{x2013}$ private returns to effort. A main result… ▽ More Bolletta (2021, Math. Soc. Sci. 114:1-10) studies a model in which a network is strategically formed and then agents play a linear best-response investment game in it. The model is motivated by an application in which people choose both their study partners and their levels of educational effort. Agents have different one-dimensional types $\unicode{x2013}$ private returns to effort. A main result claims that pairwise Nash stable networks have a locally complete structure consisting of possibly overlapping cliques: if two agents are linked, they are part of a clique composed of all agents with types between theirs. We offer a counterexample showing that the claimed characterization is incorrect, highlight where the analysis errs, and discuss implications for network formation models. △ Less

Submitted 22 February, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

arXiv:2301.05182 [pdf, other]

Thompson Sampling with Diffusion Generative Prior

Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum

Abstract: In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with… ▽ More In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with the learned prior to deal with new tasks at test time. Our posterior sampling algorithm is designed to carefully balance between the learned prior and the noisy observations that come from the learner's interaction with the environment. To capture realistic bandit scenarios, we also propose a novel diffusion model training procedure that trains even from incomplete and/or noisy data, which could be of independent interest. Finally, our extensive experimental evaluations clearly demonstrate the potential of the proposed approach. △ Less

Submitted 30 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

arXiv:2212.01287 [pdf, other]

SARAS-Net: Scale and Relation Aware Siamese Network for Change Detection

Authors: Chao-Peng Chen, Jun-Wei Hsieh, Ping-Yang Chen, Yi-Kuan Hsieh, Bor-Shiun Wang

Abstract: Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they igno… ▽ More Change detection (CD) aims to find the difference between two images at different times and outputs a change map to represent whether the region has changed or not. To achieve a better result in generating the change map, many State-of-The-Art (SoTA) methods design a deep learning model that has a powerful discriminative ability. However, these methods still get lower performance because they ignore spatial information and scaling changes between objects, giving rise to blurry or wrong boundaries. In addition to these, they also neglect the interactive information of two different images. To alleviate these problems, we propose our network, the Scale and Relation-Aware Siamese Network (SARAS-Net) to deal with this issue. In this paper, three modules are proposed that include relation-aware, scale-aware, and cross-transformer to tackle the problem of scene change detection more effectively. To verify our model, we tested three public datasets, including LEVIR-CD, WHU-CD, and DSFIN, and obtained SoTA accuracy. Our code is available at https://github.com/f64051041/SARAS-Net. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2211.12839 [pdf]

Newly Developed Flexible Grid Trading Model Combined ANN and SSO algorithm

Authors: Wei-Chang Yeh, Yu-Hsin Hsieh, Chia-Ling Huang

Abstract: In modern society, the trading methods and strategies used in financial market have gradually changed from traditional on-site trading to electronic remote trading, and even online automatic trading performed by a pre-programmed computer programs because the continuous development of network and computer computing technology. The quantitative trading, which the main purpose is to automatically for… ▽ More In modern society, the trading methods and strategies used in financial market have gradually changed from traditional on-site trading to electronic remote trading, and even online automatic trading performed by a pre-programmed computer programs because the continuous development of network and computer computing technology. The quantitative trading, which the main purpose is to automatically formulate people's investment decisions into a fixed and quantifiable operation logic that eliminates all emotional interference and the influence of subjective thoughts and applies this logic to financial market activities in order to obtain excess profits above average returns, has led a lot of attentions in financial market. The development of self-adjustment programming algorithms for automatically trading in financial market has transformed a top priority for academic research and financial practice. Thus, a new flexible grid trading model combined with the Simplified Swarm Optimization (SSO) algorithm for optimizing parameters for various market situations as input values and the fully connected neural network (FNN) and Long Short-Term Memory (LSTM) model for training a quantitative trading model to automatically calculate and adjust the optimal trading parameters for trading after inputting the existing market situation is developed and studied in this work. The proposed model provides a self-adjust model to reduce investors' effort in the trading market, obtains outperformed investment return rate and model robustness, and can properly control the balance between risk and return. △ Less

Submitted 5 September, 2022; originally announced November 2022.

arXiv:2211.06835 [pdf, other]

Scale-Aware Crowd Counting Using a Joint Likelihood Density Map and Synthetic Fusion Pyramid Network

Authors: Yi-Kuan Hsieh, Jun-Wei Hsieh, Yu-Chee Tseng, Ming-Ching Chang, Bor-Shiun Wang

Abstract: We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of… ▽ More We develop a Synthetic Fusion Pyramid Network (SPF-Net) with a scale-aware loss function design for accurate crowd counting. Existing crowd-counting methods assume that the training annotation points were accurate and thus ignore the fact that noisy annotations can lead to large model-learning bias and counting error, especially for counting highly dense crowds that appear far away. To the best of our knowledge, this work is the first to properly handle such noise at multiple scales in end-to-end loss design and thus push the crowd counting state-of-the-art. We model the noise of crowd annotation points as a Gaussian and derive the crowd probability density map from the input image. We then approximate the joint distribution of crowd density maps with the full covariance of multiple scales and derive a low-rank approximation for tractability and efficient implementation. The derived scale-aware loss function is used to train the SPF-Net. We show that it outperforms various loss functions on four public datasets: UCF-QNRF, UCF CC 50, NWPU and ShanghaiTech A-B datasets. The proposed SPF-Net can accurately predict the locations of people in the crowd, despite training on noisy training annotations. △ Less

Submitted 2 January, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

Comments: 8 pages, 8 figures, 4 tables

arXiv:2211.06330 [pdf, other]

doi 10.1109/ICDH55609.2022.00015

Health Guardian Platform: A technology stack to accelerate discovery in Digital Health research

Authors: Bo Wen, Vince S. Siu, Italo Buleje, Kuan Yu Hsieh, Takashi Itoh, Lukas Zimmerli, Nigel Hinds, Elif Eyigoz, Bing Dang, Stefan von Cavallar, Jeffrey L. Rogers

Abstract: This paper highlights the design philosophy and architecture of the Health Guardian, a platform developed by the IBM Digital Health team to accelerate discoveries of new digital biomarkers and development of digital health technologies. The Health Guardian allows for rapid translation of artificial intelligence (AI) research into cloud-based microservices that can be tested with data from clinical… ▽ More This paper highlights the design philosophy and architecture of the Health Guardian, a platform developed by the IBM Digital Health team to accelerate discoveries of new digital biomarkers and development of digital health technologies. The Health Guardian allows for rapid translation of artificial intelligence (AI) research into cloud-based microservices that can be tested with data from clinical cohorts to understand disease and enable early prevention. The platform can be connected to mobile applications, wearables, or Internet of things (IoT) devices to collect health-related data into a secure database. When the analytics are created, the researchers can containerize and deploy their code on the cloud using pre-defined templates, and validate the models using the data collected from one or more sensing devices. The Health Guardian platform currently supports time-series, text, audio, and video inputs with 70+ analytic capabilities and is used for non-commercial scientific research. We provide an example of the Alzheimer's disease (AD) assessment microservice which uses AI methods to extract linguistic features from audio recordings to evaluate an individual's mini-mental state, the likelihood of having AD, and to predict the onset of AD before turning the age of 85. Today, IBM research teams across the globe use the Health Guardian internally as a test bed for early-stage research ideas, and externally with collaborators to support and enhance AI model development and clinical study efforts. △ Less

Submitted 10 November, 2022; originally announced November 2022.

Comments: 6 pages, 3 figures, https://ieeexplore.ieee.org/document/9861047

Journal ref: IEEE International Conference on Digital Health (ICDH), 2022, pp. 40-46

arXiv:2211.04475 [pdf]

doi 10.3390/photonics9120948

Photonic Hook with Modulated Bending Angle Formed by Using Triangular Mesoscale Janus Prisms

Authors: Wei-Yu Chen, Cheng-Yang Liu, Yu-Kai Hsieh, Oleg V. Minin, Igor V. Minin

Abstract: In this study, we propose a novel design of the triangular mesoscale Janus prisms for the generation of the long photonic hook. The numerical simulations based on the finite-difference time-domain method are used to examine the formation mechanism of the photonic hook. The electric intensity distributions near the micro-prisms are calculated for operating at different re-fractive indices and space… ▽ More In this study, we propose a novel design of the triangular mesoscale Janus prisms for the generation of the long photonic hook. The numerical simulations based on the finite-difference time-domain method are used to examine the formation mechanism of the photonic hook. The electric intensity distributions near the micro-prisms are calculated for operating at different re-fractive indices and spaces of the two triangular micro-prisms. The asymmetric vortexes of in-tensity distributions result in the long photonic hook with large bending angle. The length and the bending angle of the photonic hook are efficiently modulated by changing the space be-tween the two triangular micro-prisms. Moreover, the narrow width of the photonic hook is achieved beyond the diffraction limit. The triangular Janus micro-prisms have high potential for practical applications in optical tweezers, nanoparticle sorting and manipulation and photonic circuits. △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: 8 Figures

MSC Class: 65Z05 ACM Class: J.2

arXiv:2210.13867 [pdf, ps, other]

A Dynamical System View of Langevin-Based Non-Convex Sampling

Authors: Mohammad Reza Karimi, Ya-Ping Hsieh, Andreas Krause

Abstract: Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture… ▽ More Non-convex sampling is a key challenge in machine learning, central to non-convex optimization in deep learning as well as to approximate probabilistic inference. Despite its significance, theoretically there remain many important challenges: Existing guarantees (1) typically only hold for the averaged iterates rather than the more desirable last iterates, (2) lack convergence metrics that capture the scales of the variables such as Wasserstein distances, and (3) mainly apply to elementary schemes such as stochastic gradient Langevin dynamics. In this paper, we develop a new framework that lifts the above issues by harnessing several tools from the theory of dynamical systems. Our key result is that, for a large class of state-of-the-art sampling schemes, their last-iterate convergence in Wasserstein distances can be reduced to the study of their continuous-time counterparts, which is much better understood. Coupled with standard assumptions of MCMC sampling, our theory immediately yields the last-iterate Wasserstein convergence of many advanced sampling schemes such as proximal, randomized mid-point, and Runge-Kutta integrators. Beyond existing methods, our framework also motivates more efficient schemes that enjoy the same rigorous guarantees. △ Less

Submitted 13 March, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: typos corrected, references added

MSC Class: 62D05

arXiv:2210.13291 [pdf, other]

doi 10.48550/arXiv.2210.13291

NVIDIA FLARE: Federated Learning from Simulation to Real-World

Authors: Holger R. Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, Yuan-Ting Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona Flores, Andrew Feng

Abstract: Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and… ▽ More Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package. It allows researchers to apply their data science workflows in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) in real-world FL settings. This paper introduces the key design principles of NVFlare and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare. △ Less

Submitted 28 April, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted at the International Workshop on Federated Learning, NeurIPS 2022, New Orleans, USA (https://federated-learning.org/fl-neurips-2022); Revised version v2: added Key Components list, system metrics for homomorphic encryption experiment; Extended v3 for journal submission

Journal ref: IEEE Data Eng. Bull., Vol. 46, No. 1, 2023

arXiv:2210.11382 [pdf, other]

doi 10.1063/5.0011935

Collective Thomson scattering in non-equilibrium laser produced two-stream plasmas

Authors: K. Sakai, S. Isayama, N. Bolouki, M. S. Habibi, Y. L. Liu, Y. H. Hsieh, H. H. Chu, J. Wang, S. H. Chen, T. Morita, K. Tomita, R. Yamazaki, Y. Sakawa, S. Matsukiyo, Y. Kuramitsu

Abstract: We investigate collective Thomson scattering (CTS) in two-stream non-equilibrium plasmas analytically, numerically and experimentally. In laboratory astrophysics, CTS is a unique tool to obtain local plasma diagnostics. While the standard CTS theory assumes plasmas to be linear, stationary, isotropic and equilibrium, it is often nonlinear, non-stationary, anisotropic, and non-equilibrium in high e… ▽ More We investigate collective Thomson scattering (CTS) in two-stream non-equilibrium plasmas analytically, numerically and experimentally. In laboratory astrophysics, CTS is a unique tool to obtain local plasma diagnostics. While the standard CTS theory assumes plasmas to be linear, stationary, isotropic and equilibrium, it is often nonlinear, non-stationary, anisotropic, and non-equilibrium in high energy phenomena relevant to laboratory astrophysics. We theoretically calculate and numerically simulate the CTS spectra in two-stream plasmas as a typical example of non-equilibrium system in space and astrophysical plasmas. The simulation results show the feasibility to diagnose two-stream instability directly via CTS measurements. In order to confirm the non-equilibrium CTS analysis, we have been developing experimental system with high repetition rate table top laser for laboratory astrophysics. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 14 pages, 9 figures, 1 table

Journal ref: Phys. Plasmas 27, 103104 (2020)

arXiv:2207.07105 [pdf, ps, other]

Continuous-time Analysis for Variational Inequalities: An Overview and Desiderata

Authors: Tatjana Chavdarova, Ya-Ping Hsieh, Michael I. Jordan

Abstract: Algorithms that solve zero-sum games, multi-objective agent objectives, or, more generally, variational inequality (VI) problems are notoriously unstable on general problems. Owing to the increasing need for solving such problems in machine learning, this instability has been highlighted in recent years as a significant research challenge. In this paper, we provide an overview of recent progress i… ▽ More Algorithms that solve zero-sum games, multi-objective agent objectives, or, more generally, variational inequality (VI) problems are notoriously unstable on general problems. Owing to the increasing need for solving such problems in machine learning, this instability has been highlighted in recent years as a significant research challenge. In this paper, we provide an overview of recent progress in the use of continuous-time perspectives in the analysis and design of methods targeting the broad VI problem class. Our presentation draws parallels between single-objective problems and multi-objective problems, highlighting the challenges of the latter. We also formulate various desiderata for algorithms that apply to general VIs and we argue that achieving these desiderata may profit from an understanding of the associated continuous-time dynamics. △ Less

Submitted 14 July, 2022; originally announced July 2022.

arXiv:2206.06795 [pdf, other]

Riemannian stochastic approximation algorithms

Authors: Mohammad Reza Karimi, Ya-Ping Hsieh, Panayotis Mertikopoulos, Andreas Krause

Abstract: We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this dif… ▽ More We examine a wide class of stochastic approximation algorithms for solving (stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise naturally in the study of Riemannian optimization, game theory and optimal transport, but their behavior is much less understood compared to the Euclidean case because of the lack of a global linear structure on the manifold. We overcome this difficulty by introducing a suitable Fermi coordinate frame which allows us to map the asymptotic behavior of the Riemannian Robbins-Monro (RRM) algorithms under study to that of an associated deterministic dynamical system. In so doing, we provide a general template of almost sure convergence results that mirrors and extends the existing theory for Euclidean Robbins-Monro schemes, despite the significant complications that arise due to the curvature and topology of the underlying manifold. We showcase the flexibility of the proposed framework by applying it to a range of retraction-based variants of the popular optimistic / extra-gradient methods for solving minimization problems and games, and we provide a unified treatment for their convergence. △ Less

Submitted 27 December, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: 33 pages, 2 figures; a one-page abstract of this paper was presented in COLT 2022

MSC Class: Primary 62L20; 37N40; secondary 90C15; 90C47; 90C48

arXiv:2206.06015 [pdf, other]

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

Authors: Yu-Guan Hsieh, Kimon Antonakopoulos, Volkan Cevher, Panayotis Mertikopoulos

Abstract: We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all con… ▽ More We examine the problem of regret minimization when the learner is involved in a continuous game with other optimizing agents: in this case, if all players follow a no-regret algorithm, it is possible to achieve significantly lower regret relative to fully adversarial environments. We study this problem in the context of variationally stable games (a class of continuous games which includes all convex-concave and monotone games), and when the players only have access to noisy estimates of their individual payoff gradients. If the noise is additive, the game-theoretic and purely adversarial settings enjoy similar regret guarantees; however, if the noise is multiplicative, we show that the learners can, in fact, achieve constant regret. We achieve this faster rate via an optimistic gradient scheme with learning rate separation -- that is, the method's extrapolation and update steps are tuned to different schedules, depending on the noise profile. Subsequently, to eliminate the need for delicate hyperparameter tuning, we propose a fully adaptive method that attains nearly the same guarantees as its non-adapted counterpart, while operating without knowledge of either the game or of the noise profile. △ Less

Submitted 17 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: In Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv:2206.04113 [pdf, other]

Push--Pull with Device Sampling

Authors: Yu-Guan Hsieh, Yassine Laguel, Franck Iutzeler, Jérôme Malick

Abstract: We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph. Specifically, we place ourselves in an asynchronous model where only a random portion of nodes perform computation at each iteration, while the information exchange can be conducted between all the nodes and i… ▽ More We consider decentralized optimization problems in which a number of agents collaborate to minimize the average of their local functions by exchanging over an underlying communication graph. Specifically, we place ourselves in an asynchronous model where only a random portion of nodes perform computation at each iteration, while the information exchange can be conducted between all the nodes and in an asymmetric fashion. For this setting, we propose an algorithm that combines gradient tracking with a network-level variance reduction (in contrast to variance reduction within each node). This enables each node to track the average of the gradients of the objective functions. Our theoretical analysis shows that the algorithm converges linearly, when the local objective functions are strongly convex, under mild connectivity conditions on the expected mixing matrices. In particular, our result does not require the mixing matrices to be doubly stochastic. In the experiments, we investigate a broadcast mechanism that transmits information from computing nodes to their neighbors, and confirm the linear convergence of our method on both synthetic and real-world datasets. △ Less

Submitted 17 March, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: In IEEE Transactions on Automatic Control

arXiv:2206.04091 [pdf, other]

Uplifting Bandits

Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton

Abstract: We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose U… ▽ More We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose UCB-style algorithms that estimate the uplifts of the actions over a baseline. We study multiple variants of the problem, including when the baseline and affected variables are unknown, and prove sublinear regret bounds for all of these. We also provide lower bounds that justify the necessity of our modeling assumptions. Experiments on synthetic and real-world datasets show the benefit of methods that estimate the uplifts over policies that do not use this structure. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2206.03922 [pdf, other]

A unified stochastic approximation framework for learning in games

Authors: Panayotis Mertikopoulos, Ya-Ping Hsieh, Volkan Cevher

Abstract: We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In a… ▽ More We develop a flexible stochastic approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite). The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, the exponential/multiplicative weights algorithm for learning in finite games, optimistic and bandit variants of the above, etc. In addition to providing an integrated view of these algorithms, our framework further allows us to obtain several new convergence results, both asymptotic and in finite time, in both continuous and finite games. Specifically, we provide a range of criteria for identifying classes of Nash equilibria and sets of action profiles that are attracting with high probability, and we also introduce the notion of coherence, a game-theoretic property that includes strict and sharp equilibria, and which leads to convergence in finite time. Importantly, our analysis applies to both oracle-based and bandit, payoff-based methods - that is, when players only observe their realized payoffs. △ Less

Submitted 3 July, 2023; v1 submitted 8 June, 2022; originally announced June 2022.

Comments: 40 pages, 5 figures, 2 tables

MSC Class: Primary 91A10; 91A26; secondary 68Q32; 68T02

arXiv:2202.05722 [pdf, other]

The Schrödinger Bridge between Gaussian Measures has a Closed Form

Authors: Charlotte Bunne, Ya-Ping Hsieh, Marco Cuturi, Andreas Krause

Abstract: The static optimal transport $(\mathrm{OT})$ problem between Gaussians seeks to recover an optimal map, or more generally a coupling, to morph a Gaussian into another. It has been well studied and applied to a wide variety of tasks. Here we focus on the dynamic formulation of OT, also known as the Schrödinger bridge (SB) problem, which has recently seen a surge of interest in machine learning due… ▽ More The static optimal transport $(\mathrm{OT})$ problem between Gaussians seeks to recover an optimal map, or more generally a coupling, to morph a Gaussian into another. It has been well studied and applied to a wide variety of tasks. Here we focus on the dynamic formulation of OT, also known as the Schrödinger bridge (SB) problem, which has recently seen a surge of interest in machine learning due to its connections with diffusion-based generative models. In contrast to the static setting, much less is known about the dynamic setting, even for Gaussian distributions. In this paper, we provide closed-form expressions for SBs between Gaussian measures. In contrast to the static Gaussian OT problem, which can be simply reduced to studying convex programs, our framework for solving SBs requires significantly more involved tools such as Riemannian geometry and generator theory. Notably, we establish that the solutions of SBs between Gaussian measures are themselves Gaussian processes with explicit mean and covariance kernels, and thus are readily amenable for many downstream applications such as generative modeling or interpolation. To demonstrate the utility, we devise a new method for modeling the evolution of single-cell genomics data and report significantly improved numerical stability compared to existing SB-based approaches. △ Less

Submitted 31 March, 2023; v1 submitted 11 February, 2022; originally announced February 2022.

arXiv:2112.02538 [pdf, ps, other]

Toward Real-World Voice Disorder Classification

Authors: Heng-Cheng Kuo, Yu-Peng Hsieh, Huan-Hsin Tseng, Chi-Te Wang, Shih-Hau Fang, Yu Tsao

Abstract: Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resour… ▽ More Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. Methods: This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain invariant features. Results: The results show that the unweighted average recall in the noisy real-world domain improved by 13% and remained at 80% in the clinic domain with only slight degradation. The domain mismatch was effectively eliminated. Moreover, the proposed system reduced the usage of both memory and computation by over 73.9%. Conclusion: By deploying factorized convolutional neural networks and domain adversarial training, domain-invariant features can be derived for voice disorder classification with limited resources. The promising results confirm that the proposed system can significantly reduce resource consumption and improve classification accuracy by considering the domain mismatch. Significance: To the best of our knowledge, this is the first study that jointly considers real-world model compression and noise-robustness issues in voice disorder classification. The proposed system is intended for application to embedded systems with limited resources. △ Less

Submitted 26 April, 2023; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: Accepted by IEEE TBME (under an IEEE Open Access publishing Agreement)

arXiv:2111.13744 [pdf, ps, other]

Yogurts Choose Consumers? Estimation of Random-Utility Models via Two-Sided Matching

Authors: Odran Bonnet, Alfred Galichon, Yu-Wei Hsieh, Keith O'Hara, Matt Shum

Abstract: The problem of demand inversion - a crucial step in the estimation of random utility discrete-choice models - is equivalent to the determination of stable outcomes in two-sided matching models. This equivalence applies to random utility models that are not necessarily additive, smooth, nor even invertible. Based on this equivalence, algorithms for the determination of stable matchings provide effe… ▽ More The problem of demand inversion - a crucial step in the estimation of random utility discrete-choice models - is equivalent to the determination of stable outcomes in two-sided matching models. This equivalence applies to random utility models that are not necessarily additive, smooth, nor even invertible. Based on this equivalence, algorithms for the determination of stable matchings provide effective computational methods for estimating these models. For non-invertible models, the identified set of utility vectors is a lattice, and the matching algorithms recover sharp upper and lower bounds on the utilities. Our matching approach facilitates estimation of models that were previously difficult to estimate, such as the pure characteristics model. An empirical application to voting data from the 1999 European Parliament elections illustrates the good performance of our matching-based demand inversion algorithms in practice. △ Less

Submitted 26 November, 2021; originally announced November 2021.

Comments: Forthcoming, Review of Economic Studies

arXiv:2110.04795 [pdf, ps, other]

Isogeny-based Group Signatures and Accountable Ring Signatures in QROM

Authors: Kai-Min Chung, Yao-Ching Hsieh, Mi-Ying Huang, Yu-Hsuan Huang, Tanja Lange, Bo-Yin Yang

Abstract: We provide the first isogeny-based group signature (GS) and accountable ring signature (ARS) that are provably secure in the quantum random oracle model (QROM). We do so by building an intermediate primitive called openable sigma protocol and show that every such protocol gives rise to a secure ARS and GS. Additionally, the QROM security is guaranteed if the perfect unique-response property is sat… ▽ More We provide the first isogeny-based group signature (GS) and accountable ring signature (ARS) that are provably secure in the quantum random oracle model (QROM). We do so by building an intermediate primitive called openable sigma protocol and show that every such protocol gives rise to a secure ARS and GS. Additionally, the QROM security is guaranteed if the perfect unique-response property is satisfied. Our design, with the underlying protocol satisfying this essential unique-response property, is sophisticatedly crafted for QROM security. From there, with clever twists to available proving techniques, we obtain the first isogeny-based ARS and GS that are proven QROM-secure. Concurrently, an efficient construction was proposed by Beullens et al. (Eurocrypt 2022), but is only proven secure in the classical random oracle model (ROM). Our proposal seeks stronger QROM security, although it is less efficient due to the signature size quadratically scaling with the ring/group size. △ Less

Submitted 2 November, 2022; v1 submitted 10 October, 2021; originally announced October 2021.

arXiv:2109.00711 [pdf, other]

Heterogeneous relational message passing networks for molecular dynamics simulations

Authors: Zun Wang, Chong Wang, Sibo Zhao, Yong Xu, Shaogang Hao, Chang Yu Hsieh, Bing-Lin Gu, Wenhui Duan

Abstract: With many frameworks based on message passing neural networks proposed to predict molecular and bulk properties, machine learning methods have tremendously shifted the paradigms of computational sciences underpinning physics, material science, chemistry, and biology. While existing machine learning models have yielded superior performances in many occasions, most of them model and process molecula… ▽ More With many frameworks based on message passing neural networks proposed to predict molecular and bulk properties, machine learning methods have tremendously shifted the paradigms of computational sciences underpinning physics, material science, chemistry, and biology. While existing machine learning models have yielded superior performances in many occasions, most of them model and process molecular systems in terms of homogeneous graph, which severely limits the expressive power for representing diverse interactions. In practice, graph data with multiple node and edge types is ubiquitous and more appropriate for molecular systems. Thus, we propose the heterogeneous relational message passing network (HermNet), an end-to-end heterogeneous graph neural networks, to efficiently express multiple interactions in a single model with {\it ab initio} accuracy. HermNet performs impressively against many top-performing models on both molecular and extended systems. Specifically, HermNet outperforms other tested models in nearly 75\%, 83\% and 94\% of tasks on MD17, QM9 and extended systems datasets, respectively. Finally, we elucidate how the design of HermNet is compatible with quantum mechanics from the perspective of the density functional theory. Besides, HermNet is a universal framework, whose sub-networks could be replaced by other advanced models. △ Less

Submitted 2 September, 2021; originally announced September 2021.

arXiv:2108.05797 [pdf]

doi 10.1038/s41467-022-34158-z

Generating extreme electric fields in 2D materials by dual ionic gating

Authors: Benjamin I. Weintrub, Yu-Ling Hsieh, Jan N. Kirchhof, Kirill I. Bolotin

Abstract: We demonstrate a new type of dual gate transistor to induce record electric fields through two-dimensional materials (2DMs). At the heart of this device is a 2DM suspended between two volumes of ionic liquid (IL) with independently controlled potentials. The potential difference between the ILs falls across an ultrathin layer consisting of the 2DM and the electrical double layers above and below i… ▽ More We demonstrate a new type of dual gate transistor to induce record electric fields through two-dimensional materials (2DMs). At the heart of this device is a 2DM suspended between two volumes of ionic liquid (IL) with independently controlled potentials. The potential difference between the ILs falls across an ultrathin layer consisting of the 2DM and the electrical double layers above and below it, thereby producing an intense electric field across the 2DM. We determine the field strength via i) electrical transport measurements and ii) direct measurements of electrochemical potentials of the ILs using semiconducting 2DM, WSe2. The field strength across the material reaches more than 3.5 V/nm, the largest static electric field through any electronic device to date. We demonstrate that this field is strong enough to close the bandgap of trilayer WSe2 driving a semiconductor-to-metal transition. Our approach grants access to previously-inaccessible phenomena occurring in ultrastrong electric fields. △ Less

Submitted 5 July, 2022; v1 submitted 11 August, 2021; originally announced August 2021.

Journal ref: Nature Communications 13, 6601 (2022)

arXiv:2107.00127 [pdf, other]

SQRP: Sensing Quality-aware Robot Programming System for Non-expert Programmers

Authors: Yi-Hsuan Hsieh, Pei-Chi Huang, Aloysius K Mok

Abstract: Robot programming typically makes use of a set of mechanical skills that is acquired by machine learning. Because there is in general no guarantee that machine learning produces robot programs that are free of surprising behavior, the safe execution of a robot program must utilize monitoring modules that take sensor data as inputs in real time to ensure the correctness of the skill execution. Owin… ▽ More Robot programming typically makes use of a set of mechanical skills that is acquired by machine learning. Because there is in general no guarantee that machine learning produces robot programs that are free of surprising behavior, the safe execution of a robot program must utilize monitoring modules that take sensor data as inputs in real time to ensure the correctness of the skill execution. Owing to the fact that sensors and monitoring algorithms are usually subject to physical restrictions and that effective robot programming is sensitive to the selection of skill parameters, these considerations may lead to different sensor input qualities such as the view coverage of a vision system that determines whether a skill can be successfully deployed in performing a task. Choosing improper skill parameters may cause the monitoring modules to delay or miss the detection of important events such as a mechanical failure. These failures may reduce the throughput in robotic manufacturing and could even cause a destructive system crash. To address above issues, we propose a sensing quality-aware robot programming system that automatically computes the sensing qualities as a function of the robot's environment and uses the information to guide non-expert users to select proper skill parameters in the programming phase. We demonstrate our system framework on a 6DOF robot arm for an object pick-up task. △ Less

Submitted 30 June, 2021; originally announced July 2021.

Comments: 7 pages, 9 figures, 1 table; accepted for presentation in IEEE ICRA 2021(IEEE International Conference on Robotics and Automation)

arXiv:2105.13348 [pdf, other]

Optimization in Open Networks via Dual Averaging

Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: In networks of autonomous agents (e.g., fleets of vehicles, scattered sensors), the problem of minimizing the sum of the agents' local functions has received a lot of interest. We tackle here this distributed optimization problem in the case of open networks when agents can join and leave the network at any time. Leveraging recent online optimization techniques, we propose and analyze the converge… ▽ More In networks of autonomous agents (e.g., fleets of vehicles, scattered sensors), the problem of minimizing the sum of the agents' local functions has received a lot of interest. We tackle here this distributed optimization problem in the case of open networks when agents can join and leave the network at any time. Leveraging recent online optimization techniques, we propose and analyze the convergence of a decentralized asynchronous optimization method for open networks. △ Less

Submitted 16 October, 2021; v1 submitted 27 May, 2021; originally announced May 2021.

Comments: In 60th IEEE Conference on Decision and Control (CDC 2021); 7 pages, 1 figure

arXiv:2105.07622 [pdf, other]

Ensemble-based Transfer Learning for Low-resource Machine Translation Quality Estimation

Authors: Ting-Wei Wu, Yung-An Hsieh, Yi-Chieh Liu

Abstract: Quality Estimation (QE) of Machine Translation (MT) is a task to estimate the quality scores for given translation outputs from an unknown MT system. However, QE scores for low-resource languages are usually intractable and hard to collect. In this paper, we focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20), but in a more challenging setting. We aim… ▽ More Quality Estimation (QE) of Machine Translation (MT) is a task to estimate the quality scores for given translation outputs from an unknown MT system. However, QE scores for low-resource languages are usually intractable and hard to collect. In this paper, we focus on the Sentence-Level QE Shared Task of the Fifth Conference on Machine Translation (WMT20), but in a more challenging setting. We aim to predict QE scores of given translation outputs when barely none of QE scores of that paired languages are given during training. We propose an ensemble-based predictor-estimator QE model with transfer learning to overcome such QE data scarcity challenge by leveraging QE scores from other miscellaneous languages and translation results of targeted languages. Based on the evaluation results, we provide a detailed analysis of how each of our extension affects QE models on the reliability and the generalization ability to perform transfer learning under multilingual tasks. Finally, we achieve the best performance on the ensemble model combining the models pretrained by individual languages as well as different levels of parallel trained corpus with a Pearson's correlation of 0.298, which is 2.54 times higher than baselines. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2104.12761 [pdf, other]

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

Authors: Yu-Guan Hsieh, Kimon Antonakopoulos, Panayotis Mertikopoulos

Abstract: In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may… ▽ More In game-theoretic learning, several agents are simultaneously following their individual interests, so the environment is non-stationary from each player's perspective. In this context, the performance of a learning algorithm is often measured by its regret. However, no-regret algorithms are not created equal in terms of game-theoretic guarantees: depending on how they are tuned, some of them may drive the system to an equilibrium, while others could produce cyclic, chaotic, or otherwise divergent trajectories. To account for this, we propose a range of no-regret policies based on optimistic mirror descent, with the following desirable properties: i) they do not require any prior tuning or knowledge of the game; ii) they all achieve O(\sqrt{T}) regret against arbitrary, adversarial opponents; and iii) they converge to the best response against convergent opponents. Also, if employed by all players, then iv) they guarantee O(1) social regret; while v) the induced sequence of play converges to Nash equilibrium with O(1) individual regret in all variationally stable games (a class of games that includes all monotone and convex-concave zero-sum games). △ Less

Submitted 16 October, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: In the 34th Annual Conference on Learning Theory (COLT 2021); 35 pages, 2 figures

arXiv:2103.13495 [pdf, other]

Machine Learning-based Automatic Graphene Detection with Color Correction for Optical Microscope Images

Authors: Hui-Ying Siao, Siyu Qi, Zhi Ding, Chia-Yu Lin, Yu-Chiang Hsieh, Tse-Ming Chen

Abstract: Graphene serves critical application and research purposes in various fields. However, fabricating high-quality and large quantities of graphene is time-consuming and it requires heavy human resource labor costs. In this paper, we propose a Machine Learning-based Automatic Graphene Detection Method with Color Correction (MLA-GDCC), a reliable and autonomous graphene detection from microscopic imag… ▽ More Graphene serves critical application and research purposes in various fields. However, fabricating high-quality and large quantities of graphene is time-consuming and it requires heavy human resource labor costs. In this paper, we propose a Machine Learning-based Automatic Graphene Detection Method with Color Correction (MLA-GDCC), a reliable and autonomous graphene detection from microscopic images. The MLA-GDCC includes a white balance (WB) to correct the color imbalance on the images, a modified U-Net and a support vector machine (SVM) to segment the graphene flakes. Considering the color shifts of the images caused by different cameras, we apply WB correction to correct the imbalance of the color pixels. A modified U-Net model, a convolutional neural network (CNN) architecture for fast and precise image segmentation, is introduced to segment the graphene flakes from the background. In order to improve the pixel-level accuracy, we implement a SVM after the modified U-Net model to separate the monolayer and bilayer graphene flakes. The MLA-GDCC achieves flake-level detection rates of 87.09% for monolayer and 90.41% for bilayer graphene, and the pixel-level accuracy of 99.27% for monolayer and 98.92% for bilayer graphene. MLA-GDCC not only achieves high detection rates of the graphene flakes but also speeds up the latency for the graphene detection process from hours to seconds. △ Less

Submitted 24 March, 2021; originally announced March 2021.

Comments: 14 pages, 8 figures

arXiv:2012.11579 [pdf, ps, other]

Multi-Agent Online Optimization with Delays: Asynchronicity, Adaptivity, and Optimism

Authors: Yu-Guan Hsieh, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos

Abstract: In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adapti… ▽ More In this paper, we provide a general framework for studying multi-agent online learning problems in the presence of delays and asynchronicities. Specifically, we propose and analyze a class of adaptive dual averaging schemes in which agents only need to accumulate gradient feedback received from the whole system, without requiring any between-agent coordination. In the single-agent case, the adaptivity of the proposed method allows us to extend a range of existing results to problems with potentially unbounded delays between playing an action and receiving the corresponding feedback. In the multi-agent case, the situation is significantly more complicated because agents may not have access to a global clock to use as a reference point; to overcome this, we focus on the information that is available for producing each prediction rather than the actual delay associated with each feedback. This allows us to derive adaptive learning strategies with optimal regret bounds, even in a fully decentralized, asynchronous environment. Finally, we also analyze an "optimistic" variant of the proposed algorithm which is capable of exploiting the predictability of problems with a slower variation and leads to improved regret bounds. △ Less

Submitted 16 April, 2022; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: Accepted by Journal of Machine Learning Research (JMLR)

arXiv:2007.15039 [pdf, other]

Extreme-K categorical samples problem

Authors: Elizabeth Chou, Catie McVey, Yin-Chen Hsieh, Sabrina Enriquez, Fushing Hsieh

Abstract: With histograms as its foundation, we develop Categorical Exploratory Data Analysis (CEDA) under the extreme-$K$ sample problem, and illustrate its universal applicability through four 1D categorical datasets. Given a sizable $K$, CEDA's ultimate goal amounts to discover by data's information content via carrying out two data-driven computational tasks: 1) establish a tree geometry upon $K$ popula… ▽ More With histograms as its foundation, we develop Categorical Exploratory Data Analysis (CEDA) under the extreme-$K$ sample problem, and illustrate its universal applicability through four 1D categorical datasets. Given a sizable $K$, CEDA's ultimate goal amounts to discover by data's information content via carrying out two data-driven computational tasks: 1) establish a tree geometry upon $K$ populations as a platform for discovering a wide spectrum of patterns among populations; 2) evaluate each geometric pattern's reliability. In CEDA developments, each population gives rise to a row vector of categories proportions. Upon the data matrix's row-axis, we discuss the pros and cons of Euclidean distance against its weighted version for building a binary clustering tree geometry. The criterion of choice rests on degrees of uniformness in column-blocks framed by this binary clustering tree. Each tree-leaf (population) is then encoded with a binary code sequence, so is tree-based pattern. For evaluating reliability, we adopt row-wise multinomial randomness to generate an ensemble of matrix mimicries, so an ensemble of mimicked binary trees. Reliability of any observed pattern is its recurrence rate within the tree ensemble. A high reliability value means a deterministic pattern. Our four applications of CEDA illuminate four significant aspects of extreme-$K$ sample problems. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: 20 pages, 12 figures

Showing 1–50 of 89 results for author: Hsieh, Y