Search | arXiv e-print repository

arXiv:2407.20223 [pdf, other]

Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning

Authors: Ray Zhang, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Ryan Eustice, Maani Ghaffari, Arnie Sen

Abstract: This paper introduces a robust unsupervised SE(3) point cloud registration method that operates without requiring point correspondences. The method frames point clouds as functions in a reproducing kernel Hilbert space (RKHS), leveraging SE(3)-equivariant features for direct feature space registration. A novel RKHS distance metric is proposed, offering reliable performance amidst noise, outliers,… ▽ More This paper introduces a robust unsupervised SE(3) point cloud registration method that operates without requiring point correspondences. The method frames point clouds as functions in a reproducing kernel Hilbert space (RKHS), leveraging SE(3)-equivariant features for direct feature space registration. A novel RKHS distance metric is proposed, offering reliable performance amidst noise, outliers, and asymmetrical data. An unsupervised training approach is introduced to effectively handle limited ground truth data, facilitating adaptation to real datasets. The proposed method outperforms classical and supervised methods in terms of registration accuracy on both synthetic (ModelNet40) and real-world (ETH3D) noisy, outlier-rich datasets. To our best knowledge, this marks the first instance of successful real RGB-D odometry data registration using an equivariant method. The code is available at {https://sites.google.com/view/eccv24-equivalign} △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 10 pages, to be published in ECCV 2024

arXiv:2407.17457 [pdf, other]

CSCPR: Cross-Source-Context Indoor RGB-D Place Recognition

Authors: Jing Liang, Zhuo Deng, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Arnie Sen, Dinesh Manocha

Abstract: We present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking into a single end-to-end model. Unlike prior approaches that primarily focus on the RGB domain, CSCPR is designed to handle the RGB-D data. We extend the Context-of-Clusters (CoCs) for handling noisy colorized point clouds and introduce two n… ▽ More We present a new algorithm, Cross-Source-Context Place Recognition (CSCPR), for RGB-D indoor place recognition that integrates global retrieval and reranking into a single end-to-end model. Unlike prior approaches that primarily focus on the RGB domain, CSCPR is designed to handle the RGB-D data. We extend the Context-of-Clusters (CoCs) for handling noisy colorized point clouds and introduce two novel modules for reranking: the Self-Context Cluster (SCC) and Cross Source Context Cluster (CSCC), which enhance feature representation and match query-database pairs based on local features, respectively. We also present two new datasets, ScanNetIPR and ARKitIPR. Our experiments demonstrate that CSCPR significantly outperforms state-of-the-art models on these datasets by at least 36.5% in Recall@1 at ScanNet-PR dataset and 44% in new datasets. Code and datasets will be released. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.12939 [pdf, other]

GenRC: Generative 3D Room Completion from Sparse Image Collections

Authors: Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen, Chi Liu, Yu-Lun Liu, Albert Y. C. Chen, Cheng-Hao Kuo, Min Sun

Abstract: Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first proje… ▽ More Sparse RGBD scene completion is a challenging task especially when considering consistent textures and geometries throughout the entire scene. Different from existing solutions that rely on human-designed text prompts or predefined camera trajectories, we propose GenRC, an automated training-free pipeline to complete a room-scale 3D mesh with high-fidelity textures. To achieve this, we first project the sparse RGBD images to a highly incomplete 3D mesh. Instead of iteratively generating novel views to fill in the void, we utilized our proposed E-Diffusion to generate a view-consistent panoramic RGBD image which ensures global geometry and appearance consistency. Furthermore, we maintain the input-output scene stylistic consistency through textual inversion to replace human-designed text prompts. To bridge the domain gap among datasets, E-Diffusion leverages models trained on large-scale datasets to generate diverse appearances. GenRC outperforms state-of-the-art methods under most appearance and geometric metrics on ScanNet and ARKitScenes datasets, even though GenRC is not trained on these datasets nor using predefined camera trajectories. Project page: https://minfenli.github.io/GenRC △ Less

Submitted 18 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.12342 [pdf, other]

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Authors: Jintang Xue, Yun-Cheng Wang, Chengwei Wei, C. -C. Jay Kuo

Abstract: As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases and it can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores… ▽ More As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases and it can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method, named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments conducted on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.07666 [pdf]

A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability

Authors: Ting Fang Tan, Kabilan Elangovan, Jasmine Ong, Nigam Shah, Joseph Sung, Tien Yin Wong, Lan Xue, Nan Liu, Haibo Wang, Chang Fu Kuo, Simon Chesterman, Zee Kin Yeong, Daniel SW Ting

Abstract: A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models… ▽ More A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.05590 [pdf, other]

GSBIQA: Green Saliency-guided Blind Image Quality Assessment Method

Authors: Zhanxuan Mei, Yun-Cheng Wang, C. -C. Jay Kuo

Abstract: Blind Image Quality Assessment (BIQA) is an essential task that estimates the perceptual quality of images without reference. While many BIQA methods employ deep neural networks (DNNs) and incorporate saliency detectors to enhance performance, their large model sizes limit deployment on resource-constrained devices. To address this challenge, we introduce a novel and non-deep-learning BIQA method… ▽ More Blind Image Quality Assessment (BIQA) is an essential task that estimates the perceptual quality of images without reference. While many BIQA methods employ deep neural networks (DNNs) and incorporate saliency detectors to enhance performance, their large model sizes limit deployment on resource-constrained devices. To address this challenge, we introduce a novel and non-deep-learning BIQA method with a lightweight saliency detection module, called Green Saliency-guided Blind Image Quality Assessment (GSBIQA). It is characterized by its minimal model size, reduced computational demands, and robust performance. Experimental results show that the performance of GSBIQA is comparable with state-of-the-art DL-based methods with significantly lower resource requirements. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.19263 [pdf, other]

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid accessible screen reading tools, in great need of new models driven by advancements in Multimodal Large Language Models (MLLMs). In this paper, we propose a Tree-of-Lens (ToL) agent, utilizing a novel ToL grounding mechanism, to address the SPR task. Based on the input point coordinate and the corresponding GUI screenshot, our ToL agent constructs a Hierarchical Layout Tree. Based on the tree, our ToL agent not only comprehends the content of the indicated area but also articulates the layout and spatial relationships between elements. Such layout information is crucial for accurately interpreting information on the screen, distinguishing our ToL agent from other screen reading tools. We also thoroughly evaluate the ToL agent against other baselines on a newly proposed SPR benchmark, which includes GUIs from mobile, web, and operating systems. Last but not least, we test the ToL agent on mobile GUI navigation tasks, demonstrating its utility in identifying incorrect actions along the path of agent execution trajectories. Code and data: screen-point-and-read.github.io △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.16222 [pdf, other]

The microlocal Riemann-Hilbert correspondence for complex contact manifolds

Authors: Laurent Côté, Christopher Kuo, David Nadler, Vivek Shende

Abstract: Kashiwara showed in 1996 that the categories of microlocalized D-modules can be canonically glued to give a sheaf of categories over a complex contact manifold. Much more recently, and by rather different considerations, we constructed a canonical notion of perverse microsheaves on the same class of spaces. Here we provide a Riemann-Hilbert correspondence. Kashiwara showed in 1996 that the categories of microlocalized D-modules can be canonically glued to give a sheaf of categories over a complex contact manifold. Much more recently, and by rather different considerations, we constructed a canonical notion of perverse microsheaves on the same class of spaces. Here we provide a Riemann-Hilbert correspondence. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 47 pages

arXiv:2406.12585 [pdf, other]

Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling

Authors: Yao-Ching Yu, Chun-Chih Kuo, Ziqi Ye, Yu-Cheng Chang, Yueh-Se Li

Abstract: Ensembling multiple models has always been an effective approach to push the limits of existing performance and is widely used in classification tasks by simply averaging the classification probability vectors from multiple classifiers to achieve better accuracy. However, in the thriving open-source Large Language Model (LLM) community, ensembling methods are rare and typically limited to ensembli… ▽ More Ensembling multiple models has always been an effective approach to push the limits of existing performance and is widely used in classification tasks by simply averaging the classification probability vectors from multiple classifiers to achieve better accuracy. However, in the thriving open-source Large Language Model (LLM) community, ensembling methods are rare and typically limited to ensembling the full-text outputs of LLMs, such as selecting the best output using a ranker, which leads to underutilization of token-level probability information. In this paper, we treat the Generation of each token by LLMs as a Classification (GaC) for ensembling. This approach fully exploits the probability information at each generation step and better prevents LLMs from producing early incorrect tokens that lead to snowballing errors. In experiments, we ensemble state-of-the-art LLMs on several benchmarks, including exams, mathematics and reasoning, and observe that our method breaks the existing community performance ceiling. Furthermore, we observed that most of the tokens in the answer are simple and do not affect the correctness of the final answer. Therefore, we also experimented with ensembling only key tokens, and the results showed better performance with lower latency across benchmarks. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.11937 [pdf, other]

Using graph neural networks to reconstruct charged pion showers in the CMS High Granularity Calorimeter

Authors: M. Aamir, B. Acar, G. Adamov, T. Adams, C. Adloff, S. Afanasiev, C. Agrawal, C. Agrawal, A. Ahmad, H. A. Ahmed, S. Akbar, N. Akchurin, B. Akgul, B. Akgun, R. O. Akpinar, E. Aktas, A. AlKadhim, V. Alexakhin, J. Alimena, J. Alison, A. Alpana, W. Alshehri, P. Alvarez Dominguez, M. Alyari, C. Amendola , et al. (550 additional authors not shown)

Abstract: A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadr… ▽ More A novel method to reconstruct the energy of hadronic showers in the CMS High Granularity Calorimeter (HGCAL) is presented. The HGCAL is a sampling calorimeter with very fine transverse and longitudinal granularity. The active media are silicon sensors and scintillator tiles readout by SiPMs and the absorbers are a combination of lead and Cu/CuW in the electromagnetic section, and steel in the hadronic section. The shower reconstruction method is based on graph neural networks and it makes use of a dynamic reduction network architecture. It is shown that the algorithm is able to capture and mitigate the main effects that normally hinder the reconstruction of hadronic showers using classical reconstruction methods, by compensating for fluctuations in the multiplicity, energy, and spatial distributions of the shower's constituents. The performance of the algorithm is evaluated using test beam data collected in 2018 prototype of the CMS HGCAL accompanied by a section of the CALICE AHCAL prototype. The capability of the method to mitigate the impact of energy leakage from the calorimeter is also demonstrated. △ Less

Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Prepared for submission to JINST

arXiv:2406.11309 [pdf, other]

BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models

Authors: Xuefeng Hu, Ke Zhang, Min Sun, Albert Chen, Cheng-Hao Kuo, Ram Nevatia

Abstract: Large-scale pretrained vision-language models like CLIP have demonstrated remarkable zero-shot image classification capabilities across diverse domains. To enhance CLIP's performance while preserving the zero-shot paradigm, various test-time prompt tuning methods have been introduced to refine class embeddings through unsupervised learning objectives during inference. However, these methods often… ▽ More Large-scale pretrained vision-language models like CLIP have demonstrated remarkable zero-shot image classification capabilities across diverse domains. To enhance CLIP's performance while preserving the zero-shot paradigm, various test-time prompt tuning methods have been introduced to refine class embeddings through unsupervised learning objectives during inference. However, these methods often encounter challenges in selecting appropriate learning rates to prevent collapsed training in the absence of validation data during test-time adaptation. In this study, we propose a novel backpropagation-free algorithm BaFTA for test-time adaptation of vision-language models. Instead of fine-tuning text prompts to refine class embeddings, our approach directly estimates class centroids using online clustering within a projected embedding space that aligns text and visual embeddings. We dynamically aggregate predictions from both estimated and original class embeddings, as well as from distinct augmented views, by assessing the reliability of each prediction using Rényi Entropy. Through extensive experiments, we demonstrate that BaFTA consistently outperforms state-of-the-art test-time adaptation methods in both effectiveness and efficiency. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Preprint updated from our earlier manuscript submitted to ICLR 2024 (https://openreview.net/forum?id=KNtcoAM5Gy)

arXiv:2406.10484 [pdf, other]

Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model

Authors: Lu Xu, Sijie Zhu, Chunyuan Li, Chia-Wen Kuo, Fan Chen, Xinyao Wang, Guang Chen, Dawei Du, Ye Yuan, Longyin Wen

Abstract: The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos in real-world applications are edited videos, \textit{e.g.}, users usually cut and add effects/modifications to the raw video before publishing it on s… ▽ More The emerging video LMMs (Large Multimodal Models) have achieved significant improvements on generic video understanding in the form of VQA (Visual Question Answering), where the raw videos are captured by cameras. However, a large portion of videos in real-world applications are edited videos, \textit{e.g.}, users usually cut and add effects/modifications to the raw video before publishing it on social media platforms. The edited videos usually have high view counts but they are not covered in existing benchmarks of video LMMs, \textit{i.e.}, ActivityNet-QA, or VideoChatGPT benchmark. In this paper, we leverage the edited videos on a popular short video platform, \textit{i.e.}, TikTok, and build a video VQA benchmark (named EditVid-QA) covering four typical editing categories, i.e., effect, funny, meme, and game. Funny and meme videos benchmark nuanced understanding and high-level reasoning, while effect and game evaluate the understanding capability of artificial design. Most of the open-source video LMMs perform poorly on the EditVid-QA benchmark, indicating a huge domain gap between edited short videos on social media and regular raw videos. To improve the generalization ability of LMMs, we collect a training set for the proposed benchmark based on both Panda-70M/WebVid raw videos and small-scale TikTok/CapCut edited videos, which boosts the performance on the proposed EditVid-QA benchmark, indicating the effectiveness of high-quality training data. We also identified a serious issue in the existing evaluation protocol using the GPT-3.5 judge, namely a "sorry" attack, where a sorry-style naive answer can achieve an extremely high rating from the GPT judge, e.g., over 4.3 for correctness score on VideoChatGPT evaluation protocol. To avoid the "sorry" attacks, we evaluate results with GPT-4 judge and keyword filtering. The datasets will be released for academic purposes only. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.00857 [pdf, other]

Modeling the refractive index profile n(z) of polar ice for ultra-high energy neutrino experiments

Authors: S. Ali, P. Allison, S. Archambault, J. J. Beatty, D. Z. Besson, A. Bishop, P. Chen, Y. C. Chen, B. A. Clark, W. Clay, A. Connolly, K. Couberly, L. Cremonesi, A. Cummings, P. Dasgupta, R. Debolt, S. de Kockere, K. D. de Vries, C. Deaconu, M. A. DuVernois, J. Flaherty, E. Friedman, R. Gaior, P. Giri, J. Hanson , et al. (45 additional authors not shown)

Abstract: We develop an in-situ index of refraction profile using the transit time of radio signals broadcast from an englacial transmitter to 2-5 km distant radio-frequency receivers, deployed at depths up to 200 m. Maxwell's equations generally admit two ray propagation solutions from a given transmitter, corresponding to a direct path (D) and a refracted path (R); the measured D vs. R (dt(D,R)) timing di… ▽ More We develop an in-situ index of refraction profile using the transit time of radio signals broadcast from an englacial transmitter to 2-5 km distant radio-frequency receivers, deployed at depths up to 200 m. Maxwell's equations generally admit two ray propagation solutions from a given transmitter, corresponding to a direct path (D) and a refracted path (R); the measured D vs. R (dt(D,R)) timing differences provide constraints on the index of refraction profile near South Pole, where the Askaryan Radio Array (ARA) neutrino observatory is located. We constrain the refractive index profile by simulating D and R ray paths via ray tracing and comparing those to measured dt(D,R) signals. Using previous ice density data as a proxy for n(z), we demonstrate that our data strongly favors a glaciologically-motivated three-phase densification model rather than a single exponential scale height model. Simulations show that the single exponential model overestimates ARA neutrino sensitivity compared to the three-phase model. △ Less

Submitted 11 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20292 [pdf, other]

All-Loop Geometry for Four-Point Correlation Function

Authors: Song He, Yu-tin Huang, Chia-Kai Kuo

Abstract: In this letter, we consider a positive geometry conjectured to encode the loop integrand of four-point stress-energy correlators in planar $\mathcal{N}=4$ super Yang-Mills. Beginning with four lines in twistor space, we characterize a positive subspace to which an $\ell$-loop geometry is attached. The loop geometry then consists of $\ell$ lines in twistor space satisfying positivity conditions amo… ▽ More In this letter, we consider a positive geometry conjectured to encode the loop integrand of four-point stress-energy correlators in planar $\mathcal{N}=4$ super Yang-Mills. Beginning with four lines in twistor space, we characterize a positive subspace to which an $\ell$-loop geometry is attached. The loop geometry then consists of $\ell$ lines in twistor space satisfying positivity conditions among themselves and with respect to the base. Consequently, the \textit{loop geometry} can be viewed as fibration over a \textit{tree geometry}. The fibration naturally dissects the base into chambers, in which the degree-$4 \ell$ loop form is unique and distinct for each chamber. Interestingly, up to three loops, the chambers are simply organized by the six ordering of $x^2_{12}x^2_{34}$, $x^2_{14}x^2_{23}$ and $x^2_{13}x^2_{24}$. We explicitly verify our conjecture by computing the loop-forms in terms of a basis of planar conformal integrals up to $\ell=3$, which indeed yield correct loop integrands for the four-point correlator. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 7 pages + 2 figures

arXiv:2405.19595 [pdf]

The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset

Authors: Jeffrey D. Rudie, Hui-Ming Lin, Robyn L. Ball, Sabeena Jalal, Luciano M. Prevedello, Savvas Nicolaou, Brett S. Marinelli, Adam E. Flanders, Kirti Magudia, George Shih, Melissa A. Davis, John Mongan, Peter D. Chang, Ferco H. Berger, Sebastiaan Hermans, Meng Law, Tyler Richards, Jan-Peter Grunz, Andreas Steven Kunz, Shobhit Mathur, Sandro Galea-Soler, Andrew D. Chung, Saif Afat, Chin-Chi Kuo, Layal Aweidah , et al. (15 additional authors not shown)

Abstract: The RSNA Abdominal Traumatic Injury CT (RATIC) dataset is the largest publicly available collection of adult abdominal CT studies annotated for traumatic injuries. This dataset includes 4,274 studies from 23 institutions across 14 countries. The dataset is freely available for non-commercial use via Kaggle at https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection. Created for the… ▽ More The RSNA Abdominal Traumatic Injury CT (RATIC) dataset is the largest publicly available collection of adult abdominal CT studies annotated for traumatic injuries. This dataset includes 4,274 studies from 23 institutions across 14 countries. The dataset is freely available for non-commercial use via Kaggle at https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection. Created for the RSNA 2023 Abdominal Trauma Detection competition, the dataset encourages the development of advanced machine learning models for detecting abdominal injuries on CT scans. The dataset encompasses detection and classification of traumatic injuries across multiple organs, including the liver, spleen, kidneys, bowel, and mesentery. Annotations were created by expert radiologists from the American Society of Emergency Radiology (ASER) and Society of Abdominal Radiology (SAR). The dataset is annotated at multiple levels, including the presence of injuries in three solid organs with injury grading, image-level annotations for active extravasations and bowel injury, and voxelwise segmentations of each of the potentially injured organs. With the release of this dataset, we hope to facilitate research and development in machine learning and abdominal trauma that can lead to improved patient care and outcomes. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 40 pages, 2 figures, 3 tables

arXiv:2405.19469 [pdf, other]

Constraining Inflation with the BICEP/Keck CMB Polarization Experiments

Authors: The BICEP/Keck Collaboration, :, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, D. Beck, J. J. Bock, H. Boenish, V. Buza, J. R. Cheshire IV, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. V. Denison, M. Dierickx, L. Duband, M. Eiben, B. Elwood, S. Fatigoni, J. P. Filippini, M. Gao , et al. (63 additional authors not shown)

Abstract: The BICEP/$\textit{Keck}$ (BK) series of cosmic microwave background (CMB) polarization experiments has, over the past decade and a half, produced a series of field-leading constraints on cosmic inflation via measurements of the "B-mode" polarization of the CMB. Primordial B modes are directly tied to the amplitude of primordial gravitational waves (PGW), their strength parameterized by the tensor… ▽ More The BICEP/$\textit{Keck}$ (BK) series of cosmic microwave background (CMB) polarization experiments has, over the past decade and a half, produced a series of field-leading constraints on cosmic inflation via measurements of the "B-mode" polarization of the CMB. Primordial B modes are directly tied to the amplitude of primordial gravitational waves (PGW), their strength parameterized by the tensor-to-scalar ratio, $r$, and thus the energy scale of inflation. Having set the most sensitive constraints to-date on $r$, $σ(r)=0.009$ ($r_{0.05}<0.036, 95\%$ C.L.) using data through the 2018 observing season ("BK18"), the BICEP/$\textit{Keck}$ program has continued to improve its dataset in the years since. We give a brief overview of the BK program and the "BK18" result before discussing the program's ongoing efforts, including the deployment and performance of the $\textit{Keck Array}$'s successor instrument, BICEP Array, improvements to data processing and internal consistency testing, new techniques such as delensing, and how those will ultimately serve to allow BK reach $σ(r) \lesssim 0.003$ using data through the 2027 observing season. △ Less

Submitted 11 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: 9 pages, 5 figures. Contribution to the 2024 Cosmology session of the 58th Rencontres de Moriond

arXiv:2405.16144 [pdf, other]

GreenCOD: A Green Camouflaged Object Detection Method

Authors: Hong-Shuo Chen, Yao Zhu, Suya You, Azad M. Madni, C. -C. Jay Kuo

Abstract: We introduce GreenCOD, a green method for detecting camouflaged objects, distinct in its avoidance of backpropagation techniques. GreenCOD leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs). Traditional camouflaged object detection (COD) approaches often rely on complex deep neural network architectures, seeking performance improvements through bac… ▽ More We introduce GreenCOD, a green method for detecting camouflaged objects, distinct in its avoidance of backpropagation techniques. GreenCOD leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs). Traditional camouflaged object detection (COD) approaches often rely on complex deep neural network architectures, seeking performance improvements through backpropagation-based fine-tuning. However, such methods are typically computationally demanding and exhibit only marginal performance variations across different models. This raises the question of whether effective training can be achieved without backpropagation. Addressing this, our work proposes a new paradigm that utilizes gradient boosting for COD. This approach significantly simplifies the model design, resulting in a system that requires fewer parameters and operations and maintains high performance compared to state-of-the-art deep learning models. Remarkably, our models are trained without backpropagation and achieve the best performance with fewer than 20G Multiply-Accumulate Operations (MACs). This new, more efficient paradigm opens avenues for further exploration in green, backpropagation-free model training. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.15211 [pdf, other]

Duality, Künneth formulae, and integral transforms in microlocal geometry

Authors: Christopher Kuo, Wenyuan Li

Abstract: We study the dualizability of sheaves on manifolds with isotropic singular supports $\operatorname{Sh}_Λ(M)$ and microsheaves with isotropic supports $\operatorname{μsh}_Λ(Λ)$ and obtain a classification result of colimit-preserving functors by convolutions of sheaf kernels. Moreover, for sheaves with isotropic singular supports and compact supports $\operatorname{Sh}_Λ^b(M)_0$, the standard categ… ▽ More We study the dualizability of sheaves on manifolds with isotropic singular supports $\operatorname{Sh}_Λ(M)$ and microsheaves with isotropic supports $\operatorname{μsh}_Λ(Λ)$ and obtain a classification result of colimit-preserving functors by convolutions of sheaf kernels. Moreover, for sheaves with isotropic singular supports and compact supports $\operatorname{Sh}_Λ^b(M)_0$, the standard categorical duality and Verdier duality are related by the wrap-once functor, which is the inverse Serre functor in proper objects, and we thus show that the Verdier duality extends naturally to all compact objects $\operatorname{Sh}_Λ^c(M)_0$ when the wrap-once functor is an equivalence, for instance, when $Λ$ is a full Legendrian stop or a swappable Legendrian stop. △ Less

Submitted 27 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 29 pages, 3 figures. This article extends certain results from arXiv:2210.06643, which no longer includes them since version 4. Notably, the Künneth formula, duality, and the Fourier-Mukai property are now established for microsheaves, as summarized in Theorem 1.4. A mistake regarding what is known about Theorem 1.8 from the Floer literature is also fixed

MSC Class: 53Dxx (Primary) 55N30 (Secondary)

arXiv:2405.07197 [pdf, other]

Qsyn: A Developer-Friendly Quantum Circuit Synthesis Framework for NISQ Era and Beyond

Authors: Mu-Te Lau, Chin-Yi Cheng, Cheng-Hua Lu, Chia-Hsu Chuang, Yi-Hsiang Kuo, Hsiang-Chun Yang, Chien-Tung Kuo, Hsin-Yu Chen, Chen-Ying Tung, Cheng-En Tsai, Guan-Hao Chen, Leng-Kai Lin, Ching-Huan Wang, Tzu-Hsu Wang, Chung-Yang Ric Huang

Abstract: In this paper, we introduce a new quantum circuit synthesis (QCS) framework, Qsyn, for developers to research, develop, test, experiment, and then contribute their QCS algorithms and tools to the framework. Our framework is more developer-friendly than other modern QCS frameworks in three aspects: (1) We design a rich command-line interface so that developers can easily design various testing scen… ▽ More In this paper, we introduce a new quantum circuit synthesis (QCS) framework, Qsyn, for developers to research, develop, test, experiment, and then contribute their QCS algorithms and tools to the framework. Our framework is more developer-friendly than other modern QCS frameworks in three aspects: (1) We design a rich command-line interface so that developers can easily design various testing scenarios and flexibly conduct experiments on their algorithms. (2) We offer detailed access to many data representations on different abstract levels of quantum circuits so that developers can optimize their algorithms to the extreme. (3) We define a rigid developing flow and environment so that developers can ensure their development qualities with the best modern software engineering practices. We illustrate the friendliness of our framework with a showcase of developing a T-Count Optimization algorithm and demonstrate our performance superiority with fair comparisons to other modern QCS frameworks. △ Less

Submitted 12 May, 2024; originally announced May 2024.

arXiv:2405.05949 [pdf, other]

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Authors: Jiachen Li, Xinyao Wang, Sijie Zhu, Chia-Wen Kuo, Lu Xu, Fan Chen, Jitesh Jain, Humphrey Shi, Longyin Wen

Abstract: Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Exp… ▽ More Recent advancements in Multimodal Large Language Models (LLMs) have focused primarily on scaling by increasing text-image pair data and enhancing LLMs to improve performance on multimodal tasks. However, these scaling approaches are computationally expensive and overlook the significance of improving model capabilities from the vision side. Inspired by the successful applications of Mixture-of-Experts (MoE) in LLMs, which improves model scalability during training while keeping inference costs similar to those of smaller models, we propose CuMo. CuMo incorporates Co-upcycled Top-K sparsely-gated Mixture-of-experts blocks into both the vision encoder and the MLP connector, thereby enhancing the multimodal LLMs with minimal additional activated parameters during inference. CuMo first pre-trains the MLP blocks and then initializes each expert in the MoE block from the pre-trained MLP block during the visual instruction tuning stage. Auxiliary losses are used to ensure a balanced loading of experts. CuMo outperforms state-of-the-art multimodal LLMs across various VQA and visual-instruction-following benchmarks using models within each model size group, all while training exclusively on open-sourced datasets. The code and model weights for CuMo are open-sourced at https://github.com/SHI-Labs/CuMo. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03499 [pdf, ps, other]

Physical properties and electronic structure of the two-gap superconductor V$_{2}$Ga$_{5}$

Authors: P. -Y. Cheng, Mohamed Oudah, T. -L. Hung, C. -E. Hsu, C. -C. Chang, J. -Y. Haung, T. -C. Liu, C. -M. Cheng, M. -N. Ou, W. -T. Chen, L. Z. Deng, C. -C. Lee, Y. -Y. Chen, C. -N. Kuo, C. -S. Lue, Janna Machts, Kenji M. Kojima, Alannah M. Hallas, C. -L. Huang

Abstract: We present a thorough investigation of the physical properties and superconductivity of the binary intermetallic V2Ga5. Electrical resistivity and specific heat measurements show that V2Ga5 enters its superconducting state below Tsc = 3.5 K, with a critical field of Hc2,perp c(Hc2,para c) = 6.5(4.1) kOe. With H perp c, the peak effect was observed in resistivity measurements, indicating the ultrah… ▽ More We present a thorough investigation of the physical properties and superconductivity of the binary intermetallic V2Ga5. Electrical resistivity and specific heat measurements show that V2Ga5 enters its superconducting state below Tsc = 3.5 K, with a critical field of Hc2,perp c(Hc2,para c) = 6.5(4.1) kOe. With H perp c, the peak effect was observed in resistivity measurements, indicating the ultrahigh quality of the single crystal studied. The resistivity measurements under high pressure reveal that the Tsc is suppressed linearly with pressure and reaches absolute zero around 20 GPa. Specific heat and muon spin relaxation measurements both indicate that the two-gap s-wave model best describes the superconductivity of V2Ga5. The spectra obtained from angle-resolved photoemission spectroscopy measurements suggest that two superconducting gaps open at the Fermi surface around the Z and Γ points. These results are verified by first-principles band structure calculations. We therefore conclude that V2Ga5 is a phonon-mediated two-gap s-wave superconductor △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Some images experience distortion during the conversion process to EPS format

arXiv:2404.09993 [pdf, other]

No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

Authors: Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng, Wei Wang, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo, Ming-Hsuan Yang

Abstract: Inherent ambiguity in layout annotations poses significant challenges to developing accurate 360° room layout estimation models. To address this issue, we propose a novel Bi-Layout model capable of predicting two distinct layout types. One stops at ambiguous regions, while the other extends to encompass all visible areas. Our model employs two global context embeddings, where each embedding is des… ▽ More Inherent ambiguity in layout annotations poses significant challenges to developing accurate 360° room layout estimation models. To address this issue, we propose a novel Bi-Layout model capable of predicting two distinct layout types. One stops at ambiguous regions, while the other extends to encompass all visible areas. Our model employs two global context embeddings, where each embedding is designed to capture specific contextual information for each layout type. With our novel feature guidance module, the image feature retrieves relevant context from these embeddings, generating layout-aware features for precise bi-layout predictions. A unique property of our Bi-Layout model is its ability to inherently detect ambiguous regions by comparing the two predictions. To circumvent the need for manual correction of ambiguous annotations during testing, we also introduce a new metric for disambiguating ground truth layouts. Our method demonstrates superior performance on benchmark datasets, notably outperforming leading approaches. Specifically, on the MatterportLayout dataset, it improves 3DIoU from 81.70% to 82.57% across the full test set and notably from 54.80% to 59.97% in subsets with significant ambiguity. Project page: https://liagm.github.io/Bi_Layout/ △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: CVPR 2024, Project page: https://liagm.github.io/Bi_Layout/

arXiv:2404.06627 [pdf, other]

A Beehive Haloscope for High-mass Axion Dark Matter

Authors: Matthew O. Withers, Chao-Lin Kuo

Abstract: We propose a new haloscope geometry that can arbitrarily increase the resonator volume for a given target axion mass. This geometry consists of closely packed, overlapping coaxial cavities operating as a single resonator. While the resonant frequency is still determined by the dimensions of the individual "cells," the strong interactions between the cells encourage the entire "beehive" to oscillat… ▽ More We propose a new haloscope geometry that can arbitrarily increase the resonator volume for a given target axion mass. This geometry consists of closely packed, overlapping coaxial cavities operating as a single resonator. While the resonant frequency is still determined by the dimensions of the individual "cells," the strong interactions between the cells encourage the entire "beehive" to oscillate in phase, a phenomenon expected of tightly coupled harmonic oscillators. This synchronization behavior allows the construction of a singly connected large-volume resonator at high frequency by simply increasing the number of the cells. Using direct numerical simulations, we verify the existence of a global eigenmode that has a high (40%) form factor in a 169-element beehive resonator. The resonant frequency of the eigenmode is tunable by moving the center rods laterally in unison. The form factor is very tolerant to dimensional deviations and misalignment, as a result of mode hybridization due to strong coupling. The beehive haloscope inherits many appealing properties from the conventional coaxial cavity: a high quality factor, compatibility with a solenoid magnet, ease of fabrication, tuning, and coupling. We argue that this geometry is an excellent candidate for high-mass axion searches covering the post-inflationary parameter space (>5 GHz). △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 21 pages, 26 figures

arXiv:2404.02885 [pdf, other]

PoCo: Point Context Cluster for RGBD Indoor Place Recognition

Authors: Jing Liang, Zhuo Deng, Zheming Zhou, Omid Ghasemalizadeh, Dinesh Manocha, Min Sun, Cheng-Hao Kuo, Arnie Sen

Abstract: We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database. The task presents inherent challenges attributed to the constrained field of view and limited range of perception sensors. We propose a new network architecture, which generalizes the recent Context of Clusters (… ▽ More We present a novel end-to-end algorithm (PoCo) for the indoor RGB-D place recognition task, aimed at identifying the most likely match for a given query frame within a reference database. The task presents inherent challenges attributed to the constrained field of view and limited range of perception sensors. We propose a new network architecture, which generalizes the recent Context of Clusters (CoCs) to extract global descriptors directly from the noisy point clouds through end-to-end learning. Moreover, we develop the architecture by integrating both color and geometric modalities into the point features to enhance the global descriptor representation. We conducted evaluations on public datasets ScanNet-PR and ARKit with 807 and 5047 scenarios, respectively. PoCo achieves SOTA performance: on ScanNet-PR, we achieve R@1 of 64.63%, a 5.7% improvement from the best-published result CGis (61.12%); on Arkit, we achieve R@1 of 45.12%, a 13.3% improvement from the best-published result CGis (39.82%). In addition, PoCo shows higher efficiency than CGis in inference time (1.75X-faster), and we demonstrate the effectiveness of PoCo in recognizing places within a real-world laboratory environment. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.02153 [pdf, other]

Mass calibration of DES Year-3 clusters via SPT-3G CMB cluster lensing

Authors: B. Ansarinejad, S. Raghunathan, T. M. C. Abbott, P. A. R. Ade, M. Aguena, O. Alves, A. J. Anderson, F. Andrade-Oliveira, M. Archipley, L. Balkenhol, K. Benabed, A. N. Bender, B. A. Benson, E. Bertin, F. Bianchini, L. E. Bleem, S. Bocquet, F. R. Bouchet, D. Brooks, L. Bryant, D. L. Burke, E. Camphuis, J. E. Carlstrom, A. Carnero Rosell, J. Carretero , et al. (120 additional authors not shown)

Abstract: We measure the stacked lensing signal in the direction of galaxy clusters in the Dark Energy Survey Year 3 (DES Y3) redMaPPer sample, using cosmic microwave background (CMB) temperature data from SPT-3G, the third-generation CMB camera on the South Pole Telescope (SPT). We estimate the lensing signal using temperature maps constructed from the initial 2 years of data from the SPT-3G 'Main' survey,… ▽ More We measure the stacked lensing signal in the direction of galaxy clusters in the Dark Energy Survey Year 3 (DES Y3) redMaPPer sample, using cosmic microwave background (CMB) temperature data from SPT-3G, the third-generation CMB camera on the South Pole Telescope (SPT). We estimate the lensing signal using temperature maps constructed from the initial 2 years of data from the SPT-3G 'Main' survey, covering 1500 deg$^2$ of the Southern sky. We then use this signal as a proxy for the mean cluster mass of the DES sample. In this work, we employ three versions of the redMaPPer catalogue: a Flux-Limited sample containing 8865 clusters, a Volume-Limited sample with 5391 clusters, and a Volume&Redshift-Limited sample with 4450 clusters. For the three samples, we find the mean cluster masses to be ${M}_{200{\rm{m}}}=1.66\pm0.13$ [stat.]$\pm0.03$ [sys.], $1.97\pm0.18$ [stat.]$\pm0.05$ [sys.], and $2.11\pm0.20$ [stat.]$\pm0.05$ [sys.]$\times{10}^{14}\ {\rm{M}}_{\odot }$, respectively. This is a factor of $\sim2$ improvement relative to the precision of measurements with previous generations of SPT surveys and the most constraining cluster mass measurements using CMB cluster lensing to date. Overall, we find no significant tensions between our results and masses given by redMaPPer mass-richness scaling relations of previous works, which were calibrated using CMB cluster lensing, optical weak lensing, and velocity dispersion measurements from various combinations of DES, SDSS and Planck data. We then divide our sample into 3 redshift and 3 richness bins, finding no significant tensions with optical weak-lensing calibrated masses in these bins. We forecast a $5.7\%$ constraint on the mean cluster mass of the DES Y3 sample with the complete SPT-3G surveys when using both temperature and polarization data and including an additional $\sim1400$ deg$^2$ of observations from the 'Extended' SPT-3G survey. △ Less

Submitted 12 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 23 pages, 9 figures, accepted for publication in JCAP. Minor changes and corrections have been made relative to v1

arXiv:2404.00253 [pdf, other]

GreenSaliency: A Lightweight and Efficient Image Saliency Detection Method

Authors: Zhanxuan Mei, Yun-Cheng Wang, C. -C. Jay Kuo

Abstract: Image saliency detection is crucial in understanding human gaze patterns from visual stimuli. The escalating demand for research in image saliency detection is driven by the growing necessity to incorporate such techniques into various computer vision tasks and to understand human visual systems. Many existing image saliency detection methods rely on deep neural networks (DNNs) to achieve good per… ▽ More Image saliency detection is crucial in understanding human gaze patterns from visual stimuli. The escalating demand for research in image saliency detection is driven by the growing necessity to incorporate such techniques into various computer vision tasks and to understand human visual systems. Many existing image saliency detection methods rely on deep neural networks (DNNs) to achieve good performance. However, the high computational complexity associated with these approaches impedes their integration with other modules or deployment on resource-constrained platforms, such as mobile devices. To address this need, we propose a novel image saliency detection method named GreenSaliency, which has a small model size, minimal carbon footprint, and low computational complexity. GreenSaliency can be a competitive alternative to the existing deep-learning-based (DL-based) image saliency detection methods with limited computation resources. GreenSaliency comprises two primary steps: 1) multi-layer hybrid feature extraction and 2) multi-path saliency prediction. Experimental results demonstrate that GreenSaliency achieves comparable performance to the state-of-the-art DL-based methods while possessing a considerably smaller model size and significantly reduced computational complexity. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2404.00095 [pdf, other]

GDA: Generalized Diffusion for Robust Test-time Adaptation

Authors: Yun-Yun Tsai, Fu-Chen Chen, Albert Y. C. Chen, Junfeng Yang, Che-Chun Su, Min Sun, Cheng-Hao Kuo

Abstract: Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the mod… ▽ More Machine learning models struggle with generalization when encountering out-of-distribution (OOD) samples with unexpected distribution shifts. For vision tasks, recent studies have shown that test-time adaptation employing diffusion models can achieve state-of-the-art accuracy improvements on OOD samples by generating new samples that align with the model's domain without the need to modify the model's weights. Unfortunately, those studies have primarily focused on pixel-level corruptions, thereby lacking the generalization to adapt to a broader range of OOD types. We introduce Generalized Diffusion Adaptation (GDA), a novel diffusion-based test-time adaptation method robust against diverse OOD types. Specifically, GDA iteratively guides the diffusion by applying a marginal entropy loss derived from the model, in conjunction with style and content preservation losses during the reverse sampling process. In other words, GDA considers the model's output behavior with the semantic information of the samples as a whole, which can reduce ambiguity in downstream tasks during the generation process. Evaluation across various popular model architectures and OOD benchmarks shows that GDA consistently outperforms prior work on diffusion-driven adaptation. Notably, it achieves the highest classification accuracy improvements, ranging from 4.4\% to 5.02\% on ImageNet-C and 2.5\% to 7.4\% on Rendition, Sketch, and Stylized benchmarks. This performance highlights GDA's generalization to a broader range of OOD benchmarks. △ Less

Submitted 2 April, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.17925 [pdf, other]

Testing the $\mathbfΛ$CDM Cosmological Model with Forthcoming Measurements of the Cosmic Microwave Background with SPT-3G

Authors: K. Prabhu, S. Raghunathan, M. Millea, G. Lynch, P. A. R. Ade, E. Anderes, A. J. Anderson, B. Ansarinejad, M. Archipley, L. Balkenhol, K. Benabed, A. N. Bender, B. A. Benson, F. Bianchini, L. E. Bleem, F. R. Bouchet, L. Bryant, E. Camphuis, J. E. Carlstrom, T. W. Cecil, C. L. Chang, P. Chaubal, P. M. Chichura, T. -L. Chou, A. Coerver , et al. (76 additional authors not shown)

Abstract: We forecast constraints on cosmological parameters enabled by three surveys conducted with SPT-3G, the third-generation camera on the South Pole Telescope. The surveys cover separate regions of 1500, 2650, and 6000 ${\rm deg}^{2}$ to different depths, in total observing 25% of the sky. These regions will be measured to white noise levels of roughly 2.5, 9, and 12 $μ{\rm K-arcmin}$, respectively, i… ▽ More We forecast constraints on cosmological parameters enabled by three surveys conducted with SPT-3G, the third-generation camera on the South Pole Telescope. The surveys cover separate regions of 1500, 2650, and 6000 ${\rm deg}^{2}$ to different depths, in total observing 25% of the sky. These regions will be measured to white noise levels of roughly 2.5, 9, and 12 $μ{\rm K-arcmin}$, respectively, in CMB temperature units at 150 GHz by the end of 2024. The survey also includes measurements at 95 and 220 GHz, which have noise levels a factor of ~1.2 and 3.5 times higher than 150 GHz, respectively, with each band having a polarization noise level ~$\sqrt{\text{2}}$ times higher than the temperature noise. We use a novel approach to obtain the covariance matrices for jointly and optimally estimated gravitational lensing potential bandpowers and unlensed CMB temperature and polarization bandpowers. We demonstrate the ability to test the $Λ{\rm CDM}$ model via the consistency of cosmological parameters constrained independently from SPT-3G and Planck data, and consider the improvement in constraints on $Λ{\rm CDM}$ extension parameters from a joint analysis of SPT-3G and Planck data. The $Λ{\rm CDM}$ cosmological parameters are typically constrained with uncertainties up to ~2 times smaller with SPT-3G data, compared to Planck, with the two data sets measuring significantly different angular scales and polarization levels, providing additional tests of the standard cosmological model. △ Less

Submitted 5 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 26 pages; 13 figures; Accepted for publication in ApJ; Minor edits have been made

arXiv:2403.15971 [pdf, other]

PSHop: A Lightweight Feed-Forward Method for 3D Prostate Gland Segmentation

Authors: Yijing Yang, Vasileios Magoulianitis, Jiaxin Yang, Jintang Xue, Masatomo Kaneko, Giovanni Cacciamani, Andre Abreu, Vinay Duddalwar, C. -C. Jay Kuo, Inderbir S. Gill, Chrysostomos Nikias

Abstract: Automatic prostate segmentation is an important step in computer-aided diagnosis of prostate cancer and treatment planning. Existing methods of prostate segmentation are based on deep learning models which have a large size and lack of transparency which is essential for physicians. In this paper, a new data-driven 3D prostate segmentation method on MRI is proposed, named PSHop. Different from dee… ▽ More Automatic prostate segmentation is an important step in computer-aided diagnosis of prostate cancer and treatment planning. Existing methods of prostate segmentation are based on deep learning models which have a large size and lack of transparency which is essential for physicians. In this paper, a new data-driven 3D prostate segmentation method on MRI is proposed, named PSHop. Different from deep learning based methods, the core methodology of PSHop is a feed-forward encoder-decoder system based on successive subspace learning (SSL). It consists of two modules: 1) encoder: fine to coarse unsupervised representation learning with cascaded VoxelHop units, 2) decoder: coarse to fine segmentation prediction with voxel-wise classification and local refinement. Experiments are conducted on the publicly available ISBI-2013 dataset, as well as on a larger private one. Experimental analysis shows that our proposed PSHop is effective, robust and lightweight in the tasks of prostate gland and zonal segmentation, achieving a Dice Similarity Coefficient (DSC) of 0.873 for the gland segmentation task. PSHop achieves a competitive performance comparatively to other deep learning methods, while keeping the model size and inference complexity an order of magnitude smaller. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 11 pages, 5 figures, 5 tables

arXiv:2403.15969 [pdf, other]

PCa-RadHop: A Transparent and Lightweight Feed-forward Method for Clinically Significant Prostate Cancer Segmentation

Authors: Vasileios Magoulianitis, Jiaxin Yang, Yijing Yang, Jintang Xue, Masatomo Kaneko, Giovanni Cacciamani, Andre Abreu, Vinay Duddalwar, C. -C. Jay Kuo, Inderbir S. Gill, Chrysostomos Nikias

Abstract: Prostate Cancer is one of the most frequently occurring cancers in men, with a low survival rate if not early diagnosed. PI-RADS reading has a high false positive rate, thus increasing the diagnostic incurred costs and patient discomfort. Deep learning (DL) models achieve a high segmentation performance, although require a large model size and complexity. Also, DL models lack of feature interpreta… ▽ More Prostate Cancer is one of the most frequently occurring cancers in men, with a low survival rate if not early diagnosed. PI-RADS reading has a high false positive rate, thus increasing the diagnostic incurred costs and patient discomfort. Deep learning (DL) models achieve a high segmentation performance, although require a large model size and complexity. Also, DL models lack of feature interpretability and are perceived as ``black-boxes" in the medical field. PCa-RadHop pipeline is proposed in this work, aiming to provide a more transparent feature extraction process using a linear model. It adopts the recently introduced Green Learning (GL) paradigm, which offers a small model size and low complexity. PCa-RadHop consists of two stages: Stage-1 extracts data-driven radiomics features from the bi-parametric Magnetic Resonance Imaging (bp-MRI) input and predicts an initial heatmap. To reduce the false positive rate, a subsequent stage-2 is introduced to refine the predictions by including more contextual information and radiomics features from each already detected Region of Interest (ROI). Experiments on the largest publicly available dataset, PI-CAI, show a competitive performance standing of the proposed method among other deep DL models, achieving an area under the curve (AUC) of 0.807 among a cohort of 1,000 patients. Moreover, PCa-RadHop maintains orders of magnitude smaller model size and complexity. △ Less

Submitted 23 March, 2024; originally announced March 2024.

Comments: 13 pages, 7 figures, 5 tables

arXiv:2403.04994 [pdf]

Enhanced polarization switching characteristics of HfO2 ultrathin films via acceptor-donor co-doping

Authors: Chao Zhou, Liyang Ma, Yanpeng Feng, Chang-Yang Kuo, Yu-Chieh Ku, Cheng-En Liu, Xianlong Cheng, Jingxuan Li, Yangyang Si, Haoliang Huang, Yan Huang, Hongjian Zhao, Chun-Fu Chang, Sujit Das, Shi Liu, Zuhuang Chen

Abstract: In the realm of ferroelectric memories, HfO2-based ferroelectrics stand out because of their exceptional CMOS compatibility and scalability. Nevertheless, their switchable polarization and switching speed are not on par with those of perovskite ferroelectrics. It is widely acknowledged that defects play a crucial role in stabilizing the metastable polar phase of HfO2. Simultaneously, defects also… ▽ More In the realm of ferroelectric memories, HfO2-based ferroelectrics stand out because of their exceptional CMOS compatibility and scalability. Nevertheless, their switchable polarization and switching speed are not on par with those of perovskite ferroelectrics. It is widely acknowledged that defects play a crucial role in stabilizing the metastable polar phase of HfO2. Simultaneously, defects also pin the domain walls and impede the switching process, ultimately rendering the sluggish switching of HfO2. Herein, we present an effective strategy involving acceptor-donor co-doping to effectively tackle this dilemma. Remarkably enhanced ferroelectricity and the fastest switching process ever reported among HfO2 polar devices are observed in La3+-Ta5+ co-doped HfO2 ultrathin films. Moreover, robust macro-electrical characteristics of co-doped films persist even at a thickness as low as 3 nm, expanding potential applications of HfO2 in ultrathin devices. Our systematic investigations further demonstrate that synergistic effects of uniform microstructure and smaller switching barrier introduced by co-doping ensure the enhanced ferroelectricity and shortened switching time. The co-doping strategy offers an effective avenue to control the defect state and improve the ferroelectric properties of HfO2 films. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.02337 [pdf, other]

First Constraints on the Epoch of Reionization Using the non-Gaussianity of the Kinematic Sunyaev-Zel{'}dovich Effect from the South Pole Telescope and {\it Herschel}-SPIRE Observations

Authors: S. Raghunathan, P. A. R. Ade, A. J. Anderson, B. Ansarinejad, M. Archipley, J. E. Austermann, L. Balkenhol, J. A. Beall, K. Benabed, A. N. Bender, B. A. Benson, F. Bianchini, L. E. Bleem, J. Bock, F. R. Bouchet, L. Bryant, E. Camphuis, J. E. Carlstrom, T. W. Cecil, C. L. Chang, P. Chaubal, H. C. Chiang, P. M. Chichura, T. -L. Chou, R. Citron , et al. (97 additional authors not shown)

Abstract: We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ i… ▽ More We report results from an analysis aimed at detecting the trispectrum of the kinematic Sunyaev-Zel{'}dovich (kSZ) effect by combining data from the South Pole Telescope (SPT) and {\it Herschel}-SPIRE experiments over a 100 ${\rm deg}^{2}$ field. The SPT observations combine data from the previous and current surveys, namely SPTpol and SPT-3G, to achieve depths of 4.5, 3, and 16 $μ{\rm K-arcmin}$ in bands centered at 95, 150, and 220 GHz. For SPIRE, we include data from the 600 and 857 GHz bands. We reconstruct the velocity-induced large-scale correlation of the small-scale kSZ signal with a quadratic estimator that uses two cosmic microwave background (CMB) temperature maps, constructed by optimally combining data from all the frequency bands. We reject the null hypothesis of a zero trispectrum at $10.3σ$ level. However, the measured trispectrum contains contributions from both the kSZ and other undesired components, such as CMB lensing and astrophysical foregrounds, with kSZ being sub-dominant. We use the \textsc{Agora} simulations to estimate the expected signal from CMB lensing and astrophysical foregrounds. After accounting for the contributions from CMB lensing and foreground signals, we do not detect an excess kSZ-only trispectrum and use this non-detection to set constraints on reionization. By applying a prior based on observations of the Gunn-Peterson trough, we obtain an upper limit on the duration of reionization of $Δz_{\rm re, 50} < 4.5$ (95\% C.L). We find these constraints are fairly robust to foregrounds assumptions. This trispectrum measurement is independent of, but consistent with, {\it Planck}'s optical depth measurement. This result is the first constraint on the epoch of reionization using the non-Gaussian nature of the kSZ signal. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 15 pages, 5 figures (3 in main text and 2 in Appendix); To be submitted to PRL; Comments welcome; Data products and plotting scripts can be downloaded from https://github.com/sriniraghunathan/kSZ_4pt_SPT_SPIRE

arXiv:2402.06982 [pdf, other]

Treatment-wise Glioblastoma Survival Inference with Multi-parametric Preoperative MRI

Authors: Xiaofeng Liu, Nadya Shusharina, Helen A Shih, C. -C. Jay Kuo, Georges El Fakhri, Jonghye Woo

Abstract: In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise treatment planning can be achieved by comparing the ST of different treatments. It is well established that both the current status of the patient (as represented by the MR scans) and the choice of tr… ▽ More In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise treatment planning can be achieved by comparing the ST of different treatments. It is well established that both the current status of the patient (as represented by the MR scans) and the choice of treatment are the cause of ST. While previous related MR-based glioblastoma ST studies have focused only on the direct mapping of MR scans to ST, they have not included the underlying causal relationship between treatments and ST. To address this limitation, we propose a treatment-conditioned regression model for glioblastoma ST that incorporates treatment information in addition to MR scans. Our approach allows us to effectively utilize the data from all of the treatments in a unified manner, rather than having to train separate models for each of the treatments. Furthermore, treatment can be effectively injected into each convolutional layer through the adaptive instance normalization we employ. We evaluate our framework on the BraTS20 ST prediction task. Three treatment options are considered: Gross Total Resection (GTR), Subtotal Resection (STR), and no resection. The evaluation results demonstrate the effectiveness of injecting the treatment for estimating GBM survival. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: SPIE Medical Imaging 2024: Computer-Aided Diagnosis

arXiv:2402.01233 [pdf, other]

Forecast of foreground cleaning strategies for AliCPT-1

Authors: Junzhou Zhang, Shamik Ghosh, Jiazheng Dou, Yang Liu, Siyu Li, Jiming Chen, Jiaxin Wang, Zhaoxuan Zhang, Jacques Delabrouille, Mathieu Remazeilles, Chang Feng, Bin Hu, Hao Liu, Larissa Santos, Pengjie Zhang, Wen Zhao, Le Zhang, Zhi-Qi Huang, Hong Li, Chao-Lin Kuo, Xinmin Zhang

Abstract: We report the test results of several independent foreground-cleaning pipelines used in the Ali CMB Polarization Telescope experiment (AliCPT-1), a high-altitude CMB imager in the Northern hemisphere with thousands of detectors dedicated to the search for a primordial CMB polarization $B$-mode signature. Based on simulated data from 4 detector modules and a single season of observation, which we r… ▽ More We report the test results of several independent foreground-cleaning pipelines used in the Ali CMB Polarization Telescope experiment (AliCPT-1), a high-altitude CMB imager in the Northern hemisphere with thousands of detectors dedicated to the search for a primordial CMB polarization $B$-mode signature. Based on simulated data from 4 detector modules and a single season of observation, which we refer to as Data Challenge 1 (DC1), we employ different and independent pipelines to examine the robustness and effectiveness of the estimates on foreground parameters and the primordial $B$-mode detection. The foreground-cleaning strategies used in the pipelines include the parametric method of template fitting (TF) and the non-parametric methods of the constrained internal linear combination (cILC), the analytical blind separation (ABS), and the generalized least squares (GLS). We examine the impact of possible foreground residuals on the estimate of the CMB tensor-to-scalar ratio ($r$) for each pipeline by changing the contamination components in the simulated maps and varying the foreground models and sky patches for various tests. According to the DC1 data with the simulation input value $r_{\rm true}=0.023$, the foreground residual contamination levels in the TF/ABS/cILC/GLS pipelines are well within the corresponding statistical errors at the $2σ$ level. Furthermore, by utilizing the tension estimator, which helps identify significant residual foreground contamination in the detection of the primordial $B$-mode signal by quantifying the discrepancy between various $r$ measurements, we conclude that the presence of small foreground residuals does not lead to any significant inconsistency in the estimation of $r$. △ Less

Submitted 26 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: 38 pages, 22 figures; accepted for publication in ApJS

arXiv:2402.01060 [pdf, other]

doi 10.1103/PhysRevApplied.21.L041002

High-volume tunable resonator for axion searches above 7 GHz

Authors: Taj A. Dyson, Chelsea L. Bartram, Ashley Davidson, Jonah B. Ezekiel, Laura M. Futamura, Tongtian Liu, Chao-Lin Kuo

Abstract: We present results from the first experimental demonstration of a tunable thin-shell axion haloscope. This novel geometry decouples the overall volume of the cavity-based resonator from its resonant frequency, thereby evading the steep sensitivity degradation at high-frequencies. An aluminum $2.6$ L ($41$ $λ^3$) prototype which tunes from $7.1$ to $8.0$ GHz was fabricated and characterized at room… ▽ More We present results from the first experimental demonstration of a tunable thin-shell axion haloscope. This novel geometry decouples the overall volume of the cavity-based resonator from its resonant frequency, thereby evading the steep sensitivity degradation at high-frequencies. An aluminum $2.6$ L ($41$ $λ^3$) prototype which tunes from $7.1$ to $8.0$ GHz was fabricated and characterized at room temperature. An axion-sensitive, straightforwardly tunable $\mathrm{TM}$$_{010}$ mode is clearly identified with a room temperature quality factor, $Q$, of $\sim$$5,000$. The on-resonance $E$-field distribution is mapped and found to agree with numerical calculations. Anticipating future cryogenic operation, we develop an alignment protocol relying only on rf measurements of the cavity, maintaining a form factor of $0.57$ across the full tuning range. These measurements demonstrate the feasibility of cavity-based haloscopes with operating volume $V\ggλ^3$. We discuss plans for future development and the parameters required for a thin-shell haloscope exploring the post-inflationary axion parameter space ($\sim$$4$ to $\sim$$30$ GHz) at DFSZ sensitivity. △ Less

Submitted 23 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 6 pages, 7 figures; references added, Table 2 updated, acknowledgments made more descriptive, grammar copy edits, and title updated to published version

Journal ref: PhysRevApplied 21 (2024) L041002

arXiv:2402.00927 [pdf, other]

doi 10.1051/0004-6361/202348308

Ordered magnetic fields around the 3C 84 central black hole

Authors: G. F. Paraschos, J. -Y. Kim, M. Wielgus, J. Röder, T. P. Krichbaum, E. Ros, I. Agudo, I. Myserlis, M. Moscibrodzka, E. Traianou, J. A. Zensus, L. Blackburn, C. -K. Chan, S. Issaoun, M. Janssen, M. D. Johnson, V. L. Fish, K. Akiyama, A. Alberdi, W. Alef, J. C. Algaba, R. Anantua, K. Asada, R. Azulay, U. Bach , et al. (258 additional authors not shown)

Abstract: 3C84 is a nearby radio source with a complex total intensity structure, showing linear polarisation and spectral patterns. A detailed investigation of the central engine region necessitates the use of VLBI above the hitherto available maximum frequency of 86GHz. Using ultrahigh resolution VLBI observations at the highest available frequency of 228GHz, we aim to directly detect compact structures a… ▽ More 3C84 is a nearby radio source with a complex total intensity structure, showing linear polarisation and spectral patterns. A detailed investigation of the central engine region necessitates the use of VLBI above the hitherto available maximum frequency of 86GHz. Using ultrahigh resolution VLBI observations at the highest available frequency of 228GHz, we aim to directly detect compact structures and understand the physical conditions in the compact region of 3C84. We used EHT 228GHz observations and, given the limited (u,v)-coverage, applied geometric model fitting to the data. We also employed quasi-simultaneously observed, multi-frequency VLBI data for the source in order to carry out a comprehensive analysis of the core structure. We report the detection of a highly ordered, strong magnetic field around the central, SMBH of 3C84. The brightness temperature analysis suggests that the system is in equipartition. We determined a turnover frequency of $ν_m=(113\pm4)$GHz, a corresponding synchrotron self-absorbed magnetic field of $B_{SSA}=(2.9\pm1.6)$G, and an equipartition magnetic field of $B_{eq}=(5.2\pm0.6)$G. Three components are resolved with the highest fractional polarisation detected for this object ($m_\textrm{net}=(17.0\pm3.9)$%). The positions of the components are compatible with those seen in low-frequency VLBI observations since 2017-2018. We report a steeply negative slope of the spectrum at 228GHz. We used these findings to test models of jet formation, propagation, and Faraday rotation in 3C84. The findings of our investigation into different flow geometries and black hole spins support an advection-dominated accretion flow in a magnetically arrested state around a rapidly rotating supermassive black hole as a model of the jet-launching system in the core of 3C84. However, systematic uncertainties due to the limited (u,v)-coverage, however, cannot be ignored. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: 15 pages, 6 figures, published in A&A

Journal ref: Issue: A&A Volume 682, February 2024; Article number: L3; Number of pages: 15

arXiv:2401.15847 [pdf, other]

Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA

Authors: Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang

Abstract: Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanced multimodal AI applications, such as agents that understand complex scenes and navigate through webpages, the skill of multipanel visual reasoning i… ▽ More Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanced multimodal AI applications, such as agents that understand complex scenes and navigate through webpages, the skill of multipanel visual reasoning is essential, and a comprehensive evaluation of models in this regard is important. Therefore, we introduce Multipanel Visual Question Answering (MultipanelVQA), a novel benchmark comprising 6,600 triplets of questions, answers, and multipanel images that specifically challenge models in comprehending multipanel images. Our evaluation shows that questions in the MultipanelVQA benchmark pose significant challenges to the state-of-the-art Multimodal Large Language Models (MLLMs) tested, even though humans can attain approximately 99% accuracy on these questions. Distinctively, the MultipanelVQA benchmark features synthetically generated multipanel images specifically crafted to isolate and assess the impact of various factors, such as the layout, on MLLMs' multipanel image comprehension abilities. As a result, in addition to benchmarking the capabilities of MLLMs in understanding multipanel images, we analyze various factors of the multipanel image that affect MLLMs' performance with synthetic data and offer insights for enhancement. Code and data are released at https://sites.google.com/view/multipanelvqa/home. △ Less

Submitted 27 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: ACL 2024

arXiv:2401.13525 [pdf, other]

Flaring Stars in a Non-targeted mm-wave Survey with SPT-3G

Authors: C. Tandoi, S. Guns, A. Foster, P. A. R. Ade, A. J. Anderson, B. Ansarinejad, M. Archipley, L. Balkenhol, K. Benabed, A. N. Bender, B. A. Benson, F. Bianchini, L. E. Bleem, F. R. Bouchet, L. Bryant, E. Camphuis, J. E. Carlstrom, T. W. Cecil, C. L. Chang, P. Chaubal, P. M. Chichura, T. -L. Chou, A. Coerver, T. M. Crawford, A. Cukierman , et al. (74 additional authors not shown)

Abstract: We present a flare star catalog from four years of non-targeted millimeter-wave survey data from the South Pole Telescope (SPT). The data were taken with the SPT-3G camera and cover a 1500-square-degree region of the sky from $20^{h}40^{m}0^{s}$ to $3^{h}20^{m}0^{s}$ in right ascension and $-42^{\circ}$ to $-70^{\circ}$ in declination. This region was observed on a nearly daily cadence from 2019-2… ▽ More We present a flare star catalog from four years of non-targeted millimeter-wave survey data from the South Pole Telescope (SPT). The data were taken with the SPT-3G camera and cover a 1500-square-degree region of the sky from $20^{h}40^{m}0^{s}$ to $3^{h}20^{m}0^{s}$ in right ascension and $-42^{\circ}$ to $-70^{\circ}$ in declination. This region was observed on a nearly daily cadence from 2019-2022 and chosen to avoid the plane of the galaxy. A short-duration transient search of this survey yields 111 flaring events from 66 stars, increasing the number of both flaring events and detected flare stars by an order of magnitude from the previous SPT-3G data release. We provide cross-matching to Gaia DR3, as well as matches to X-ray point sources found in the second ROSAT all-sky survey. We have detected flaring stars across the main sequence, from early-type A stars to M dwarfs, as well as a large population of evolved stars. These stars are mostly nearby, spanning 10 to 1000 parsecs in distance. Most of the flare spectral indices are constant or gently rising as a function of frequency at 95/150/220 GHz. The timescale of these events can range from minutes to hours, and the peak $νL_ν$ luminosities range from $10^{27}$ to $10^{31}$ erg s$^{-1}$ in the SPT-3G frequency bands. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.07475 [pdf, other]

GWPT: A Green Word-Embedding-based POS Tagger

Authors: Chengwei Wei, Runqi Pang, C. -C. Jay Kuo

Abstract: As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) fe… ▽ More As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.16382 [pdf, ps, other]

What Determines the Boundaries of H2O Maser Emission in an X-ray Illuminated Gas Disk ?

Authors: C. Y. Kuo, F. Gao, J. A. Braatz, D. W. Pesce, E. M. L. Humphreys, M. J. Reid, C. M. V. Impellizzeri, C. Henkel, J. Wagner, C. E. Wu

Abstract: High precision mapping of H2O megamaser emission from active galaxies has revealed more than a dozen Keplerian H2O maser disks, which enable a ~4% uncertainty estimate of the Hubble constant as well as providing accurate masses for the central black holes. These disks often have well-defined inner and outer boundaries of maser emission on sub-parsec scales. In order to better understand the physic… ▽ More High precision mapping of H2O megamaser emission from active galaxies has revealed more than a dozen Keplerian H2O maser disks, which enable a ~4% uncertainty estimate of the Hubble constant as well as providing accurate masses for the central black holes. These disks often have well-defined inner and outer boundaries of maser emission on sub-parsec scales. In order to better understand the physical conditions that determine the inner and outer radii of a maser disk, we examine the distributions of gas density and X-ray heating rate in a warped molecular disk described by a power-law surface density profile. For a suitable choice of the disk mass, we find that the outer radius R_out of the maser disk predicted from our model can match the observed value, with R_out mainly determined by the maximum heating rate or the minimum density for efficient maser action, depending on the combination of the Eddington ratio, black hole mass, and disk mass. Our analysis also indicates that the inner radius for maser action is comparable to the dust sublimation radius, suggesting that dust may play a role in determining the inner radius of a maser disk. Finally, our model predicts that H2O gigamaser disks could exist at the centers of high-z quasars, with disk sizes of >~ 10-30 pc. △ Less

Submitted 10 July, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: Accepted by MNRAS, 17 pages, 8 figures, 2 tables

arXiv:2312.14968 [pdf, other]

Enhancing Edge Intelligence with Highly Discriminant LNT Features

Authors: Xinyu Wang, Vinod K. Mishra, C. -C. Jay Kuo

Abstract: AI algorithms at the edge demand smaller model sizes and lower computational complexity. To achieve these objectives, we adopt a green learning (GL) paradigm rather than the deep learning paradigm. GL has three modules: 1) unsupervised representation learning, 2) supervised feature learning, and 3) supervised decision learning. We focus on the second module in this work. In particular, we derive n… ▽ More AI algorithms at the edge demand smaller model sizes and lower computational complexity. To achieve these objectives, we adopt a green learning (GL) paradigm rather than the deep learning paradigm. GL has three modules: 1) unsupervised representation learning, 2) supervised feature learning, and 3) supervised decision learning. We focus on the second module in this work. In particular, we derive new discriminant features from proper linear combinations of input features, denoted by x, obtained in the first module. They are called complementary and raw features, respectively. Along this line, we present a novel supervised learning method to generate highly discriminant complementary features based on the least-squares normal transform (LNT). LNT consists of two steps. First, we convert a C-class classification problem to a binary classification problem. The two classes are assigned with 0 and 1, respectively. Next, we formulate a least-squares regression problem from the N-dimensional (N-D) feature space to the 1-D output space, and solve the least-squares normal equation to obtain one N-D normal vector, denoted by a1. Since one normal vector is yielded by one binary split, we can obtain M normal vectors with M splits. Then, Ax is called an LNT of x, where transform matrix A in R^{M by N} by stacking aj^T, j=1, ..., M, and the LNT, Ax, can generate M new features. The newly generated complementary features are shown to be more discriminant than the raw features. Experiments show that the classification performance can be improved by these new features. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 2023 IEEE International Conference on Big Data, AI and Adaptive Computing for Edge Sensing and Processing Workshop

arXiv:2312.11447 [pdf, ps, other]

On the Hochschild cohomology of Tamarkin categories

Authors: Christopher Kuo, Vivek Shende, Bingyu Zhang

Abstract: To any open subset of a cotangent bundle, Tamarkin has associated a certain quotient of a category of sheaves. Here we show that the Hochschild cohomology of this category agrees with filtered symplectic cohomology. To any open subset of a cotangent bundle, Tamarkin has associated a certain quotient of a category of sheaves. Here we show that the Hochschild cohomology of this category agrees with filtered symplectic cohomology. △ Less

Submitted 5 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 37 pages. V2: Several new results were added in Sections 5.5, 6.3, and 6.6, and some minor changes and corrections

arXiv:2312.04936 [pdf, other]

SKT-Hang: Hanging Everyday Objects via Object-Agnostic Semantic Keypoint Trajectory Generation

Authors: Chia-Liang Kuo, Yu-Wei Chao, Yi-Ting Chen

Abstract: We study the problem of hanging a wide range of grasped objects on diverse supporting items. Hanging objects is a ubiquitous task that is encountered in numerous aspects of our everyday lives. However, both the objects and supporting items can exhibit substantial variations in their shapes and structures, bringing two challenging issues: (1) determining the task-relevant geometric structures acros… ▽ More We study the problem of hanging a wide range of grasped objects on diverse supporting items. Hanging objects is a ubiquitous task that is encountered in numerous aspects of our everyday lives. However, both the objects and supporting items can exhibit substantial variations in their shapes and structures, bringing two challenging issues: (1) determining the task-relevant geometric structures across different objects and supporting items, and (2) identifying a robust action sequence to accommodate the shape variations of supporting items. To this end, we propose Semantic Keypoint Trajectory (SKT), an object-agnostic representation that is highly versatile and applicable to various everyday objects. We also propose Shape-conditioned Trajectory Deformation Network (SCTDN), a model that learns to generate SKT by deforming a template trajectory based on the task-relevant geometric structure features of the supporting items. We conduct extensive experiments and demonstrate substantial improvements in our framework over existing robot hanging methods in the success rate and inference time. Finally, our simulation-trained framework shows promising hanging results in the real world. For videos and supplementary materials, please visit our project webpage: https://hcis-lab.github.io/SKT-Hang/. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.02759 [pdf, other]

Absolute Flux Density Calibration of the Greenland Telescope Data for Event Horizon Telescope Observations

Authors: J. Y. Koay, K. Asada, S. Matsushita, C. -Y. Kuo, C. -W. L. Huang, C. Romero-Cañizales, S. Koyama, J. Park, W. -P. Lo, G. Bower, M. -T. Chen, S. -H. Chang, C. -C. Chen, R. Chilson, C. C. Han, P. T. P. Ho, Y. -D. Huang, M. Inoue, B. Jeter, H. Jiang, P. M. Koch, D. Kubo, C. -T. Li, C. -T. Liu, K. -Y. Liu , et al. (13 additional authors not shown)

Abstract: Starting from the observing campaign in April 2018, the Greenland Telescope (GLT) has been added as a new station of the Event Horizon Telescope (EHT) array. Visibilities on baselines to the GLT, particularly in the North-South direction, potentially provide valuable new constraints for the modeling and imaging of sources such as M87*. The GLT's location at high Northern latitudes adds unique chal… ▽ More Starting from the observing campaign in April 2018, the Greenland Telescope (GLT) has been added as a new station of the Event Horizon Telescope (EHT) array. Visibilities on baselines to the GLT, particularly in the North-South direction, potentially provide valuable new constraints for the modeling and imaging of sources such as M87*. The GLT's location at high Northern latitudes adds unique challenges to its calibration strategies. Additionally, the performance of the GLT was not optimal during the 2018 observations due to it being only partially commissioned at the time. This document describes the steps taken to estimate the various parameters (and their uncertainties) required for the absolute flux calibration of the GLT data as part of the EHT. In particular, we consider the non-optimized status of the GLT in 2018, as well as its improved performance during the 2021 EHT campaign. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 17 pages, 4 figures, EHT Memo Series 2023-L1-02

arXiv:2311.15219 [pdf, ps, other]

$L_1$ approach to the compressible viscous fluid flows in the half-space

Authors: Jou Chun Kuo, Yoshihiro Shibata

Abstract: In this paper, we proved the local well-posedness for the Navier-Stokes equtions describing the motion of isotropic barotoropic compressible viscous fluid flow with non-slip boundary conditions, wehre the fluid domain is the half-space in the $N$-dimensional Euclidean space. The density part of solutions and their time derivative belong to $L_1$ in time with some Besov spaces in space and also the… ▽ More In this paper, we proved the local well-posedness for the Navier-Stokes equtions describing the motion of isotropic barotoropic compressible viscous fluid flow with non-slip boundary conditions, wehre the fluid domain is the half-space in the $N$-dimensional Euclidean space. The density part of solutions and their time derivative belong to $L_1$ in time with some Besov spaces in space and also the velosity parts and their time derivative belong to $L_1$ in time with some Besov spaces in space. We use Lagrange transformation to eliminate the covection term and we use an analytic semgroup approach. Our Stokes semigroup is not only a continuous analytic semigroup but also has an $L_1$ in times maximal regularity with some Besov spaces in space. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: $L_1$ maximal regularity theory in the study of the compressible Navier-Stokes equations was first obtained by Danchin and Tolksdolf. Their argument is an extension of Da Prato and Grisvard theory and they assumed that the fluid domain is bounded. We use some real interpolation arguments and we do not need any boundedness assumption of domains. arXiv admin note: substantial text overlap with arXiv:2311.12331

arXiv:2311.12331 [pdf, ps, other]

$L_1$ approach to the compressible viscous fluid flows in the half-space

Authors: Jou chun Kuo, Yoshihiro Shibata

Abstract: In this paper, we prove the local well-posedness for the Navier-Stokes equations describing the motion of isotropic barotoropic compressible viscous fluid flow with non-slip boundary conditions, where the fluid domain is the $N$ dimensional half-sapce. We solve the equations in the $L_1$ in time and Besov spaces $B^s_{q,1}$ in space maximal regularity framework. Here, we assume that… ▽ More In this paper, we prove the local well-posedness for the Navier-Stokes equations describing the motion of isotropic barotoropic compressible viscous fluid flow with non-slip boundary conditions, where the fluid domain is the $N$ dimensional half-sapce. We solve the equations in the $L_1$ in time and Besov spaces $B^s_{q,1}$ in space maximal regularity framework. Here, we assume that $-1+N/q \leq s < 1/q$ and $N-1 < q < 2N$. We use Lagrange transformation to eliminate the convection term and we use an analytic semigroup approach. We only assume the strictly positiveness of initial mass density. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: This paper treats the $L_1$ maximal regularity for the compressible Stokes equations in the half space with Dirichlet zero condition and the local well-posedness of the compressible Navier-Stokes equations in the half space in the $L_1$ in time framework

arXiv:2311.04814 [pdf, other]

doi 10.1103/PhysRevB.108.195108

Linear dichroic x-ray absorption response of Ti-Ti dimers along the $c$ axis in Ti$_2$O$_3$ upon Mg substitution

Authors: M. Okawa, D. Takegami, D. S. Christovam, M. Ferreira-Carvalho, C. -Y. Kuo, C. T. Chen, T. Miyoshino, K. Takasu, T. Okuda, C. F. Chang, L. H. Tjeng, T. Mizokawa

Abstract: Corundum oxide Ti$_2$O$_3$ shows the metal-insulator transition around 400-600 K accompanying the nearest Ti$^{3+}$-Ti$^{3+}$ bond ($a_{1g}a_{1g}$ singlet state) formation along the $c$ axis. In order to clarify the hole-doping effect for the $a_{1g}a_{1g}$ singlet bond in Ti$_2$O$_3$, we investigated Ti $3d$ orbital anisotropy between corundum-type Ti$_2$O$_3$ and ilmenite-type MgTiO$_3$ using li… ▽ More Corundum oxide Ti$_2$O$_3$ shows the metal-insulator transition around 400-600 K accompanying the nearest Ti$^{3+}$-Ti$^{3+}$ bond ($a_{1g}a_{1g}$ singlet state) formation along the $c$ axis. In order to clarify the hole-doping effect for the $a_{1g}a_{1g}$ singlet bond in Ti$_2$O$_3$, we investigated Ti $3d$ orbital anisotropy between corundum-type Ti$_2$O$_3$ and ilmenite-type MgTiO$_3$ using linear dichroism of soft x-ray absorption spectroscopy of the Ti $L_{2,3}$ edge. From the linear dichroic spectral weight in Mg$_y$Ti$_{2-y}$O$_3$, we confirmed that the $a_{1g}a_{1g}$ state is dominant not only in $y=0.01$ (almost Ti$_2$O$_3$), but also in $y = 0.29$, indicating that the Ti-Ti bond survives against a certain level of hole doping. In $y=0.63$ corresponding to 46% hole doping per Ti, the $3d$ orbital symmetry changes from $a_{1g}$ to $e_g^π$. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 5 pages, 5 figures

Journal ref: Phys. Rev. B 108, 195108 (2023)

arXiv:2310.11839 [pdf]

Neel tensor torque at the ferromagnet/antiferromagnet interface

Authors: Chao-Yao Yang, Sheng-Huai Chen, Chih-Hsiang Tseng, Chang-Yang Kuo, Hsiu-Hau Lin, Chih-Huang Lai

Abstract: Antiferromagnets (AFMs) exhibit spin arrangements with no net magnetization, positioning them as promising candidates for spintronics applications. While electrical manipulation of the single-crystal AFMs, composed of periodic spin configurations, is achieved recently, it remains a daunting challenge to characterize and to manipulate polycrystalline AFMs. Utilizing statistical analysis in data sci… ▽ More Antiferromagnets (AFMs) exhibit spin arrangements with no net magnetization, positioning them as promising candidates for spintronics applications. While electrical manipulation of the single-crystal AFMs, composed of periodic spin configurations, is achieved recently, it remains a daunting challenge to characterize and to manipulate polycrystalline AFMs. Utilizing statistical analysis in data science, we demonstrate that polycrystalline AFMs can be described using a real, symmetric, positive semi-definite, rank-two tensor, which we term the Neel tensor. This tensor introduces a unique spin torque, diverging from the conventional field-like and Slonczewski torques in spintronics devices. Remarkably, Neel tensors can be trained to retain a specific orientation, functioning as a form of working memory. This attribute enables zero-field spin-orbit-torque switching in trilayer devices featuring a heavy-metal/ferromagnet/AFM structure and is also consistent with the X-ray magnetic linear dichroism measurements. Our findings uncover hidden statistical patterns in polycrystalline AFMs and establishes the presence of Neel tensor torque, highlighting its potential to drive future spintronics innovations. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: main text 18 pages, supplementary information 10 pages

arXiv:2310.10849 [pdf, other]

doi 10.1007/s10909-024-03100-6

Results and Limits of Time Division Multiplexing for the BICEP Array High Frequency Receivers

Authors: S. Fatigoni, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, D. Beck, J. J. Bock, V. Buza, J. Cheshire, J. Connors, J. Cornelison, M. Crumrine, A. J. Cukierman, E. V. Denison, M. I. Dierickx, L. Duband, M. Eiben, J. P. Filippini, A. Fortes, M. Gao, C. Giannakopoulos, N. Goeckner-Wald, D. C. Goldfinger , et al. (62 additional authors not shown)

Abstract: Time-Division Multiplexing is the readout architecture of choice for many ground and space experiments, as it is a very mature technology with proven outstanding low-frequency noise stability, which represents a central challenge in multiplexing. Once fully populated, each of the two BICEP Array high frequency receivers, observing at 150GHz and 220/270GHz, will have 7776 TES detectors tiled on the… ▽ More Time-Division Multiplexing is the readout architecture of choice for many ground and space experiments, as it is a very mature technology with proven outstanding low-frequency noise stability, which represents a central challenge in multiplexing. Once fully populated, each of the two BICEP Array high frequency receivers, observing at 150GHz and 220/270GHz, will have 7776 TES detectors tiled on the focal plane. The constraints set by these two receivers required a redesign of the warm readout electronics. The new version of the standard Multi Channel Electronics, developed and built at the University of British Columbia, is presented here for the first time. BICEP Array operates Time Division Multiplexing readout technology to the limits of its capabilities in terms of multiplexing rate, noise and crosstalk, and applies them in rigorously demanding scientific application requiring extreme noise performance and systematic error control. Future experiments like CMB-S4 plan to use TES bolometers with Time Division/SQUID-based readout for an even larger number of detectors. △ Less

Submitted 24 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: 10 pages, 7 figures, Submitted to Journal of Low Temperature Physics

Journal ref: Journal of Low Temperature Physics (2024)

arXiv:2310.09956 [pdf, ps, other]

Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior

Authors: Xiaotong Chen, Zheming Zhou, Zhuo Deng, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnie Sen

Abstract: Reconstructing transparent objects using affordable RGB-D cameras is a persistent challenge in robotic perception due to inconsistent appearances across views in the RGB domain and inaccurate depth readings in each single-view. We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms. In the first stage, off-the-shelf monocular object segmentation and… ▽ More Reconstructing transparent objects using affordable RGB-D cameras is a persistent challenge in robotic perception due to inconsistent appearances across views in the RGB domain and inaccurate depth readings in each single-view. We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms. In the first stage, off-the-shelf monocular object segmentation and depth completion networks are leveraged to predict the depth of transparent objects, furnishing single-view shape prior. Subsequently, we propose Epipolar-guided Optical Flow (EOF) to fuse several single-view shape priors from the first stage to a cross-view consistent 3D reconstruction given camera poses estimated from opaque part of the scene. Our key innovation lies in EOF which employs boundary-sensitive sampling and epipolar-line constraints into optical flow to accurately establish 2D correspondences across multiple views on transparent objects. Quantitative evaluations demonstrate that our pipeline significantly outperforms baseline methods in 3D reconstruction quality, paving the way for more adept robotic perception and interaction with transparent objects. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: IEEE-RAS Humanoids 2023 paper, 8 pages, 6 figures

Showing 1–50 of 618 results for author: Kuo, C