Search | arXiv e-print repository

Advancing Video Quality Assessment for AIGC

Authors: Xinli Yue, Jianhui Sun, Han Kong, Liangchao Yao, Tianyi Wang, Lei Li, Fengyun Rao, Jing Lv, Fan Xia, Yuetang Deng, Qian Wang, Lingchen Zhao

Abstract: In recent years, AI generative models have made remarkable progress across various domains, including text generation, image generation, and video generation. However, assessing the quality of text-to-video generation is still in its infancy, and existing evaluation frameworks fall short when compared to those for natural videos. Current video quality assessment (VQA) methods primarily focus on ev… ▽ More In recent years, AI generative models have made remarkable progress across various domains, including text generation, image generation, and video generation. However, assessing the quality of text-to-video generation is still in its infancy, and existing evaluation frameworks fall short when compared to those for natural videos. Current video quality assessment (VQA) methods primarily focus on evaluating the overall quality of natural videos and fail to adequately account for the substantial quality discrepancies between frames in generated videos. To address this issue, we propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies. Additionally, we introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities. Experimental results demonstrate that our method outperforms existing VQA techniques on the AIGC Video dataset, surpassing the previous state-of-the-art by 3.1% in terms of PLCC. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 5 pages, 1 figure

arXiv:2409.08824 [pdf, other]

Pathfinder for Low-altitude Aircraft with Binary Neural Network

Authors: Kaijie Yin, Tian Gao, Hui Kong

Abstract: A prior global topological map (e.g., the OpenStreetMap, OSM) can boost the performance of autonomous mapping by a ground mobile robot. However, the prior map is usually incomplete due to lacking labeling in partial paths. To solve this problem, this paper proposes an OSM maker using airborne sensors carried by low-altitude aircraft, where the core of the OSM maker is a novel efficient pathfinder… ▽ More A prior global topological map (e.g., the OpenStreetMap, OSM) can boost the performance of autonomous mapping by a ground mobile robot. However, the prior map is usually incomplete due to lacking labeling in partial paths. To solve this problem, this paper proposes an OSM maker using airborne sensors carried by low-altitude aircraft, where the core of the OSM maker is a novel efficient pathfinder approach based on LiDAR and camera data, i.e., a binary dual-stream road segmentation model. Specifically, a multi-scale feature extraction based on the UNet architecture is implemented for images and point clouds. To reduce the effect caused by the sparsity of point cloud, an attention-guided gated block is designed to integrate image and point-cloud features. For enhancing the efficiency of the model, we propose a binarization streamline to each model component, including a variant of vision transformer (ViT) architecture as the encoder of the image branch, and new focal and perception losses to optimize the model training. The experimental results on two datasets demonstrate that our pathfinder method achieves SOTA accuracy with high efficiency in finding paths from the low-level airborne sensors, and we can create complete OSM prior maps based on the segmented road skeletons. Code and data are available at:https://github.com/IMRL/Pathfinder}{https://github.com/IMRL/Pathfinder. △ Less

Submitted 22 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

arXiv:2408.12527 [pdf, other]

UMAD: University of Macau Anomaly Detection Benchmark Dataset

Authors: Dong Li, Lineng Chen, Cheng-Zhong Xu, Hui Kong

Abstract: Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD… ▽ More Anomaly detection is critical in surveillance systems and patrol robots by identifying anomalous regions in images for early warning. Depending on whether reference data are utilized, anomaly detection can be categorized into anomaly detection with reference and anomaly detection without reference. Currently, anomaly detection without reference, which is closely related to out-of-distribution (OoD) object detection, struggles with learning anomalous patterns due to the difficulty of collecting sufficiently large and diverse anomaly datasets with the inherent rarity and novelty of anomalies. Alternatively, anomaly detection with reference employs the scheme of change detection to identify anomalies by comparing semantic changes between a reference image and a query one. However, there are very few ADr works due to the scarcity of public datasets in this domain. In this paper, we aim to address this gap by introducing the UMAD Benchmark Dataset. To our best knowledge, this is the first benchmark dataset designed specifically for anomaly detection with reference in robotic patrolling scenarios, e.g., where an autonomous robot is employed to detect anomalous objects by comparing a reference and a query video sequences. The reference sequences can be taken by the robot along a specified route when there are no anomalous objects in the scene. The query sequences are captured online by the robot when it is patrolling in the same scene following the same route. Our benchmark dataset is elaborated such that each query image can find a corresponding reference based on accurate robot localization along the same route in the prebuilt 3D map, with which the reference and query images can be geometrically aligned using adaptive warping. Besides the proposed benchmark dataset, we evaluate the baseline models of ADr on this dataset. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: Accepted by the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024, project code at https://github.com/IMRL/UMAD

arXiv:2408.09504 [pdf]

Design and Experimental Study of Vacuum Suction Grabbing Technology to Grasp Fabric Piece

Authors: Ray Wai Man Kong, Mingyi Liu, Theodore Ho Tin Kong

Abstract: The primary objective of this study was to design the grabbing technique used to determine the vacuum suction gripper and its design parameters for the pocket welting operation in apparel manufacturing. It presents the application of vacuum suction in grabbing technology, a technique that has revolutionized the handling and manipulation to grasp the various fabric materials in a range of garment i… ▽ More The primary objective of this study was to design the grabbing technique used to determine the vacuum suction gripper and its design parameters for the pocket welting operation in apparel manufacturing. It presents the application of vacuum suction in grabbing technology, a technique that has revolutionized the handling and manipulation to grasp the various fabric materials in a range of garment industries. Vacuum suction, being non-intrusive and non-invasive, offers several advantages compared to traditional grabbing methods. It is particularly useful in scenarios where soft woven fabric and air-impermeable fabric items need to be handled with utmost care. The paper delves into the working principles of vacuum suction, its various components, and the underlying physics involved. Furthermore, it explores the various applications of vacuum suction in the garment industry into the automation exploration. The paper also highlights the challenges and limitations of vacuum suction technology and suggests potential areas for further research and development. △ Less

Submitted 18 August, 2024; originally announced August 2024.

Comments: 9 Pages, 3 figures, 6 diagrams, 1 table

arXiv:2407.17078 [pdf, other]

Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments

Authors: Wei Gao, Zezhou Sun, Mingle Zhao, Cheng-Zhong Xu, Hui Kong

Abstract: The autonomous mapping of large-scale urban scenes presents significant challenges for autonomous robots. To mitigate the challenges, global planning, such as utilizing prior GPS trajectories from OpenStreetMap (OSM), is often used to guide the autonomous navigation of robots for mapping. However, due to factors like complex terrain, unexpected body movement, and sensor noise, the uncertainty of t… ▽ More The autonomous mapping of large-scale urban scenes presents significant challenges for autonomous robots. To mitigate the challenges, global planning, such as utilizing prior GPS trajectories from OpenStreetMap (OSM), is often used to guide the autonomous navigation of robots for mapping. However, due to factors like complex terrain, unexpected body movement, and sensor noise, the uncertainty of the robot's pose estimates inevitably increases over time, ultimately leading to the failure of robotic mapping. To address this issue, we propose a novel active loop closure procedure, enabling the robot to actively re-plan the previously planned GPS trajectory. The method can guide the robot to re-visit the previous places where the loop-closure detection can be performed to trigger the back-end optimization, effectively reducing errors and uncertainties in pose estimation. The proposed active loop closure mechanism is implemented and embedded into a real-time OSM-guided robot mapping framework. Empirical results on several large-scale outdoor scenarios demonstrate its effectiveness and promising performance. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.12867 [pdf, other]

Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run

Authors: Gayathri Raman, Samuele Ronchini, James Delaunay, Aaron Tohuvavohu, Jamie A. Kennea, Tyler Parsotan, Elena Ambrosi, Maria Grazia Bernardini, Sergio Campana, Giancarlo Cusumano, Antonino D'Ai, Paolo D'Avanzo, Valerio D'Elia, Massimiliano De Pasquale, Simone Dichiara, Phil Evans, Dieter Hartmann, Paul Kuin, Andrea Melandri, Paul O'Brien, Julian P. Osborne, Kim Page, David M. Palmer, Boris Sbarufatti, Gianpiero Tagliaferri , et al. (1797 additional authors not shown)

Abstract: We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav… ▽ More We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 50 pages, 10 figures, 4 tables

arXiv:2407.04519 [pdf, other]

Success or Failure? Analyzing Segmentation Refinement with Few-Shot Segmentation

Authors: Seonghyeon Moon, Haein Kong, Muhammad Haris Khan

Abstract: The purpose of segmentation refinement is to enhance the initial coarse masks generated by segmentation algorithms. The refined masks are expected to capture the details and contours of the target objects. Research on segmentation refinement has developed as a response to the need for high-quality initial masks. However, to our knowledge, no method has been developed that can determine the success… ▽ More The purpose of segmentation refinement is to enhance the initial coarse masks generated by segmentation algorithms. The refined masks are expected to capture the details and contours of the target objects. Research on segmentation refinement has developed as a response to the need for high-quality initial masks. However, to our knowledge, no method has been developed that can determine the success of segmentation refinement. Such a method could ensure the reliability of segmentation in applications where the outcome of the segmentation is important, and fosters innovation in image processing technologies. To address this research gap, we propose JFS~(Judging From Support-set), a method to identify the success of segmentation refinement leveraging a few-shot segmentation (FSS) model. The traditional goal of the problem in FSS is to find a target object in a query image utilizing target information given by a support set. However, in our proposed method, we use the FSS network in a novel way to assess the segmentation refinement. When there are two masks, a coarse mask and a refined mask from segmentation refinement, these two masks become support masks. The existing support mask works as a ground truth mask to judge whether the quality of the refined segmentation is more accurate than the coarse mask. We first obtained a coarse mask and refined it using SEPL (SAM Enhanced Pseduo-Labels) to get the two masks. Then, these become input to FSS model to judge whether the post-processing was successful. JFS is evaluated on the best and worst cases from SEPL to validate its effectiveness. The results showed that JFS can determine whether the SEPL is a success or not. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 4 pages

arXiv:2405.19813 [pdf, other]

SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

Authors: Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu, Shoudong Huang, Youfu Li, He Kong

Abstract: Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone… ▽ More Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone arrays. To tackle these challenges, in this paper, we adopt batch simultaneous localization and mapping (SLAM) for joint calibration of multiple asynchronous microphone arrays and sound source localization. Using the Fisher information matrix (FIM) approach, we first conduct the observability analysis (i.e., parameter identifiability) of the above-mentioned calibration problem and establish necessary/sufficient conditions under which the FIM and the Jacobian matrix have full column rank, which implies the identifiability of the unknown parameters. We also discover several scenarios where the unknown parameters are not uniquely identifiable. Subsequently, we propose an effective framework to initialize the unknown parameters, which is used as the initial guess in batch SLAM for multiple microphone arrays calibration, aiming to further enhance optimization accuracy and convergence. Extensive numerical simulations and real experiments have been conducted to verify the performance of the proposed method. The experiment results show that the proposed pipeline achieves higher accuracy with fast convergence in comparison to methods that use the noise-corrupted ground truth of the unknown parameters as the initial guess in the optimization and other existing frameworks. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: This paper was accepted to and going to appear in the IEEE Transactions on Robotics

arXiv:2405.16593 [pdf, other]

The Construction of Large-scale Structure Catalogs for the Dark Energy Spectroscopic Instrument

Authors: A. J. Ross, J. Aguilar, S. Ahlen, S. Alam, A. Anand, S. Bailey, D. Bianchi, S. Brieden, D. Brooks, E. Burtin, A. Carnero Rosell, E. Chaussidon, T. Claybaugh, S. Cole, K. Dawson, A. de la Macorra, A. de Mattia, Arjun Dey, Biprateep Dey, P. Doel, K. Fanning, S. Ferraro, J. Ereza, A. Font-Ribera, J. E. Forero-Romero , et al. (61 additional authors not shown)

Abstract: We present the technical details on how large-scale structure (LSS) catalogs are constructed from redshifts measured from spectra observed by the Dark Energy Spectroscopic Instrument (DESI). The LSS catalogs provide the information needed to determine the relative number density of DESI tracers as a function of redshift and celestial coordinates and, e.g., determine clustering statistics. We produ… ▽ More We present the technical details on how large-scale structure (LSS) catalogs are constructed from redshifts measured from spectra observed by the Dark Energy Spectroscopic Instrument (DESI). The LSS catalogs provide the information needed to determine the relative number density of DESI tracers as a function of redshift and celestial coordinates and, e.g., determine clustering statistics. We produce catalogs that are weighted subsamples of the observed data, each matched to a weighted `random' catalog that forms an unclustered sampling of the probability density that DESI could have observed those data at each location. Precise knowledge of the DESI observing history and associated hardware performance allows for a determination of the DESI footprint and the number of times DESI has covered it at sub-arcsecond level precision. This enables the completeness of any DESI sample to be modeled at this same resolution. The pipeline developed to create LSS catalogs has been designed to easily allow robustness tests and enable future improvements. We describe how it allows ongoing work improving the match between galaxy and random catalogs, such as including further information when assigning redshifts to randoms, accounting for fluctuations in target density, accounting for variation in the redshift success rate, and accommodating blinding schemes. △ Less

Submitted 18 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted (by JCAP) version of supporting publication of DESI 2024II: Sample definitions, characteristics, and two-point clustering statistics

arXiv:2405.16299 [pdf, other]

Forward modeling fluctuations in the DESI LRGs target sample using image simulations

Authors: Hui Kong, Ashley J. Ross, Klaus Honscheid, Dustin Lang, Anna Porredon, Arnaud de Mattia, Mehdi Rezaie, Rongpu Zhou, Edward Schlafly, John Moustakas, Alberto Rosado-Marin, Jessica Nicole Aguilar, Steven Ahlen, David Brooks, Edmond Chaussidon, Todd Claybaugh, Shaun Cole, Axel de la Macorra, Arjun Dey, Biprateep Dey, Peter Doel, Kevin Fanning, Jaime E. Forero-Romero, Enrique Gaztanaga, Satya Gontcho A Gontcho , et al. (28 additional authors not shown)

Abstract: We use the forward modeling pipeline, Obiwan, to study the imaging systematics of the Luminous Red Galaxies (LRGs) targeted by the Dark Energy Spectroscopic Instrument (DESI). We update the Obiwan pipeline, which had previously been developed to simulate the optical images used to target DESI data, to further simulate WISE images in the infrared. This addition makes it possible to simulate the DES… ▽ More We use the forward modeling pipeline, Obiwan, to study the imaging systematics of the Luminous Red Galaxies (LRGs) targeted by the Dark Energy Spectroscopic Instrument (DESI). We update the Obiwan pipeline, which had previously been developed to simulate the optical images used to target DESI data, to further simulate WISE images in the infrared. This addition makes it possible to simulate the DESI LRGs sample, which utilizes WISE data in the target selection. Deep DESI imaging data combined with a method to account for biases in their shapes is used to define a truth sample of potential LRG targets. We simulate a total of 15 million galaxies to obtain a simulated LRG sample (Obiwan LRGs) that predicts the variations in target density due to imaging properties. We find that the simulations predict the trends with depth observed in the data, including how they depend on the intrinsic brightness of the galaxies. We observe that faint LRGs are the main contributing power of the imaging systematics trend induced by depth. We also find significant trends in the data against Galactic extinction that are not predicted by Obiwan. These trends depend strongly on the particular map of Galactic extinction chosen to test against, implying Large-Scale Structure systematic contamination (e.g. Cosmic-Infrared Background) in the Galactic extinction maps is a likely root cause. We additionally observe that the DESI LRGs sample exhibits a complex dependency on a combination of seeing, depth, and intrinsic galaxy brightness, which is not replicated by Obiwan, suggesting discrepancies between the current simulation settings and the actual observations. The detailed findings we present should be used to guide any observational systematics mitigation treatment for the clustering of the DESI LRG sample. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 46 pages, 26 figures

arXiv:2405.02817 [pdf, other]

Labeling supervised fine-tuning data with the scaling law

Authors: Huanjun Kong

Abstract: This paper introduces a multi-stage manual annotation calibrated by the scaling law, offering a high-quality Supervised Fine-Tuning data acquisition method for environments with constrained resources like GPU poor, limited GPT access, and funding restrictions. We have preprocessed 58k authentic chat data and manually annotated 2.3k questions. After this, we conducted fine-tuning on Qwen models, ra… ▽ More This paper introduces a multi-stage manual annotation calibrated by the scaling law, offering a high-quality Supervised Fine-Tuning data acquisition method for environments with constrained resources like GPU poor, limited GPT access, and funding restrictions. We have preprocessed 58k authentic chat data and manually annotated 2.3k questions. After this, we conducted fine-tuning on Qwen models, ranging from 0.5B to 32B parameters. The optimal version improved 29.07 in F1 score. This confirms the viability of fine-tuning Large Language Model (LLM) for downstream Natural Language Processing (NLP) tasks. Our contributions are: 1) Created Supervised Fine-Tuning (SFT) training data in alpaca format, along with a set of Low-Rank Adaptation (LoRA) weights, and 2) Developed a method for acquiring high-quality data leveraging scaling law principle. The script, raw data with alpaca format and experiments track are open-sourced on Github (https://github.com/InternLM/HuixiangDou/tree/main/web/tools), HuggingFace (https://huggingface.co/tpoisonooo) and WandB (https://wandb.ai/tpoisonooo/huixiangdou-cr/table?nw=nwusertpoisonooo). The privacy of the data involved has been authorized by users. SFT data and license comes from ncnn contributors group. △ Less

Submitted 16 August, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 5 pages, 3 tables, 3 figures

arXiv:2405.02145 [pdf, other]

Characterized Diffusion and Spatial-Temporal Interaction Network for Trajectory Prediction in Autonomous Driving

Authors: Haicheng Liao, Xuelin Li, Yongkang Li, Hanlin Kong, Chengyue Wang, Bonan Wang, Yanchen Guan, KaHou Tam, Zhenning Li, Chengzhong Xu

Abstract: Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module… ▽ More Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module, an innovative module designed to simulate traffic scenarios with inherent uncertainty. This module enriches the predictive process by infusing it with detailed semantic information, thereby enhancing trajectory prediction accuracy. Complementing this, our Spatio-Temporal (ST) Interaction Module captures the nuanced effects of traffic scenarios on vehicle dynamics across both spatial and temporal dimensions with remarkable effectiveness. Demonstrated through exhaustive evaluations, our model sets a new standard in trajectory prediction, achieving state-of-the-art (SOTA) results on the Next Generation Simulation (NGSIM), Highway Drone (HighD), and Macao Connected Autonomous Driving (MoCAD) datasets across both short and extended temporal spans. This performance underscores the model's unparalleled adaptability and efficacy in navigating complex traffic scenarios, including highways, urban streets, and intersections. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.19385 [pdf]

High-performance solid-state electrochemical thermal switches with earth-abundant cerium oxide

Authors: Ahrong Jeong, Mitsuki Yoshimura, Hyeonjun Kong, Zhiping Bian, Jason Tam, Bin Feng, Yuichi Ikuhara, Takashi Endo, Yasutaka Matsuo, Hiromichi Ohta

Abstract: Thermal switches, which electrically turn heat flow on and off, have attracted attention as thermal management devices. Electrochemical reduction/oxidation switches the thermal conductivity (\k{appa}\) of active metal oxide films. The performance of the previously proposed electrochemical thermal switches is low; on/off \k{appa}\-ratio is mostly less than 5 and \k{appa}\-switching width is less th… ▽ More Thermal switches, which electrically turn heat flow on and off, have attracted attention as thermal management devices. Electrochemical reduction/oxidation switches the thermal conductivity (\k{appa}\) of active metal oxide films. The performance of the previously proposed electrochemical thermal switches is low; on/off \k{appa}\-ratio is mostly less than 5 and \k{appa}\-switching width is less than 5 W/mK. We used CeO2 thin film as the active layer deposited on a solid electrolyte YSZ substrate. When the CeO2 thin film was reduced once (off-state) and then oxidized (on-state), \k{appa}\ was about 2.2 W/mK in the most reduced state, and \k{appa}\ increased with oxidation to 12.5 W/mK (on-state). This reduction (off-state)/oxidation (on-state) cycle was repeated 100 times and the average value of \k{appa}\ was 2.2 W/mK after reduction (off-state) and 12.5 W/mK after oxidation (on-state). The on/off \k{appa}\-ratio was 5.8 and \k{appa}\-switching width was 10.3 W/mK. The CeO2-based solid-state electrochemical thermal switches would be potential devices for thermal shutters and thermal displays. △ Less

Submitted 22 August, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: 19 pages, 7 figures with supporting information (12 pages, 11 figures, 1 table)

arXiv:2404.17520 [pdf, other]

A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environment

Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui, Chengzhong Xu

Abstract: As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traff… ▽ More As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traffic scenarios. It represents a significant leap forward, achieving marked performance improvements on several key datasets. Specifically, it surpasses existing benchmarks with gains of 16.2% on the Next Generation Simulation (NGSIM), 27.4% on the Highway Drone (HighD), and 19.8% on the Macao Connected Autonomous Driving (MoCAD) dataset. Our proposed model shows exceptional proficiency in handling corner cases, essential for real-world applications. Moreover, its robustness is evident in scenarios with missing or limited data, outperforming most of the state-of-the-art baselines. This adaptability and resilience position our model as a viable tool for real-world autonomous driving systems, heralding a new standard in vehicle trajectory prediction for enhanced safety and efficiency. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.11313 [pdf, other]

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2404.05364 [pdf, other]

Autoregressive Search of Gravitational Waves: Denoising

Authors: Sangin Kim, C. Y. Hui, Jianqi Yan, Alex P. Leung, Kwangmin Oh, A. K. H. Kong, L. C. -C. Lin, Kwan-Lok Li

Abstract: Because of the small strain amplitudes of gravitational-wave (GW) signals, unveiling them in the presence of detector/environmental noise is challenging. For visualizing the signals and extracting its waveform for a comparison with theoretical prediction, a frequency-domain whitening process is commonly adopted for filtering the data. In this work, we propose an alternative template-free framework… ▽ More Because of the small strain amplitudes of gravitational-wave (GW) signals, unveiling them in the presence of detector/environmental noise is challenging. For visualizing the signals and extracting its waveform for a comparison with theoretical prediction, a frequency-domain whitening process is commonly adopted for filtering the data. In this work, we propose an alternative template-free framework based on autoregressive modeling for denoising the GW data and extracting the waveform. We have tested our framework on extracting the injected signals from the simulated data as well as a series of known compact binary coalescence (CBC) events from the LIGO data. Comparing with the conventional whitening procedure, our methodology generally yields improved cross-correlation and reduced root mean square errors with respect to the signal model. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Phys. Rev. D in press, 16 pages, 11 figures, 1 table

arXiv:2404.04248 [pdf, other]

doi 10.3847/2041-8213/ad5beb

Observation of Gravitational Waves from the Coalescence of a $2.5\text{-}4.5~M_\odot$ Compact Object and a Neutron Star

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, S. Akçay, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah , et al. (1771 additional authors not shown)

Abstract: We report the observation of a coalescing compact binary with component masses $2.5\text{-}4.5~M_\odot$ and $1.2\text{-}2.0~M_\odot$ (all measurements quoted at the 90% credible level). The gravitational-wave signal GW230529_181500 was observed during the fourth observing run of the LIGO-Virgo-KAGRA detector network on 2023 May 29 by the LIGO Livingston Observatory. The primary component of the so… ▽ More We report the observation of a coalescing compact binary with component masses $2.5\text{-}4.5~M_\odot$ and $1.2\text{-}2.0~M_\odot$ (all measurements quoted at the 90% credible level). The gravitational-wave signal GW230529_181500 was observed during the fourth observing run of the LIGO-Virgo-KAGRA detector network on 2023 May 29 by the LIGO Livingston Observatory. The primary component of the source has a mass less than $5~M_\odot$ at 99% credibility. We cannot definitively determine from gravitational-wave data alone whether either component of the source is a neutron star or a black hole. However, given existing estimates of the maximum neutron star mass, we find the most probable interpretation of the source to be the coalescence of a neutron star with a black hole that has a mass between the most massive neutron stars and the least massive black holes observed in the Galaxy. We provisionally estimate a merger rate density of $55^{+127}_{-47}~\text{Gpc}^{-3}\,\text{yr}^{-1}$ for compact binary coalescences with properties similar to the source of GW230529_181500; assuming that the source is a neutron star-black hole merger, GW230529_181500-like sources constitute about 60% of the total merger rate inferred for neutron star-black hole coalescences. The discovery of this system implies an increase in the expected rate of neutron star-black hole mergers with electromagnetic counterparts and provides further evidence for compact objects existing within the purported lower mass gap. △ Less

Submitted 26 July, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: 45 pages (10 pages author list, 13 pages main text, 1 page acknowledgements, 13 pages appendices, 8 pages bibliography), 17 figures, 16 tables. Update to match version published in The Astrophysical Journal Letters. Data products available from https://zenodo.org/records/10845779

Report number: LIGO-P2300352

Journal ref: ApJL 970, L34 (2024)

arXiv:2404.02405 [pdf, other]

TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression

Authors: Ho-Joong Kim, Jung-Ho Hong, Heejo Kong, Seong-Whan Lee

Abstract: In this paper, we investigate that the normalized coordinate expression is a key factor as reliance on hand-crafted components in query-based detectors for temporal action detection (TAD). Despite significant advancements towards an end-to-end framework in object detection, query-based detectors have been limited in achieving full end-to-end modeling in TAD. To address this issue, we propose \mode… ▽ More In this paper, we investigate that the normalized coordinate expression is a key factor as reliance on hand-crafted components in query-based detectors for temporal action detection (TAD). Despite significant advancements towards an end-to-end framework in object detection, query-based detectors have been limited in achieving full end-to-end modeling in TAD. To address this issue, we propose \modelname{}, a full end-to-end temporal action detection transformer that integrates time-aligned coordinate expression. We reformulate coordinate expression utilizing actual timeline values, ensuring length-invariant representations from the extremely diverse video duration environment. Furthermore, our proposed adaptive query selection dynamically adjusts the number of queries based on video length, providing a suitable solution for varying video durations compared to a fixed query set. Our approach not only simplifies the TAD process by eliminating the need for hand-crafted components but also significantly improves the performance of query-based detectors. Our TE-TAD outperforms the previous query-based detectors and achieves competitive performance compared to state-of-the-art methods on popular benchmark datasets. Code is available at: https://github.com/Dotori-HJ/TE-TAD △ Less

Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.15026 [pdf, other]

VRSO: Visual-Centric Reconstruction for Static Object Annotation

Authors: Chenyao Yu, Yingfeng Cai, Jiaxin Zhang, Hui Kong, Wei Sui, Cong Yang

Abstract: As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labelling over the dense LiDAR point clouds and reference images.… ▽ More As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labelling over the dense LiDAR point clouds and reference images. Though most public driving datasets adopt this strategy to provide SOD ground truth (GT), it is still expensive and time-consuming in practice. This paper introduces VRSO, a visual-centric approach for static object annotation. Experiments on the Waymo Open Dataset show that the mean reprojection error from VRSO annotation is only 2.6 pixels, around four times lower than the Waymo Open Dataset labels (10.6 pixels). VRSO is distinguished in low cost, high efficiency, and high quality: (1) It recovers static objects in 3D space with only camera images as input, and (2) manual annotation is barely involved since GT for SOD tasks is generated based on an automatic reconstruction and annotation pipeline. △ Less

Submitted 29 August, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted at 2024 IEEE International Conference on Intelligent Robots and Systems (IROS)

arXiv:2403.05791 [pdf, other]

Asynchronous Microphone Array Calibration using Hybrid TDOA Information

Authors: Chengjie Zhang, Jiang Wang, He Kong

Abstract: Asynchronous microphone array calibration is a prerequisite for most audition robot applications. A popular solution to the above calibration problem is the batch form of Simultaneous Localisation and Mapping (SLAM), using the time difference of arrival measurements between two microphones (TDOA-M), and the robot (which serves as a moving sound source during calibration) odometry information. In t… ▽ More Asynchronous microphone array calibration is a prerequisite for most audition robot applications. A popular solution to the above calibration problem is the batch form of Simultaneous Localisation and Mapping (SLAM), using the time difference of arrival measurements between two microphones (TDOA-M), and the robot (which serves as a moving sound source during calibration) odometry information. In this paper, we introduce a new form of measurement for microphone array calibration, i.e. the time difference of arrival between adjacent sound events (TDOA-S) with respect to the microphone channels. We propose to combine TDOA-S and TDOA-M, called hybrid TDOA, together with odometry measurements for bath SLAM-based calibration of asynchronous microphone arrays. Simulation and real-world experiment results consistently show that our method is more independent of microphone number, less sensitive to initial values (when using off-the-shelf algorithms such as Gauss-Newton iterations), and has better calibration accuracy and robustness under various TDOA noises. In addition, the simulation result demonstrates that our method has a lower Cramér-Rao lower bound (CRLB) for microphone parameters. To benefit the community, we open-source our code and data at https://github.com/zcj808/Hybrid-TDOA-Calib. △ Less

Submitted 19 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04350 [pdf, other]

Extract non-Gaussian Features in Gravitational Wave Observation Data Using Self-Supervised Learning

Authors: Yu-Chiung Lin, Albert K. H. Kong

Abstract: We propose a self-supervised learning model to denoise gravitational wave (GW) signals in the time series strain data without relying on waveform information. Denoising GW data is a crucial intermediate process for machine-learning-based data analysis techniques, as it can simplify the model for downstream tasks such as detections and parameter estimations. We use the blind-spot neural network and… ▽ More We propose a self-supervised learning model to denoise gravitational wave (GW) signals in the time series strain data without relying on waveform information. Denoising GW data is a crucial intermediate process for machine-learning-based data analysis techniques, as it can simplify the model for downstream tasks such as detections and parameter estimations. We use the blind-spot neural network and train it with whitened strain data with GW signals injected as both input data and target. Under the assumption of a Gaussian noise model, our model successfully denoises 38% of GW signals from binary black hole mergers in H1 data and 49% of signals in L1 data detected in the O1, O2, and O3 observation runs with an overlap greater than 0.5. We also test the model's potential to extract glitch features and loud inspiral compact binary coalescence signals a few seconds before the merger. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 39 pages, 15 figures in the main article, and 43 figures in the appendix

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2402.07945 [pdf, other]

ScreenAgent: A Vision Language Model-driven Computer Control Agent

Authors: Runliang Niu, Jindong Li, Shiqi Wang, Yali Fu, Xiyu Hu, Xueyuan Leng, He Kong, Yi Chang, Qi Wang

Abstract: Existing Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled directly by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision… ▽ More Existing Large Language Models (LLM) can invoke a variety of tools and APIs to complete complex tasks. The computer, as the most powerful and universal tool, could potentially be controlled directly by a trained LLM agent. Powered by the computer, we can hopefully build a more generalized agent to assist humans in various daily digital works. In this paper, we construct an environment for a Vision Language Model (VLM) agent to interact with a real computer screen. Within this environment, the agent can observe screenshots and manipulate the Graphics User Interface (GUI) by outputting mouse and keyboard actions. We also design an automated control pipeline that includes planning, acting, and reflecting phases, guiding the agent to continuously interact with the environment and complete multi-step tasks. Additionally, we construct the ScreenAgent Dataset, which collects screenshots and action sequences when completing a variety of daily computer tasks. Finally, we trained a model, ScreenAgent, which achieved computer control capabilities comparable to GPT-4V and demonstrated more precise UI positioning capabilities. Our attempts could inspire further research on building a generalist LLM agent. The code is available at \url{https://github.com/niuzaisheng/ScreenAgent}. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03649 [pdf, ps, other]

Group completions and the homotopical monadicity theorem

Authors: Hana Jia Kong, J. Peter May, Foling Zou

Abstract: We abstract and generalize homotopical monadicity statements, placing in a single conceptual framework a range of old and recent recognition and characterization principles in iterated loop space theory in classical, equivariant, and multiplicative frameworks. Some of the examples are new and some are old, but all are illuminated by the coherent framework, which we feel certain will encompass exam… ▽ More We abstract and generalize homotopical monadicity statements, placing in a single conceptual framework a range of old and recent recognition and characterization principles in iterated loop space theory in classical, equivariant, and multiplicative frameworks. Some of the examples are new and some are old, but all are illuminated by the coherent framework, which we feel certain will encompass examples not yet thought of. The work is currently divided into three independently readable papers. This first paper is itself divided into three parts. In the first, we give the general abstract theory and treat the classical examples with structured spaces or $G$-spaces as input. In the second, we develop a general context of composite adjunctions that feeds into the first. It specializes, quite differently, to give infinite loop space machines that take either orbital presheaves or categories of operators as input. In the brief third part, we show how the multiplicative theory fits directly into the frameworks of the first and second parts. The second paper will focus on new constructions and applications when the starting category is that of orbital presheaves. The third will feed in new multiplicative constructions of the second author. Both papers fit into the general context established here, but the new constructions are of considerable independent interest. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 88 pages

MSC Class: 55P48; 55P91; 18M60; 18C15

arXiv:2402.00330 [pdf, other]

Night-Rider: Nocturnal Vision-aided Localization in Streetlight Maps Using Invariant Extended Kalman Filtering

Authors: Tianxiao Gao, Mingle Zhao, Chengzhong Xu, Hui Kong

Abstract: Vision-aided localization for low-cost mobile robots in diverse environments has attracted widespread attention recently. Although many current systems are applicable in daytime environments, nocturnal visual localization is still an open problem owing to the lack of stable visual information. An insight from most nocturnal scenes is that the static and bright streetlights are reliable visual info… ▽ More Vision-aided localization for low-cost mobile robots in diverse environments has attracted widespread attention recently. Although many current systems are applicable in daytime environments, nocturnal visual localization is still an open problem owing to the lack of stable visual information. An insight from most nocturnal scenes is that the static and bright streetlights are reliable visual information for localization. Hence we propose a nocturnal vision-aided localization system in streetlight maps with a novel data association and matching scheme using object detection methods. We leverage the Invariant Extended Kalman Filter (InEKF) to fuse IMU, odometer, and camera measurements for consistent state estimation at night. Furthermore, a tracking recovery module is also designed for tracking failures. Experimental results indicate that our proposed system achieves accurate and robust localization with less than $0.2\%$ relative error of trajectory length in four nocturnal environments. △ Less

Submitted 3 March, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.13877 [pdf]

doi 10.5194/nhess-24-3075-2024

AscDAMs: Advanced SLAM-based channel detection and mapping system

Authors: Tengfei Wang, Fucheng Lu, Jintao Qin, Taosheng Huang, Hui Kong, Ping Shen

Abstract: Obtaining high-resolution, accurate channel topography and deposit conditions is the prior challenge for the study of channelized debris flow. Currently, wide-used mapping technologies including satellite imaging and drone photogrammetry struggle to precisely observe channel interior conditions of mountainous long-deep gullies, particularly those in the Wenchuan Earthquake region. SLAM is an emerg… ▽ More Obtaining high-resolution, accurate channel topography and deposit conditions is the prior challenge for the study of channelized debris flow. Currently, wide-used mapping technologies including satellite imaging and drone photogrammetry struggle to precisely observe channel interior conditions of mountainous long-deep gullies, particularly those in the Wenchuan Earthquake region. SLAM is an emerging tech for 3D mapping; however, extremely rugged environment in long-deep gullies poses two major challenges even for the state-of-art SLAM: (1) Atypical features; (2) Violent swaying and oscillation of sensors. These issues result in large deviation and lots of noise for SLAM results. To improve SLAM mapping in such environments, we propose an advanced SLAM-based channel detection and mapping system, namely AscDAMs. It features three main enhancements to post-process SLAM results: (1) The digital orthophoto map aided deviation correction algorithm greatly eliminates the systematic error; (2) The point cloud smoothing algorithm substantially diminishes noises; (3) The cross section extraction algorithm enables the quantitative assessment of channel deposits and their changes. Two field experiments were conducted in Chutou Gully, Wenchuan County in China in February and November 2023, representing observations before and after the rainy season. We demonstrate the capability of AscDAMs to greatly improve SLAM results, promoting SLAM for mapping the specially challenging environment. The proposed method compensates for the insufficiencies of existing technologies in detecting debris flow channel interiors including detailed channel morphology, erosion patterns, deposit distinction, volume estimation and change detection. It serves to enhance the study of full-scale debris flow mechanisms, long-term post-seismic evolution, and hazard assessment. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.08772 [pdf, other]

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Authors: Huanjun Kong, Songyang Zhang, Jiaying Li, Min Xiao, Jun Xu, Kai Chen

Abstract: In this work, we present HuixiangDou, a technical assistant powered by Large Language Models (LLM). This system is designed to assist algorithm developers by providing insightful responses to questions related to open-source algorithm projects, such as computer vision and deep learning projects from OpenMMLab. We further explore the integration of this assistant into the group chats of instant mes… ▽ More In this work, we present HuixiangDou, a technical assistant powered by Large Language Models (LLM). This system is designed to assist algorithm developers by providing insightful responses to questions related to open-source algorithm projects, such as computer vision and deep learning projects from OpenMMLab. We further explore the integration of this assistant into the group chats of instant messaging (IM) tools such as WeChat and Lark. Through several iterative improvements and trials, we have developed a sophisticated technical chat assistant capable of effectively answering users' technical questions without causing message flooding. This paper's contributions include: 1) Designing an algorithm pipeline specifically for group chat scenarios; 2) Verifying the reliable performance of text2vec in task rejection; 3) Identifying three critical requirements for LLMs in technical-assistant-like products, namely scoring ability, In-Context Learning (ICL), and Long Context. We have made the source code, android app and web service available at Github (https://github.com/internlm/huixiangdou), OpenXLab (https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web) and YouTube (https://youtu.be/ylXrT-Tei-Y) to aid in future research and application. HuixiangDou is applicable to any group chat within IM tools. △ Less

Submitted 12 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 13 pages, 4 figures

arXiv:2401.04872 [pdf, other]

Knowledge-aware Graph Transformer for Pedestrian Trajectory Prediction

Authors: Yu Liu, Yuexin Zhang, Kunming Li, Yongliang Qiao, Stewart Worrall, You-Fu Li, He Kong

Abstract: Predicting pedestrian motion trajectories is crucial for path planning and motion control of autonomous vehicles. Accurately forecasting crowd trajectories is challenging due to the uncertain nature of human motions in different environments. For training, recent deep learning-based prediction approaches mainly utilize information like trajectory history and interactions between pedestrians, among… ▽ More Predicting pedestrian motion trajectories is crucial for path planning and motion control of autonomous vehicles. Accurately forecasting crowd trajectories is challenging due to the uncertain nature of human motions in different environments. For training, recent deep learning-based prediction approaches mainly utilize information like trajectory history and interactions between pedestrians, among others. This can limit the prediction performance across various scenarios since the discrepancies between training datasets have not been properly incorporated. To overcome this limitation, this paper proposes a graph transformer structure to improve prediction performance, capturing the differences between the various sites and scenarios contained in the datasets. In particular, a self-attention mechanism and a domain adaption module have been designed to improve the generalization ability of the model. Moreover, an additional metric considering cross-dataset sequences is introduced for training and performance evaluation purposes. The proposed framework is validated and compared against existing methods using popular public datasets, i.e., ETH and UCY. Experimental results demonstrate the improved performance of our proposed scheme. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: This paper was accepted to and presented at the 26th IEEE International Conference on Intelligent Transportation Systems (ITSC), September 2023

arXiv:2401.00496 [pdf, other]

SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091 △ Less

Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

arXiv:2312.08746 [pdf, other]

DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators

Authors: Hanyang Kong, Dongze Lian, Michael Bi Mi, Xinchao Wang

Abstract: We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating unbounded flythrough scenes from textual prompts. Different from other methods that focus on warping images frame by frame, we advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and generalization ability. To further enhance t… ▽ More We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating unbounded flythrough scenes from textual prompts. Different from other methods that focus on warping images frame by frame, we advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and generalization ability. To further enhance the fidelity of the generated images, we also propose a feature-correspondence-guidance diffusion process and a high-pass filtering strategy to promote geometric consistency and high-frequency detail consistency, respectively. Extensive experiments reveal that DreamDrone significantly surpasses existing methods, delivering highly authentic scene generation with exceptional visual quality, without training or fine-tuning on datasets or reconstructing 3D point clouds in advance. △ Less

Submitted 24 September, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 16 pages, 12 figures, project page: https://hyokong.github.io/dreamdrone-page/

arXiv:2310.17750 [pdf, other]

A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications

Authors: Ahmed Magooda, Alec Helyar, Kyle Jackson, David Sullivan, Chad Atalla, Emily Sheng, Dan Vann, Richard Edgar, Hamid Palangi, Roman Lutz, Hongliang Kong, Vincent Yun, Eslam Kamal, Federico Zarfati, Hanna Wallach, Sarah Bird, Mei Chen

Abstract: We present a framework for the automated measurement of responsible AI (RAI) metrics for large language models (LLMs) and associated products and services. Our framework for automatically measuring harms from LLMs builds on existing technical and sociotechnical expertise and leverages the capabilities of state-of-the-art LLMs, such as GPT-4. We use this framework to run through several case studie… ▽ More We present a framework for the automated measurement of responsible AI (RAI) metrics for large language models (LLMs) and associated products and services. Our framework for automatically measuring harms from LLMs builds on existing technical and sociotechnical expertise and leverages the capabilities of state-of-the-art LLMs, such as GPT-4. We use this framework to run through several case studies investigating how different LLMs may violate a range of RAI-related principles. The framework may be employed alongside domain-specific sociotechnical expertise to create measurements for new harm areas in the future. By implementing this framework, we aim to enable more advanced harm measurement efforts and further the responsible use of LLMs. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: This is a living document

arXiv:2310.17450 [pdf, other]

Rapid Generation of Kilonova Light Curves Using Conditional Variational Autoencoder

Authors: Surojit Saha, Michael J. Williams, Laurence Datrier, Fergus Hayes, Matt Nicholl, Albert K. H. Kong, Martin Hendry, IK Siong Heng, Gavin P. Lamb, En-Tzu Lin, Daniel Williams

Abstract: The discovery of the optical counterpart, along with the gravitational waves from GW170817, of the first binary neutron star merger, opened up a new era for multi-messenger astrophysics. Combining the GW data with the optical counterpart, also known as AT2017gfo, classified as a kilonova, has revealed the nature of compact binary merging systems by extracting enriched information about the total b… ▽ More The discovery of the optical counterpart, along with the gravitational waves from GW170817, of the first binary neutron star merger, opened up a new era for multi-messenger astrophysics. Combining the GW data with the optical counterpart, also known as AT2017gfo, classified as a kilonova, has revealed the nature of compact binary merging systems by extracting enriched information about the total binary mass, the mass ratio, the system geometry, and the equation of state. Even though the detection of kilonova brought about a revolution in the domain of multi-messenger astronomy, since there has been only one kilonova from a gravitational wave detected binary neutron star merger event so far, this limits the exact understanding of the origin and propagation of the kilonova. Here, we use a conditional variational autoencoder trained on light curve data from two kilonova models having different temporal lengths, and consequently, generate kilonova light curves rapidly based on physical parameters of our choice with good accuracy. Once trained, the time scale for light curve generation is of the order of a few milliseconds, thus speeding up generating light curves by $1000$ times compared to the simulation. The mean squared error between the generated and original light curves is typically $0.015$ with a maximum of $0.08$ for each set of considered physical parameter; while having a maximum of $\approx0.6$ error across the whole parameter space. Hence, implementing this technique provides fast and reliably accurate results. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 19 pages, 7 figures (3 additional figures in appendix), accepted to ApJ

arXiv:2310.10511 [pdf]

doi 10.1088/1741-4326/acff0a

A linear parameters study of ion cyclotron emission using drift ring beam distribution

Authors: Haozhe Kong, Huasheng Xie, Jizhong Sun

Abstract: Ion cyclotron emission (ICE) holds great potential as a diagnostic tool for fast ions in fusion devices. The theory of magnetoacoustic cyclotron instability (MCI), as an emission mechanism for ICE, states that MCI is driven by a velocity distribution of fast ions that approximates a drift ring beam. The influence of key parameters on the linear MCI is systematically investigated using the linear k… ▽ More Ion cyclotron emission (ICE) holds great potential as a diagnostic tool for fast ions in fusion devices. The theory of magnetoacoustic cyclotron instability (MCI), as an emission mechanism for ICE, states that MCI is driven by a velocity distribution of fast ions that approximates a drift ring beam. The influence of key parameters on the linear MCI is systematically investigated using the linear kinetic dispersion relation solver BO (Xie H. 2019 Comput. Phys. Comm. 244 343). The computational spectra region considered extends up to 40 times the ion cyclotron frequency. By examining the influence of these key parameters on MCI, several novel results have been obtained. In the case of MCI excited by super-Alfvénic fast ions, the parallel velocity spread significantly affects the bandwidth of harmonics and the continuous spectrum, while the perpendicular velocity spread has a decisive effect on the MCI growth rate. As the velocity spread increases, the linear relationship between the MCI growth rate and the square root of the number density ratio transitions to a linear relationship between the MCI growth rate and the number density ratio. This finding provides a linear perspective explanation for the observed linear relation between fast ion number density and ICE intensity in JET. Furthermore, high harmonics are more sensitive to changes in propagation angle than low harmonics because a decrease in the propagation angle alters the dispersion relation of the fast Alfvén wave. In the case of MCI excited by sub-Alfvénic fast ions, a significant growth rate increase occurs at high harmonics due to the transition of sub-Alfvénic fast ions to super-Alfvénic fast ions. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: 14 pages, 21 figures

Journal ref: Nucl. Fusion 63 (2023) 126034

arXiv:2308.14480 [pdf, other]

Priority-Centric Human Motion Generation in Discrete Latent Space

Authors: Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang

Abstract: Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is ess… ▽ More Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is essential to recognize that not all motions hold the same relevance to a particular textual description. Some motions, being more salient and informative, should be given precedence during generation. In response, we introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM), which utilizes a Transformer-based VQ-VAE to derive a concise, discrete motion representation, incorporating a global self-attention mechanism and a regularization term to counteract code collapse. We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token within the entire motion sequence. This approach retains the most salient motions during the reverse diffusion process, leading to more semantically rich and varied motions. Additionally, we formulate two strategies to gauge the importance of motion tokens, drawing from both textual and visual indicators. Comprehensive experiments on the HumanML3D and KIT-ML datasets confirm that our model surpasses existing techniques in fidelity and diversity, particularly for intricate textual descriptions. △ Less

Submitted 30 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.13666 [pdf, other]

A Joint Fermi-GBM and Swift-BAT Analysis of Gravitational-Wave Candidates from the Third Gravitational-wave Observing Run

Authors: C. Fletcher, J. Wood, R. Hamburg, P. Veres, C. M. Hui, E. Bissaldi, M. S. Briggs, E. Burns, W. H. Cleveland, M. M. Giles, A. Goldstein, B. A. Hristov, D. Kocevski, S. Lesage, B. Mailyan, C. Malacaria, S. Poolakkil, A. von Kienlin, C. A. Wilson-Hodge, The Fermi Gamma-ray Burst Monitor Team, M. Crnogorčević, J. DeLaunay, A. Tohuvavohu, R. Caputo, S. B. Cenko , et al. (1674 additional authors not shown)

Abstract: We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses,… ▽ More We present Fermi Gamma-ray Burst Monitor (Fermi-GBM) and Swift Burst Alert Telescope (Swift-BAT) searches for gamma-ray/X-ray counterparts to gravitational wave (GW) candidate events identified during the third observing run of the Advanced LIGO and Advanced Virgo detectors. Using Fermi-GBM on-board triggers and sub-threshold gamma-ray burst (GRB) candidates found in the Fermi-GBM ground analyses, the Targeted Search and the Untargeted Search, we investigate whether there are any coincident GRBs associated with the GWs. We also search the Swift-BAT rate data around the GW times to determine whether a GRB counterpart is present. No counterparts are found. Using both the Fermi-GBM Targeted Search and the Swift-BAT search, we calculate flux upper limits and present joint upper limits on the gamma-ray luminosity of each GW. Given these limits, we constrain theoretical models for the emission of gamma-rays from binary black hole mergers. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.06287 [pdf, other]

Chandra Observation of NGC 1559: Eight Ultraluminous X-ray Sources Including a Compact Binary Candidate

Authors: Chen-Hsun Ma, Kwan-Lok Li, You-Hua Chu, Albert K. H. Kong

Abstract: Despite the 30-year history of ultra-luminous X-ray sources (ULXs) studies, issues like the majority of their physical natures (i.e., neutron stars, stellar-mass black holes, or intermediate black holes) as well as the accretion mechanisms are still under debate. Expanding the ULX sample size in the literature is clearly a way to help. To this end, we investigated the X-ray source population, ULXs… ▽ More Despite the 30-year history of ultra-luminous X-ray sources (ULXs) studies, issues like the majority of their physical natures (i.e., neutron stars, stellar-mass black holes, or intermediate black holes) as well as the accretion mechanisms are still under debate. Expanding the ULX sample size in the literature is clearly a way to help. To this end, we investigated the X-ray source population, ULXs in particular, in the barred spiral galaxy NGC 1559 using a Chandra observation made in 2016. In this 45-ks exposure, 33 X-ray point sources were detected within the 2.'7 isophotal radius of the galaxy. Among them, 8 ULXs were identified with the criterion of the X-ray luminosity $L_x>10^{39}$ erg s$^{-1}$ (0.3-7~keV). Both X-ray light curves and spectra of all the sources were examined. Except for some low-count spectra that only provide ambiguous spectral fitting results, all the X-ray sources were basically spectrally hard and therefore likely have non-thermal origins. While no strong X-ray variability was present in most of the sources owing to the relatively short exposure of the observation, we found an intriguing ULX, named X-24, exhibiting a periodicity of $\sim$7500s with a detection significance of 2.7$σ$. We speculate that it is the orbital period of the system. Roche-lobe over flow and Roche limit are consistent with the speculation. Thus, we suggest that X-24 may be the one of the rare compact binary ULXs, and hence, a good candidate as a stellar-mass black hole. △ Less

Submitted 10 August, 2023; originally announced August 2023.

Comments: Accepted for publication in ApJ

arXiv:2308.04920 [pdf, ps, other]

doi 10.1093/mnras/stad2383

Influences of dynamical disruptions on the evolution of pulsars in globular clusters

Authors: Kwangmin Oh, C. Y. Hui, Jongsuk Hong, J. Takata, A. K. H. Kong, Pak-Hin Thomas Tam, Kwan-Lok Li, K. S. Cheng

Abstract: By comparing the physical properties of pulsars hosted by core-collapsed (CCed) and non-core-collapsed (Non-CCed) globular clusters (GCs), we find that pulsars in CCed GCs rotate significantly slower than their counterparts in Non-CCed GCs. Additionally, radio luminosities at 1.4 GHz in CCed GCs are higher. These findings are consistent with the scenario that dynamical interactions in GCs can inte… ▽ More By comparing the physical properties of pulsars hosted by core-collapsed (CCed) and non-core-collapsed (Non-CCed) globular clusters (GCs), we find that pulsars in CCed GCs rotate significantly slower than their counterparts in Non-CCed GCs. Additionally, radio luminosities at 1.4 GHz in CCed GCs are higher. These findings are consistent with the scenario that dynamical interactions in GCs can interrupt angular momentum transfer processes and surface magnetic field decay during the recycling phase. Our results suggest that such effects in CCed GCs are stronger due to more frequent disruptions of compact binaries. This is further supported by the observation that both estimated disruption rates and the fraction of isolated pulsars are predominantly higher in CCed GCs. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 9 pages, 8 figures, 3 tables, Accepted in MNRAS

arXiv:2308.04793 [pdf, other]

Cosmic ray calorimetry in star-forming galaxy populations and implications for their contribution to the extra-galactic $γ$-ray background

Authors: Ellis R. Owen, Albert K. H. Kong, Kuo-Chuan Pan

Abstract: Star-forming galaxies (SFGs) have been established as an important source population in the extra-galactic $γ$-ray background (EGB). Their intensive star-formation creates an abundance of environments able to accelerate particles, and these build-up a rich sea of cosmic rays (CRs). Above GeV energies, CR protons can undergo hadronic interactions with their environment to produce $γ$-rays. SFGs can… ▽ More Star-forming galaxies (SFGs) have been established as an important source population in the extra-galactic $γ$-ray background (EGB). Their intensive star-formation creates an abundance of environments able to accelerate particles, and these build-up a rich sea of cosmic rays (CRs). Above GeV energies, CR protons can undergo hadronic interactions with their environment to produce $γ$-rays. SFGs can operate as CR proton "calorimeters", where a large fraction of the CR energy is converted to $γ$-rays. However, CRs also deposit energy and momentum to modify the thermal and hydrodynamic conditions of the gas in SFGs, and can become a powerful driver of outflows. Such outflows are ubiquitous among some types of SFGs, and have the potential to severely degrade their CR proton calorimetry. This diminishes their contribution to the EGB. In this work, we adopt a self-consistent treatment of particle transport in outflows from SFGs to assess their calorimetry. We use 1D numerical treatments of galactic outflows driven by CRs and thermal gas pressure, accounting for the dynamical effects and interactions of CRs. We show the impact CR-driven flows have on the relative contribution of SFG populations to the EGB, and investigate the properties of SFGs that contribute most strongly. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 8 pages, 4 figures, 1 table. Presented at the 38th International Cosmic Ray Conference (ICRC2023)

Journal ref: PoS (ICRC2023), 554

arXiv:2308.03822 [pdf, other]

Search for Eccentric Black Hole Coalescences during the Third Observing Run of LIGO and Virgo

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi , et al. (1750 additional authors not shown)

Abstract: Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effect… ▽ More Despite the growing number of confident binary black hole coalescences observed through gravitational waves so far, the astrophysical origin of these binaries remains uncertain. Orbital eccentricity is one of the clearest tracers of binary formation channels. Identifying binary eccentricity, however, remains challenging due to the limited availability of gravitational waveforms that include effects of eccentricity. Here, we present observational results for a waveform-independent search sensitive to eccentric black hole coalescences, covering the third observing run (O3) of the LIGO and Virgo detectors. We identified no new high-significance candidates beyond those that were already identified with searches focusing on quasi-circular binaries. We determine the sensitivity of our search to high-mass (total mass $M>70$ $M_\odot$) binaries covering eccentricities up to 0.3 at 15 Hz orbital frequency, and use this to compare model predictions to search results. Assuming all detections are indeed quasi-circular, for our fiducial population model, we place an upper limit for the merger rate density of high-mass binaries with eccentricities $0 < e \leq 0.3$ at $0.33$ Gpc$^{-3}$ yr$^{-1}$ at 90\% confidence level. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 24 pages, 5 figures

Report number: LIGO-P2300080

arXiv:2308.01873 [pdf, other]

A deformation of Borel equivariant homotopy

Authors: Gabriel Angelini-Knoll, Mark Behrens, Eva Belmont, Hana Jia Kong

Abstract: We describe a deformation of the $\infty$-category of Borel $G$-spectra for a finite group $G$. This provides a new presentation of the $a$-complete real Artin--Tate motivic stable homotopy category when $G=C_2$ and gives a new interpretation of the $a$-completed $C_2$-effective slice spectral sequence. As a new computational tool, we present a modified Adams--Novikov spectral sequence which compu… ▽ More We describe a deformation of the $\infty$-category of Borel $G$-spectra for a finite group $G$. This provides a new presentation of the $a$-complete real Artin--Tate motivic stable homotopy category when $G=C_2$ and gives a new interpretation of the $a$-completed $C_2$-effective slice spectral sequence. As a new computational tool, we present a modified Adams--Novikov spectral sequence which computes the $RO(G)$-graded Mackey functor valued homotopy of Borel $G$-spectra. △ Less

Submitted 27 September, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: updated introduction and references; 49 pages, comments welcome!

MSC Class: 55P91; 14F42; 55P42; 55T15

arXiv:2308.00867 [pdf, ps, other]

Evidence of stellar oscillations in the post-common envelop binary candidate ASASSN-V J205543.90+240033.5

Authors: J. Takata, A. K. H. Kong, X. F. Wang, F. F. Song, J. Mao, X. Hou, C. -P. Hu, L. C. -C. Lin, K. L. Li, C. Y. Hui

Abstract: ASASSN-V J205543.90+240033.5 (ASJ2055) is a possible post-common envelope binary system. Its optical photometric data shows an orbital variation about $0.52$~days and a fast period modulation of $P_0\sim 9.77$~minute, whose origin is unknown. In this {\it Letter}, we report an evidence of the stellar oscillation of the companion star as the origin of the fast period modulation. We analyze the phot… ▽ More ASASSN-V J205543.90+240033.5 (ASJ2055) is a possible post-common envelope binary system. Its optical photometric data shows an orbital variation about $0.52$~days and a fast period modulation of $P_0\sim 9.77$~minute, whose origin is unknown. In this {\it Letter}, we report an evidence of the stellar oscillation of the companion star as the origin of the fast period modulation. We analyze the photometric data taken by TESS, Liverpool telescope, and Lulin One-meter Telescope. It is found that the period of the 9.77-minute signal measured in 2022 August is significantly shorter than that in 2021 July/August, and the magnitude of the change is of the order of $|\triangle P_0|/P_0\sim 0.0008(4)$. Such a large variation will be incompatible with the scenario of the white dwarf spin as the origin of the 9.77-minute periodic modulation. We suggest that the fast periodic signal is related to the emission from the irradiated companion star rather than that of the white dwarf. Using existing photometric data covering a wide wavelength range, we estimate that the hot white dwarf in ASJ2055 has a temperature of $T_{eff}\sim 80000$~K and is heating the oscillating M-type main-sequence star with $T_{eff}\sim 3500$~K on its un-irradiated surface. The stellar oscillation of M-type main-sequence star has been predicted in theoretical studies, but no observational confirmation has been done. ASJ2055, therefore, has a potential to be a unique laboratory to investigate the stellar oscillation of a M-type main-sequence star and the heating effect on the stellar oscillation. △ Less

Submitted 3 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: 9 pages, 5 figures, 2 tables. Accepted for publication in ApJ Letter

arXiv:2307.01753 [pdf, other]

doi 10.1093/mnras/stae886

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

Authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho , et al. (24 additional authors not shown)

Abstract: We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the… ▽ More We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against simulations with and without $\fnl$ and systematics, showing superior performance of the neural network treatment. The neural network with a set of nine imaging property maps passes our systematic null test criteria, and is chosen as the fiducial treatment. Assuming the universality relation, we find $\fnl = 34^{+24(+50)}_{-44(-73)}$ at 68\%(95\%) confidence. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. We study how the regression method biases the measured angular power-spectrum and degrades the $\fnl$ constraining power. The use of the nine maps more than doubles the uncertainty compared to using only the three primary maps in the regression. Our results thus motivate the development of more efficient methods that avoid over-correction, protect large-scale clustering information, and preserve constraining power. Additionally, our results encourage further studies of $\fnl$ with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics and lessen the degradation in the $\fnl$ uncertainty. △ Less

Submitted 25 June, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 21 pages, 17 figures, 7 tables (Appendix excluded). Published in MNRAS

arXiv:2306.14222 [pdf, other]

Unveiling the Potential of Sentiment: Can Large Language Models Predict Chinese Stock Price Movements?

Authors: Haohan Zhang, Fengrui Hua, Chengjin Xu, Hao Kong, Ruiting Zuo, Jian Guo

Abstract: The rapid advancement of Large Language Models (LLMs) has spurred discussions about their potential to enhance quantitative trading strategies. LLMs excel in analyzing sentiments about listed companies from financial news, providing critical insights for trading decisions. However, the performance of LLMs in this task varies substantially due to their inherent characteristics. This paper introduce… ▽ More The rapid advancement of Large Language Models (LLMs) has spurred discussions about their potential to enhance quantitative trading strategies. LLMs excel in analyzing sentiments about listed companies from financial news, providing critical insights for trading decisions. However, the performance of LLMs in this task varies substantially due to their inherent characteristics. This paper introduces a standardized experimental procedure for comprehensive evaluations. We detail the methodology using three distinct LLMs, each embodying a unique approach to performance enhancement, applied specifically to the task of sentiment factor extraction from large volumes of Chinese news summaries. Subsequently, we develop quantitative trading strategies using these sentiment factors and conduct back-tests in realistic scenarios. Our results will offer perspectives about the performances of Large Language Models applied to extracting sentiments from Chinese news texts. △ Less

Submitted 4 May, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

arXiv:2305.14038 [pdf, other]

Why semantics matters: A deep study on semantic particle-filtering localization in a LiDAR semantic pole-map

Authors: Yuming Huang, Yi Gu, Chengzhong Xu, Hui Kong

Abstract: In most urban and suburban areas, pole-like structures such as tree trunks or utility poles are ubiquitous. These structural landmarks are very useful for the localization of autonomous vehicles given their geometrical locations in maps and measurements from sensors. In this work, we aim at creating an accurate map for autonomous vehicles or robots with pole-like structures as the dominant localiz… ▽ More In most urban and suburban areas, pole-like structures such as tree trunks or utility poles are ubiquitous. These structural landmarks are very useful for the localization of autonomous vehicles given their geometrical locations in maps and measurements from sensors. In this work, we aim at creating an accurate map for autonomous vehicles or robots with pole-like structures as the dominant localization landmarks, hence called pole-map. In contrast to the previous pole-based mapping or localization methods, we exploit the semantics of pole-like structures. Specifically, semantic segmentation is achieved by a new mask-range transformer network in a mask-classfication paradigm. With the semantics extracted for the pole-like structures in each frame, a multi-layer semantic pole-map is created by aggregating the detected pole-like structures from all frames. Given the semantic pole-map, we propose a semantic particle-filtering localization scheme for vehicle localization. Theoretically, we have analyzed why the semantic information can benefit the particle-filter localization, and empirically it is validated on the public SemanticKITTI dataset that the particle-filtering localization with semantics achieves much better performance than the counterpart without semantics when each particle's odometry prediction and/or the online observation is subject to uncertainties at significant levels. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.07931 [pdf, other]

GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Authors: Tian Gao, Cheng-Zhong Xu, Le Zhang, Hui Kong

Abstract: Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compress… ▽ More Vision Transformer (ViT) has performed remarkably in various computer vision tasks. Nonetheless, affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method, model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we investigate a binarized ViT model. Empirically, we observe that the existing binarization technology designed for Convolutional Neural Networks (CNN) cannot migrate well to a ViT's binarization task. We also find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Analytically, model binarization can limit the parameters search space during parameter updates while training a model.... △ Less

Submitted 18 January, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

Comments: Accepted by Neural Networks

arXiv:2305.06086 [pdf]

doi 10.1088/1361-6587/ad1008

Enhancement of Fusion Reactivity under Non-Maxwellian Distributions: Effects of Drift-Ring-Beam, Slowing-Down, and Kappa Super-Thermal Distributions

Authors: Haozhe Kong, Huasheng Xie, Bing Liu, Muzhi Tan, Di Luo, Zhi Li, Jizhong Sun

Abstract: Non-Maxwellian distributions of particles are commonly observed in fusion studies, especially for magnetic confinement fusion plasmas. The particle distribution has a direct effect on fusion reactivity, which is the focus of this study. We investigate the effects of three types of non-Maxwellian distributions, namely drift-ring-beam, slowing-down, and kappa super-thermal distributions, on the fusi… ▽ More Non-Maxwellian distributions of particles are commonly observed in fusion studies, especially for magnetic confinement fusion plasmas. The particle distribution has a direct effect on fusion reactivity, which is the focus of this study. We investigate the effects of three types of non-Maxwellian distributions, namely drift-ring-beam, slowing-down, and kappa super-thermal distributions, on the fusion reactivities of D-T (Deuterium-Trillium) and p-B11 (proton-Boron) using a newly developed program, where the enhancement of fusion reactivity relative to the Maxwellian distribution is computed while keeping the total kinetic energy constant. The calculation results show that for the temperature ranges of interest to us, namely 5-50 keV for D-T and 100-500 keV for p-B11, these non-Maxwellian distributions can enhance the fusion reactivities. In the case of the drift-ring-beam distribution, the enhancement factors for both reactions are affected by the perpendicular ring beam velocity, leading to decreased enhancement in low temperature range and increased enhancement in high temperature range. However, this effect is favorable for p-B11 fusion reaction and unfavorable for D-T fusion reaction. In the slowing-down distribution, the birth speed plays a crucial role in both reactions, and increasing birth speed leads to a shift in the enhancement ranges towards lower temperatures, which is beneficial for both reactions. Finally, the kappa super-thermal distribution results in a relatively large enhancement in the low temperature range with a small high energy power-law index κ. Overall, this study provides insight into the effects of non-Maxwellian distributions on fusion reactivity and highlights potential opportunities for enhancing fusion efficiency. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 12 pages, 18 figures

Journal ref: Plasma Phys. Control. Fusion 66 (2024) 015009

arXiv:2304.08393 [pdf, other]

Search for gravitational-lensing signatures in the full third observing run of the LIGO-Virgo network

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, R. Abbott, H. Abe, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, C. Alléné, A. Allocca, P. A. Altin , et al. (1670 additional authors not shown)

Abstract: Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated… ▽ More Gravitational lensing by massive objects along the line of sight to the source causes distortions of gravitational wave-signals; such distortions may reveal information about fundamental physics, cosmology and astrophysics. In this work, we have extended the search for lensing signatures to all binary black hole events from the third observing run of the LIGO--Virgo network. We search for repeated signals from strong lensing by 1) performing targeted searches for subthreshold signals, 2) calculating the degree of overlap amongst the intrinsic parameters and sky location of pairs of signals, 3) comparing the similarities of the spectrograms amongst pairs of signals, and 4) performing dual-signal Bayesian analysis that takes into account selection effects and astrophysical knowledge. We also search for distortions to the gravitational waveform caused by 1) frequency-independent phase shifts in strongly lensed images, and 2) frequency-dependent modulation of the amplitude and phase due to point masses. None of these searches yields significant evidence for lensing. Finally, we use the non-detection of gravitational-wave lensing to constrain the lensing rate based on the latest merger-rate estimates and the fraction of dark matter composed of compact objects. △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: 28 pages, 11 figures

Report number: LIGO-P2200031

arXiv:2304.06077 [pdf, other]

doi 10.1093/mnras/stad969

A journey from the hard to the soft state: How do QPOs evolve in the 2021 outburst of GX 339-4?

Authors: H. Stiele, A. K. H. Kong

Abstract: We investigated the snapshots of five NICER observations of the black hole transient GX 339-4 when the source transited from the hard state into the soft state during its outburst in 2021. In this paper, we focused our study on the evolution of quasi-periodic oscillations (QPOs) and noise components using power-density spectra. In addition, we derived hardness ratios comparing count rates above an… ▽ More We investigated the snapshots of five NICER observations of the black hole transient GX 339-4 when the source transited from the hard state into the soft state during its outburst in 2021. In this paper, we focused our study on the evolution of quasi-periodic oscillations (QPOs) and noise components using power-density spectra. In addition, we derived hardness ratios comparing count rates above and below 2 keV. The evolution from the hard to the soft state was a somewhat erratic process showing several transitions between states that are dominated by top-flat noise and can show type-C QPOs; those that are dominated by red noise and can show type-B QPOs. From the parameters that we studied, we only found a strong correlation between the hardness ratio and the type of QPO observed. This implies that the appearance of type-B QPOs is related to a change in the accretion geometry of the system that also reflects in altered spectral properties. We also observed that the type-B QPO forms from or disintegrates into a broad peaked feature when the source comes out of or goes to the hard-intermediate state, respectively. This implies some strong decoherence in the process that creates this feature. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 6 pages, 5 figures, supplementary online material as appendices (13 pages), accepted for publication in MNRAS

arXiv:2303.15937 [pdf, other]

PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout

Authors: HsiaoYuan Hsu, Xiangteng He, Yuxin Peng, Hao Kong, Qing Zhang

Abstract: Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements, including text, logo, and underlay, which is a key to automatic template-free creative graphic design. In practical applications, e.g., poster designs, the canvas is originally non-empty, and both inter-element relationships as well as inter-layer relationships should be c… ▽ More Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements, including text, logo, and underlay, which is a key to automatic template-free creative graphic design. In practical applications, e.g., poster designs, the canvas is originally non-empty, and both inter-element relationships as well as inter-layer relationships should be concerned when generating a proper layout. A few recent works deal with them simultaneously, but they still suffer from poor graphic performance, such as a lack of layout variety or spatial non-alignment. Since content-aware visual-textual presentation layout is a novel task, we first construct a new dataset named PosterLayout, which consists of 9,974 poster-layout pairs and 905 images, i.e., non-empty canvases. It is more challenging and useful for greater layout variety, domain diversity, and content diversity. Then, we propose design sequence formation (DSF) that reorganizes elements in layouts to imitate the design processes of human designers, and a novel CNN-LSTM-based conditional generative adversarial network (GAN) is presented to generate proper layouts. Specifically, the discriminator is design-sequence-aware and will supervise the "design" process of the generator. Experimental results verify the usefulness of the new benchmark and the effectiveness of the proposed approach, which achieves the best performance by generating suitable layouts for diverse canvases. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023. Dataset and code are available at https://github.com/PKU-ICST-MIPL/PosterLayout-CVPR2023

arXiv:2302.09123 [pdf, other]

The $\mathbb C$-motivic Adams-Novikov spectral sequence for topological modular forms

Authors: Daniel C. Isaksen, Hana Jia Kong, Guchuan Li, Yangyang Ruan, Heyi Zhu

Abstract: We analyze the $\mathbb{C}$-motivic (and classical) Adams-Novikov spectral sequence for the $\mathbb{C}$-motivic modular forms spectrum $\mathit{mmf}$ (and for the classical topological modular forms spectrum $\mathit{tmf}$). We primarily use purely algebraic techniques, with a few exceptions. Along the way, we settle a previously unresolved detail about the multiplicative structure of the homotop… ▽ More We analyze the $\mathbb{C}$-motivic (and classical) Adams-Novikov spectral sequence for the $\mathbb{C}$-motivic modular forms spectrum $\mathit{mmf}$ (and for the classical topological modular forms spectrum $\mathit{tmf}$). We primarily use purely algebraic techniques, with a few exceptions. Along the way, we settle a previously unresolved detail about the multiplicative structure of the homotopy groups of $\mathit{tmf}$. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: 38 pages, 5 figures. Comments welcome!

Report number: HIM-Spectral-2022 MSC Class: 14F42; 55T15; 55Q10

Showing 1–50 of 384 results for author: Kong, H