-
VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction
Authors:
Haoyu Wu,
Ziqiao Peng,
Xukun Zhou,
Yunfei Cheng,
Jun He,
Hongyan Liu,
Zhaoxin Fan
Abstract:
3D face reconstruction from monocular images has promoted the development of various applications such as augmented reality. Though existing methods have made remarkable progress, most of them emphasize geometric reconstruction, while overlooking the importance of texture prediction. To address this issue, we propose VGG-Tex, a novel Vivid Geometry-Guided Facial Texture Estimation model designed f…
▽ More
3D face reconstruction from monocular images has promoted the development of various applications such as augmented reality. Though existing methods have made remarkable progress, most of them emphasize geometric reconstruction, while overlooking the importance of texture prediction. To address this issue, we propose VGG-Tex, a novel Vivid Geometry-Guided Facial Texture Estimation model designed for High Fidelity Monocular 3D Face Reconstruction. The core of this approach is leveraging 3D parametric priors to enhance the outcomes of 2D UV texture estimation. Specifically, VGG-Tex includes a Facial Attributes Encoding Module, a Geometry-Guided Texture Generator, and a Visibility-Enhanced Texture Completion Module. These components are responsible for extracting parametric priors, generating initial textures, and refining texture details, respectively. Based on the geometry-texture complementarity principle, VGG-Tex also introduces a Texture-guided Geometry Refinement Module to further balance the overall fidelity of the reconstructed 3D faces, along with corresponding losses. Comprehensive experiments demonstrate that our method significantly improves texture reconstruction performance compared to existing state-of-the-art methods.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
A Robust Probability-based Joint Registration Method of Multiple Point Clouds Considering Local Consistency
Authors:
Lingjie Su,
Wei Xu,
Shuyang Zhao,
Yuqi Cheng,
Wenlong Li
Abstract:
In robotic inspection, joint registration of multiple point clouds is an essential technique for estimating the transformation relationships between measured parts, such as multiple blades in a propeller. However, the presence of noise and outliers in the data can significantly impair the registration performance by affecting the correctness of correspondences. To address this issue, we incorporat…
▽ More
In robotic inspection, joint registration of multiple point clouds is an essential technique for estimating the transformation relationships between measured parts, such as multiple blades in a propeller. However, the presence of noise and outliers in the data can significantly impair the registration performance by affecting the correctness of correspondences. To address this issue, we incorporate local consistency property into the probability-based joint registration method. Specifically, each measured point set is treated as a sample from an unknown Gaussian Mixture Model (GMM), and the registration problem is framed as estimating the probability model. By incorporating local consistency into the optimization process, we enhance the robustness and accuracy of the posterior distributions, which represent the one-to-all correspondences that directly determine the registration results. Effective closed-form solution for transformation and probability parameters are derived with Expectation-Maximization (EM) algorithm. Extensive experiments demonstrate that our method outperforms the existing methods, achieving high accuracy and robustness with the existence of noise and outliers. The code will be available at https://github.com/sulingjie/JPRLC_registration.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Kraus is King: High-order Completely Positive and Trace Preserving (CPTP) Low Rank Method for the Lindblad Master Equation
Authors:
Daniel Appelo,
Yingda Cheng
Abstract:
We design high order accurate methods that exploit low rank structure in the density matrix while respecting the essential structure of the Lindblad equation. Our methods preserves complete positivity and are trace preserving.
We design high order accurate methods that exploit low rank structure in the density matrix while respecting the essential structure of the Lindblad equation. Our methods preserves complete positivity and are trace preserving.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
Authors:
Yao-Fei Cheng,
Li-Wei Chen,
Hung-Shin Lee,
Hsin-Min Wang
Abstract:
This study investigates the efficacy of data augmentation techniques for low-resource automatic speech recognition (ASR), focusing on two endangered Austronesian languages, Amis and Seediq. Recognizing the potential of self-supervised learning (SSL) in low-resource settings, we explore the impact of data volume on the continued pre-training of SSL models. We propose a novel data-selection scheme l…
▽ More
This study investigates the efficacy of data augmentation techniques for low-resource automatic speech recognition (ASR), focusing on two endangered Austronesian languages, Amis and Seediq. Recognizing the potential of self-supervised learning (SSL) in low-resource settings, we explore the impact of data volume on the continued pre-training of SSL models. We propose a novel data-selection scheme leveraging a multilingual corpus to augment the limited target language data. This scheme utilizes a language classifier to extract utterance embeddings and employs one-class classifiers to identify utterances phonetically and phonologically proximate to the target languages. Utterances are ranked and selected based on their decision scores, ensuring the inclusion of highly relevant data in the SSL-ASR pipeline. Our experimental results demonstrate the effectiveness of this approach, yielding substantial improvements in ASR performance for both Amis and Seediq. These findings underscore the feasibility and promise of data augmentation through cross-lingual transfer learning for low-resource language ASR.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Group delay controlled by the decoherence of a single artificial atom
Authors:
Y. -T. Cheng,
K. -M. Hsieh,
B. -Y. Wu,
Z. Q. Niu,
F. Aziz,
Y. -H. Huang,
P. Y. Wen,
K. -T. Lin,
Y. -H. Lin,
J. C. Chen,
A. F. Kockum,
G. -D. Lin,
Z. -R. Lin,
Y. Lu,
I. -C. Hoi
Abstract:
The ability to slow down light at the single-photon level has applications in quantum information processing and other quantum technologies. We demonstrate two methods, both using just a single artificial atom, enabling dynamic control over microwave light velocities in waveguide quantum electrodynamics (waveguide QED). Our methods are based on two distinct mechanisms harnessing the balance betwee…
▽ More
The ability to slow down light at the single-photon level has applications in quantum information processing and other quantum technologies. We demonstrate two methods, both using just a single artificial atom, enabling dynamic control over microwave light velocities in waveguide quantum electrodynamics (waveguide QED). Our methods are based on two distinct mechanisms harnessing the balance between radiative and non-radiative decay rates of a superconducting artificial atom in front of a mirror. In the first method, we tune the radiative decay of the atom using interference effects due to the mirror; in the second method, we pump the atom to control its non-radiative decay through the Autler--Townes effect. When the half the radiative decay rate exceeds the non-radiative decay rate, we observe positive group delay; conversely, dominance of the non-radiative decay rate results in negative group delay. Our results advance signal-processing capabilities in waveguide QED.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Broad-Line AGN at $3.5<z<6$: The Black Hole Mass Function and a Connection with Little Red Dots
Authors:
Anthony J. Taylor,
Steven L. Finkelstein,
Dale D. Kocevski,
Junehyoung Jeon,
Volker Bromm,
Ricardo O. Amorin,
Pablo Arrabal Haro,
Bren E. Backhaus,
Micaela B. Bagley,
Eduardo Bañados,
Rachana Bhatawdekar,
Madisyn Brooks,
Antonello Calabro,
Oscar A. Chavez Ortiz,
Yingjie Cheng,
Nikko J. Cleri,
Justin W. Cole,
Kelcey Davis,
Mark Dickinson,
Callum Donnan,
James S. Dunlop,
Richard S. Ellis,
Vital Fernandez,
Adriano Fontana,
Seiji Fujimoto
, et al. (26 additional authors not shown)
Abstract:
We present a sample of 50 H-alpha detected broad-line active galactic nuclei (BLAGN) at redshifts 3.5<z<6.8 using data from the CEERS and RUBIES surveys. We select these sources directly from JWST/NIRSpec G395M/F290LP spectra. We use a multi-step pre-selection and a Bayesian fitting procedure to ensure a high-quality sample of sources with broad Balmer lines and narrow forbidden lines. We compute…
▽ More
We present a sample of 50 H-alpha detected broad-line active galactic nuclei (BLAGN) at redshifts 3.5<z<6.8 using data from the CEERS and RUBIES surveys. We select these sources directly from JWST/NIRSpec G395M/F290LP spectra. We use a multi-step pre-selection and a Bayesian fitting procedure to ensure a high-quality sample of sources with broad Balmer lines and narrow forbidden lines. We compute rest-frame ultraviolet and optical spectral slopes for these objects, and determine that 10 BLAGN in our sample are also little red dots (LRDs). These LRD BLAGN, when examined in aggregate, show broader H-alpha line profiles and a higher fraction of broad-to-narrow component H-alpha emission than non-LRD BLAGN. Moreover, we find that ~66% of these objects are intrinsically reddened (beta (optical)>0), independent of the contributions of emission lines to the broadband photometry. We construct the black hole (BH) mass function at 3.5<z<6 after computing robust observational and line detection completeness corrections. This BH mass function shows broad agreement with both recent JWST/NIRSpec and JWST/NIRCam WFSS based BH mass functions, though we extend these earlier results to log(M(BH)/M(sun)) < 7. The derived BH mass function is consistent with a variety of theoretical models, indicating that the observed abundance of black holes in the early universe is not discrepant with physically-motivated predictions. The BH mass function shape resembles a largely featureless power-law, suggesting that any signature from black-hole seeding has been lost by redshift z~5-6. Finally, we compute the BLAGN UV luminosity function and find good agreement with JWST-detected BLAGN samples from recent works, finding that BLAGN hosts constitute <10% of the total observed UV luminosity at all but the brightest luminosities.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
SubRegWeigh: Effective and Efficient Annotation Weighing with Subword Regularization
Authors:
Kohei Tsuji,
Tatsuya Hiraoka,
Yuchang Cheng,
Tomoya Iwakura
Abstract:
Many datasets of natural language processing (NLP) sometimes include annotation errors. Researchers have attempted to develop methods to reduce the adverse effect of errors in datasets automatically. However, an existing method is time-consuming because it requires many trained models to detect errors. We propose a novel method to reduce the time of error detection. Specifically, we use a tokeniza…
▽ More
Many datasets of natural language processing (NLP) sometimes include annotation errors. Researchers have attempted to develop methods to reduce the adverse effect of errors in datasets automatically. However, an existing method is time-consuming because it requires many trained models to detect errors. We propose a novel method to reduce the time of error detection. Specifically, we use a tokenization technique called subword regularization to create pseudo-multiple models which are used to detect errors. Our proposed method, SubRegWeigh, can perform annotation weighting four to five times faster than the existing method. Additionally, SubRegWeigh improved performance in both document classification and named entity recognition tasks. In experiments with pseudo-incorrect labels, pseudo-incorrect labels were adequately detected.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
The Abundance and Properties of Barred Galaxies out to $z \sim$ 4 Using $\textit{JWST}$ CEERS Data
Authors:
Yuchen Guo,
Shardha Jogee,
Eden Wise,
Keith Pritchett Jr.,
Elizabeth J. McGrath,
Steven L. Finkelstein,
Kartheik G. Iyer,
Pablo Arrabal Haro,
Micaela B. Bagley,
Mark Dickinson,
Jeyhan S. Kartaltepe,
Anton M. Koekemoer,
Casey Papovich,
Nor Pirzkal,
L. Y. Aaron Yung,
Bren E. Backhaus,
Eric F. Bell,
Rachana Bhatawdekar,
Yingjie Cheng,
Luca Costantin,
Alexander de la Vega,
Mauro Giavalisco,
Nimish P. Hathi,
Benne W. Holwerda,
Peter Kurczynski
, et al. (4 additional authors not shown)
Abstract:
We analyze $\textit{JWST}$ CEERS NIRCam images to present {the first estimate} of the observed fraction and properties of bars out to $z \sim 4$. We analyze a sample of 1770 galaxies with stellar mass $M_\star > 10^{10} M_\odot$ at $0.5 \leq z \leq 4$ and identify barred galaxies via ellipse fits and visual classification of both F200W and F444W images. Our results apply mainly to bars with projec…
▽ More
We analyze $\textit{JWST}$ CEERS NIRCam images to present {the first estimate} of the observed fraction and properties of bars out to $z \sim 4$. We analyze a sample of 1770 galaxies with stellar mass $M_\star > 10^{10} M_\odot$ at $0.5 \leq z \leq 4$ and identify barred galaxies via ellipse fits and visual classification of both F200W and F444W images. Our results apply mainly to bars with projected semi-major axis $a_{\rm bar}$ $> 1.5 $ kpc ($\sim$ 2 $\times$ PSF in F200W images) that can be robustly traced by ellipse fits. For such bars, the {observed} bar fraction at $z\sim$ 2-4 is low ($\lesssim 10\%$), and they appear to be emerging at least as early as $z\sim 4$ when the Universe was $\sim$ 13\% of its present age. At $z\sim$ 2-4, compared to our results, TNG50 simulations {predict} a significantly larger bar fraction due to a large population of small bars with $a_{\rm bar}$ $< 1.5$ kpc {that we cannot robustly detect}. If such a population exists, the true bar fraction may be significantly higher than our results. At $z \ge 1.5$, many barred galaxies show nearby neighbors, suggesting bars may be tidally triggered. {From $z \sim 4$ to $z \sim 0.5$, the observed bar fraction, average projected bar length, and projected bar strength rise.} Our results highlight the early emergence and evolution of barred galaxies and the rising importance of bar-driven secular evolution from $z \sim$4 to today.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
JavaVFC: Java Vulnerability Fixing Commits from Open-source Software
Authors:
Tan Bui,
Yan Naing Tun,
Yiran Cheng,
Ivana Clairine Irsan,
Ting Zhang,
Hong Jin Kang
Abstract:
We present a comprehensive dataset of Java vulnerability-fixing commits (VFCs) to advance research in Java vulnerability analysis. Our dataset, derived from thousands of open-source Java projects on GitHub, comprises two variants: JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous process involving heuristic rules and multiple rounds of manual labeling. We initially used…
▽ More
We present a comprehensive dataset of Java vulnerability-fixing commits (VFCs) to advance research in Java vulnerability analysis. Our dataset, derived from thousands of open-source Java projects on GitHub, comprises two variants: JavaVFC and JavaVFC-extended. The dataset was constructed through a rigorous process involving heuristic rules and multiple rounds of manual labeling. We initially used keywords to filter candidate VFCs based on commit messages, then refined this keyword set through iterative manual labeling. The final labeling round achieved a precision score of 0.7 among three annotators. We applied the refined keyword set to 34,321 open-source Java repositories with over 50 GitHub stars, resulting in JavaVFC with 784 manually verified VFCs and JavaVFC-extended with 16,837 automatically identified VFCs. Both variants are presented in a standardized JSONL format for easy access and analysis. This dataset supports various research endeavors, including VFC identification, fine-grained vulnerability detection, and automated vulnerability repair. The JavaVFC and JavaVFC-extended are publicly available at https://zenodo.org/records/13731781.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models
Authors:
Junfeng Tian,
Da Zheng,
Yang Cheng,
Rui Wang,
Colin Zhang,
Debing Zhang
Abstract:
Large language models (LLM) have prioritized expanding the context window from which models can incorporate more information. However, training models to handle long contexts presents significant challenges. These include the scarcity of high-quality natural long-context data, the potential for performance degradation on short-context tasks, and the reduced training efficiency associated with atte…
▽ More
Large language models (LLM) have prioritized expanding the context window from which models can incorporate more information. However, training models to handle long contexts presents significant challenges. These include the scarcity of high-quality natural long-context data, the potential for performance degradation on short-context tasks, and the reduced training efficiency associated with attention mechanisms. In this paper, we introduce Untie the Knots (\textbf{UtK}), a novel data augmentation strategy employed during the continue pre-training phase, designed to efficiently enable LLMs to gain long-context capabilities without the need to modify the existing data mixture. In particular, we chunk the documents, shuffle the chunks, and create a complex and knotted structure of long texts; LLMs are then trained to untie these knots and identify relevant segments within seemingly chaotic token sequences. This approach greatly improves the model's performance by accurately attending to relevant information in long context and the training efficiency is also largely increased. We conduct extensive experiments on models with 7B and 72B parameters, trained on 20 billion tokens, demonstrating that UtK achieves 75\% and 84.5\% accurracy on RULER at 128K context length, significantly outperforming other long context strategies. The trained models will open-source for further research.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
Transformer Based Tissue Classification in Robotic Needle Biopsy
Authors:
Fanxin Wang,
Yikun Cheng,
Sudipta S Mukherjee,
Rohit Bhargava,
Thenkurussi Kesavadas
Abstract:
Image-guided minimally invasive robotic surgery is commonly employed for tasks such as needle biopsies or localized therapies. However, the nonlinear deformation of various tissue types presents difficulties for surgeons in achieving precise needle tip placement, particularly when relying on low-fidelity biopsy imaging systems. In this paper, we introduce a method to classify needle biopsy interve…
▽ More
Image-guided minimally invasive robotic surgery is commonly employed for tasks such as needle biopsies or localized therapies. However, the nonlinear deformation of various tissue types presents difficulties for surgeons in achieving precise needle tip placement, particularly when relying on low-fidelity biopsy imaging systems. In this paper, we introduce a method to classify needle biopsy interventions and identify tissue types based on a comprehensive needle-tissue contact model that incorporates both position and force parameters. We trained a transformer model using a comprehensive dataset collected from a formerly developed robotics platform, which consists of synthetic and porcine tissue from various locations (liver, kidney, heart, belly, hock) marked with interaction phases (pre-puncture, puncture, post-puncture, neutral). This model achieves a significant classification accuracy of 0.93. Our demonstrated method can assist surgeons in identifying transitions to different tissues, aiding surgeons with tissue awareness.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
CogniDual Framework: Self-Training Large Language Models within a Dual-System Theoretical Framework for Improving Cognitive Tasks
Authors:
Yongxin Deng,
Xihe Qiu,
Xiaoyu Tan,
Chao Qu,
Jing Pan,
Yuan Cheng,
Yinghui Xu,
Wei Chu
Abstract:
Cognitive psychology investigates perception, attention, memory, language, problem-solving, decision-making, and reasoning. Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2. Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level p…
▽ More
Cognitive psychology investigates perception, attention, memory, language, problem-solving, decision-making, and reasoning. Kahneman's dual-system theory elucidates the human decision-making process, distinguishing between the rapid, intuitive System 1 and the deliberative, rational System 2. Recent advancements have positioned large language Models (LLMs) as formidable tools nearing human-level proficiency in various cognitive tasks. Nonetheless, the presence of a dual-system framework analogous to human cognition in LLMs remains unexplored. This study introduces the \textbf{CogniDual Framework for LLMs} (CFLLMs), designed to assess whether LLMs can, through self-training, evolve from deliberate deduction to intuitive responses, thereby emulating the human process of acquiring and mastering new information. Our findings reveal the cognitive mechanisms behind LLMs' response generation, enhancing our understanding of their capabilities in cognitive psychology. Practically, self-trained models can provide faster responses to certain queries, reducing computational demands during inference.
△ Less
Submitted 6 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Directional sources realised by toroidal dipoles
Authors:
Junho Jung,
Yuqiong Cheng,
Wanyue Xiao,
Shubo Wang
Abstract:
Directional optical sources can give rise to the directional excitation and propagation of light. The directionality of the conventional directional dipole (CDD) sources are attributed to the interference of the electric and/or magnetic dipoles, while the effect of the toroidal dipole on optical directionality remains unexplored.} Here, we numerically and analytically investigate the directional p…
▽ More
Directional optical sources can give rise to the directional excitation and propagation of light. The directionality of the conventional directional dipole (CDD) sources are attributed to the interference of the electric and/or magnetic dipoles, while the effect of the toroidal dipole on optical directionality remains unexplored.} Here, we numerically and analytically investigate the directional properties of the toroidal dipole. We show that the toroidal dipole can replace the electric dipole in the CDD sources to form the pseudo directional dipoles (PDDs), which can be applied to achieve analogous near-field directional coupling with a silicon waveguide. Moreover, the directionality of the PDDs can be flexibly controlled by changing the geometric parameters of the toroidal dipole, leading to tunable asymmetric coupling between the sources and the waveguide. These new types of directional sources provide more degrees of freedom for tailoring the optical directionality compared to the conventional sources. The results open new possibilities for directional light manipulation and can find applications in on-chip optical routing, waveguiding, and nanophotonic communications.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers
Authors:
Luoyu Mei,
Shuai Wang,
Yun Cheng,
Ruofeng Liu,
Zhimeng Yin,
Wenchao Jiang,
Shuai Wang,
Wei Gong
Abstract:
Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a…
▽ More
Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a novel Enhanced Semantic Performance Point Cloud Transformer with a two-stage semantic recognition framework tailored for VR applications. ESP-PCT takes advantage of the accuracy of sensory point cloud data and optimizes the semantic recognition process, where the localization and focus stages are trained jointly in an end-to-end manner. We evaluate ESP-PCT on various VR semantic recognition conditions, demonstrating substantial enhancements in recognition efficiency. Notably, ESP-PCT achieves a remarkable accuracy of 93.2% while reducing the computational requirements (FLOPs) by 76.9% and memory usage by 78.2% compared to the existing Point Transformer model simultaneously. These underscore ESP-PCT's potential in VR semantic recognition by achieving high accuracy and reducing redundancy. The code and data of this project are available at \url{https://github.com/lymei-SEU/ESP-PCT}.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
Authors:
Xiaoye Qu,
Jiashuo Sun,
Wei Wei,
Yu Cheng
Abstract:
Recently, Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multi-modal context comprehension. However, they still suffer from hallucination problems referring to generating inconsistent outputs with the image content. To mitigate hallucinations, previous studies mainly focus on retraining LVLMs with custom datasets. Although effective, they inherently come with add…
▽ More
Recently, Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multi-modal context comprehension. However, they still suffer from hallucination problems referring to generating inconsistent outputs with the image content. To mitigate hallucinations, previous studies mainly focus on retraining LVLMs with custom datasets. Although effective, they inherently come with additional computational costs. In this paper, we propose a training-free framework, \textbf{MVP}, that aims to reduce hallucinations by making the most of the innate capabilities of the LVLMs via \textbf{M}ulti-\textbf{V}iew Multi-\textbf{P}ath Reasoning. Specifically, we first devise a multi-view information-seeking strategy to thoroughly perceive the comprehensive information in the image, which enriches the general global information captured by the original vision encoder in LVLMs. Furthermore, during the answer decoding, we observe that the occurrence of hallucinations has a strong correlation with the certainty of the answer tokens. Thus, we propose multi-path reasoning for each information view to quantify and aggregate the certainty scores for each potential answer among multiple decoding paths and finally decide the output answer. By fully grasping the information in the image and carefully considering the certainty of the potential answers when decoding, our MVP can effectively reduce hallucinations in LVLMs.The extensive experiments verify that our proposed MVP significantly mitigates the hallucination problem across four well-known LVLMs. The source code is available at: \url{https://github.com/GasolSun36/MVP}.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
CogVLM2: Visual Language Models for Image and Video Understanding
Authors:
Wenyi Hong,
Weihan Wang,
Ming Ding,
Wenmeng Yu,
Qingsong Lv,
Yan Wang,
Yean Cheng,
Shiyu Huang,
Junhui Ji,
Zhao Xue,
Lei Zhao,
Zhuoyi Yang,
Xiaotao Gu,
Xiaohan Zhang,
Guanyu Feng,
Da Yin,
Zihan Wang,
Ji Qi,
Xixuan Song,
Peng Zhang,
Debing Liu,
Bin Xu,
Juanzi Li,
Yuxiao Dong,
Jie Tang
Abstract:
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2…
▽ More
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2 inherits the visual expert architecture with improved training recipes in both pre-training and post-training stages, supporting input resolution up to $1344 \times 1344$ pixels. As a video understanding model, CogVLM2-Video integrates multi-frame input with timestamps and proposes automated temporal grounding data construction. Notably, CogVLM2 family has achieved state-of-the-art results on benchmarks like MMBench, MM-Vet, TextVQA, MVBench and VCGBench. All models are open-sourced in https://github.com/THUDM/CogVLM2 and https://github.com/THUDM/GLM-4, contributing to the advancement of the field.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds
Authors:
Ying-Chieh Hsu,
Stanley Yung-Chuan Liu,
Chao-Jung Huang,
Chi-Wei Wu,
Ren-Kai Cheng,
Jane Yung-Jen Hsu,
Shang-Ran Huang,
Yuan-Ren Cheng,
Fu-Shun Hsu
Abstract:
This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da…
▽ More
This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The dataset, comprising 5,173 one-second segments, was used to train and test models, including Support Vector Machine (SVM), Bidirectional Long Short-Term Memory (BiLSTM), and ResNet-50. The ResNet-50, a convolutional neural network (CNN), showed the best overall performance in classifying snoring acoustics, particularly in identifying multi-level obstructions. The study emphasizes the potential of integrating snoring acoustics with deep learning to improve the diagnosis and treatment of OSA. However, challenges such as limited sample size, data imbalance, and differences between pharmacologically induced and natural snoring sounds were noted, suggesting further research to enhance model accuracy and generalizability.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Structural Optimization of Lightweight Bipedal Robot via SERL
Authors:
Yi Cheng,
Chenxi Han,
Yuheng Min,
Linqi Ye,
Houde Liu,
Hang Liu
Abstract:
Designing a bipedal robot is a complex and challenging task, especially when dealing with a multitude of structural parameters. Traditional design methods often rely on human intuition and experience. However, such approaches are time-consuming, labor-intensive, lack theoretical guidance and hard to obtain optimal design results within vast design spaces, thus failing to full exploit the inherent…
▽ More
Designing a bipedal robot is a complex and challenging task, especially when dealing with a multitude of structural parameters. Traditional design methods often rely on human intuition and experience. However, such approaches are time-consuming, labor-intensive, lack theoretical guidance and hard to obtain optimal design results within vast design spaces, thus failing to full exploit the inherent performance potential of robots. In this context, this paper introduces the SERL (Structure Evolution Reinforcement Learning) algorithm, which combines reinforcement learning for locomotion tasks with evolution algorithms. The aim is to identify the optimal parameter combinations within a given multidimensional design space. Through the SERL algorithm, we successfully designed a bipedal robot named Wow Orin, where the optimal leg length are obtained through optimization based on body structure and motor torque. We have experimentally validated the effectiveness of the SERL algorithm, which is capable of optimizing the best structure within specified design space and task conditions. Additionally, to assess the performance gap between our designed robot and the current state-of-the-art robots, we compared Wow Orin with mainstream bipedal robots Cassie and Unitree H1. A series of experimental results demonstrate the Outstanding energy efficiency and performance of Wow Orin, further validating the feasibility of applying the SERL algorithm to practical design.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Learning effective pruning at initialization from iterative pruning
Authors:
Shengkai Liu,
Yaofeng Cheng,
Fusheng Zha,
Wei Guo,
Lining Sun,
Zhenshan Bing,
Chenguang Yang
Abstract:
Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the…
▽ More
Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the lottery ticket hypothesis, the iterative rewind pruning (IRP) finds subnetworks retroactively by rewinding the parameter to the original initialization in every pruning iteration, which means all the subnetworks are based on the initial state. Here, we hypothesise the surviving subnetworks are more important and bridge the initial feature and their surviving score as the PaI criterion. We employ an end-to-end neural network (\textbf{AutoS}parse) to learn this correlation, input the model's initial features, output their score and then prune the lowest score parameters before training. To validate the accuracy and generalization of our method, we performed PaI across various models. Results show that our approach outperforms existing methods in high-sparsity settings. Notably, as the underlying logic of model pruning is consistent in different models, only one-time IRP on one model is needed (e.g., once IRP on ResNet-18/CIFAR-10, AutoS can be generalized to VGG-16/CIFAR-10, ResNet-18/TinyImageNet, et al.). As the first neural network-based PaI method, we conduct extensive experiments to validate the factors influencing this approach. These results reveal the learning tendencies of neural networks and provide new insights into our understanding and research of PaI from a practical perspective. Our code is available at: https://github.com/ChengYaofeng/AutoSparse.git.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
GarmentAligner: Text-to-Garment Generation via Retrieval-augmented Multi-level Corrections
Authors:
Shiyue Zhang,
Zheng Chong,
Xujie Zhang,
Hanhui Li,
Yuhao Cheng,
Yiqiang Yan,
Xiaodan Liang
Abstract:
General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffus…
▽ More
General text-to-image models bring revolutionary innovation to the fields of arts, design, and media. However, when applied to garment generation, even the state-of-the-art text-to-image models suffer from fine-grained semantic misalignment, particularly concerning the quantity, position, and interrelations of garment components. Addressing this, we propose GarmentAligner, a text-to-garment diffusion model trained with retrieval-augmented multi-level corrections. To achieve semantic alignment at the component level, we introduce an automatic component extraction pipeline to obtain spatial and quantitative information of garment components from corresponding images and captions. Subsequently, to exploit component relationships within the garment images, we construct retrieval subsets for each garment by retrieval augmentation based on component-level similarity ranking and conduct contrastive learning to enhance the model perception of components from positive and negative samples. To further enhance the alignment of components across semantic, spatial, and quantitative granularities, we propose the utilization of multi-level correction losses that leverage detailed component information. The experimental findings demonstrate that GarmentAligner achieves superior fidelity and fine-grained semantic alignment when compared to existing competitors.
△ Less
Submitted 23 August, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
Authors:
Zhaochen Su,
Jun Zhang,
Xiaoye Qu,
Tong Zhu,
Yanshu Li,
Jiashuo Sun,
Juntao Li,
Min Zhang,
Yu Cheng
Abstract:
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missin…
▽ More
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we present ConflictBank, the first comprehensive benchmark developed to systematically evaluate knowledge conflicts from three aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models' encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and semantic divergences. Based on our proposed novel construction framework, we create 7,453,853 claim-evidence pairs and 553,117 QA pairs. We present numerous findings on model scale, conflict causes, and conflict types. We hope our ConflictBank benchmark will help the community better understand model behavior in conflicts and develop more reliable LLMs.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Generative Organizational Behavior Simulation using Large Language Model based Autonomous Agents: A Holacracy Perspective
Authors:
Chen Zhu,
Yihang Cheng,
Jingshuai Zhang,
Yusheng Qiu,
Sitao Xia,
Hengshu Zhu
Abstract:
In this paper, we present the technical details and periodic findings of our project, CareerAgent, which aims to build a generative simulation framework for a Holacracy organization using Large Language Model-based Autonomous Agents. Specifically, the simulation framework includes three phases: construction, execution, and evaluation, and it incorporates basic characteristics of individuals, organ…
▽ More
In this paper, we present the technical details and periodic findings of our project, CareerAgent, which aims to build a generative simulation framework for a Holacracy organization using Large Language Model-based Autonomous Agents. Specifically, the simulation framework includes three phases: construction, execution, and evaluation, and it incorporates basic characteristics of individuals, organizations, tasks, and meetings. Through our simulation, we obtained several interesting findings. At the organizational level, an increase in the average values of management competence and functional competence can reduce overall members' stress levels, but it negatively impacts deeper organizational performance measures such as average task completion. At the individual level, both competences can improve members' work performance. From the analysis of social networks, we found that highly competent members selectively participate in certain tasks and take on more responsibilities. Over time, small sub-communities form around these highly competent members within the holacracy. These findings contribute theoretically to the study of organizational science and provide practical insights for managers to understand the organization dynamics.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
DDSP Guitar Amp: Interpretable Guitar Amplifier Modeling
Authors:
Yen-Tung Yeh,
Yu-Hua Chen,
Yuan-Chiao Cheng,
Jui-Te Wu,
Jun-Jie Fu,
Yi-Fan Yeh,
Yi-Hsuan Yang
Abstract:
Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack…
▽ More
Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack, power amp, and output transformer) using specific DSP-inspired designs. With a set of time- and frequency-domain metrics, we demonstrate that DDSP guitar amp achieves performance comparable with that of black-box baselines while requiring less than 10\% of the computational operations per audio sample, thereby holding greater potential for usages in real-time applications.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Inference Plans for Hybrid Particle Filtering
Authors:
Ellie Y. Cheng,
Eric Atkinson,
Guillaume Baudart,
Louis Mandel,
Michael Carbin
Abstract:
Advanced probabilistic programming languages (PPLs) use hybrid inference systems to combine symbolic exact inference and Monte Carlo methods to improve inference performance. These systems use heuristics to partition random variables within the program into variables that are encoded symbolically and variables that are encoded with sampled values, and the heuristics are not necessarily aligned wit…
▽ More
Advanced probabilistic programming languages (PPLs) use hybrid inference systems to combine symbolic exact inference and Monte Carlo methods to improve inference performance. These systems use heuristics to partition random variables within the program into variables that are encoded symbolically and variables that are encoded with sampled values, and the heuristics are not necessarily aligned with the performance evaluation metrics used by the developer. In this work, we present inference plans, a programming interface that enables developers to control the partitioning of random variables during hybrid particle filtering. We further present Siren, a new PPL that enables developers to use annotations to specify inference plans the inference system must implement. To assist developers with statically reasoning about whether an inference plan can be implemented, we present an abstract-interpretation-based static analysis for Siren for determining inference plan satisfiability. We prove the analysis is sound with respect to Siren's semantics. Our evaluation applies inference plans to three different hybrid particle filtering algorithms on a suite of benchmarks and shows that the control provided by inference plans enables speed ups of 1.76x on average and up to 206x to reach target accuracy, compared to the inference plans implemented by default heuristics; the results also show that inference plans improve accuracy by 1.83x on average and up to 595x with less or equal runtime, compared to the default inference plans. We further show that the static analysis is precise in practice, identifying all satisfiable inference plans in 27 out of the 33 benchmark-algorithm combinations.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Relativistic and Electron Correlation Effects in Static Dipole Polarizabilities for Main-Group Elements
Authors:
YingXing Cheng
Abstract:
In this study, I compute the static dipole polarizability of main-group elements using the finite-field method combined with relativistic coupled-cluster and configuration interaction simulations. The computational results closely align with the values recommended in the 2018 table of static dipole polarizabilities of neutral elements [Mol. Phys. 117, 1200 (2019)]. Additionally, I investigate the…
▽ More
In this study, I compute the static dipole polarizability of main-group elements using the finite-field method combined with relativistic coupled-cluster and configuration interaction simulations. The computational results closely align with the values recommended in the 2018 table of static dipole polarizabilities of neutral elements [Mol. Phys. 117, 1200 (2019)]. Additionally, I investigate the influence of relativistic effects and electron correlation on atomic dipole polarizabilities. Specifically, three types of relativistic effects impacting dipole polarizabilities are studied: scalar-relativistic, spin-orbit coupling, and fully relativistic Dirac-Coulomb effects. The results indicate that scalar-relativistic effects are predominant for atoms in Groups 1--2, with minimal influence from spin-orbit coupling effects. Conversely, for elements in Groups 13--18, scalar-relativistic effects are less significant, while spin-orbit coupling significantly affects elements starting from the fourth row in Groups 13--14 and from the fifth row in Groups 15--18. In each category of relativistic effects, the impact of electron correlation is evaluated. The results show that electron correlation significantly influences dipole polarizability calculations, particularly for Groups 1--2 and 13--14 atoms, but is less significant for Groups 15--18 atoms. This study provides a comprehensive and consistent dataset of dipole polarizabilities and contributes to a systematic understanding of the roles of relativistic and electron correlation effects in atomic dipole polarizabilities, serving as a valuable reference for future research.
△ Less
Submitted 14 September, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
Magnetic Fields in Massive Star-forming Regions (MagMaR) IV: Tracing the Magnetic Fields in the O-type protostellar system IRAS 16547$-$4247
Authors:
Luis A. Zapata,
Manuel Fernández-López,
Patricio Sanhueza,
Josep M. Girart,
Luis F. Rodríguez,
Paulo Cortes,
Koch Patrick,
María T. Beltrán,
Kate Pattle,
Henrik Beuther,
Piyali Saha,
Wenyu Jiao,
Fengwei Xu,
Xing Walker Lu,
Fernando Olguin,
Shanghuo Li,
Ian W. Stephens,
Ji-hyun Kang,
Yu Cheng,
Spandan Choudhury,
Kaho Morii,
Eun Jung Chung,
Jia-Wei Wang,
Jihye Hwang,
A-Ran Lyo
, et al. (2 additional authors not shown)
Abstract:
The formation of the massive stars, and in particular, the role that the magnetic fields play in their early evolutionary phase is still far from being completely understood. Here, we present Atacama Large Millimeter/Submillimeter Array (ALMA) 1.2 mm full polarized continuum, and H$^{13}$CO$^+$(3$-$2), CS(5$-$4), and HN$^{13}$C(3$-$2) line observations with a high angular resolution ($\sim$0.4…
▽ More
The formation of the massive stars, and in particular, the role that the magnetic fields play in their early evolutionary phase is still far from being completely understood. Here, we present Atacama Large Millimeter/Submillimeter Array (ALMA) 1.2 mm full polarized continuum, and H$^{13}$CO$^+$(3$-$2), CS(5$-$4), and HN$^{13}$C(3$-$2) line observations with a high angular resolution ($\sim$0.4$''$ or 1100 au). In the 1.2 mm continuum emission, we reveal a dusty envelope surrounding the massive protostars, IRAS16547-E and IRAS16547-W, with dimensions of $\sim$10,000 au. This envelope has a bi-conical structure likely carved by the powerful thermal radio jet present in region. The magnetic fields vectors follow very-well the bi-conical envelope. The polarization fraction is $\sim$2.0\% in this region. Some of these vectors seem to converge to IRAS 16547-E, and IRAS 16547-W, the most massive protostars. Moreover, the velocity fields revealed from the spectral lines H$^{13}$CO$^+$(3$-$2), and HN$^{13}$C(3$-$2) show velocity gradients with a good correspondence with the magnetic fields, that maybe are tracing the cavities of molecular outflows or maybe in some parts infall. We derived a magnetic field strength in some filamentary regions that goes from 2 to 6.1\,mG. We also find that the CS(5$-$4) molecular line emission reveals multiple outflow cavities or bow-shocks with different orientations, some of which seem to follow the NW-SE radio thermal jet.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Efficient simulation of inhomogeneously correlated systems using block interaction product states
Authors:
Yifan Cheng,
Zhaoxuan Xie,
Xiaoyu Xie,
Haibo Ma
Abstract:
The strength of DMRG lies in its treatment of identical sites that are energetically degenerate and spatially similar. However, this becomes a drawback when applied to quantum chemistry calculations for large systems, as entangled orbitals often span broad ranges in energy and space, with notably inhomogeneous interactions. In this study, we propose addressing strong intra-fragment and weak inter-…
▽ More
The strength of DMRG lies in its treatment of identical sites that are energetically degenerate and spatially similar. However, this becomes a drawback when applied to quantum chemistry calculations for large systems, as entangled orbitals often span broad ranges in energy and space, with notably inhomogeneous interactions. In this study, we propose addressing strong intra-fragment and weak inter-fragment correlations separately using a multi-configurational block interaction product state (BIPS) framework. The strong correlation is captured in electronic states on fragments, considering entanglement between fragments and their environments. This method has been tested in various chemical systems and shows high accuracy and efficiency in addressing inhomogeneous effects in quantum chemistry.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Attention-Guided Perturbation for Unsupervised Image Anomaly Detection
Authors:
Tingfeng Huang,
Yuxuan Cheng,
Jingbo Xia,
Rui Yu,
Yuxuan Cai,
Jinhai Xiang,
Xinwei He,
Xiang Bai
Abstract:
Reconstruction-based methods have significantly advanced modern unsupervised anomaly detection. However, the strong capacity of neural networks often violates the underlying assumptions by reconstructing abnormal samples well. To alleviate this issue, we present a simple yet effective reconstruction framework named Attention-Guided Pertuation Network (AGPNet), which learns to add perturbation nois…
▽ More
Reconstruction-based methods have significantly advanced modern unsupervised anomaly detection. However, the strong capacity of neural networks often violates the underlying assumptions by reconstructing abnormal samples well. To alleviate this issue, we present a simple yet effective reconstruction framework named Attention-Guided Pertuation Network (AGPNet), which learns to add perturbation noise with an attention mask, for accurate unsupervised anomaly detection. Specifically, it consists of two branches, \ie, a plain reconstruction branch and an auxiliary attention-based perturbation branch. The reconstruction branch is simply a plain reconstruction network that learns to reconstruct normal samples, while the auxiliary branch aims to produce attention masks to guide the noise perturbation process for normal samples from easy to hard. By doing so, we are expecting to synthesize hard yet more informative anomalies for training, which enable the reconstruction branch to learn important inherent normal patterns both comprehensively and efficiently. Extensive experiments are conducted on three popular benchmarks covering MVTec-AD, VisA, and MVTec-3D, and show that our framework obtains leading anomaly detection performance under various setups including few-shot, one-class, and multi-class setups.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions
Authors:
Yiran Cheng,
Lwin Khin Shar,
Ting Zhang,
Shouguo Yang,
Chaopeng Dong,
David Lo,
Shichao Lv,
Zhiqiang Shi,
Limin Sun
Abstract:
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vul…
▽ More
Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules. They then use syntactic-level code clone detection to identify the vulnerable versions. These methods are hindered by imprecisions due to (1) the inclusion of vulnerability-irrelevant code in the analysis and (2) the inadequacy of syntactic-level code clone detection. This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++. Vercation combines program slicing with a Large Language Model (LLM) to identify vulnerability-relevant code from vulnerability patches. It then backtraces historical commits to gather previous modifications of identified vulnerability-relevant code. We propose semantic-level code clone detection to compare the differences between pre-modification and post-modification code, thereby locating the vulnerability-introducing commit (vic) and enabling to identify the vulnerable versions between the patch commit and the vic. We curate a dataset linking 74 OSS vulnerabilities and 1013 versions to evaluate Vercation. On this dataset, our approach achieves the F1 score of 92.4%, outperforming current state-of-the-art methods. More importantly, Vercation detected 134 incorrect vulnerable OSS versions in NVD reports.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Ensemble architecture in polyp segmentation
Authors:
Hao-Yun Hsu,
Yi-Ching Cheng,
Guan-Hua Huang
Abstract:
In this research, we revisit the architecture of semantic segmentation and evaluate the models excelling in polyp segmentation. We introduce an integrated framework that harnesses the advantages of different models to attain an optimal outcome. More specifically, we fuse the learned features from convolutional and transformer models for prediction, and we view this approach as an ensemble techniqu…
▽ More
In this research, we revisit the architecture of semantic segmentation and evaluate the models excelling in polyp segmentation. We introduce an integrated framework that harnesses the advantages of different models to attain an optimal outcome. More specifically, we fuse the learned features from convolutional and transformer models for prediction, and we view this approach as an ensemble technique to enhance model performance. Our experiments on polyp segmentation reveal that the proposed architecture surpasses other top models, exhibiting improved learning capacity and resilience. The code is available at https://github.com/HuangDLab/EnFormer.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Authors:
Zhuoyi Yang,
Jiayan Teng,
Wendi Zheng,
Ming Ding,
Shiyu Huang,
Jiazheng Xu,
Yuanming Yang,
Wenyi Hong,
Xiaohan Zhang,
Guanyu Feng,
Da Yin,
Xiaotao Gu,
Yuxuan Zhang,
Weihan Wang,
Yean Cheng,
Ting Liu,
Bin Xu,
Yuxiao Dong,
Jie Tang
Abstract:
We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep…
▽ More
We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep fusion between the two modalities. By employing a progressive training technique, CogVideoX is adept at producing coherent, long-duration videos characterized by significant motions. In addition, we develop an effective text-video data processing pipeline that includes various data preprocessing strategies and a video captioning method. It significantly helps enhance the performance of CogVideoX, improving both generation quality and semantic alignment. Results show that CogVideoX demonstrates state-of-the-art performance across both multiple machine metrics and human evaluations. The model weights of both the 3D Causal VAE and CogVideoX are publicly available at https://github.com/THUDM/CogVideo.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Cryogenic nonlinear conversion processes in periodically-poled thin-film lithium niobate waveguides
Authors:
Yujie Cheng,
Xiaoting Li,
Lantian Feng,
Haochuan Li,
Wenzhao Sun,
Xinyu Song,
Yuyang Ding,
Guangcan Guo,
Cheng Wang,
Xifeng Ren
Abstract:
Periodically poled thin-film lithium niobate (TFLN) waveguides, which enable efficient quadratic nonlinear processes, serve as crucial foundation for classical and quantum signal processing with photonic integrated circuits. To expand their application scope, we provide, to our best knowledge, the first investigation of nonlinear conversion processes in periodically poled TFLN waveguides at cryoge…
▽ More
Periodically poled thin-film lithium niobate (TFLN) waveguides, which enable efficient quadratic nonlinear processes, serve as crucial foundation for classical and quantum signal processing with photonic integrated circuits. To expand their application scope, we provide, to our best knowledge, the first investigation of nonlinear conversion processes in periodically poled TFLN waveguides at cryogenic condition. Through systematic experimental characterization, we find that the periodically poled TFLN waveguide maintains consistent conversion efficiencies at both cryogenic and room temperatures for both classical second-harmonic generation and quantum photon-pair generation processes, demonstrating the significant potential of TFLN wavelength conversion devices for cryogenic applications. This breakthrough will foster future scalable quantum photonic systems and optical interfacing among different cryogenic platforms.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor
Authors:
Yiqi Liu,
Yuqi Xue,
Yu Cheng,
Lingxiao Ma,
Ziming Miao,
Jilong Xue,
Jian Huang
Abstract:
As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on the chip (e.g., Graphcore IPU). It allows each core to directly access the fast scratchpad memory in other cores, which enables new parallel computing paradigms. However, without proper support for…
▽ More
As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on the chip (e.g., Graphcore IPU). It allows each core to directly access the fast scratchpad memory in other cores, which enables new parallel computing paradigms. However, without proper support for the scalable inter-core connections in current DL compilers, it is hard for developers to exploit the benefits of this new architecture.
We present T10, the first DL compiler to exploit the inter-core communication bandwidth and distributed on-chip memory on AI chips. To formulate the computation and communication patterns of tensor operators in this new architecture, T10 introduces a distributed tensor abstraction rTensor. T10 maps a DNN model to execution plans with a generalized compute-shift pattern, by partitioning DNN computation into sub-operators and mapping them to cores, so that the cores can exchange data following predictable patterns. T10 makes globally optimized trade-offs between on-chip memory consumption and inter-core communication overhead, selects the best execution plan from a vast optimization space, and alleviates unnecessary inter-core communications. Our evaluation with a real inter-core connected AI chip, the Graphcore IPU, shows up to 3.3$\times$ performance improvement, and scalability support for larger models, compared to state-of-the-art DL compilers and vendor libraries.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
GAIA -- A Large Language Model for Advanced Power Dispatch
Authors:
Yuheng Cheng,
Huan Zhao,
Xiyuan Zhou,
Junhua Zhao,
Yuji Cao,
Chao Yang
Abstract:
Power dispatch is essential for providing stable, cost-effective, and eco-friendly electricity to society. However, traditional methods falter as power systems grow in scale and complexity, struggling with multitasking, swift problem-solving, and human-machine collaboration. This paper introduces GAIA, the pioneering Large Language Model (LLM) tailored for power dispatch tasks. We have developed a…
▽ More
Power dispatch is essential for providing stable, cost-effective, and eco-friendly electricity to society. However, traditional methods falter as power systems grow in scale and complexity, struggling with multitasking, swift problem-solving, and human-machine collaboration. This paper introduces GAIA, the pioneering Large Language Model (LLM) tailored for power dispatch tasks. We have developed a novel dataset construction technique that harnesses a range of data sources to fine-tune GAIA for optimal performance in this domain. This approach streamlines LLM training, allowing for the seamless integration of multidimensional data in power system management. Additionally, we have crafted specialized prompt strategies to boost GAIA's input-output efficiency in dispatch scenarios. When evaluated on the ElecBench benchmark, GAIA surpasses the baseline model LLaMA2 on multiple metrics. In practical applications, GAIA has demonstrated its ability to enhance decision-making processes, improve operational efficiency, and facilitate better human-machine interactions in power dispatch operations. This paper expands the application of LLMs to power dispatch and validates their practical utility, paving the way for future innovations in this field.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Masked Random Noise for Communication Efficient Federaetd Learning
Authors:
Shiwei Li,
Yingyi Cheng,
Haozhao Wang,
Xing Tang,
Shijie Xu,
Weihong Luo,
Yuhua Li,
Dugang Liu,
Xiuqiang He,
and Ruixuan Li
Abstract:
Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters withi…
▽ More
Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters within predefined random noise. For this purpose, we propose Federated Masked Random Noise (FedMRN), a novel framework that enables clients to learn a 1-bit mask for each model parameter and apply masked random noise (i.e., the Hadamard product of random noise and masks) to represent model updates. To make FedMRN feasible, we propose an advanced mask training strategy, called progressive stochastic masking (PSM). After local training, each client only need to transmit local masks and a random seed to the server. Additionally, we provide theoretical guarantees for the convergence of FedMRN under both strongly convex and non-convex assumptions. Extensive experiments are conducted on four popular datasets. The results show that FedMRN exhibits superior convergence speed and test accuracy compared to relevant baselines, while attaining a similar level of accuracy as FedAvg.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Mitigating Multilingual Hallucination in Large Vision-Language Models
Authors:
Xiaoye Qu,
Mingyang Song,
Wei Wei,
Jianfeng Dong,
Yu Cheng
Abstract:
While Large Vision-Language Models (LVLMs) have exhibited remarkable capabilities across a wide range of tasks, they suffer from hallucination problems, where models generate plausible yet incorrect answers given the input image-query pair. This hallucination phenomenon is even more severe when querying the image in non-English languages, while existing methods for mitigating hallucinations in LVL…
▽ More
While Large Vision-Language Models (LVLMs) have exhibited remarkable capabilities across a wide range of tasks, they suffer from hallucination problems, where models generate plausible yet incorrect answers given the input image-query pair. This hallucination phenomenon is even more severe when querying the image in non-English languages, while existing methods for mitigating hallucinations in LVLMs only consider the English scenarios. In this paper, we make the first attempt to mitigate this important multilingual hallucination in LVLMs. With thorough experiment analysis, we found that multilingual hallucination in LVLMs is a systemic problem that could arise from deficiencies in multilingual capabilities or inadequate multimodal abilities. To this end, we propose a two-stage Multilingual Hallucination Removal (MHR) framework for LVLMs, aiming to improve resistance to hallucination for both high-resource and low-resource languages. Instead of relying on the intricate manual annotations of multilingual resources, we fully leverage the inherent capabilities of the LVLM and propose a novel cross-lingual alignment method, which generates multiple responses for each image-query input and then identifies the hallucination-aware pairs for each language. These data pairs are finally used for direct preference optimization to prompt the LVLMs to favor non-hallucinating responses. Experimental results show that our MHR achieves a substantial reduction in hallucination generation for LVLMs. Notably, on our extended multilingual POPE benchmark, our framework delivers an average increase of 19.0% in accuracy across 13 different languages. Our code and model weights are available at https://github.com/ssmisya/MHR
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Study of Wide-Field-of-View X-ray Observations of the Virgo Cluster Using the Lobster Eye Imager for Astronomy
Authors:
Wen-Cheng Feng,
Shu-Mei Jia,
Hai-Hui Zhao,
Heng Yu,
Hai-Wu Pan,
Cheng-Kui Li,
Yu-Lin Cheng,
Shan-Shan Weng,
Yong Chen,
Yuan Liu,
Zhi-Xing Ling,
Chen Zhang
Abstract:
The Lobster Eye Imager for Astronomy (LEIA) is the pathfinder of the wide-field X-ray telescope used in the Einstein Probe mission. In this study, we present an image of the Virgo Cluster taken by LEIA in the 0.5-4.5 keV band with an exposure time of $\sim$17.3 ks in the central region. This extended emission is generally consistent with the results obtained by ROSAT. However, the field is affecte…
▽ More
The Lobster Eye Imager for Astronomy (LEIA) is the pathfinder of the wide-field X-ray telescope used in the Einstein Probe mission. In this study, we present an image of the Virgo Cluster taken by LEIA in the 0.5-4.5 keV band with an exposure time of $\sim$17.3 ks in the central region. This extended emission is generally consistent with the results obtained by ROSAT. However, the field is affected by bright point sources due to the instrument's Point Spread Function (PSF) effect. Through fitting of the LEIA spectrum of the Virgo Cluster, we obtained a temperature of $2.1^{+0.3}_{-0.1}$ keV, which is consistent with the XMM-Newton results ($\sim$2.3 keV). Above 1.6 keV, the spectrum is dominated by the X-ray background. In summary, this study validates LEIA's extended source imaging and spectral resolution capabilities for the first time.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Localized stem structures in quasi-resonant two-soliton solutions for the asymmetric Nizhnik-Novikov-Veselov system
Authors:
Feng Yuan,
Jiguang Rao,
Jingsong He,
Yi Cheng
Abstract:
Elastic collisions of solitons generally have a finite phase shift. When the phase shift has a finitely large value, the two vertices of the (2+1)-dimensional 2-soliton are significantly separated due to the phase shift, accompanied by the formation of a local structure connecting the two V-shaped solitons. We define this local structure as the stem structure. This study systematically investigate…
▽ More
Elastic collisions of solitons generally have a finite phase shift. When the phase shift has a finitely large value, the two vertices of the (2+1)-dimensional 2-soliton are significantly separated due to the phase shift, accompanied by the formation of a local structure connecting the two V-shaped solitons. We define this local structure as the stem structure. This study systematically investigates the localized stem structures between two solitons in the (2+1)-dimensional asymmetric Nizhnik-Novikov-Veselov system. These stem structures, arising from quasi-resonant collisions between the solitons, exhibit distinct features of spatial locality and temporal invariance. We explore two scenarios: one characterized by weakly quasi-resonant collisions (i.e. $a_{12}\approx 0$), and the other by strongly quasi-resonant collisions (i.e. $a_{12}\approx +\infty$). Through mathematical analysis, we extract comprehensive insights into the trajectories, amplitudes, and velocities of the soliton arms. Furthermore, we discuss the characteristics of the stem structures, including their length and extreme points. Our findings shed new light on the interaction between solitons in the (2+1)-dimensional asymmetric Nizhnik-Novikov-Veselov system.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Boosting Graph Foundation Model from Structural Perspective
Authors:
Yao Cheng,
Yige Zhao,
Jianxiang Yu,
Xiang Li
Abstract:
Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspectiv…
▽ More
Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspective and propose BooG. The model constructs virtual super nodes to unify structural characteristics of graph data from different domains. Specifically, the super nodes fuse the information of anchor nodes and class labels, where each anchor node captures the information of a node or a graph instance to be classified. Instead of using the raw graph structure, we connect super nodes to all nodes within their neighborhood by virtual edges. This new structure allows for effective information aggregation while unifying cross-domain structural characteristics. Additionally, we propose a novel pre-training objective based on contrastive learning, which learns more expressive representations for graph data and generalizes effectively to different domains and downstream tasks. Experimental results on various datasets and tasks demonstrate the superior performance of BooG. We provide our code and data here: https://anonymous.4open.science/r/BooG-EE42/.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
Enhancing Taobao Display Advertising with Multimodal Representations: Challenges, Approaches and Insights
Authors:
Xiang-Rong Sheng,
Feifan Yang,
Litong Gong,
Biao Wang,
Zhangming Chan,
Yujing Zhang,
Yueyao Cheng,
Yong-Nan Zhu,
Tiezheng Ge,
Han Zhu,
Yuning Jiang,
Jian Xu,
Bo Zheng
Abstract:
Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting mul…
▽ More
Despite the recognized potential of multimodal data to improve model accuracy, many large-scale industrial recommendation systems, including Taobao display advertising system, predominantly depend on sparse ID features in their models. In this work, we explore approaches to leverage multimodal data to enhance the recommendation accuracy. We start from identifying the key challenges in adopting multimodal data in a manner that is both effective and cost-efficient for industrial systems. To address these challenges, we introduce a two-phase framework, including: 1) the pre-training of multimodal representations to capture semantic similarity, and 2) the integration of these representations with existing ID-based models. Furthermore, we detail the architecture of our production system, which is designed to facilitate the deployment of multimodal representations. Since the integration of multimodal representations in mid-2023, we have observed significant performance improvements in Taobao display advertising system. We believe that the insights we have gathered will serve as a valuable resource for practitioners seeking to leverage multimodal data in their systems.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Design and Optimization of Big Data and Machine Learning-Based Risk Monitoring System in Financial Markets
Authors:
Liyang Wang,
Yu Cheng,
Xingxin Gu,
Zhizhong Wu
Abstract:
With the increasing complexity of financial markets and rapid growth in data volume, traditional risk monitoring methods no longer suffice for modern financial institutions. This paper designs and optimizes a risk monitoring system based on big data and machine learning. By constructing a four-layer architecture, it effectively integrates large-scale financial data and advanced machine learning al…
▽ More
With the increasing complexity of financial markets and rapid growth in data volume, traditional risk monitoring methods no longer suffice for modern financial institutions. This paper designs and optimizes a risk monitoring system based on big data and machine learning. By constructing a four-layer architecture, it effectively integrates large-scale financial data and advanced machine learning algorithms. Key technologies employed in the system include Long Short-Term Memory (LSTM) networks, Random Forest, Gradient Boosting Trees, and real-time data processing platform Apache Flink, ensuring the real-time and accurate nature of risk monitoring. Research findings demonstrate that the system significantly enhances efficiency and accuracy in risk management, particularly excelling in identifying and warning against market crash risks.
△ Less
Submitted 27 July, 2024;
originally announced July 2024.
-
Needle Segmentation Using GAN: Restoring Thin Instrument Visibility in Robotic Ultrasound
Authors:
Zhongliang Jiang,
Xuesong Li,
Xiangyu Chu,
Angelos Karlas,
Yuan Bi,
Yingsheng Cheng,
K. W. Samuel Au,
Nassir Navab
Abstract:
Ultrasound-guided percutaneous needle insertion is a standard procedure employed in both biopsy and ablation in clinical practices. However, due to the complex interaction between tissue and instrument, the needle may deviate from the in-plane view, resulting in a lack of close monitoring of the percutaneous needle. To address this challenge, we introduce a robot-assisted ultrasound (US) imaging s…
▽ More
Ultrasound-guided percutaneous needle insertion is a standard procedure employed in both biopsy and ablation in clinical practices. However, due to the complex interaction between tissue and instrument, the needle may deviate from the in-plane view, resulting in a lack of close monitoring of the percutaneous needle. To address this challenge, we introduce a robot-assisted ultrasound (US) imaging system designed to seamlessly monitor the insertion process and autonomously restore the visibility of the inserted instrument when misalignment happens. To this end, the adversarial structure is presented to encourage the generation of segmentation masks that align consistently with the ground truth in high-order space. This study also systematically investigates the effects on segmentation performance by exploring various training loss functions and their combinations. When misalignment between the probe and the percutaneous needle is detected, the robot is triggered to perform transverse searching to optimize the positional and rotational adjustment to restore needle visibility. The experimental results on ex-vivo porcine samples demonstrate that the proposed method can precisely segment the percutaneous needle (with a tip error of $0.37\pm0.29mm$ and an angle error of $1.19\pm 0.29^{\circ}$). Furthermore, the needle appearance can be successfully restored under the repositioned probe pose in all 45 trials, with repositioning errors of $1.51\pm0.95mm$ and $1.25\pm0.79^{\circ}$. from latex to text with math symbols
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
An Iterative Approach to Topic Modelling
Authors:
Albert Wong,
Florence Wing Yau Cheng,
Ashley Keung,
Yamileth Hercules,
Mary Alexandra Garcia,
Yew-Wei Lim,
Lien Pham
Abstract:
Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propos…
▽ More
Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propose to use an iterative process to perform topic modelling that gives rise to a sense of completeness of the resulting topics when the process is complete. Using the BERTopic package, a popular method in topic modelling, we demonstrate how the modelling process can be applied iteratively to arrive at a set of topics that could not be further improved upon using one of the three selected measures for clustering comparison as the decision criteria. This demonstration is conducted using a subset of the COVIDSenti-A dataset. The early success leads us to believe that further research using in using this approach in conjunction with other topic modelling algorithms could be viable.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Magnetic Fields in Massive Star-forming Regions (MagMaR): Unveiling an Hourglass Magnetic Field in G333.46-0.16 using ALMA
Authors:
Piyali Saha,
Patricio Sanhueza,
Marco Padovani,
Josep M. Girart,
Paulo Cortes,
Kaho Morii,
Junhao Liu,
A. Sanchez-Monge,
Daniele Galli,
Shantanu Basu,
Patrick M. Koch,
Maria T. Beltran,
Shanghuo Li,
Henrik Beuther,
Ian W. Stephens,
Fumitaka Nakamura,
Qizhou Zhang,
Wenyu Jiao,
M. Fernandez-Lopez,
Jihye Hwang,
Eun Jung Chung,
Kate Pattle,
Luis A. Zapata,
Fengwei Xu,
Fernando A. Olguin
, et al. (11 additional authors not shown)
Abstract:
The contribution of the magnetic field to the formation of high-mass stars is poorly understood. We report the high-angular resolution ($\sim0.3^{\prime\prime}$, 870 au) map of the magnetic field projected on the plane of the sky (B$_\mathrm{POS}$) towards the high-mass star forming region G333.46$-$0.16 (G333), obtained with the Atacama Large Millimeter/submillimeter Array (ALMA) at 1.2 mm as par…
▽ More
The contribution of the magnetic field to the formation of high-mass stars is poorly understood. We report the high-angular resolution ($\sim0.3^{\prime\prime}$, 870 au) map of the magnetic field projected on the plane of the sky (B$_\mathrm{POS}$) towards the high-mass star forming region G333.46$-$0.16 (G333), obtained with the Atacama Large Millimeter/submillimeter Array (ALMA) at 1.2 mm as part of the Magnetic Fields in Massive Star-forming Regions (MagMaR) survey. The B$_\mathrm{POS}$ morphology found in this region is consistent with a canonical ``hourglass'' which suggest a dynamically important field. This region is fragmented into two protostars separated by $\sim1740$ au. Interestingly, by analysing H$^{13}$CO$^{+}$ ($J=3-2$) line emission, we find no velocity gradient over the extend of the continuum which is consistent with a strong field. We model the B$_\mathrm{POS}$, obtaining a marginally supercritical mass-to-flux ratio of 1.43, suggesting an initially strongly magnetized environment. Based on the Davis-Chandrasekhar-Fermi method, the magnetic field strength towards G333 is estimated to be 5.7 mG. The absence of strong rotation and outflows towards the central region of G333 suggests strong magnetic braking, consistent with a highly magnetized environment. Our study shows that despite being a strong regulator, the magnetic energy fails to prevent the process of fragmentation, as revealed by the formation of the two protostars in the central region.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Frequency stabilization based on H13C14N absorption in lithium niobate micro-disk laser
Authors:
Zhen Yi,
Zhihao Zhang,
Jianglin Guan,
Guanghui Zhao,
Renhong Gao,
Botao Fu,
Jintian Lin,
Jinming Chen,
Jian Liu,
Yijie Pan,
Ya Cheng
Abstract:
We demonstrate an on-chip lithium niobate micro-disk laser based on hydrogen cyanide (H13C14N) gas saturation absorption method for frequency stabilization. The laser chip consists of two main components: a micro-disk laser and a combined racetrack ring cavity. By operating on the H13C14N P12 absorption line at 1551.3 nm, the laser frequency can be precisely stabilized. The laser demonstrates rema…
▽ More
We demonstrate an on-chip lithium niobate micro-disk laser based on hydrogen cyanide (H13C14N) gas saturation absorption method for frequency stabilization. The laser chip consists of two main components: a micro-disk laser and a combined racetrack ring cavity. By operating on the H13C14N P12 absorption line at 1551.3 nm, the laser frequency can be precisely stabilized. The laser demonstrates remarkable stability, achieving a best stability value of 9*10^-9. Furthermore, the short-term stability, evaluated over continuous time intervals of 35 seconds, showcases exceptional performance. Additionally, the residual drift remains well below 30 MHz.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
Authors:
Yunkang Cao,
Jiangning Zhang,
Luca Frittoli,
Yuqi Cheng,
Weiming Shen,
Giacomo Boracchi
Abstract:
Zero-shot anomaly detection (ZSAD) targets the identification of anomalies within images from arbitrary novel categories. This study introduces AdaCLIP for the ZSAD task, leveraging a pre-trained vision-language model (VLM), CLIP. AdaCLIP incorporates learnable prompts into CLIP and optimizes them through training on auxiliary annotated anomaly detection data. Two types of learnable prompts are pr…
▽ More
Zero-shot anomaly detection (ZSAD) targets the identification of anomalies within images from arbitrary novel categories. This study introduces AdaCLIP for the ZSAD task, leveraging a pre-trained vision-language model (VLM), CLIP. AdaCLIP incorporates learnable prompts into CLIP and optimizes them through training on auxiliary annotated anomaly detection data. Two types of learnable prompts are proposed: static and dynamic. Static prompts are shared across all images, serving to preliminarily adapt CLIP for ZSAD. In contrast, dynamic prompts are generated for each test image, providing CLIP with dynamic adaptation capabilities. The combination of static and dynamic prompts is referred to as hybrid prompts, and yields enhanced ZSAD performance. Extensive experiments conducted across 14 real-world anomaly detection datasets from industrial and medical domains indicate that AdaCLIP outperforms other ZSAD methods and can generalize better to different categories and even domains. Finally, our analysis highlights the importance of diverse auxiliary data and optimized prompts for enhanced generalization capacity. Code is available at https://github.com/caoyunkang/AdaCLIP.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions
Authors:
Yihao Ai,
Yifei Qi,
Bo Wang,
Yu Cheng,
Xinchao Wang,
Robby T. Tan
Abstract:
Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotat…
▽ More
Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotation on low-light images. To this end, we introduce a novel approach that eliminates the need for low-light ground truths. Our primary novelty lies in leveraging two complementary-teacher networks to generate more reliable pseudo labels, enabling our model achieves competitive performance on extremely low-light images without the need for training with low-light ground truths. Our framework consists of two stages. In the first stage, our model is trained on well-lit data with low-light augmentations. In the second stage, we propose a dual-teacher framework to utilize the unlabeled low-light data, where a center-based main teacher produces the pseudo labels for relatively visible cases, while a keypoints-based complementary teacher focuses on producing the pseudo labels for the missed persons of the main teacher. With the pseudo labels from both teachers, we propose a person-specific low-light augmentation to challenge a student model in training to outperform the teachers. Experimental results on real low-light dataset (ExLPose-OCN) show, our method achieves 6.8% (2.4 AP) improvement over the state-of-the-art (SOTA) method, despite no low-light ground-truth data is used in our approach, in contrast to the SOTA method. Our code will be available at:https://github.com/ayh015-dev/DA-LLPose.
△ Less
Submitted 23 July, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
Variation Bayesian Interference for Multiple Extended Targets or Unresolved Group Targets Tracking
Authors:
Yuanhao Cheng,
Yunhe Cao,
Tat-Soon Yeo,
Yulin Zhang,
Fu Jie
Abstract:
In this work, we propose a tracking method for multiple extended targets or unresolvable group targets based on the Variational Bayesian Inference (VBI). Firstly, based on the most commonly used Random Matrix Model (RMM), the joint states of a single target are modeled as a Gamma Gaussian Inverse Wishart (GGIW) distribution, and the multi-target joint association variables are involved in the esti…
▽ More
In this work, we propose a tracking method for multiple extended targets or unresolvable group targets based on the Variational Bayesian Inference (VBI). Firstly, based on the most commonly used Random Matrix Model (RMM), the joint states of a single target are modeled as a Gamma Gaussian Inverse Wishart (GGIW) distribution, and the multi-target joint association variables are involved in the estimation together as unknown information with a prior distribution. A shape evolution model and VBI are employed to address the shortcomings of the RMM. Through the VBI, we can derive the approximate variational posterior for the exact multi-target posterior. Furthermore, to demonstrate the applicability of the method in real-world tracking scenarios, we present two potential lightweight schemes. The first is based on clustering, which effectively prunes the joint association events. The second is a simplification of the variational posterior through marginal association probabilities. We demonstrate the effectiveness of the proposed method using simulation experiments, and the proposed method outperforms current state-of-the-art methods in terms of accuracy and adaptability. This manuscript is only a preprint version, a completer and more official version will be uploaded as soon as possible
△ Less
Submitted 6 August, 2024; v1 submitted 21 July, 2024;
originally announced July 2024.
-
An electro-optically tunable arrayed waveguide grating fabricated on thin film lithium niobate
Authors:
Zhe Wang,
1 Zhiwei Fang,
Yiran Zhu,
Jian Liu,
Lang Gao,
Jianping Yu,
Haisu Zhang,
Min Wang,
Ya Cheng
Abstract:
We design and fabricate an 8-channel thin film lithium niobate (TFLN) arrayed-waveguide grating (AWG) and demonstrate the electro-optical tunability of the device. The monolithically integrated microelectrodes are designed for waveguides phase modulation and wavelength tunning. Experiments show that the fabricated electro-optically controlled TFLN AWG has a channel spacing of 200 GHz and a wavelen…
▽ More
We design and fabricate an 8-channel thin film lithium niobate (TFLN) arrayed-waveguide grating (AWG) and demonstrate the electro-optical tunability of the device. The monolithically integrated microelectrodes are designed for waveguides phase modulation and wavelength tunning. Experiments show that the fabricated electro-optically controlled TFLN AWG has a channel spacing of 200 GHz and a wavelength tuning efficiency of 10 pm/V.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
ANDES, the high resolution spectrograph for the ELT: science goals, project overview and future developments
Authors:
A. Marconi,
M. Abreu,
V. Adibekyan,
V. Alberti,
S. Albrecht,
J. Alcaniz,
M. Aliverti,
C. Allende Prieto,
J. D. Alvarado Gómez,
C. S. Alves,
P. J. Amado,
M. Amate,
M. I. Andersen,
S. Antoniucci,
E. Artigau,
C. Bailet,
C. Baker,
V. Baldini,
A. Balestra,
S. A. Barnes,
F. Baron,
S. C. C. Barros,
S. M. Bauer,
M. Beaulieu,
O. Bellido-Tirado
, et al. (264 additional authors not shown)
Abstract:
The first generation of ELT instruments includes an optical-infrared high-resolution spectrograph, indicated as ELT-HIRES and recently christened ANDES (ArmazoNes high Dispersion Echelle Spectrograph). ANDES consists of three fibre-fed spectrographs ([U]BV, RIZ, YJH) providing a spectral resolution of $\sim$100,000 with a minimum simultaneous wavelength coverage of 0.4-1.8 $μ$m with the goal of ex…
▽ More
The first generation of ELT instruments includes an optical-infrared high-resolution spectrograph, indicated as ELT-HIRES and recently christened ANDES (ArmazoNes high Dispersion Echelle Spectrograph). ANDES consists of three fibre-fed spectrographs ([U]BV, RIZ, YJH) providing a spectral resolution of $\sim$100,000 with a minimum simultaneous wavelength coverage of 0.4-1.8 $μ$m with the goal of extending it to 0.35-2.4 $μ$m with the addition of a U arm to the BV spectrograph and a separate K band spectrograph. It operates both in seeing- and diffraction-limited conditions and the fibre feeding allows several, interchangeable observing modes including a single conjugated adaptive optics module and a small diffraction-limited integral field unit in the NIR. Modularity and fibre-feeding allow ANDES to be placed partly on the ELT Nasmyth platform and partly in the Coudé room. ANDES has a wide range of groundbreaking science cases spanning nearly all areas of research in astrophysics and even fundamental physics. Among the top science cases, there are the detection of biosignatures from exoplanet atmospheres, finding the fingerprints of the first generation of stars, tests on the stability of Nature's fundamental couplings, and the direct detection of the cosmic acceleration. The ANDES project is carried forward by a large international consortium, composed of 35 Institutes from 13 countries, forming a team of almost 300 scientists and engineers which include the majority of the scientific and technical expertise in the field that can be found in ESO member states.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.