-
Equivalent Characterizations of the Aubin Property for Nonlinear Semidefinite Programming
Authors:
Liang Chen,
Ruoning Chen,
Defeng Sun,
Liping Zhang
Abstract:
In this paper, we study the Aubin property of the Karush-Kuhn-Tucker solution mapping for the nonlinear semidefinite programming (NLSDP) problem at a locally optimal solution. In the literature, it is known that the Aubin property implies the constraint nondegeneracy by Fusek [SIAM J. Optim. 23 (2013), pp. 1041-1061] and the second-order sufficient condition by Ding et al. [SIAM J. Optim. 27 (2017…
▽ More
In this paper, we study the Aubin property of the Karush-Kuhn-Tucker solution mapping for the nonlinear semidefinite programming (NLSDP) problem at a locally optimal solution. In the literature, it is known that the Aubin property implies the constraint nondegeneracy by Fusek [SIAM J. Optim. 23 (2013), pp. 1041-1061] and the second-order sufficient condition by Ding et al. [SIAM J. Optim. 27 (2017), pp. 67-90]. Based on the Mordukhovich criterion, here we further prove that the strong second-order sufficient condition is also necessary for the Aubin property to hold. Consequently, several equivalent conditions including the strong regularity are established for NLSDP's Aubin property. Together with the recent progress made by Chen et al. on the equivalence between the Aubin property and the strong regularity for nonlinear second-order cone programming [arXiv:2406.13798v1 (2024)], this paper constitutes a significant step forward in characterizing the Aubin property for general non-polyhedral $C^2$-cone reducible constrained optimization problems.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Multi-Scale Cell Decomposition for Path Planning using Restrictive Routing Potential Fields
Authors:
Josue N. Rivera,
Dengfeng Sun
Abstract:
In burgeoning domains, like urban goods distribution, the advent of aerial cargo transportation necessitates the development of routing solutions that prioritize safety. This paper introduces Larp, a novel path planning framework that leverages the concept of restrictive potential fields to forge routes demonstrably safer than those derived from existing methods. The algorithm achieves it by segme…
▽ More
In burgeoning domains, like urban goods distribution, the advent of aerial cargo transportation necessitates the development of routing solutions that prioritize safety. This paper introduces Larp, a novel path planning framework that leverages the concept of restrictive potential fields to forge routes demonstrably safer than those derived from existing methods. The algorithm achieves it by segmenting a potential field into a hierarchy of cells, each with a designated restriction zone determined by obstacle proximity. While the primary impetus behind Larp is to enhance the safety of aerial pathways for cargo-carrying Unmanned Aerial Vehicles (UAVs), its utility extends to a wide array of path planning scenarios. Comparative analyses with both established and contemporary potential field-based methods reveal Larp's proficiency in maintaining a safe distance from restrictions and its adeptness in circumventing local minima.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
PromptSAM+: Malware Detection based on Prompt Segment Anything Model
Authors:
Xingyuan Wei,
Yichen Liu,
Ce Li,
Ning Li,
Degang Sun,
Yan Wang
Abstract:
Machine learning and deep learning (ML/DL) have been extensively applied in malware detection, and some existing methods demonstrate robust performance. However, several issues persist in the field of malware detection: (1) Existing work often overemphasizes accuracy at the expense of practicality, rarely considering false positive and false negative rates as important metrics. (2) Considering the…
▽ More
Machine learning and deep learning (ML/DL) have been extensively applied in malware detection, and some existing methods demonstrate robust performance. However, several issues persist in the field of malware detection: (1) Existing work often overemphasizes accuracy at the expense of practicality, rarely considering false positive and false negative rates as important metrics. (2) Considering the evolution of malware, the performance of classifiers significantly declines over time, greatly reducing the practicality of malware detectors. (3) Prior ML/DL-based efforts heavily rely on ample labeled data for model training, largely dependent on feature engineering or domain knowledge to build feature databases, making them vulnerable if correct labels are scarce. With the development of computer vision, vision-based malware detection technology has also rapidly evolved. In this paper, we propose a visual malware general enhancement classification framework, `PromptSAM+', based on a large visual network segmentation model, the Prompt Segment Anything Model(named PromptSAM+). Our experimental results indicate that 'PromptSAM+' is effective and efficient in malware detection and classification, achieving high accuracy and low rates of false positives and negatives. The proposed method outperforms the most advanced image-based malware detection technologies on several datasets. 'PromptSAM+' can mitigate aging in existing image-based malware classifiers, reducing the considerable manpower needed for labeling new malware samples through active learning. We conducted experiments on datasets for both Windows and Android platforms, achieving favorable outcomes. Additionally, our ablation experiments on several datasets demonstrate that our model identifies effective modules within the large visual network.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Mitigating the Impact of Malware Evolution on API Sequence-based Windows Malware Detector
Authors:
Xingyuan Wei,
Ce Li,
Qiujian Lv,
Ning Li,
Degang Sun,
Yan Wang
Abstract:
In dynamic Windows malware detection, deep learning models are extensively deployed to analyze API sequences. Methods based on API sequences play a crucial role in malware prevention. However, due to the continuous updates of APIs and the changes in API sequence calls leading to the constant evolution of malware variants, the detection capability of API sequence-based malware detection models sign…
▽ More
In dynamic Windows malware detection, deep learning models are extensively deployed to analyze API sequences. Methods based on API sequences play a crucial role in malware prevention. However, due to the continuous updates of APIs and the changes in API sequence calls leading to the constant evolution of malware variants, the detection capability of API sequence-based malware detection models significantly diminishes over time. We observe that the API sequences of malware samples before and after evolution usually have similar malicious semantics. Specifically, compared to the original samples, evolved malware samples often use the API sequences of the pre-evolution samples to achieve similar malicious behaviors. For instance, they access similar sensitive system resources and extend new malicious functions based on the original functionalities. In this paper, we propose a frame(MME), a framework that can enhance existing API sequence-based malware detectors and mitigate the adverse effects of malware evolution. To help detection models capture the similar semantics of these post-evolution API sequences, our framework represents API sequences using API knowledge graphs and system resource encodings and applies contrastive learning to enhance the model's encoder. Results indicate that, compared to Regular Text-CNN, our framework can significantly reduce the false positive rate by 13.10% and improve the F1-Score by 8.47% on five years of data, achieving the best experimental results. Additionally, evaluations show that our framework can save on the human costs required for model maintenance. We only need 1% of the budget per month to reduce the false positive rate by 11.16% and improve the F1-Score by 6.44%.
△ Less
Submitted 3 August, 2024;
originally announced August 2024.
-
HOT: An Efficient Halpern Accelerating Algorithm for Optimal Transport Problems
Authors:
Guojun Zhang,
Zhexuan Gu,
Yancheng Yuan,
Defeng Sun
Abstract:
This paper proposes an efficient HOT algorithm for solving the optimal transport (OT) problems with finite supports. We particularly focus on an efficient implementation of the HOT algorithm for the case where the supports are in $\mathbb{R}^2$ with ground distances calculated by $L_2^2$-norm. Specifically, we design a Halpern accelerating algorithm to solve the equivalent reduced model of the dis…
▽ More
This paper proposes an efficient HOT algorithm for solving the optimal transport (OT) problems with finite supports. We particularly focus on an efficient implementation of the HOT algorithm for the case where the supports are in $\mathbb{R}^2$ with ground distances calculated by $L_2^2$-norm. Specifically, we design a Halpern accelerating algorithm to solve the equivalent reduced model of the discrete OT problem. Moreover, we derive a novel procedure to solve the involved linear systems in the HOT algorithm in linear time complexity. Consequently, we can obtain an $\varepsilon$-approximate solution to the optimal transport problem with $M$ supports in $O(M^{1.5}/\varepsilon)$ flops, which significantly improves the best-known computational complexity. We further propose an efficient procedure to recover an optimal transport plan for the original OT problem based on a solution to the reduced model, thereby overcoming the limitations of the reduced OT model in applications that require the transport map. We implement the HOT algorithm in PyTorch and extensive numerical results show the superior performance of the HOT algorithm compared to existing state-of-the-art algorithms for solving the OT problems.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Domain Adaptive Lung Nodule Detection in X-ray Image
Authors:
Haifeng Zhao,
Lixiang Jiang,
Leilei Ma,
Dengdi Sun,
Yanping Fu
Abstract:
Medical images from different healthcare centers exhibit varied data distributions, posing significant challenges for adapting lung nodule detection due to the domain shift between training and application phases. Traditional unsupervised domain adaptive detection methods often struggle with this shift, leading to suboptimal outcomes. To overcome these challenges, we introduce a novel domain adapt…
▽ More
Medical images from different healthcare centers exhibit varied data distributions, posing significant challenges for adapting lung nodule detection due to the domain shift between training and application phases. Traditional unsupervised domain adaptive detection methods often struggle with this shift, leading to suboptimal outcomes. To overcome these challenges, we introduce a novel domain adaptive approach for lung nodule detection that leverages mean teacher self-training and contrastive learning. First, we propose a hierarchical contrastive learning strategy to refine nodule representations and enhance the distinction between nodules and background. Second, we introduce a nodule-level domain-invariant feature learning (NDL) module to capture domain-invariant features through adversarial learning across different domains. Additionally, we propose a new annotated dataset of X-ray images to aid in advancing lung nodule detection research. Extensive experiments conducted on multiple X-ray datasets demonstrate the efficacy of our approach in mitigating domain shift impacts.
△ Less
Submitted 2 August, 2024; v1 submitted 28 July, 2024;
originally announced July 2024.
-
Sewer Image Super-Resolution with Depth Priors and Its Lightweight Network
Authors:
Gang Pan,
Chen Wang,
Zhijie Sui,
Shuai Guo,
Yaozhi Lv,
Honglie Li,
Di Sun
Abstract:
The Quick-view (QV) technique serves as a primary method for detecting defects within sewerage systems. However, the effectiveness of QV is impeded by the limited visual range of its hardware, resulting in suboptimal image quality for distant portions of the sewer network. Image super-resolution is an effective way to improve image quality and has been applied in a variety of scenes. However, rese…
▽ More
The Quick-view (QV) technique serves as a primary method for detecting defects within sewerage systems. However, the effectiveness of QV is impeded by the limited visual range of its hardware, resulting in suboptimal image quality for distant portions of the sewer network. Image super-resolution is an effective way to improve image quality and has been applied in a variety of scenes. However, research on super-resolution for sewer images remains considerably unexplored. In response, this study leverages the inherent depth relationships present within QV images and introduces a novel Depth-guided, Reference-based Super-Resolution framework denoted as DSRNet. It comprises two core components: a depth extraction module and a depth information matching module (DMM). DSRNet utilizes the adjacent frames of the low-resolution image as reference images and helps them recover texture information based on the correlation. By combining these modules, the integration of depth priors significantly enhances both visual quality and performance benchmarks. Besides, in pursuit of computational efficiency and compactness, our paper introduces a super-resolution knowledge distillation model based on an attention mechanism. This mechanism facilitates the acquisition of feature similarity between a more complex teacher model and a streamlined student model, the latter being a lightweight version of DSRNet. Experimental results demonstrate that DSRNet significantly improves PSNR and SSIM compared with other methods. This study also conducts experiments on sewer defect semantic segmentation, object detection, and classification on the Pipe dataset and Sewer-ML dataset. Experiments show that the method can improve the performance of low-resolution sewer images in these tasks.
△ Less
Submitted 27 July, 2024;
originally announced July 2024.
-
Text-Region Matching for Multi-Label Image Recognition with Missing Labels
Authors:
Leilei Ma,
Hongxing Xie,
Lei Wang,
Yanping Fu,
Dengdi Sun,
Haifeng Zhao
Abstract:
Recently, large-scale visual language pre-trained (VLP) models have demonstrated impressive performance across various downstream tasks. Motivated by these advancements, pioneering efforts have emerged in multi-label image recognition with missing labels, leveraging VLP prompt-tuning technology. However, they usually cannot match text and vision features well, due to complicated semantics gaps and…
▽ More
Recently, large-scale visual language pre-trained (VLP) models have demonstrated impressive performance across various downstream tasks. Motivated by these advancements, pioneering efforts have emerged in multi-label image recognition with missing labels, leveraging VLP prompt-tuning technology. However, they usually cannot match text and vision features well, due to complicated semantics gaps and missing labels in a multi-label image. To tackle this challenge, we propose \textbf{T}ext-\textbf{R}egion \textbf{M}atching for optimizing \textbf{M}ulti-\textbf{L}abel prompt tuning, namely TRM-ML, a novel method for enhancing meaningful cross-modal matching. Compared to existing methods, we advocate exploring the information of category-aware regions rather than the entire image or pixels, which contributes to bridging the semantic gap between textual and visual representations in a one-to-one matching manner. Concurrently, we further introduce multimodal contrastive learning to narrow the semantic gap between textual and visual modalities and establish intra-class and inter-class relationships. Additionally, to deal with missing labels, we propose a multimodal category prototype that leverages intra- and inter-category semantic relationships to estimate unknown labels, facilitating pseudo-label generation. Extensive experiments on the MS-COCO, PASCAL VOC, Visual Genome, NUS-WIDE, and CUB-200-211 benchmark datasets demonstrate that our proposed framework outperforms the state-of-the-art methods by a significant margin. Our code is available here\href{https://github.com/yu-gi-oh-leilei/TRM-ML}{\raisebox{-1pt}{\faGithub}}.
△ Less
Submitted 7 August, 2024; v1 submitted 26 July, 2024;
originally announced July 2024.
-
LAMBDA: A Large Model Based Data Agent
Authors:
Maojun Sun,
Ruijian Han,
Binyan Jiang,
Houduo Qi,
Defeng Sun,
Yancheng Yuan,
Jian Huang
Abstract:
We introduce ``LAMBDA," a novel open-source, code-free multi-agent data analysis system that that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent roles: the prog…
▽ More
We introduce ``LAMBDA," a novel open-source, code-free multi-agent data analysis system that that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user's instructions and domain-specific knowledge, enhanced by advanced models. Meanwhile, the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention in the operational loop. Additionally, LAMBDA can flexibly integrate external models and algorithms through our knowledge integration mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various machine learning datasets. It has the potential to enhance data science practice and analysis paradigm by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for individuals from diverse backgrounds. The strong performance of LAMBDA in solving data science problems is demonstrated in several case studies, which are presented at \url{https://www.polyu.edu.hk/ama/cmfai/lambda.html}.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Implementable Semismooth* Newton Methods for Generalized Equations are G-Semismooth Newton Methods
Authors:
Liang Chen,
Defeng Sun,
Wangyongquan Zhang
Abstract:
Semismooth* Newton methods have been proposed in recent years targeting multi-valued inclusion problems and have been successfully implemented to deal with several concrete generalized equations. In this paper, we show that these executable implementations are exactly the applications of G-semismooth Newton methods for solving nonsmooth equations localized from these generalized equations. This ne…
▽ More
Semismooth* Newton methods have been proposed in recent years targeting multi-valued inclusion problems and have been successfully implemented to deal with several concrete generalized equations. In this paper, we show that these executable implementations are exactly the applications of G-semismooth Newton methods for solving nonsmooth equations localized from these generalized equations. This new understanding expands the breadth of G-semismooth Newton methods in theory, and more importantly, facilitates the design and implementation of practical Newton-type algorithms for solving generalized equations.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
SMooDi: Stylized Motion Diffusion Model
Authors:
Lei Zhong,
Yiming Xie,
Varun Jampani,
Deqing Sun,
Huaizu Jiang
Abstract:
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, SMooDi can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-…
▽ More
We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, SMooDi can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-motion model for stylization. Specifically, we propose style guidance to ensure that the generated motion closely matches the reference style, alongside a lightweight style adaptor that directs the motion towards the desired style while ensuring realism. Experiments across various applications demonstrate that our proposed framework outperforms existing methods in stylized motion generation.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
SlideGCD: Slide-based Graph Collaborative Training with Knowledge Distillation for Whole Slide Image Classification
Authors:
Tong Shu,
Jun Shi,
Dongdong Sun,
Zhiguo Jiang,
Yushan Zheng
Abstract:
Existing WSI analysis methods lie on the consensus that histopathological characteristics of tumors are significant guidance for cancer diagnostics. Particularly, as the evolution of cancers is a continuous process, the correlations and differences across various stages, anatomical locations and patients should be taken into account. However, recent research mainly focuses on the inner-contextual…
▽ More
Existing WSI analysis methods lie on the consensus that histopathological characteristics of tumors are significant guidance for cancer diagnostics. Particularly, as the evolution of cancers is a continuous process, the correlations and differences across various stages, anatomical locations and patients should be taken into account. However, recent research mainly focuses on the inner-contextual information in a single WSI, ignoring the correlations between slides. To verify whether introducing the slide inter-correlations can bring improvements to WSI representation learning, we propose a generic WSI analysis pipeline SlideGCD that considers the existing multi-instance learning (MIL) methods as the backbone and forge the WSI classification task as a node classification problem. More specifically, SlideGCD declares a node buffer that stores previous slide embeddings for subsequent extensive slide-based graph construction and conducts graph learning to explore the inter-correlations implied in the slide-based graph. Moreover, we frame the MIL classifier and graph learning into two parallel workflows and deploy the knowledge distillation to transfer the differentiable information to the graph neural network. The consistent performance boosting, brought by SlideGCD, of four previous state-of-the-art MIL methods is observed on two TCGA benchmark datasets. The code is available at https://github.com/HFUT-miaLab/SlideGCD.
△ Less
Submitted 19 July, 2024; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Phase-field modeling of dendritic growth with gas bubbles in the solidification of binary alloys
Authors:
Chengjie Zhan,
Zhenhua Chai,
Dongke Sun,
Baochang Shi,
Shaoning Geng,
Ping Jiang
Abstract:
In this work, a phase-field model is developed for the dendritic growth with gas bubbles in the solidification of binary alloys. In this model, a total free energy for the complex gas-liquid-dendrite system is proposed through considering the interactions of gas bubbles, liquid melt and solid dendrites, and it can reduce to the energy for gas-liquid flows in the region far from the solid phase, wh…
▽ More
In this work, a phase-field model is developed for the dendritic growth with gas bubbles in the solidification of binary alloys. In this model, a total free energy for the complex gas-liquid-dendrite system is proposed through considering the interactions of gas bubbles, liquid melt and solid dendrites, and it can reduce to the energy for gas-liquid flows in the region far from the solid phase, while degenerate to the energy for thermosolutal dendritic growth when the gas bubble disappears. The governing equations are usually obtained by minimizing the total free energy, but here some modifications are made to improve the capacity of the conservative phase-field equation for gas bubbles and convection-diffusion equation for solute transfer. Additionally, through the asymptotic analysis of the thin-interface limit, the present general phase-field model for alloy solidification can match the corresponding free boundary problem, and it is identical to the commonly used models under a specific choice of model parameters. Furthermore, to describe the fluid flow, the incompressible Navier-Stokes equations are adopted in the entire domain including gas, liquid, and solid regions, where the fluid-structure interaction is considered by a simple diffuse-interface method. To test the present phase-field model, the lattice Boltzmann method is used to study several problems of gas-liquid flows, dendritic growth as well as the solidification in presence of gas bubbles, and a good performance of the present model for such complex problems is observed.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
FAGhead: Fully Animate Gaussian Head from Monocular Videos
Authors:
Yixin Xuan,
Xinyang Li,
Gongxin Yao,
Shiwei Zhou,
Donghui Sun,
Xiaoxin Chen,
Yu Pan
Abstract:
High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Repre…
▽ More
High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Representation Field (PLRF) with learnable Gaussian point positions to enhance reconstruction performance. Meanwhile, to effectively manage the edges of avatars, we introduced the alpha rendering to supervise the alpha value of each pixel. Extensive experimental results on the open-source datasets and our capturing datasets demonstrate that our approach is able to generate high-fidelity 3D head avatars and fully control the expression and pose of the virtual avatars, which is outperforming than existing works.
△ Less
Submitted 28 June, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Advancements in Feature Extraction Recognition of Medical Imaging Systems Through Deep Learning Technique
Authors:
Qishi Zhan,
Dan Sun,
Erdi Gao,
Yuhan Ma,
Yaxin Liang,
Haowei Yang
Abstract:
This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simple…
▽ More
This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simplex algorithm is presented. Aiming at the nonlinear characteristics of hyperspectral images, a generalized discriminant analysis algorithm based on kernel function is proposed. In this project, a hyperspectral remote sensing image is taken as the object, and we investigate its mathematical modeling, solution methods, and feature extraction techniques. It is found that different types of objects are independent of each other and compact in image processing. Compared with the traditional linear discrimination method, the result of image segmentation is better. This method can not only overcome the disadvantage of the traditional method which is easy to be affected by light, but also extract the features of the object quickly and accurately. It has important reference significance for clinical diagnosis.
△ Less
Submitted 23 May, 2024;
originally announced June 2024.
-
Aubin Property and Strong Regularity Are Equivalent for Nonlinear Second-Order Cone Programming
Authors:
Liang Chen,
Ruoning Chen,
Defeng Sun,
Junyuan Zhu
Abstract:
This paper solves a fundamental open problem in variational analysis on the equivalence between the Aubin property and the strong regularity for nonlinear second-order cone programming (SOCP) at a locally optimal solution. We achieve this by introducing a reduction approach to the Aubin property characterized by the Mordukhovich criterion and a lemma of alternative choices on cones to replace the…
▽ More
This paper solves a fundamental open problem in variational analysis on the equivalence between the Aubin property and the strong regularity for nonlinear second-order cone programming (SOCP) at a locally optimal solution. We achieve this by introducing a reduction approach to the Aubin property characterized by the Mordukhovich criterion and a lemma of alternative choices on cones to replace the S-lemma used in Outrata and Ramírez [SIAM J. Optim. 21 (2011) 789-823] and Opazo, Outrata, and Ramírez [SIAM J. Optim. 27 (2017) 2141-2151], where the same SOCP was considered under the strict complementarity condition except for possibly only one block of constraints. As a byproduct, we also offer a new approach to the well-known result of Dontchev and Rockafellar [SIAM J. Optim. 6 (1996) 1087-1105] on the equivalence of the two concepts in conventional nonlinear programming.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Low-Energy Electronic Structure in the Unconventional Charge-Ordered State of ScV$_6$Sn$_6$
Authors:
Asish K. Kundu,
Xiong Huang,
Eric Seewald,
Ethan Ritz,
Santanu Pakhira,
Shuai Zhang,
Dihao Sun,
Simon Turkel,
Sara Shabani,
Turgut Yilmaz,
Elio Vescovo,
Cory R. Dean,
David C. Johnston,
Tonica Valla,
Turan Birol,
Dmitri N. Basov,
Rafael M. Fernandes,
Abhay N. Pasupathy
Abstract:
Kagome vanadates {\it A}V$_3$Sb$_5$ display unusual low-temperature electronic properties including charge density waves (CDW), whose microscopic origin remains unsettled. Recently, CDW order has been discovered in a new material ScV$_6$Sn$_6$, providing an opportunity to explore whether the onset of CDW leads to unusual electronic properties. Here, we study this question using angle-resolved phot…
▽ More
Kagome vanadates {\it A}V$_3$Sb$_5$ display unusual low-temperature electronic properties including charge density waves (CDW), whose microscopic origin remains unsettled. Recently, CDW order has been discovered in a new material ScV$_6$Sn$_6$, providing an opportunity to explore whether the onset of CDW leads to unusual electronic properties. Here, we study this question using angle-resolved photoemission spectroscopy (ARPES) and scanning tunneling microscopy (STM). The ARPES measurements show minimal changes to the electronic structure after the onset of CDW. However, STM quasiparticle interference (QPI) measurements show strong dispersing features related to the CDW ordering vectors. A plausible explanation is the presence of a strong momentum-dependent scattering potential peaked at the CDW wavevector, associated with the existence of competing CDW instabilities. Our STM results further indicate that the bands most affected by the CDW are near vHS, analogous to the case of {\it A}V$_3$Sb$_5$ despite very different CDW wavevectors.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning
Authors:
Dan Sun,
Yaxin Liang,
Yining Yang,
Yuhan Ma,
Qishi Zhan,
Erdi Gao
Abstract:
This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two group…
▽ More
This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two groups were tested. The experimental results show that this method can convert discrete features into continuous characters, thus reducing the complexity of feature preprocessing. Word2Vec and natural language processing technology are integrated to achieve the goal of direct evaluation of missing image features. The robustness of the image feature evaluation model is improved by using the excellent feature analysis characteristics of a convolutional neural network. This project intends to improve the existing image feature identification methods and eliminate the subjective influence in the evaluation process. The findings from the simulation indicate that the novel approach has developed is viable, effectively augmenting the features within the produced representations.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network
Authors:
Houze Liu,
Iris Li,
Yaxin Liang,
Dan Sun,
Yining Yang,
Haowei Yang
Abstract:
Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networ…
▽ More
Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networks. Firstly, according to the characteristics of pneumonia images, AlexNet and InceptionV3 were selected to obtain better image recognition results. Combining the features of medical images, the forward neural network with deeper and more complex structure is learned. Finally, knowledge extraction technology is used to extract the obtained data into the AlexNet model to achieve the purpose of improving computing efficiency and reducing computing costs. The results showed that the prediction accuracy, specificity, and sensitivity of the trained AlexNet model increased by 4.25 percentage points, 7.85 percentage points, and 2.32 percentage points, respectively. The graphics processing usage has decreased by 51% compared to the InceptionV3 mode.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation
Authors:
Jinyuan Li,
Ziyan Li,
Han Li,
Jianfei Yu,
Rui Xia,
Di Sun,
Gang Pan
Abstract:
Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases u…
▽ More
Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases used in similar tasks (e.g., phrase localization) and fine-grained named entities. In this paper, we propose RiVEG, a unified framework that reformulates GMNER into a joint MNER-VE-VG task by leveraging large language models (LLMs) as connecting bridges. This reformulation brings two benefits: 1) It enables us to optimize the MNER module for optimal MNER performance and eliminates the need to pre-extract region features using object detection methods, thus naturally addressing the two major limitations of existing GMNER methods. 2) The introduction of Entity Expansion Expression module and Visual Entailment (VE) module unifies Visual Grounding (VG) and Entity Grounding (EG). This endows the proposed framework with unlimited data and model scalability. Furthermore, to address the potential ambiguity stemming from the coarse-grained bounding box output in GMNER, we further construct the new Segmented Multimodal Named Entity Recognition (SMNER) task and corresponding Twitter-SMNER dataset aimed at generating fine-grained segmentation masks, and experimentally demonstrate the feasibility and effectiveness of using box prompt-based Segment Anything Model (SAM) to empower any GMNER model with the ability to accomplish the SMNER task. Extensive experiments demonstrate that RiVEG significantly outperforms SoTA methods on four datasets across the MNER, GMNER, and SMNER tasks.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise
Authors:
Zhonghao Wang,
Danyu Sun,
Sheng Zhou,
Haobo Wang,
Jiapei Fan,
Longtao Huang,
Jiajun Bu
Abstract:
Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating inc…
▽ More
Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating incorrect information during training. To address this issue, the study of Graph Neural Networks under Label Noise (GLN) has recently gained traction. However, due to variations in dataset selection, data splitting, and preprocessing techniques, the community currently lacks a comprehensive benchmark, which impedes deeper understanding and further development of GLN. To fill this gap, we introduce NoisyGL in this paper, the first comprehensive benchmark for graph neural networks under label noise. NoisyGL enables fair comparisons and detailed analyses of GLN methods on noisy labeled graph data across various datasets, with unified experimental settings and interface. Our benchmark has uncovered several important insights that were missed in previous research, and we believe these findings will be highly beneficial for future studies. We hope our open-source benchmark library will foster further advancements in this field. The code of the benchmark can be found in https://github.com/eaglelab-zju/NoisyGL.
△ Less
Submitted 6 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model
Authors:
Hongen Liu,
Di Sun,
Jiahao Wang,
Yi Liu,
Gang Pan
Abstract:
Video text spotting (VTS) aims to simultaneously localize, recognize and track text instances in videos. To address the limited recognition capability of end-to-end methods, recent methods track the zero-shot results of state-of-the-art image text spotters directly, and achieve impressive performance. However, owing to the domain gap between different datasets, these methods usually obtain limited…
▽ More
Video text spotting (VTS) aims to simultaneously localize, recognize and track text instances in videos. To address the limited recognition capability of end-to-end methods, recent methods track the zero-shot results of state-of-the-art image text spotters directly, and achieve impressive performance. However, owing to the domain gap between different datasets, these methods usually obtain limited tracking trajectories on extreme dataset. Fine-tuning transformer-based text spotters on specific datasets could yield performance enhancements, albeit at the expense of considerable training resources. In this paper, we propose a Language Collaboration and Glyph Perception Model, termed LOGO, an innovative framework designed to enhance the performance of conventional text spotters. To achieve this goal, we design a language synergy classifier (LSC) to explicitly discern text instances from background noise in the recognition stage. Specially, the language synergy classifier can output text content or background code based on the legibility of text regions, thus computing language scores. Subsequently, fusion scores are computed by taking the average of detection scores and language scores, and are utilized to re-score the detection results before tracking. By the re-scoring mechanism, the proposed LSC facilitates the detection of low-resolution text instances while filtering out text-like regions. Moreover, the glyph supervision is introduced to enhance the recognition accuracy of noisy text regions. In addition, we propose the visual position mixture module, which can merge the position information and visual features efficiently, and acquire more discriminative tracking features. Extensive experiments on public benchmarks validate the effectiveness of the proposed method.
△ Less
Submitted 10 June, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Theoretical Analysis of Meta Reinforcement Learning: Generalization Bounds and Convergence Guarantees
Authors:
Cangqing Wang,
Mingxiu Sui,
Dan Sun,
Zecheng Zhang,
Yan Zhou
Abstract:
This research delves deeply into Meta Reinforcement Learning (Meta RL) through a exploration focusing on defining generalization limits and ensuring convergence. By employing a approach this article introduces an innovative theoretical framework to meticulously assess the effectiveness and performance of Meta RL algorithms. We present an explanation of generalization limits measuring how well thes…
▽ More
This research delves deeply into Meta Reinforcement Learning (Meta RL) through a exploration focusing on defining generalization limits and ensuring convergence. By employing a approach this article introduces an innovative theoretical framework to meticulously assess the effectiveness and performance of Meta RL algorithms. We present an explanation of generalization limits measuring how well these algorithms can adapt to learning tasks while maintaining consistent results. Our analysis delves into the factors that impact the adaptability of Meta RL revealing the relationship, between algorithm design and task complexity. Additionally we establish convergence assurances by proving conditions under which Meta RL strategies are guaranteed to converge towards solutions. We examine the convergence behaviors of Meta RL algorithms across scenarios providing a comprehensive understanding of the driving forces behind their long term performance. This exploration covers both convergence and real time efficiency offering a perspective, on the capabilities of these algorithms.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Weakly supervised alignment and registration of MR-CT for cervical cancer radiotherapy
Authors:
Jjahao Zhang,
Yin Gu,
Deyu Sun,
Yuhua Gao,
Ming Gao,
Ming Cui,
Teng Zhang,
He Ma
Abstract:
Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be use…
▽ More
Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be useful in achieving a precise outline of the extent of paracervical tissue invasion. Registration is the initial step in information fusion. However, when aligning multimodal images with varying depths, manual alignment is prone to large errors and is time-consuming. Furthermore, the variations in the size of the Region of Interest (ROI) and the shape of multimodal images pose a significant challenge for achieving accurate registration.In this paper, we propose a preliminary spatial alignment algorithm and a weakly supervised multimodal registration network. The spatial position alignment algorithm efficiently utilizes the limited annotation information in the two modal images provided by the doctor to automatically align multimodal images with varying depths. By utilizing aligned multimodal images for weakly supervised registration and incorporating pyramidal features and cost volume to estimate the optical flow, the results indicate that the proposed method outperforms traditional volume rendering alignment methods and registration networks in various evaluation metrics. This demonstrates the effectiveness of our model in multimodal image registration.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
High Discrimination Ratio, Broadband Circularly Polarized Light Photodetector Using Dielectric Achiral Nanostructures
Authors:
Guanyu Zhang,
Xiaying Lyu,
Yulu Qin,
Yaolong Li,
Zipu Fan,
Xianghan Meng,
Yuqing Cheng,
Zini Cao,
Yixuan Xu,
Dong Sun,
Yunan Gao,
Qihuang Gong,
Guowei Lu
Abstract:
The on-chip measurement of polarization states plays an increasingly crucial role in modern sensing and imaging applications. While high-performance monolithic linearly polarized photodetectors have been extensively studied, integrated circularly polarized light (CPL) photodetectors are still hindered by inadequate discrimination capability. In this study, we employ achiral all-dielectric nanostru…
▽ More
The on-chip measurement of polarization states plays an increasingly crucial role in modern sensing and imaging applications. While high-performance monolithic linearly polarized photodetectors have been extensively studied, integrated circularly polarized light (CPL) photodetectors are still hindered by inadequate discrimination capability. In this study, we employ achiral all-dielectric nanostructures to develop a broadband CPL photodetector with an impressive discrimination ratio of ~107 at the wavelength of 405 nm, significantly surpassing its counterparts by two orders of magnitude. Our device shows outstanding CPL discrimination capability across the visible band without requiring intensity calibration. Its function mechanism is based on the CPL-dependent near-field modes within achiral structures: under left or right CPL illumination, distinct near-field modes are excited, resulting in asymmetric irradiation of the two electrodes and generating a photovoltage with directions determined by the chirality of the incident light field. The proposed design strategy facilitates the realization of ultra-compact CPL detection across diverse materials, structures, and spectral ranges, presenting a novel avenue for achieving high-performance monolithic CPL detection.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Random Utility Models with Skewed Random Components: the Smallest versus Largest Extreme Value Distribution
Authors:
Richard T. Carson,
Derrick H. Sun,
Yixiao Sun
Abstract:
At the core of most random utility models (RUMs) is an individual agent with a random utility component following a largest extreme value Type I (LEVI) distribution. What if, instead, the random component follows its mirror image -- the smallest extreme value Type I (SEVI) distribution? Differences between these specifications, closely tied to the random component's skewness, can be quite profound…
▽ More
At the core of most random utility models (RUMs) is an individual agent with a random utility component following a largest extreme value Type I (LEVI) distribution. What if, instead, the random component follows its mirror image -- the smallest extreme value Type I (SEVI) distribution? Differences between these specifications, closely tied to the random component's skewness, can be quite profound. For the same preference parameters, the two RUMs, equivalent with only two choice alternatives, diverge progressively as the number of alternatives increases, resulting in substantially different estimates and predictions for key measures, such as elasticities and market shares.
The LEVI model imposes the well-known independence-of-irrelevant-alternatives property, while SEVI does not. Instead, the SEVI choice probability for a particular option involves enumerating all subsets that contain this option. The SEVI model, though more complex to estimate, is shown to have computationally tractable closed-form choice probabilities. Much of the paper delves into explicating the properties of the SEVI model and exploring implications of the random component's skewness.
Conceptually, the difference between the LEVI and SEVI models centers on whether information, known only to the agent, is more likely to increase or decrease the systematic utility parameterized using observed attributes. LEVI does the former; SEVI the latter. An immediate implication is that if choice is characterized by SEVI random components, then the observed choice is more likely to correspond to the systematic-utility-maximizing choice than if characterized by LEVI. Examining standard empirical examples from different applied areas, we find that the SEVI model outperforms the LEVI model, suggesting the relevance of its inclusion in applied researchers' toolkits.
△ Less
Submitted 21 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
A Lightweight Transformer for Remote Sensing Image Change Captioning
Authors:
Dongwei Sun,
Yajie Bao,
Xiangyong Cao
Abstract:
Remote sensing image change captioning (RSICC) aims to automatically generate sentences that describe content differences in remote sensing bitemporal images. Recently, attention-based transformers have become a prevalent idea for capturing the features of global change. However, existing transformer-based RSICC methods face challenges, e.g., high parameters and high computational complexity cause…
▽ More
Remote sensing image change captioning (RSICC) aims to automatically generate sentences that describe content differences in remote sensing bitemporal images. Recently, attention-based transformers have become a prevalent idea for capturing the features of global change. However, existing transformer-based RSICC methods face challenges, e.g., high parameters and high computational complexity caused by the self-attention operation in the transformer encoder component. To alleviate these issues, this paper proposes a Sparse Focus Transformer (SFT) for the RSICC task. Specifically, the SFT network consists of three main components, i.e. a high-level features extractor based on a convolutional neural network (CNN), a sparse focus attention mechanism-based transformer encoder network designed to locate and capture changing regions in dual-temporal images, and a description decoder that embeds images and words to generate sentences for captioning differences. The proposed SFT network can reduce the parameter number and computational complexity by incorporating a sparse attention mechanism within the transformer encoder network. Experimental results on various datasets demonstrate that even with a reduction of over 90\% in parameters and computational complexity for the transformer encoder, our proposed network can still obtain competitive performance compared to other state-of-the-art RSICC methods. The code can be available at
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
On the weak reducing pairs in critical Heegaard splitting
Authors:
Dongqi Sun,
Qiang E
Abstract:
A weak reducing pair in a Heegaard splitting M = V \cup_S W is a pair of disjoint essential disks D \in V and E \in W. The weakly reducible Heegaard splitting contains at least one weak reducing pair. Critical Heegaard splitting is a special case of weakly reducible Heegaard splitting which contains at least two weak reducing pairs satisfying some special conditions. In this paper, we discuss the…
▽ More
A weak reducing pair in a Heegaard splitting M = V \cup_S W is a pair of disjoint essential disks D \in V and E \in W. The weakly reducible Heegaard splitting contains at least one weak reducing pair. Critical Heegaard splitting is a special case of weakly reducible Heegaard splitting which contains at least two weak reducing pairs satisfying some special conditions. In this paper, we discuss the properties of weak reducing pairs in a critical Heegaard splitting and give a necessary condition for Heegaard surface to be critical.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Cosmic Himalayas: The Highest Quasar Density Peak Identified in a 10,000 deg$^2$ Sky with Spatial Discrepancies between Galaxies, Quasars, and IGM HI
Authors:
Yongming Liang,
Masami Ouchi,
Dongsheng Sun,
Nobunari Kashikawa,
Zheng Cai,
Sebastiano Cantalupo,
Kentaro Nagamine,
Hidenobu Yajima,
Takanobu Kirihara,
Haibin Zhang,
Mingyu Li,
Rhythm Shimakawa,
Xiaohui Fan,
Kei Ito,
Masayuki Tanaka,
Yuichi Harikane,
J. Xavier Prochaska,
Andrea Travascio,
Weichen Wang,
Martin Elvis,
Giuseppina Fabbiano,
Junya Arita,
Masafusa Onoue,
John D. Silverman,
Dongdong Shi
, et al. (5 additional authors not shown)
Abstract:
We report the identification of a quasar overdensity in the BOSSJ0210 field, dubbed Cosmic Himalayas, consisting of 11 quasars at $z=2.16-2.20$, the densest overdensity of quasars ($17σ$) in the $\sim$10,000 deg$^2$ of the Sloan Digital Sky Survey. We present the spatial distributions of galaxies and quasars and an HI absorption map of the intergalactic medium (IGM). On the map of 465 galaxies sel…
▽ More
We report the identification of a quasar overdensity in the BOSSJ0210 field, dubbed Cosmic Himalayas, consisting of 11 quasars at $z=2.16-2.20$, the densest overdensity of quasars ($17σ$) in the $\sim$10,000 deg$^2$ of the Sloan Digital Sky Survey. We present the spatial distributions of galaxies and quasars and an HI absorption map of the intergalactic medium (IGM). On the map of 465 galaxies selected from the MAMMOTH-Subaru survey, we find two galaxy density peaks that do not fall on the quasar overdensity but instead exist at the northwest and southeast sides, approximately 25 $h^{-1}$ comoving-Mpc apart from the quasar overdensity. With a spatial resolution of 15 $h^{-1}$ comoving Mpc in projection, we produce a three-dimensional HI tomography map by the IGM Ly$α$ forest in the spectra of 23 SDSS/eBOSS quasars behind the quasar overdensity. Surprisingly, the quasar overdensity coincides with neither an absorption peak nor a transmission peak of IGM HI but lies near the border separating opaque and transparent volumes, with the more luminous quasars located in an environment with lesser IGM HI. Hence remarkably, the overdensity region traced by the 11 quasars, albeit all in coherently active states, has no clear coincidence with peaks of galaxies or HI absorption densities. Current physical scenarios with mixtures of HI overdensities and quasar photoionization cannot fully interpret the emergence of Cosmic Himalayas, suggesting this peculiar structure is an excellent laboratory to unveil the interplay between galaxies, quasars, and the IGM.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Self-generated magnetic field in three-dimensional ablative Rayleigh-Taylor instability
Authors:
Dehua Zhang,
Xian Jiang,
Tao Tao,
Jun Li,
Rui Yan,
De-Jun Sun,
Jian Zheng
Abstract:
The self-generated magnetic field in three-dimensional (3D) single-mode ablative Rayleigh-Taylor instabilities (ARTI) relevant to the acceleration phase of a direct-drive inertial confinement fusion (ICF) implosion is investigated. It is found that stronger magnetic fields up to a few thousands of T can be generated by 3D ARTI than by its two-dimensional (2D) counterpart. The Nernst effects signif…
▽ More
The self-generated magnetic field in three-dimensional (3D) single-mode ablative Rayleigh-Taylor instabilities (ARTI) relevant to the acceleration phase of a direct-drive inertial confinement fusion (ICF) implosion is investigated. It is found that stronger magnetic fields up to a few thousands of T can be generated by 3D ARTI than by its two-dimensional (2D) counterpart. The Nernst effects significantly alter the magnetic fields convection and amplify the magnetic fields. The scaling law for the magnetic flux obtained in the 2D simulations performs reasonably well in the 3D cases. While the magnetic field significantly accelerates the bubble growth in the short-wavelength 2D modes through modifying the heat fluxes, the magnetic field mostly accelerates the spike growth but has little influence on the bubble growth in 3D ARTI.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Human Behavior Modeling via Identification of Task Objective and Variability
Authors:
Sooyung Byeon,
Dawei Sun,
Inseok Hwang
Abstract:
Human behavior modeling is important for the design and implementation of human-automation interactive control systems. In this context, human behavior refers to a human's control input to systems. We propose a novel method for human behavior modeling that uses human demonstrations for a given task to infer the unknown task objective and the variability. The task objective represents the human's i…
▽ More
Human behavior modeling is important for the design and implementation of human-automation interactive control systems. In this context, human behavior refers to a human's control input to systems. We propose a novel method for human behavior modeling that uses human demonstrations for a given task to infer the unknown task objective and the variability. The task objective represents the human's intent or desire. It can be inferred by the inverse optimal control and improve the understanding of human behavior by providing an explainable objective function behind the given human behavior. Meanwhile, the variability denotes the intrinsic uncertainty in human behavior. It can be described by a Gaussian mixture model and capture the uncertainty in human behavior which cannot be encoded by the task objective. The proposed method can improve the prediction accuracy of human behavior by leveraging both task objective and variability. The proposed method is demonstrated through human-subject experiments using an illustrative quadrotor remote control example.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Probing the 3D Awareness of Visual Foundation Models
Authors:
Mohamed El Banani,
Amit Raj,
Kevis-Kokitsi Maninis,
Abhishek Kar,
Yuanzhen Li,
Michael Rubinstein,
Deqing Sun,
Leonidas Guibas,
Justin Johnson,
Varun Jampani
Abstract:
Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also repr…
▽ More
Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also represent their 3D structure? In this work, we analyze the 3D awareness of visual foundation models. We posit that 3D awareness implies that representations (1) encode the 3D structure of the scene and (2) consistently represent the surface across views. We conduct a series of experiments using task-specific probes and zero-shot inference procedures on frozen features. Our experiments reveal several limitations of the current models. Our code and analysis can be found at https://github.com/mbanani/probe3d.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Revealing mechanism of pore defect formation in laser directed energy deposition of aluminum alloy via in-situ synchrotron X-ray imaging
Authors:
Wei Liu,
Yuxiao Li,
Chunxia Yao,
Dongsheng Zhang,
Darui Sun,
Sen Chen,
Yu Wu,
Jun Wang,
Lei Lud,
Sheng-Nian Luo,
Ye Tao,
Bingbing Zhang
Abstract:
Laser metal additive manufacturing technology is capable of producing components with complex geometries and compositions that cannot be realized by conventional manufacturing methods. However, a large number of pores generated during the additive manufacturing process greatly affect the mechanical properties of the additively manufactured parts, and the mechanism of such pore generation has not b…
▽ More
Laser metal additive manufacturing technology is capable of producing components with complex geometries and compositions that cannot be realized by conventional manufacturing methods. However, a large number of pores generated during the additive manufacturing process greatly affect the mechanical properties of the additively manufactured parts, and the mechanism of such pore generation has not been revealed by direct observation clearly. Here, we report the mechanism of pore generation in the laser direct energy deposition process as revealed by {\it in-situ} high-speed high-resolution synchrotron X-ray imaging. We found that dissolution and re-precipitation of external gases and precipitation of metal vapors are the two main mechanisms of pore formation. We further explored the effects of different process parameters on the generation of pores and optimized the process to suppress pore generation. This work provides important insights into the formation of porosity defects during laser metal additive manufacturing, and can provide guidance for related process optimization.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Holographic supersymmetric Renyi entropies from hyperbolic black holes with scalar hair
Authors:
Jie Ren,
Dao-Quan Sun
Abstract:
We study holographic supersymmetric Renyi entropies from a family of hyperbolic black holes in an Einstein-Maxwell-dilaton (EMD) system under the BPS condition. We calculate the thermodynamic quantities of these hyperbolic black holes. We find a remarkably simple formula of the supersymmetric Renyi entropy that unifies (interpolates) 11 cases embeddable to 10 or 11 dimensional supergravity. It rep…
▽ More
We study holographic supersymmetric Renyi entropies from a family of hyperbolic black holes in an Einstein-Maxwell-dilaton (EMD) system under the BPS condition. We calculate the thermodynamic quantities of these hyperbolic black holes. We find a remarkably simple formula of the supersymmetric Renyi entropy that unifies (interpolates) 11 cases embeddable to 10 or 11 dimensional supergravity. It reproduces many known results in the literature, and gives new results with distinctive features. We show that the supersymmetric version of the modular entropy and the capacity of entanglement cannot be mapped to thermal quantities, due to the dependence of the temperature and the chemical potential by the BPS condition. We also calculate the entanglement spectrum. We derive the potential of the EMD system from a $V=0$ solution and obtain two neutral solutions with scalar hair as a byproduct.
△ Less
Submitted 14 June, 2024; v1 submitted 8 April, 2024;
originally announced April 2024.
-
Adapting LLMs for Efficient Context Processing through Soft Prompt Compression
Authors:
Cangqing Wang,
Yutian Yang,
Ruisi Li,
Dan Sun,
Ruicong Cai,
Yuzhu Zhang,
Chengqian Fu,
Lillian Floyd
Abstract:
The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context windo…
▽ More
The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context window sizes and the computational burdens entailed by their operations. This investigation presents an innovative framework that strategically tailors LLMs for streamlined context processing by harnessing the synergies among natural language summarization, soft prompt compression, and augmented utility preservation mechanisms. Our methodology, dubbed SoftPromptComp, amalgamates natural language prompts extracted from summarization methodologies with dynamically generated soft prompts to forge a concise yet semantically robust depiction of protracted contexts. This depiction undergoes further refinement via a weighting mechanism optimizing information retention and utility for subsequent tasks. We substantiate that our framework markedly diminishes computational overhead and enhances LLMs' efficacy across various benchmarks, while upholding or even augmenting the caliber of the produced content. By amalgamating soft prompt compression with sophisticated summarization, SoftPromptComp confronts the dual challenges of managing lengthy contexts and ensuring model scalability. Our findings point towards a propitious trajectory for augmenting LLMs' applicability and efficiency, rendering them more versatile and pragmatic for real-world applications. This research enriches the ongoing discourse on optimizing language models, providing insights into the potency of soft prompts and summarization techniques as pivotal instruments for the forthcoming generation of NLP solutions.
△ Less
Submitted 18 April, 2024; v1 submitted 7 April, 2024;
originally announced April 2024.
-
Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation
Authors:
Zhuoyuan Wang,
Dong Sun,
Xiangyun Zeng,
Ruodai Wu,
Yi Wang
Abstract:
The segmentation of organs in volumetric medical images plays an important role in computer-aided diagnosis and treatment/surgery planning. Conventional 2D convolutional neural networks (CNNs) can hardly exploit the spatial correlation of volumetric data. Current 3D CNNs have the advantage to extract more powerful volumetric representations but they usually suffer from occupying excessive memory a…
▽ More
The segmentation of organs in volumetric medical images plays an important role in computer-aided diagnosis and treatment/surgery planning. Conventional 2D convolutional neural networks (CNNs) can hardly exploit the spatial correlation of volumetric data. Current 3D CNNs have the advantage to extract more powerful volumetric representations but they usually suffer from occupying excessive memory and computation nevertheless. In this study we aim to enhance the 2D networks with contextual information for better volumetric image segmentation. Accordingly, we propose a contextual embedding learning approach to facilitate 2D CNNs capturing spatial information properly. Our approach leverages the learned embedding and the slice-wisely neighboring matching as a soft cue to guide the network. In such a way, the contextual information can be transferred slice-by-slice thus boosting the volumetric representation of the network. Experiments on challenging prostate MRI dataset (PROMISE12) and abdominal CT dataset (CHAOS) show that our contextual embedding learning can effectively leverage the inter-slice context and improve segmentation performance. The proposed approach is a plug-and-play, and memory-efficient solution to enhance the 2D networks for volumetric segmentation. Our code is publicly available at https://github.com/JuliusWang-7/CE_Block.
△ Less
Submitted 17 May, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Accelerating preconditioned ADMM via degenerate proximal point mappings
Authors:
Defeng Sun,
Yancheng Yuan,
Guojun Zhang,
Xinyuan Zhao
Abstract:
In this paper, we aim to accelerate a preconditioned alternating direction method of multipliers (pADMM), whose proximal terms are convex quadratic functions, for solving linearly constrained convex optimization problems. To achieve this, we first reformulate the pADMM into a form of proximal point method (PPM) with a positive semidefinite preconditioner which can be degenerate due to the lack of…
▽ More
In this paper, we aim to accelerate a preconditioned alternating direction method of multipliers (pADMM), whose proximal terms are convex quadratic functions, for solving linearly constrained convex optimization problems. To achieve this, we first reformulate the pADMM into a form of proximal point method (PPM) with a positive semidefinite preconditioner which can be degenerate due to the lack of strong convexity of the proximal terms in the pADMM. Then we accelerate the pADMM by accelerating the reformulated degenerate PPM (dPPM). Specifically, we first propose an accelerated dPPM by integrating the Halpern iteration and the fast Krasnosel'skiĭ-Mann iteration into it, achieving asymptotic $o(1/k)$ and non-asymptotic $O(1/k)$ convergence rates. Subsequently, building upon the accelerated dPPM, we develop an accelerated pADMM algorithm that exhibits both asymptotic $o(1/k)$ and non-asymptotic $O(1/k)$ nonergodic convergence rates concerning the Karush-Kuhn-Tucker residual and the primal objective function value gap. Preliminary numerical experiments validate the theoretical findings, demonstrating that the accelerated pADMM outperforms the pADMM in solving convex quadratic programming problems.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Efficient Online Prediction for High-Dimensional Time Series via Joint Tensor Tucker Decomposition
Authors:
Zhenting Luan,
Defeng Sun,
Haoning Wang,
Liping Zhang
Abstract:
Real-time prediction plays a vital role in various control systems, such as traffic congestion control and wireless channel resource allocation. In these scenarios, the predictor usually needs to track the evolution of the latent statistical patterns in the modern high-dimensional streaming time series continuously and quickly, which presents new challenges for traditional prediction methods. This…
▽ More
Real-time prediction plays a vital role in various control systems, such as traffic congestion control and wireless channel resource allocation. In these scenarios, the predictor usually needs to track the evolution of the latent statistical patterns in the modern high-dimensional streaming time series continuously and quickly, which presents new challenges for traditional prediction methods. This paper is the first to propose a novel online algorithm (TOPA) based on tensor factorization to predict streaming tensor time series. The proposed algorithm TOPA updates the predictor in a low-complexity online manner to adapt to the time-evolving data. Additionally, an automatically adaptive version of the algorithm (TOPA-AAW) is presented to mitigate the negative impact of stale data. Simulation results demonstrate that our proposed methods achieve prediction accuracy similar to that of conventional offline tensor prediction methods, while being much faster than them during long-term online prediction. Therefore, TOPA-AAW is an effective and efficient solution method for the online prediction of streaming tensor time series.
△ Less
Submitted 13 August, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Unsupervised Feature Selection via Nonnegative Orthogonal Constrained Regularized Minimization
Authors:
Yan Li,
Defeng Sun,
Liping Zhang
Abstract:
Unsupervised feature selection has drawn wide attention in the era of big data since it is a primary technique for dimensionality reduction. However, many existing unsupervised feature selection models and solution methods were presented for the purpose of application, and lack of theoretical support, e.g., without convergence analysis. In this paper, we first establish a novel unsupervised featur…
▽ More
Unsupervised feature selection has drawn wide attention in the era of big data since it is a primary technique for dimensionality reduction. However, many existing unsupervised feature selection models and solution methods were presented for the purpose of application, and lack of theoretical support, e.g., without convergence analysis. In this paper, we first establish a novel unsupervised feature selection model based on regularized minimization with nonnegative orthogonal constraints, which has advantages of embedding feature selection into the nonnegative spectral clustering and preventing overfitting. An effective inexact augmented Lagrangian multiplier method is proposed to solve our model, which adopts the proximal alternating minimization method to solve subproblem at each iteration. We show that the sequence generated by our method globally converges to a Karush-Kuhn-Tucker point of our model. Extensive numerical experiments on popular datasets demonstrate the stability and robustness of our method. Moreover, comparison results of algorithm performance show that our method outperforms some existing state-of-the-art methods.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Low-rank quaternion tensor completion for color video inpainting via a novel factorization strategy
Authors:
Zhenzhi Qin,
Zhenyu Ming,
Defeng Sun,
Liping Zhang
Abstract:
Recently, a quaternion tensor product named Qt-product was proposed, and then the singular value decomposition and the rank of a third-order quaternion tensor were given. From a more applicable perspective, we extend the Qt-product and propose a novel multiplication principle for third-order quaternion tensor named gQt-product. With the gQt-product, we introduce a brand-new singular value decompos…
▽ More
Recently, a quaternion tensor product named Qt-product was proposed, and then the singular value decomposition and the rank of a third-order quaternion tensor were given. From a more applicable perspective, we extend the Qt-product and propose a novel multiplication principle for third-order quaternion tensor named gQt-product. With the gQt-product, we introduce a brand-new singular value decomposition for third-order quaternion tensors named gQt-SVD and then define gQt-rank and multi-gQt-rank. We prove that the optimal low-rank approximation of a third-order quaternion tensor exists and some numerical experiments demonstrate the low-rankness of color videos. So, we apply the low-rank quaternion tensor completion to color video inpainting problems and present alternating least-square algorithms to solve the proposed low gQt-rank and multi-gQt-rank quaternion tensor completion models. The convergence analyses of the proposed algorithms are established and some numerical experiments on various color video datasets show the high recovery accuracy and computational efficiency of our methods.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
SoK: Comprehensive Analysis of Rug Pull Causes, Datasets, and Detection Tools in DeFi
Authors:
Dianxiang Sun,
Wei Ma,
Liming Nie,
Yang Liu
Abstract:
Rug pulls pose a grave threat to the cryptocurrency ecosystem, leading to substantial financial loss and undermining trust in decentralized finance (DeFi) projects. With the emergence of new rug pull patterns, research on rug pull is out of state. To fill this gap, we first conducted an extensive analysis of the literature review, encompassing both scholarly and industry sources. By examining exis…
▽ More
Rug pulls pose a grave threat to the cryptocurrency ecosystem, leading to substantial financial loss and undermining trust in decentralized finance (DeFi) projects. With the emergence of new rug pull patterns, research on rug pull is out of state. To fill this gap, we first conducted an extensive analysis of the literature review, encompassing both scholarly and industry sources. By examining existing academic articles and industrial discussions on rug pull projects, we present a taxonomy inclusive of 34 root causes, introducing six new categories inspired by industry sources: burn, hidden owner, ownership transfer, unverified contract, external call, and fake LP lock. Based on the developed taxonomy, we evaluated current rug pull datasets and explored the effectiveness and limitations of existing detection mechanisms. Our evaluation indicates that the existing datasets, which document 2,448 instances, address only 7 of the 34 root causes, amounting to a mere 20% coverage. It indicates that existing open-source datasets need to be improved to study rug pulls. In response, we have constructed a more comprehensive dataset containing 2,360 instances, expanding the coverage to 54% with the best effort. In addition, the examination of 14 detection tools showed that they can identify 25 of the 34 root causes, achieving a coverage of 73.5%. There are nine root causes (Fake LP Lock, Hidden Fee, and Destroy Token, Fake Money Transfer, Ownership Transfer, Liquidity Pool Block, Freeze Account, Wash-Trading, Hedge) that the existing tools cannot cover. Our work indicates that there is a significant gap between current research and detection tools, and the actual situation of rug pulls.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Protected group bias and stereotypes in Large Language Models
Authors:
Hadas Kotek,
David Q. Sun,
Zidi Xiu,
Margit Bowler,
Christopher Klein
Abstract:
As modern Large Language Models (LLMs) shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion,…
▽ More
As modern Large Language Models (LLMs) shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion, and race. Second, we have the model generate stories about individuals who hold different types of occupations. We collect >10k sentence completions made by a publicly available LLM, which we subject to human annotation. We find bias across minoritized groups, but in particular in the domains of gender and sexuality, as well as Western bias, in model generations. The model not only reflects societal biases, but appears to amplify them. The model is additionally overly cautious in replies to queries relating to minoritized groups, providing responses that strongly emphasize diversity and equity to an extent that other group characteristics are overshadowed. This suggests that artificially constraining potentially harmful outputs may itself lead to harm, and should be applied in a careful and controlled manner.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Air Traffic Management for Collaborative Routing of Unmanned Aerial Vehicles via Potential Fields
Authors:
Josue N. Rivera,
Dengfeng Sun
Abstract:
Aerial cargo transport is anticipated to play a pivotal role in the distribution of goods within urban environments. The shift is propelled by the surge in e-commerce, the imperative to deliver essential supplies to isolated areas, and the growing demand for expedited and more accessible deliveries. Our research introduces a quantifiable standard for defining routing restrictions for Unmanned Airc…
▽ More
Aerial cargo transport is anticipated to play a pivotal role in the distribution of goods within urban environments. The shift is propelled by the surge in e-commerce, the imperative to deliver essential supplies to isolated areas, and the growing demand for expedited and more accessible deliveries. Our research introduces a quantifiable standard for defining routing restrictions for Unmanned Aircraft System Traffic Management (UTM) using the concept of repulsive potential fields. Furthermore, we propose a scalable infrastructure that facilitates collaborative routing of cargo Unmanned Aerial Vehicles (UAVs) by independent shareholders. The practicality of the infrastructure is validated through a functional prototype implemented at a national scale.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Optimizing post-Newtonian parameters and fixing the BMS frame for numerical-relativity waveform hybridizations
Authors:
Dongze Sun,
Michael Boyle,
Keefe Mitman,
Mark A. Scheel,
Leo C. Stein,
Saul A. Teukolsky,
Vijay Varma
Abstract:
Numerical relativity (NR) simulations of binary black holes provide precise waveforms, but are typically too computationally expensive to produce waveforms with enough orbits to cover the whole frequency band of gravitational-wave observatories. Accordingly, it is important to be able to hybridize NR waveforms with analytic, post-Newtonian (PN) waveforms, which are accurate during the early inspir…
▽ More
Numerical relativity (NR) simulations of binary black holes provide precise waveforms, but are typically too computationally expensive to produce waveforms with enough orbits to cover the whole frequency band of gravitational-wave observatories. Accordingly, it is important to be able to hybridize NR waveforms with analytic, post-Newtonian (PN) waveforms, which are accurate during the early inspiral phase. We show that to build such hybrids, it is crucial to both fix the Bondi-Metzner-Sachs (BMS) frame of the NR waveforms to match that of PN theory, and optimize over the PN parameters. We test such a hybridization procedure including all spin-weighted spherical harmonic modes with $|m|\leq \ell$ for $\ell\leq 8$, using 29 NR waveforms with mass ratios $q\leq 10$ and spin magnitudes $|χ_1|, |χ_2|\leq 0.8$. We find that for spin-aligned systems, the PN and NR waveforms agree very well. The difference is limited by the small nonzero orbital eccentricity of the NR waveforms, or equivalently by the lack of eccentric terms in the PN waveforms. To maintain full accuracy of the simulations, the matching window for spin-aligned systems should be at least 5 orbits long and end at least 15 orbits before merger. For precessing systems, the errors are larger than for spin-aligned cases. The errors are likely limited by the absence of precession-related spin-spin PN terms. Using $10^5\,M$ long NR waveforms, we find that there is no optimal choice of the matching window within this time span, because the hybridization result for precessing cases is always better if using earlier or longer matching windows. We provide the mean orbital frequency of the smallest acceptable matching window as a function of the target error between the PN and NR waveforms and the black hole spins.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
3D Printed Waveguide for Augmented Reality
Authors:
Dechuan Sun,
Gregory Tanyi,
Alan Lee,
Chris French,
Younger Liang,
Christina Lim,
Ranjith R Unnithan
Abstract:
Mass production of augmented reality (AR) waveguides has been challenging due to the intricate nature of the fabrication technique and the high precision required for its optical characteristics. In this paper, we have presented a novel and low-cost approach for fabricating geometric optical waveguides designed for AR applications utilizing 3D printing techniques. To strike a balance between optic…
▽ More
Mass production of augmented reality (AR) waveguides has been challenging due to the intricate nature of the fabrication technique and the high precision required for its optical characteristics. In this paper, we have presented a novel and low-cost approach for fabricating geometric optical waveguides designed for AR applications utilizing 3D printing techniques. To strike a balance between optical performance and fabrication feasibility, we have optimized the conventional geometric waveguide design to facilitate easier fabrication. It is worth noting that our proposed method does not require molding, dicing, and post-surface polishing after printing. A prototype based on this method has been successfully fabricated, showing the immersion between the virtual image and the real-world scene. The proposed method has great potential for adaptation to mass production in various AR applications.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
SD-SLAM: A Semantic SLAM Approach for Dynamic Scenes Based on LiDAR Point Clouds
Authors:
Feiya Li,
Chunyun Fu,
Dongye Sun,
Jian Li,
Jianwen Wang
Abstract:
Point cloud maps generated via LiDAR sensors using extensive remotely sensed data are commonly used by autonomous vehicles and robots for localization and navigation. However, dynamic objects contained in point cloud maps not only downgrade localization accuracy and navigation performance but also jeopardize the map quality. In response to this challenge, we propose in this paper a novel semantic…
▽ More
Point cloud maps generated via LiDAR sensors using extensive remotely sensed data are commonly used by autonomous vehicles and robots for localization and navigation. However, dynamic objects contained in point cloud maps not only downgrade localization accuracy and navigation performance but also jeopardize the map quality. In response to this challenge, we propose in this paper a novel semantic SLAM approach for dynamic scenes based on LiDAR point clouds, referred to as SD-SLAM hereafter. The main contributions of this work are in three aspects: 1) introducing a semantic SLAM framework dedicatedly for dynamic scenes based on LiDAR point clouds, 2) Employing semantics and Kalman filtering to effectively differentiate between dynamic and semi-static landmarks, and 3) Making full use of semi-static and pure static landmarks with semantic information in the SD-SLAM process to improve localization and mapping performance. To evaluate the proposed SD-SLAM, tests were conducted using the widely adopted KITTI odometry dataset. Results demonstrate that the proposed SD-SLAM effectively mitigates the adverse effects of dynamic objects on SLAM, improving vehicle localization and mapping performance in dynamic scenes, and simultaneously constructing a static semantic map with multiple semantic classes for enhanced environment understanding.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
Combinatorial split-ring and spiral meta-resonator for efficient magnon-photon coupling
Authors:
Yuzan Xiong,
Andrew Christy,
Yun Dong,
Andrew Comstock,
Dali Sun,
Yi Li,
James F. Cahoon,
Binbin Yang,
Wei Zhang
Abstract:
Developing hybrid materials and structures for electromagnetic wave engineering has been a promising route towards novel functionalities and tunabilities in many modern applications and perspectives in new quantum technologies. Despite its established success in engineering optical light and terahertz waves, the implementation of meta-resonators operating at the microwave band is still emerging, e…
▽ More
Developing hybrid materials and structures for electromagnetic wave engineering has been a promising route towards novel functionalities and tunabilities in many modern applications and perspectives in new quantum technologies. Despite its established success in engineering optical light and terahertz waves, the implementation of meta-resonators operating at the microwave band is still emerging, especially those that allow for on-chip integration and size miniaturization, which has turned out crucial to developing hybrid quantum systems at the microwave band. In this work, we present a microwave meta-resonator consisting of split-ring and and spiral resonators, and implement it to the investigation of photon-magnon coupling for hybrid magnonic applications. We observe broadened bandwidth to the split ring modes augmented by the additional spiral resonator, and, by coupling the modes to a magnetic sample, the resultant photon-magnon coupling can be significantly enhanced to more than ten-fold. Our work suggests that combinatorial, hybrid microwave resonators may be a promising approach towards future development and implementation of photon-magnon coupling in hybrid magnonic systems.
△ Less
Submitted 14 March, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
ASGNet: Adaptive Semantic Gate Networks for Log-Based Anomaly Diagnosis
Authors:
Haitian Yang,
Degang Sun,
Wen Liu,
Yanshu Li,
Yan Wang,
Weiqing Huang
Abstract:
Logs are widely used in the development and maintenance of software systems. Logs can help engineers understand the runtime behavior of systems and diagnose system failures. For anomaly diagnosis, existing methods generally use log event data extracted from historical logs to build diagnostic models. However, we find that existing methods do not make full use of two types of features, (1) statisti…
▽ More
Logs are widely used in the development and maintenance of software systems. Logs can help engineers understand the runtime behavior of systems and diagnose system failures. For anomaly diagnosis, existing methods generally use log event data extracted from historical logs to build diagnostic models. However, we find that existing methods do not make full use of two types of features, (1) statistical features: some inherent statistical features in log data, such as word frequency and abnormal label distribution, are not well exploited. Compared with log raw data, statistical features are deterministic and naturally compatible with corresponding tasks. (2) semantic features: Logs contain the execution logic behind software systems, thus log statements share deep semantic relationships. How to effectively combine statistical features and semantic features in log data to improve the performance of log anomaly diagnosis is the key point of this paper. In this paper, we propose an adaptive semantic gate networks (ASGNet) that combines statistical features and semantic features to selectively use statistical features to consolidate log text semantic representation. Specifically, ASGNet encodes statistical features via a variational encoding module and fuses useful information through a well-designed adaptive semantic threshold mechanism. The threshold mechanism introduces the information flow into the classifier based on the confidence of the semantic features in the decision, which is conducive to training a robust classifier and can solve the overfitting problem caused by the use of statistical features. The experimental results on the real data set show that our method proposed is superior to all baseline methods in terms of various performance indicators.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
LiGNN: Graph Neural Networks at LinkedIn
Authors:
Fedor Borisyuk,
Shihai He,
Yunbo Ouyang,
Morteza Ramezani,
Peng Du,
Xiaochen Hou,
Chengming Jiang,
Nitin Pasumarthy,
Priya Bannur,
Birjodh Tiwana,
Ping Liu,
Siddharth Dangi,
Daqi Sun,
Zhoutao Pei,
Xiao Shi,
Sirou Zhu,
Qianqi Shen,
Kuang-Hsuan Lee,
David Stein,
Baolei Li,
Haichao Wei,
Amol Ghoting,
Souvik Ghosh
Abstract:
In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd…
▽ More
In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embeddings and multi-hop neighbor sampling. We explain how we built and sped up by 7x our large-scale training on LinkedIn graphs with adaptive sampling of neighbors, grouping and slicing of training data batches, specialized shared-memory queue and local gradient optimization. We summarize our deployment lessons and learnings gathered from A/B test experiments. The techniques presented in this work have contributed to an approximate relative improvements of 1% of Job application hearing back rate, 2% Ads CTR lift, 0.5% of Feed engaged daily active users, 0.2% session lift and 0.1% weekly active user lift from people recommendation. We believe that this work can provide practical solutions and insights for engineers who are interested in applying Graph neural networks at large scale.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.