Search | arXiv e-print repository

UELLM: A Unified and Efficient Approach for LLM Inference Serving

Authors: Yiyuan He, Minxian Xu, Jingfeng Wu, Wanyi Zheng, Kejiang Ye, Chengzhong Xu

Abstract: In the context of Machine Learning as a Service (MLaaS) clouds, the extensive use of Large Language Models (LLMs) often requires efficient management of significant query loads. When providing real-time inference services, several challenges arise. Firstly, increasing the number of GPUs may lead to a decrease in inference speed due to heightened communication overhead, while an inadequate number o… ▽ More In the context of Machine Learning as a Service (MLaaS) clouds, the extensive use of Large Language Models (LLMs) often requires efficient management of significant query loads. When providing real-time inference services, several challenges arise. Firstly, increasing the number of GPUs may lead to a decrease in inference speed due to heightened communication overhead, while an inadequate number of GPUs can lead to out-of-memory errors. Secondly, different deployment strategies need to be evaluated to guarantee optimal utilization and minimal inference latency. Lastly, inefficient orchestration of inference queries can easily lead to significant Service Level Objective (SLO) violations. Lastly, inefficient orchestration of inference queries can easily lead to significant Service Level Objective (SLO) violations. To address these challenges, we propose a Unified and Efficient approach for Large Language Model inference serving (UELLM), which consists of three main components: 1) resource profiler, 2) batch scheduler, and 3) LLM deployer. UELLM minimizes resource overhead, reduces inference latency, and lowers SLO violation rates. Compared with state-of-the-art (SOTA) techniques, UELLM reduces the inference latency by 72.3% to 90.3%, enhances GPU utilization by 1.2X to 4.1X, and increases throughput by 1.92X to 4.98X, it can also serve without violating the inference latency SLO. △ Less

Submitted 23 September, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

Comments: 15 pages, 5 figures, ICSOC 2024

arXiv:2409.14953 [pdf, other]

MSARS: A Meta-Learning and Reinforcement Learning Framework for SLO Resource Allocation and Adaptive Scaling for Microservices

Authors: Kan Hu, Linfeng Wen, Minxian Xu, Kejiang Ye

Abstract: Service Level Objectives (SLOs) aim to set threshold for service time in cloud services to ensure acceptable quality of service (QoS) and user satisfaction. Currently, many studies consider SLOs as a system resource to be allocated, ensuring QoS meets the SLOs. Existing microservice auto-scaling frameworks that rely on SLO resources often utilize complex and computationally intensive models, requi… ▽ More Service Level Objectives (SLOs) aim to set threshold for service time in cloud services to ensure acceptable quality of service (QoS) and user satisfaction. Currently, many studies consider SLOs as a system resource to be allocated, ensuring QoS meets the SLOs. Existing microservice auto-scaling frameworks that rely on SLO resources often utilize complex and computationally intensive models, requiring significant time and resources to determine appropriate resource allocation. This paper aims to rapidly allocate SLO resources and minimize resource costs while ensuring application QoS meets the SLO requirements in a dynamically changing microservice environment. We propose MSARS, a framework that leverages meta-learning to quickly derive SLO resource allocation strategies and employs reinforcement learning for adaptive scaling of microservice resources. It features three innovative components: First, MSARS uses graph convolutional networks to predict the most suitable SLO resource allocation scheme for the current environment. Second, MSARS utilizes meta-learning to enable the graph neural network to quickly adapt to environmental changes ensuring adaptability in highly dynamic microservice environments. Third, MSARS generates auto-scaling policies for each microservice based on an improved Twin Delayed Deep Deterministic Policy Gradient (TD3) model. The adaptive auto-scaling policy integrates the SLO resource allocation strategy into the scheduling algorithm to satisfy SLOs. Finally, we compare MSARS with state-of-the-art resource auto-scaling algorithms that utilize neural networks and reinforcement learning, MSARS takes 40% less time to adapt to new environments, 38% reduction of SLO violations, and 8% less resources cost. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: 10 pages, 6 figures, IEEE ISPA 2024

arXiv:2409.14434 [pdf, ps, other]

The sparseness of g-convex functions

Authors: Yu Wang, Ke Ye

Abstract: The g-convexity of functions on manifolds is a generalization of the convexity of functions on Rn. It plays an essential role in both differential geometry and non-convex optimization theory. This paper is concerned with g-convex smooth functions on manifolds. We establish criteria for the existence of a Riemannian metric (or connection) with respect to which a given function is g-convex. Using th… ▽ More The g-convexity of functions on manifolds is a generalization of the convexity of functions on Rn. It plays an essential role in both differential geometry and non-convex optimization theory. This paper is concerned with g-convex smooth functions on manifolds. We establish criteria for the existence of a Riemannian metric (or connection) with respect to which a given function is g-convex. Using these criteria, we obtain three sparseness results for g-convex functions: (1) The set of g-convex functions on a compact manifold is nowhere dense in the space of smooth functions. (2) Most polynomials on Rn that is g-convex with respect to some geodesically complete connection has at most one critical point. (3) The density of g-convex univariate (resp. quadratic, monomial, additively separable) polynomials asymptotically decreases to zero △ Less

Submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.14121 [pdf, other]

CONGRA: Benchmarking Automatic Conflict Resolution

Authors: Qingyu Zhang, Liangcai Su, Kai Ye, Chenxiong Qian

Abstract: Resolving conflicts from merging different software versions is a challenging task. To reduce the overhead of manual merging, researchers develop various program analysis-based tools which only solve specific types of conflicts and have a limited scope of application. With the development of language models, researchers treat conflict code as text, which theoretically allows for addressing almost… ▽ More Resolving conflicts from merging different software versions is a challenging task. To reduce the overhead of manual merging, researchers develop various program analysis-based tools which only solve specific types of conflicts and have a limited scope of application. With the development of language models, researchers treat conflict code as text, which theoretically allows for addressing almost all types of conflicts. However, the absence of effective conflict difficulty grading methods hinders a comprehensive evaluation of large language models (LLMs), making it difficult to gain a deeper understanding of their limitations. Furthermore, there is a notable lack of large-scale open benchmarks for evaluating the performance of LLMs in automatic conflict resolution. To address these issues, we introduce ConGra, a CONflict-GRAded benchmarking scheme designed to evaluate the performance of software merging tools under varying complexity conflict scenarios. We propose a novel approach to classify conflicts based on code operations and use it to build a large-scale evaluation dataset based on 44,948 conflicts from 34 real-world projects. We evaluate state-of-the-art LLMs on conflict resolution tasks using this dataset. By employing the dataset, we assess the performance of multiple state-of-the-art LLMs and code LLMs, ultimately uncovering two counterintuitive yet insightful phenomena. ConGra will be released at https://github.com/HKU-System-Security-Lab/ConGra. △ Less

Submitted 21 September, 2024; originally announced September 2024.

ACM Class: D.2; D.3

arXiv:2409.10227 [pdf, other]

Programmable multifunctional integrated microwave photonic circuit on thin-film lithium niobate

Authors: Chuangchuang Wei, Hanke Feng, Kaixuan Ye, Maarten Eijkel, Yvan Klaver, Zhaoxi Chen, Akshay Keloth, Cheng Wang, David Marpaung

Abstract: Microwave photonics, with its advanced high-frequency signal processing capabilities, is expected to play a crucial role in next-generation wireless communications and radar systems. The realization of highly integrated, high-performance, and multifunctional microwave photonic links will pave the way for its widespread deployment in practical applications, which is a significant challenge. Here, l… ▽ More Microwave photonics, with its advanced high-frequency signal processing capabilities, is expected to play a crucial role in next-generation wireless communications and radar systems. The realization of highly integrated, high-performance, and multifunctional microwave photonic links will pave the way for its widespread deployment in practical applications, which is a significant challenge. Here, leveraging thin-film lithium niobate intensity modulator and programmable cascaded microring resonators, we demonstrate for the first time a tunable microwave photonic notch filter that simultaneously achieves high level of integration along with high dynamic range, high link gain, low noise figure, and ultra-high rejection ratio. Additionally, this programmable on-chip system is multifunctional, allowing for the dual-band notch filter and the suppression of the high-power interference signal. This work demonstrates the potential applications of the thin-film lithium niobate platform in the field of high-performance integrated microwave photonic filtering and signal processing, facilitating the advancement of microwave photonic system towards practical applications. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: 18 pages, 8 figures, 1 table

arXiv:2409.05093 [pdf, other]

CloudNativeSim: a toolkit for modeling and simulation of cloud-native applications

Authors: Jingfeng Wu, Minxian Xu, Yiyuan He, Kejiang Ye, Chengzhong Xu

Abstract: Cloud-native applications are increasingly becoming popular in modern software design. Employing a microservice-based architecture into these applications is a prevalent strategy that enhances system availability and flexibility. However, cloud-native applications also introduce new challenges, such as frequent inter-service communication and the complexity of managing heterogeneous codebases and… ▽ More Cloud-native applications are increasingly becoming popular in modern software design. Employing a microservice-based architecture into these applications is a prevalent strategy that enhances system availability and flexibility. However, cloud-native applications also introduce new challenges, such as frequent inter-service communication and the complexity of managing heterogeneous codebases and hardware, resulting in unpredictable complexity and dynamism. Furthermore, as applications scale, only limited research teams or enterprises possess the resources for large-scale deployment and testing, which impedes progress in the cloud-native domain. To address these challenges, we propose CloudNativeSim, a simulator for cloud-native applications with a microservice-based architecture. CloudNativeSim offers several key benefits: (i) comprehensive and dynamic modeling for cloud-native applications, (ii) an extended simulation framework with new policy interfaces for scheduling cloud-native applications, and (iii) support for customized application scenarios and user feedback based on Quality of Service (QoS) metrics. CloudNativeSim can be easily deployed on standard computers to manage a high volume of requests and services. Its performance was validated through a case study, demonstrating higher than 94.5% accuracy in terms of response time. The study further highlights the feasibility of CloudNativeSim by illustrating the effects of various scaling policies. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 24 pages

arXiv:2409.04034 [pdf, ps, other]

Stability of ranks under field extensions

Authors: Qiyuan Chen, Ke Ye

Abstract: This paper studies the stability of tensor ranks under field extensions. Our main contributions are fourfold: (1) We prove that the analytic rank is stable under field extensions. (2) We establish the equivalence between the partition rank vs. analytic rank conjecture and the stability conjecture for partition rank. We also prove that they are equivalent to other two important conjectures. (3) We… ▽ More This paper studies the stability of tensor ranks under field extensions. Our main contributions are fourfold: (1) We prove that the analytic rank is stable under field extensions. (2) We establish the equivalence between the partition rank vs. analytic rank conjecture and the stability conjecture for partition rank. We also prove that they are equivalent to other two important conjectures. (3) We resolve the Adiprasito-Kazhdan-Ziegler conjecture on the stability of the slice rank of linear subspaces under field extensions. (4) As an application of (1), we show that the geometric rank is equal to the analytic rank up to a constant factor. △ Less

Submitted 6 September, 2024; originally announced September 2024.

Comments: 18 pages

arXiv:2408.14180 [pdf, other]

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

Authors: Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji

Abstract: Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench,… ▽ More Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench, a comprehensive benchmark designed to automatically evaluate the quality of edited images produced by IIE models from multiple dimensions. I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions. It offers three distinctive characteristics: 1) Comprehensive Evaluation Dimensions: I2EBench comprises 16 evaluation dimensions that cover both high-level and low-level aspects, providing a comprehensive assessment of each IIE model. 2) Human Perception Alignment: To ensure the alignment of our benchmark with human perception, we conducted an extensive user study for each evaluation dimension. 3) Valuable Research Insights: By analyzing the advantages and disadvantages of existing IIE models across the 16 dimensions, we offer valuable research insights to guide future development in the field. We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models. The code, dataset and generated images from all IIE models are provided in github: https://github.com/cocoshe/I2EBench. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Tech report, 39 pages, 41 figures

arXiv:2408.07595 [pdf, other]

Progressive Radiance Distillation for Inverse Rendering with Gaussian Splatting

Authors: Keyang Ye, Qiming Hou, Kun Zhou

Abstract: We propose progressive radiance distillation, an inverse rendering method that combines physically-based rendering with Gaussian-based radiance field rendering using a distillation progress map. Taking multi-view images as input, our method starts from a pre-trained radiance field guidance, and distills physically-based light and material parameters from the radiance field using an image-fitting p… ▽ More We propose progressive radiance distillation, an inverse rendering method that combines physically-based rendering with Gaussian-based radiance field rendering using a distillation progress map. Taking multi-view images as input, our method starts from a pre-trained radiance field guidance, and distills physically-based light and material parameters from the radiance field using an image-fitting process. The distillation progress map is initialized to a small value, which favors radiance field rendering. During early iterations when fitted light and material parameters are far from convergence, the radiance field fallback ensures the sanity of image loss gradients and avoids local minima that attracts under-fit states. As fitted parameters converge, the physical model gradually takes over and the distillation progress increases correspondingly. In presence of light paths unmodeled by the physical model, the distillation progress never finishes on affected pixels and the learned radiance field stays in the final rendering. With this designed tolerance for physical model limitations, we prevent unmodeled color components from leaking into light and material parameters, alleviating relighting artifacts. Meanwhile, the remaining radiance field compensates for the limitations of the physical model, guaranteeing high-quality novel views synthesis. Experimental results demonstrate that our method significantly outperforms state-of-the-art techniques quality-wise in both novel view synthesis and relighting. The idea of progressive radiance distillation is not limited to Gaussian splatting. We show that it also has positive effects for prominently specular scenes when adapted to a mesh-based inverse rendering method. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.04453 [pdf, other]

Rational Curves on Real Classical Groups

Authors: Zijia Li, Ke Ye

Abstract: This paper is concerned with rational curves on real classical groups. Our contributions are three-fold: (i) We determine the structure of quadratic rational curves on real classical groups. As a consequence, we completely classify quadratic rational curves on $\mathrm{U}_n$, $\mathrm{O}_n(\mathbb{R})$, $\mathrm{O}_{n-1,1}(\mathbb{R})$ and $\mathrm{O}_{n-2,2}(\mathbb{R})$. (ii) We prove a decompos… ▽ More This paper is concerned with rational curves on real classical groups. Our contributions are three-fold: (i) We determine the structure of quadratic rational curves on real classical groups. As a consequence, we completely classify quadratic rational curves on $\mathrm{U}_n$, $\mathrm{O}_n(\mathbb{R})$, $\mathrm{O}_{n-1,1}(\mathbb{R})$ and $\mathrm{O}_{n-2,2}(\mathbb{R})$. (ii) We prove a decomposition theorem for rational curves on real classical groups, which can be regarded as a non-commutative generalization of the fundamental theorem of algebra and partial fraction decomposition. (iii) As an application of (i) and (ii), we generalize Kempe's Universality Theorem to rational curves on homogeneous spaces. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 50 pages

MSC Class: 14H45; 20G20; 26C15; 14L35; 14L30; 70B05

arXiv:2408.04102 [pdf, other]

ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling

Authors: William Yicheng Zhu, Keren Ye, Junjie Ke, Jiahui Yu, Leonidas Guibas, Peyman Milanfar, Feng Yang

Abstract: Recognizing and disentangling visual attributes from objects is a foundation to many computer vision applications. While large vision language representations like CLIP had largely resolved the task of zero-shot object recognition, zero-shot visual attribute recognition remains a challenge because CLIP's contrastively-learned vision-language representation cannot effectively capture object-attribu… ▽ More Recognizing and disentangling visual attributes from objects is a foundation to many computer vision applications. While large vision language representations like CLIP had largely resolved the task of zero-shot object recognition, zero-shot visual attribute recognition remains a challenge because CLIP's contrastively-learned vision-language representation cannot effectively capture object-attribute dependencies. In this paper, we target this weakness and propose a sentence generation-based retrieval formulation for attribute recognition that is novel in 1) explicitly modeling a to-be-measured and retrieved object-attribute relation as a conditional probability graph, which converts the recognition problem into a dependency-sensitive language-modeling problem, and 2) applying a large pretrained Vision-Language Model (VLM) on this reformulation and naturally distilling its knowledge of image-object-attribute relations to use towards attribute recognition. Specifically, for each attribute to be recognized on an image, we measure the visual-conditioned probability of generating a short sentence encoding the attribute's relation to objects on the image. Unlike contrastive retrieval, which measures likelihood by globally aligning elements of the sentence to the image, generative retrieval is sensitive to the order and dependency of objects and attributes in the sentence. We demonstrate through experiments that generative retrieval consistently outperforms contrastive retrieval on two visual reasoning datasets, Visual Attribute in the Wild (VAW), and our newly-proposed Visual Genome Attribute Ranking (VGARank). △ Less

Submitted 24 September, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: Accepted at ECCV 2024

arXiv:2407.21269 [pdf, other]

Atomic Structure of Self-Buffered BaZr(S,Se)$_3$ Epitaxial Thin Film Interfaces

Authors: Michael Xu, Kevin Ye, Ida Sadeghi, Rafael Jaramillo, James M. LeBeau

Abstract: Understanding and controlling the growth of chalcogenide perovskite thin films through interface design is important for tailoring film properties. Here, the film and interface structure of BaZr(S,Se)$_3$ thin films grown on LaAlO$_3$ by molecular beam epitaxy and post-growth anion exchange is resolved using aberration-corrected scanning transmission electron microscopy. Epitaxial films are achiev… ▽ More Understanding and controlling the growth of chalcogenide perovskite thin films through interface design is important for tailoring film properties. Here, the film and interface structure of BaZr(S,Se)$_3$ thin films grown on LaAlO$_3$ by molecular beam epitaxy and post-growth anion exchange is resolved using aberration-corrected scanning transmission electron microscopy. Epitaxial films are achieved from self-assembly of an interface ``buffer'' layer, which accommodates the large film/substrate lattice mismatch of nearly 40\% for the alloy film studied here. The self-assembled buffer layer, occurring for both the as-grown sulfide and post-selenization alloy films, is shown to have rock-salt-like atomic stacking akin to a Ruddlesden-Popper phase. Above this buffer, the film quickly transitions to the perovskite structure. Overall, these results provide insights into oxide-chalcogenide heteroepitaxial film growth, illustrating a process that yields relaxed, crystalline, epitaxial chalcogenide perovskite films that support ongoing studies of optoelectronic and device properties. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.21075 [pdf, other]

Apple Intelligence Foundation Language Models

Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19415 [pdf, other]

Start from Video-Music Retrieval: An Inter-Intra Modal Loss for Cross Modal Retrieval

Authors: Zeyu Chen, Pengfei Zhang, Kai Ye, Wei Dong, Xin Feng, Yana Zhang

Abstract: The burgeoning short video industry has accelerated the advancement of video-music retrieval technology, assisting content creators in selecting appropriate music for their videos. In self-supervised training for video-to-music retrieval, the video and music samples in the dataset are separated from the same video work, so they are all one-to-one matches. This does not match the real situation. In… ▽ More The burgeoning short video industry has accelerated the advancement of video-music retrieval technology, assisting content creators in selecting appropriate music for their videos. In self-supervised training for video-to-music retrieval, the video and music samples in the dataset are separated from the same video work, so they are all one-to-one matches. This does not match the real situation. In reality, a video can use different music as background music, and a music can be used as background music for different videos. Many videos and music that are not in a pair may be compatible, leading to false negative noise in the dataset. A novel inter-intra modal (II) loss is proposed as a solution. By reducing the variation of feature distribution within the two modalities before and after the encoder, II loss can reduce the model's overfitting to such noise without removing it in a costly and laborious way. The video-music retrieval framework, II-CLVM (Contrastive Learning for Video-Music Retrieval), incorporating the II Loss, achieves state-of-the-art performance on the YouTube8M dataset. The framework II-CLVTM shows better performance when retrieving music using multi-modal video information (such as text in videos). Experiments are designed to show that II loss can effectively alleviate the problem of false negative noise in retrieval tasks. Experiments also show that II loss improves various self-supervised and supervised uni-modal and cross-modal retrieval tasks, and can obtain good retrieval models with a small amount of training samples. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: 10 pages, 7 figures

ACM Class: I.2; I.4

arXiv:2407.13482 [pdf, ps, other]

Simple matrix models for the flag, Grassmann, and Stiefel manifolds

Authors: Lek-Heng Lim, Ke Ye

Abstract: We derive three families of orthogonally-equivariant matrix submanifold models for the Grassmann, flag, and Stiefel manifolds respectively. These families are exhaustive -- every orthogonally-equivariant submanifold model of the lowest dimension for any of these manifolds is necessarily a member of the respective family, with a small number of exceptions. They have several computationally desirabl… ▽ More We derive three families of orthogonally-equivariant matrix submanifold models for the Grassmann, flag, and Stiefel manifolds respectively. These families are exhaustive -- every orthogonally-equivariant submanifold model of the lowest dimension for any of these manifolds is necessarily a member of the respective family, with a small number of exceptions. They have several computationally desirable features. The orthogonal equivariance allows one to obtain, for various differential geometric objects and operations, closed-form analytic expressions that are readily computable with standard numerical linear algebra. The minimal dimension aspect translates directly to a speed advantage in computations. And having an exhaustive list of all possible matrix models permits one to identify the model with the lowest matrix condition number, which translates to an accuracy advantage in computations. As an interesting aside, we will see that the family of models for the Stiefel manifold is naturally parameterized by the Cartan manifold, i.e., the positive definite cone equipped with its natural Riemannian metric. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 17 pages

MSC Class: 14M15; 65J05; 90C48; 53Z30; 57S25; 22E70

arXiv:2407.12546 [pdf, ps, other]

Minimal equivariant embeddings of the Grassmannian and flag manifold

Authors: Lek-Heng Lim, Ke Ye

Abstract: We show that the flag manifold $\operatorname{Flag}(k_1,\dots, k_p, \mathbb{R}^n)$, with Grassmannian the special case $p=1$, has an $\operatorname{SO}_n(\mathbb{R})$-equivariant embedding in an Euclidean space of dimension $(n-1)(n+2)/2$, two orders of magnitude below the current best known result. We will show that the value $(n-1)(n+2)/2$ is the smallest possible and that any… ▽ More We show that the flag manifold $\operatorname{Flag}(k_1,\dots, k_p, \mathbb{R}^n)$, with Grassmannian the special case $p=1$, has an $\operatorname{SO}_n(\mathbb{R})$-equivariant embedding in an Euclidean space of dimension $(n-1)(n+2)/2$, two orders of magnitude below the current best known result. We will show that the value $(n-1)(n+2)/2$ is the smallest possible and that any $\operatorname{SO}_n(\mathbb{R})$-equivariant embedding of $\operatorname{Flag}(k_1,\dots, k_p, \mathbb{R}^n)$ in an ambient space of minimal dimension is equivariantly equivalent to the aforementioned one. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 11 pages

MSC Class: 14M15; 57R40; 57S25; 14R20; 22E46; 22E70

arXiv:2407.10173 [pdf, other]

StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications

Authors: Linfeng Wen, Minxian Xu, Sukhpal Singh Gill, Muhammad Hafizhuddin Hilman, Satish Narayana Srirama, Kejiang Ye, Chengzhong Xu

Abstract: Microservice architecture has transformed traditional monolithic applications into lightweight components. Scaling these lightweight microservices is more efficient than scaling servers. However, scaling microservices still faces the challenges resulted from the unexpected spikes or bursts of requests, which are difficult to detect and can degrade performance instantaneously. To address this chall… ▽ More Microservice architecture has transformed traditional monolithic applications into lightweight components. Scaling these lightweight microservices is more efficient than scaling servers. However, scaling microservices still faces the challenges resulted from the unexpected spikes or bursts of requests, which are difficult to detect and can degrade performance instantaneously. To address this challenge and ensure the performance of microservice-based applications, we propose a status-aware and elastic scaling framework called StatuScale, which is based on load status detector that can select appropriate elastic scaling strategies for differentiated resource scheduling in vertical scaling. Additionally, StatuScale employs a horizontal scaling controller that utilizes comprehensive evaluation and resource reduction to manage the number of replicas for each microservice. We also present a novel metric named correlation factor to evaluate the resource usage efficiency. Finally, we use Kubernetes, an open-source container orchestration and management platform, and realistic traces from Alibaba to validate our approach. The experimental results have demonstrated that the proposed framework can reduce the average response time in the Sock-Shop application by 8.59% to 12.34%, and in the Hotel-Reservation application by 7.30% to 11.97%, decrease service level objective violations, and offer better performance in resource usage compared to baselines. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 26 pages

Journal ref: ACM Transactions on Autonomous and Adaptive Systems, 2024

arXiv:2407.10169 [pdf, other]

DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-based Clusters

Authors: Haoyu Bai, Minxian Xu, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu

Abstract: Microservices have transformed monolithic applications into lightweight, self-contained, and isolated application components, establishing themselves as a dominant paradigm for application development and deployment in public clouds such as Google and Alibaba. Autoscaling emerges as an efficient strategy for managing resources allocated to microservices' replicas. However, the dynamic and intricat… ▽ More Microservices have transformed monolithic applications into lightweight, self-contained, and isolated application components, establishing themselves as a dominant paradigm for application development and deployment in public clouds such as Google and Alibaba. Autoscaling emerges as an efficient strategy for managing resources allocated to microservices' replicas. However, the dynamic and intricate dependencies within microservice chains present challenges to the effective management of scaled microservices. Additionally, the centralized autoscaling approach can encounter scalability issues, especially in the management of large-scale microservice-based clusters. To address these challenges and enhance scalability, we propose an innovative distributed resource provisioning approach for microservices based on the Twin Delayed Deep Deterministic Policy Gradient algorithm. This approach enables effective autoscaling decisions and decentralizes responsibilities from a central node to distributed nodes. Comparative results with state-of-the-art approaches, obtained from a realistic testbed and traces, indicate that our approach reduces the average response time by 15% and the number of failed requests by 24%, validating improved scalability as the number of requests increases. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 12 pages

Journal ref: IEEE Transactions on Service Computing, 2024

arXiv:2407.04053 [pdf, other]

Edge AI: A Taxonomy, Systematic Review and Future Directions

Authors: Sukhpal Singh Gill, Muhammed Golec, Jianmin Hu, Minxian Xu, Junhui Du, Huaming Wu, Guneet Kaur Walia, Subramaniam Subramanian Murugesan, Babar Ali, Mohit Kumar, Kejiang Ye, Prabal Verma, Surendra Kumar, Felix Cuadrado, Steve Uhlig

Abstract: Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge… ▽ More Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge AI. The goal of Edge AI is to optimize data processing efficiency and velocity while ensuring data confidentiality and integrity. Despite being a relatively new field of research, spanning from 2014 to the present, it has shown significant and rapid development over the last five years. In this article, we present a systematic literature review for Edge AI to discuss the existing research, recent advancements, and future research directions. We created a collaborative edge AI learning system for cloud and edge computing analysis, including an in-depth study of the architectures that facilitate this mechanism. The taxonomy for Edge AI facilitates the classification and configuration of Edge AI systems while also examining its potential influence across many fields through compassing infrastructure, cloud computing, fog computing, services, use cases, ML and deep learning, and resource management. This study highlights the significance of Edge AI in processing real-time data at the edge of the network. Additionally, it emphasizes the research challenges encountered by Edge AI systems, including constraints on resources, vulnerabilities to security threats, and problems with scalability. Finally, this study highlights the potential future research directions that aim to address the current limitations of Edge AI by providing innovative solutions. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: Preprint Version, 18 Figures

arXiv:2407.03765 [pdf, ps, other]

Design and Central Pattern Generator Control of a New Transformable Wheel-Legged Robot

Authors: Tyler Bishop, Keran Ye, Konstantinos Karydis

Abstract: This paper introduces a new wheel-legged robot and develops motion controllers based on central pattern generators (CPGs) for the robot to navigate over a range of terrains. A transformable leg-wheel design is considered and characterized in terms of key locomotion characteristics as a function of the design. Kinematic analysis is conducted based on a generalized four-bar mechanism driven by a coa… ▽ More This paper introduces a new wheel-legged robot and develops motion controllers based on central pattern generators (CPGs) for the robot to navigate over a range of terrains. A transformable leg-wheel design is considered and characterized in terms of key locomotion characteristics as a function of the design. Kinematic analysis is conducted based on a generalized four-bar mechanism driven by a coaxial hub arrangement. The analysis is used to inform the design of a central pattern generator to control the robot by mapping oscillator states to wheel-leg trajectories and implementing differential steering within the oscillator network. Three oscillator models are used as the basis of the CPGs, and their performance is compared over a range of inputs. The CPG-based controller is used to drive the developed robot prototype on level ground and over obstacles. Additional simulated tests are performed for uneven terrain negotiation and obstacle climbing. Results demonstrate the effectiveness of CPG control in transformable wheel-legged robots. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: ICRA 2024 in print

arXiv:2406.19377 [pdf, ps, other]

Grassmannian optimization is NP-hard

Authors: Zehua Lai, Lek-Heng Lim, Ke Ye

Abstract: We show that unconstrained quadratic optimization over a Grassmannian $\operatorname{Gr}(k,n)$ is NP-hard. Our results cover all scenarios: (i) when $k$ and $n$ are both allowed to grow; (ii) when $k$ is arbitrary but fixed; (iii) when $k$ is fixed at its lowest possible value $1$. We then deduce the NP-hardness of unconstrained cubic optimization over the Stiefel manifold $\operatorname{V}(k,n)$… ▽ More We show that unconstrained quadratic optimization over a Grassmannian $\operatorname{Gr}(k,n)$ is NP-hard. Our results cover all scenarios: (i) when $k$ and $n$ are both allowed to grow; (ii) when $k$ is arbitrary but fixed; (iii) when $k$ is fixed at its lowest possible value $1$. We then deduce the NP-hardness of unconstrained cubic optimization over the Stiefel manifold $\operatorname{V}(k,n)$ and the orthogonal group $\operatorname{O}(n)$. As an addendum we demonstrate the NP-hardness of unconstrained quadratic optimization over the Cartan manifold, i.e., the positive definite cone $\mathbb{S}^n_{\scriptscriptstyle++}$ regarded as a Riemannian manifold, another popular example in manifold optimization. We will also establish the nonexistence of $\mathrm{FPTAS}$ in all cases. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 19 pages

MSC Class: 03D15; 90C26; 90C23; 65K10; 68Q25; 90C60

arXiv:2406.17911 [pdf, other]

X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms

Authors: Kun Zhao, Chenghao Xiao, Chen Tang, Bohao Yang, Kai Ye, Noura Al Moubayed, Liang Zhan, Chenghua Lin

Abstract: Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This… ▽ More Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This has become an urgent problem for RRG due to the highly patternized nature of these reports. In this work, we un-intuitively approach this problem by proposing the Layman's RRG framework, a layman's terms-based dataset, evaluation and training framework that systematically improves RRG with day-to-day language. We first contribute the translated Layman's terms dataset. Building upon the dataset, we then propose a semantics-based evaluation method, which is proved to mitigate the inflated numbers of BLEU and provides fairer evaluation. Last, we show that training on the layman's terms dataset encourages models to focus on the semantics of the reports, as opposed to overfitting to learning the report templates. We reveal a promising scaling law between the number of training examples and semantics gain provided by our dataset, compared to the inverse pattern brought by the original formats. Our code is available at \url{https://github.com/hegehongcha/LaymanRRG}. △ Less

Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.11821 [pdf, ps, other]

Simple matrix expressions for the curvatures of Grassmannian

Authors: Zehua Lai, Lek-Heng Lim, Ke Ye

Abstract: We show that modeling a Grassmannian as symmetric orthogonal matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong\{Q \in \mathbb{R}^{n \times n} : Q^{\scriptscriptstyle\mathsf{T}} Q = I, \; Q^{\scriptscriptstyle\mathsf{T}} = Q,\; \operatorname{tr}(Q)=2k - n\}$ yields exceedingly simple matrix formulas for various curvatures and curvature-related quantities, both intrinsic and extrinsic. These include… ▽ More We show that modeling a Grassmannian as symmetric orthogonal matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong\{Q \in \mathbb{R}^{n \times n} : Q^{\scriptscriptstyle\mathsf{T}} Q = I, \; Q^{\scriptscriptstyle\mathsf{T}} = Q,\; \operatorname{tr}(Q)=2k - n\}$ yields exceedingly simple matrix formulas for various curvatures and curvature-related quantities, both intrinsic and extrinsic. These include Riemann, Ricci, Jacobi, sectional, scalar, mean, principal, and Gaussian curvatures; Schouten, Weyl, Cotton, Bach, Plebański, cocurvature, nonmetricity, and torsion tensors; first, second, and third fundamental forms; Gauss and Weingarten maps; and upper and lower delta invariants. We will derive explicit, simple expressions for the aforementioned quantities in terms of standard matrix operations that are stably computable with numerical linear algebra. Many of these aforementioned quantities have never before been presented for the Grassmannian. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 25 pages

MSC Class: 15A75; 14M15

arXiv:2406.02479 [pdf]

Applying Fine-Tuned LLMs for Reducing Data Needs in Load Profile Analysis

Authors: Yi Hu, Hyeonjin Kim, Kai Ye, Ning Lu

Abstract: This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate t… ▽ More This paper presents a novel method for utilizing fine-tuned Large Language Models (LLMs) to minimize data requirements in load profile analysis, demonstrated through the restoration of missing data in power system load profiles. A two-stage fine-tuning strategy is proposed to adapt a pre-trained LLMs, i.e., GPT-3.5, for missing data restoration tasks. Through empirical evaluation, we demonstrate the effectiveness of the fine-tuned model in accurately restoring missing data, achieving comparable performance to state-of-the-art specifically designed models such as BERT-PIN. Key findings include the importance of prompt engineering and the optimal utilization of fine-tuning samples, highlighting the efficiency of few-shot learning in transferring knowledge from general user cases to specific target users. Furthermore, the proposed approach demonstrates notable cost-effectiveness and time efficiency compared to training models from scratch, making it a practical solution for scenarios with limited data availability and computing resources. This research has significant potential for application to other power system load profile analysis tasks. Consequently, it advances the use of LLMs in power system analytics, offering promising implications for enhancing the resilience and efficiency of power distribution systems. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.20560 [pdf, other]

Collaborative Resource Management and Workloads Scheduling in Cloud-Assisted Mobile Edge Computing across Timescales

Authors: Lujie Tang, Minxian Xu, Chengzhong Xu, Kejiang Ye

Abstract: Due to the limited resource capacity of edge servers and the high purchase costs of edge resources, service providers are facing the new challenge of how to take full advantage of the constrained edge resources for Internet of Things (IoT) service hosting and task scheduling to maximize system performance. In this paper, we study the joint optimization problem on service placement, resource provis… ▽ More Due to the limited resource capacity of edge servers and the high purchase costs of edge resources, service providers are facing the new challenge of how to take full advantage of the constrained edge resources for Internet of Things (IoT) service hosting and task scheduling to maximize system performance. In this paper, we study the joint optimization problem on service placement, resource provisioning, and workloads scheduling under resource and budget constraints, which is formulated as a mixed integer non-linear programming problem. Given that the frequent service placement and resource provisioning will significantly increase system configuration costs and instability, we propose a two-timescale framework for resource management and workloads scheduling, named RMWS. RMWS consists of a Gibbs sampling algorithm and an alternating minimization algorithm to determine the service placement and resource provisioning on large timescales. And a sub-gradient descent method has been designed to solve the workload scheduling challenge on small timescales.We conduct comprehensive experiments under different parameter settings. The RMWS consistently ensures a minimum 10% performance enhancement compared to other algorithms, showcasing its superiority. Theoretical proofs are also provided accordingly. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 11 pages, 10 figures

Journal ref: IEEE ICWS 2024

arXiv:2405.17241 [pdf, other]

NeurTV: Total Variation on the Neural Domain

Authors: Yisi Luo, Xile Zhao, Kai Ye, Deyu Meng

Abstract: Recently, we have witnessed the success of total variation (TV) for many imaging applications. However, traditional TV is defined on the original pixel domain, which limits its potential. In this work, we suggest a new TV regularization defined on the neural domain. Concretely, the discrete data is continuously and implicitly represented by a deep neural network (DNN), and we use the derivatives o… ▽ More Recently, we have witnessed the success of total variation (TV) for many imaging applications. However, traditional TV is defined on the original pixel domain, which limits its potential. In this work, we suggest a new TV regularization defined on the neural domain. Concretely, the discrete data is continuously and implicitly represented by a deep neural network (DNN), and we use the derivatives of DNN outputs w.r.t. input coordinates to capture local correlations of data. As compared with classical TV on the original domain, the proposed TV on the neural domain (termed NeurTV) enjoys two advantages. First, NeurTV is not limited to meshgrid but is suitable for both meshgrid and non-meshgrid data. Second, NeurTV can more exactly capture local correlations across data for any direction and any order of derivatives attributed to the implicit and continuous nature of neural domain. We theoretically reinterpret NeurTV under the variational approximation framework, which allows us to build the connection between classical TV and NeurTV and inspires us to develop variants (e.g., NeurTV with arbitrary resolution and space-variant NeurTV). Extensive numerical experiments with meshgrid data (e.g., color and hyperspectral images) and non-meshgrid data (e.g., point clouds and spatial transcriptomics) showcase the effectiveness of the proposed methods. △ Less

Submitted 27 May, 2024; originally announced May 2024.

MSC Class: 94A08; 68U10; 68T45

arXiv:2405.14206 [pdf, other]

LG-VQ: Language-Guided Codebook Learning

Authors: Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo

Abstract: Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal per… ▽ More Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal performance when the codebook is applied to multi-modal downstream tasks (\emph{e.g.}, text-to-image, image captioning) due to the existence of modal gaps. In this paper, we propose a novel language-guided codebook learning framework, called LG-VQ, which aims to learn a codebook that can be aligned with the text to improve the performance of multi-modal downstream tasks. Specifically, we first introduce pre-trained text semantics as prior knowledge, then design two novel alignment modules (\emph{i.e.}, Semantic Alignment Module, and Relationship Alignment Module) to transfer such prior knowledge into codes for achieving codebook text alignment. In particular, our LG-VQ method is model-agnostic, which can be easily integrated into existing VQ models. Experimental results show that our method achieves superior performance on reconstruction and various multi-modal downstream tasks. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: None

arXiv:2405.13190 [pdf, other]

Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation

Authors: Haoteng Tang, Guodong Liu, Siyuan Dai, Kai Ye, Kun Zhao, Wenlu Wang, Carl Yang, Lifang He, Alex Leow, Paul Thompson, Heng Huang, Liang Zhan

Abstract: The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal fun… ▽ More The MRI-derived brain network serves as a pivotal instrument in elucidating both the structural and functional aspects of the brain, encompassing the ramifications of diseases and developmental processes. However, prevailing methodologies, often focusing on synchronous BOLD signals from functional MRI (fMRI), may not capture directional influences among brain regions and rarely tackle temporal functional dynamics. In this study, we first construct the brain-effective network via the dynamic causal model. Subsequently, we introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE). This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks via an ordinary differential equation (ODE) model, which characterizes spatial-temporal brain dynamics. Our framework is validated on several clinical phenotype prediction tasks using two independent publicly available datasets (HCP and OASIS). The experimental results clearly demonstrate the advantages of our model compared to several state-of-the-art methods. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.12635 [pdf, other]

TempoScale: A Cloud Workloads Prediction Approach Integrating Short-Term and Long-Term Information

Authors: Linfeng Wen, Minxian Xu, Adel N. Toosi, Kejiang Ye

Abstract: Cloud native solutions are widely applied in various fields, placing higher demands on the efficient management and utilization of resource platforms. To achieve the efficiency, load forecasting and elastic scaling have become crucial technologies for dynamically adjusting cloud resources to meet user demands and minimizing resource waste. However, existing prediction-based methods lack comprehens… ▽ More Cloud native solutions are widely applied in various fields, placing higher demands on the efficient management and utilization of resource platforms. To achieve the efficiency, load forecasting and elastic scaling have become crucial technologies for dynamically adjusting cloud resources to meet user demands and minimizing resource waste. However, existing prediction-based methods lack comprehensive analysis and integration of load characteristics across different time scales. For instance, long-term trend analysis helps reveal long-term changes in load and resource demand, thereby supporting proactive resource allocation over longer periods, while short-term volatility analysis can examine short-term fluctuations in load and resource demand, providing support for real-time scheduling and rapid response. In response to this, our research introduces TempoScale, which aims to enhance the comprehensive understanding of temporal variations in cloud workloads, enabling more intelligent and adaptive decision-making for elastic scaling. TempoScale utilizes the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise algorithm to decompose time-series load data into multiple Intrinsic Mode Functions (IMF) and a Residual Component (RC). First, we integrate the IMF, which represents both long-term trends and short-term fluctuations, into the time series prediction model to obtain intermediate results. Then, these intermediate results, along with the RC, are transferred into a fully connected layer to obtain the final result. Finally, this result is fed into the resource management system based on Kubernetes for resource scaling. Our proposed approach can reduce the Mean Square Error by 5.80% to 30.43% compared to the baselines, and reduce the average response time by 5.58% to 31.15%. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 11pages, 11 figures, 4 tables

Journal ref: In proceedings of IEEE CLOUD 2024

arXiv:2405.09554 [pdf, ps, other]

Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto Prior

Authors: Yongfeng Huang, Zhendong Chen, Kun Ye, Lang Zhou, Haixin Sun

Abstract: In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to mod… ▽ More In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots. △ Less

Submitted 17 May, 2024; v1 submitted 18 April, 2024; originally announced May 2024.

arXiv:2405.09470 [pdf, other]

Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer

Authors: Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

Abstract: In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of… ▽ More In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: Accepted to SecTL (AsiaCCS Workshop) 2024

arXiv:2405.05128 [pdf, ps, other]

Degree of the Grassmannian as an affine variety

Authors: Lek-Heng Lim, Ke Ye

Abstract: The degree of the Grassmannian with respect to the Plücker embedding is well-known. However, the Plücker embedding, while ubiquitous in pure mathematics, is almost never used in applied mathematics. In applied mathematics, the Grassmannian is usually embedded as projection matrices… ▽ More The degree of the Grassmannian with respect to the Plücker embedding is well-known. However, the Plücker embedding, while ubiquitous in pure mathematics, is almost never used in applied mathematics. In applied mathematics, the Grassmannian is usually embedded as projection matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong \{P \in \mathbb{R}^{n \times n} : P^{\scriptscriptstyle\mathsf{T}} = P = P^2,\; \operatorname{tr}(P) = k\}$ or as involution matrices $\operatorname{Gr}(k,\mathbb{R}^n) \cong \{X \in \mathbb{R}^{n \times n} : X^{\scriptscriptstyle\mathsf{T}} = X,\; X^2 = I,\; \operatorname{tr}(X)=2k - n\}$. We will determine an explicit expression for the degree of the Grassmannian with respect to these embeddings. In so doing, we resolved a conjecture of Devriendt--Friedman--Sturmfels about the degree $\operatorname{Gr}(2, \mathbb{R}^n)$ and in fact generalized it to $\operatorname{Gr}(k, \mathbb{R}^n)$. We also proved a set theoretic variant of another conjecture of Devriendt--Friedman--Sturmfels about the limit of $\operatorname{Gr}(k,\mathbb{R}^n)$ in the sense of Gröbner degneration. △ Less

Submitted 19 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 16 pages

MSC Class: 14E25; 14F45

arXiv:2404.18454 [pdf, other]

doi 10.1145/3641519.3657456

3D Gaussian Splatting with Deferred Reflection

Authors: Keyang Ye, Qiming Hou, Kun Zhou

Abstract: The advent of neural and Gaussian-based radiance field methods have achieved great success in the field of novel view synthesis. However, specular reflection remains non-trivial, as the high frequency radiance field is notoriously difficult to fit stably and accurately. We present a deferred shading method to effectively render specular reflection with Gaussian splatting. The key challenge comes f… ▽ More The advent of neural and Gaussian-based radiance field methods have achieved great success in the field of novel view synthesis. However, specular reflection remains non-trivial, as the high frequency radiance field is notoriously difficult to fit stably and accurately. We present a deferred shading method to effectively render specular reflection with Gaussian splatting. The key challenge comes from the environment map reflection model, which requires accurate surface normal while simultaneously bottlenecks normal estimation with discontinuous gradients. We leverage the per-pixel reflection gradients generated by deferred shading to bridge the optimization process of neighboring Gaussians, allowing nearly correct normal estimations to gradually propagate and eventually spread over all reflective objects. Our method significantly outperforms state-of-the-art techniques and concurrent work in synthesizing high-quality specular reflection effects, demonstrating a consistent improvement of peak signal-to-noise ratio (PSNR) for both synthetic and real-world scenes, while running at a frame rate almost identical to vanilla Gaussian splatting. △ Less

Submitted 4 June, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.10541 [pdf, other]

MPCOM: Robotic Data Gathering with Radio Mapping and Model Predictive Communication

Authors: Zhiyou Ji, Guoliang Li, Ruihua Han, Shuai Wang, Bing Bai, Wei Xu, Kejiang Ye, Chengzhong Xu

Abstract: Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guide… ▽ More Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guided model predictive communication (MPCOM), which navigates the robot with both grid and radio maps for shape-aware collision avoidance and communication-aware trajectory generation in a dynamic environment. The proposed MPCOM is able to trade off the time spent on reaching goal, avoiding collision, and improving communication. MPCOM captures high-order signal propagation characteristics using radio maps and incorporates the map-guided communication regularizer to the motion planning block. Experiments in IRSIM and CARLA simulators show that the proposed MPCOM outperforms other benchmarks in both LOS and NLOS cases. Real-world testing based on car-like robots is also provided to demonstrate the effectiveness of MPCOM in indoor environments. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: submit to IROS

arXiv:2404.08175 [pdf, ps, other]

A Novel Vision Transformer based Load Profile Analysis using Load Images as Inputs

Authors: Hyeonjin Kim, Yi Hu, Kai Ye, Ning Lu

Abstract: This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset,… ▽ More This paper introduces ViT4LPA, an innovative Vision Transformer (ViT) based approach for Load Profile Analysis (LPA). We transform time-series load profiles into load images. This allows us to leverage the ViT architecture, originally designed for image processing, as a pre-trained image encoder to uncover latent patterns within load data. ViT is pre-trained using an extensive load image dataset, comprising 1M load images derived from smart meter data collected over a two-year period from 2,000 residential users. The training methodology is self-supervised, masked image modeling, wherein masked load images are restored to reveal hidden relationships among image patches. The pre-trained ViT encoder is then applied to various downstream tasks, including the identification of electric vehicle (EV) charging loads and behind-the-meter solar photovoltaic (PV) systems and load disaggregation. Simulation results illustrate ViT4LPA's superior performance compared to existing neural network models in downstream tasks. Additionally, we conduct an in-depth analysis of the attention weights within the ViT4LPA model to gain insights into its information flow mechanisms. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2403.20031 [pdf, other]

A Unified Framework for Human-centric Point Cloud Video Understanding

Authors: Yiteng Xu, Kecheng Ye, Xiao Han, Yiming Ren, Xinge Zhu, Yuexin Ma

Abstract: Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds, further advancing downstream human-centric tasks and applications. Previous works usually focus on tackling one specific task and rely on huge labeled data, which has poor generalization capability. Considering that human has s… ▽ More Human-centric Point Cloud Video Understanding (PVU) is an emerging field focused on extracting and interpreting human-related features from sequences of human point clouds, further advancing downstream human-centric tasks and applications. Previous works usually focus on tackling one specific task and rely on huge labeled data, which has poor generalization capability. Considering that human has specific characteristics, including the structural semantics of human body and the dynamics of human motions, we propose a unified framework to make full use of the prior knowledge and explore the inherent features in the data itself for generalized human-centric point cloud video understanding. Extensive experiments demonstrate that our method achieves state-of-the-art performance on various human-related tasks, including action recognition and 3D pose estimation. All datasets and code will be released soon. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.09016 [pdf]

A Processing Route to Chalcogenide Perovskites Alloys with Tunable Band Gap via Anion Exchange

Authors: Kevin Ye, Ida Sadeghi, Michael Xu, Jack Van Sambeek, Tao Cai, Jessica Dong, Rishabh Kothari, James M. LeBeau, R. Jaramillo

Abstract: We demonstrate synthesis of BaZr(S,Se)3 chalcogenide perovskite alloys by selenization of BaZrS3 thin films. The anion-exchange process produces films with tunable composition and band gap without changing the orthorhombic perovskite crystal structure or the film microstructure. The direct band gap is tunable between 1.5 and 1.9 eV. The alloy films made in this way feature 100x stronger photocondu… ▽ More We demonstrate synthesis of BaZr(S,Se)3 chalcogenide perovskite alloys by selenization of BaZrS3 thin films. The anion-exchange process produces films with tunable composition and band gap without changing the orthorhombic perovskite crystal structure or the film microstructure. The direct band gap is tunable between 1.5 and 1.9 eV. The alloy films made in this way feature 100x stronger photoconductive response and a lower density of extended defects, compared to alloy films made by direct growth. The perovskite structure is stable in high-selenium-content thin films with and without epitaxy. The manufacturing-compatible process of selenization in H2Se gas may spur the development of chalcogenide perovskite solar cell technology. △ Less

Submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.08136 [pdf, other]

RoboCertProb: Property Specification for Probabilistic RoboChart Models

Authors: Kangfeng Ye, Jim Woodcock

Abstract: RoboChart is a core notation in the RoboStar framework which brings modern modelling and formal verification technologies into software engineering for robotics. It is a timed and probabilistic domain-specific language for robotics and provides a UML-like architectural and state machine modelling. This work presents RoboCertProb for specifying quantitative properties of probabilistic robotic syste… ▽ More RoboChart is a core notation in the RoboStar framework which brings modern modelling and formal verification technologies into software engineering for robotics. It is a timed and probabilistic domain-specific language for robotics and provides a UML-like architectural and state machine modelling. This work presents RoboCertProb for specifying quantitative properties of probabilistic robotic systems modelled in RoboChart. RoboCertProb's semantics is based on PCTL*. To interpret RoboCertProb over RoboChart models, we give a Markov semantics (DTMCs and MDPs) to RoboChart, derived from its existing transformation semantics to the PRISM language. In addition to property specification, RoboCertProb also entitles us to configure loose constants and unspecified functions and operations in RoboChart models. It allows us to set up environmental inputs to verify reactive probabilistic systems not directly supported in probabilistic model checkers like PRISM because they employ a closed-world assumption. We implement RoboCertProb in an accompanying tool of RoboChart, RoboTool, for specifying properties and automatically generating PRISM properties from them to formally verify RoboChart models using PRISM. We have used it to analyse the behaviour of software controllers for two real robots: an industrial painting robot and an agricultural robot for treating plants with UV lights. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: 24 pages, 10 figures, 4 tables, submitted to the International Journal on Software and Systems Modeling (SoSyM)

arXiv:2403.00169 [pdf, other]

Quantitative Assurance and Synthesis of Controllers from Activity Diagrams

Authors: Kangfeng Ye, Fang Yan, Simos Gerasimou

Abstract: Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties for probabilistic models. However, capturing such systems, writing corresponding properties, and verifying them require domain knowledge. This makes it not accessible for researchers and engineers who may not have the required knowledge. Previous studies have… ▽ More Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties for probabilistic models. However, capturing such systems, writing corresponding properties, and verifying them require domain knowledge. This makes it not accessible for researchers and engineers who may not have the required knowledge. Previous studies have extended UML activity diagrams (ADs), developed transformations, and implemented accompanying tools for automation. The research, however, is incomprehensive and not fully open, which makes it hard to be evaluated, extended, adapted, and accessed. In this paper, we propose a comprehensive verification framework for ADs, including a new profile for probability, time, and quality annotations, a semantics interpretation of ADs in three Markov models, and a set of transformation rules from activity diagrams to the PRISM language, supported by PRISM and Storm. Most importantly, we developed algorithms for transformation and implemented them in a tool, called QASCAD, using model-based techniques, for fully automated verification. We evaluated one case study where multiple robots are used for delivery in a hospital and further evaluated six other examples from the literature. With all these together, this work makes noteworthy contributions to the verification of ADs by improving evaluation, extensibility, adaptability, and accessibility. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 43 pages, 29 figures, 5 tables, submitted to Journal of Systems and Software (JSS)

ACM Class: D.2.4; F.3.1; F.3.2; F.4.3

arXiv:2402.18957 [pdf, other]

Vibrational properties differ between halide and chalcogenide perovskite semiconductors, and it matters for optoelectronic performance

Authors: K. Ye, M. Menahem, T. Salzillo, F. Knoop, B. Zhao, S. Niu, O. Hellman, J. Ravichandran, R. Jaramillo, O. Yaffe

Abstract: We report a comparative study of temperature-dependent photoluminescence and structural dynamics of two perovskite semiconductors, the chalcogenide BaZrS$_3$ (BZS) and the halide CsPbBr$_3$ (CPB). These materials have similar crystal structures and direct band gaps, but we find that they have quite distinct optoelectronic and vibrational properties. Both materials exhibit thermally-activated non-r… ▽ More We report a comparative study of temperature-dependent photoluminescence and structural dynamics of two perovskite semiconductors, the chalcogenide BaZrS$_3$ (BZS) and the halide CsPbBr$_3$ (CPB). These materials have similar crystal structures and direct band gaps, but we find that they have quite distinct optoelectronic and vibrational properties. Both materials exhibit thermally-activated non-radiative recombination, but the non-radiative recombination rate in BZS is between two and four orders of magnitude faster than in CPB. Raman spectroscopy reveals that the effects of phonon anharmonicity are far more pronounced in CPB than in BZS. Further, although both materials feature a large dielectric response due to low-energy polar optical phonons, the phonons in CPB are substantially lower in energy than in BZS. Our results suggest that electron-phonon coupling in BZS is more effective at non-radiative recombination than in CPB, and that BZS may also have a substantially higher concentration of non-radiative recombination centers than CPB. The low defect concentration in CPB may be related to the ease of lattice reconfiguration, typified by anharmonic bonding. It remains to be seen to what extent these differences are inherent to the chalcogenide and halide perovskites and to what extent they can be affected by materials processing; comparing BZS single-crystals and thin films provides reason for optimism. △ Less

Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Main text - 12 pages with 5 figures and 1 table. Supplemental text - 16 pages with 6 figures and 5 tables

arXiv:2402.14255 [pdf]

Observation of temporal topological boundary states of light in a momentum bandgap

Authors: Yudong Ren, Kangpeng Ye, Qiaolu Chen, Fujia Chen, Li Zhang, Yuang Pan, Wenhao Li, Xinrui Li, Lu Zhang, Hongsheng Chen, Yihao Yang

Abstract: Topological phases have prevailed across diverse disciplines, spanning electronics, photonics, and acoustics. Hitherto, the understanding of these phases has centred on energy (frequency) bandstructures, showcasing topological boundary states at spatial interfaces. Recent strides have uncovered a unique category of bandstructures characterized by gaps in momentum, referred to as momentum bandgaps… ▽ More Topological phases have prevailed across diverse disciplines, spanning electronics, photonics, and acoustics. Hitherto, the understanding of these phases has centred on energy (frequency) bandstructures, showcasing topological boundary states at spatial interfaces. Recent strides have uncovered a unique category of bandstructures characterized by gaps in momentum, referred to as momentum bandgaps or k gaps, notably driven by breakthroughs in photonic time crystals. This discovery hints at abundant topological phases defined within momentum bands, alongside a wealth of topological boundary states in the time domain. Here, we report the first experimental observation of k-gap topology in a large-scale optical temporal synthetic lattice, manifesting as temporal topological boundary states. These boundary states are uniquely situated at temporal interfaces between two subsystems with distinct k-gap topology. Counterintuitively, despite the exponential amplification of k-gap modes within both subsystems, these topological boundary states exhibit decay in both temporal directions. Our findings mark a significant pathway for delving into k gaps, temporal topological states, and time-varying physics. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.08917 [pdf, other]

An Interference-aware Approach for Co-located Container Orchestration with Novel Metric

Authors: Xiang Li, Linfeng Wen, Minxian Xu, Kejiang Ye

Abstract: Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance… ▽ More Container orchestration technologies are widely employed in cloud computing, facilitating the co-location of online and offline services on the same infrastructure. Online services demand rapid responsiveness and high availability, whereas offline services require extensive computational resources. However, this mixed deployment can lead to resource contention, adversely affecting the performance of online services, yet the metrics used by existing methods cannot accurately reflect the extent of interference. In this paper, we introduce scheduling latency as a novel metric for quantifying interference and compare it with existing metrics. Empirical evidence demonstrates that scheduling latency more accurately reflects the performance degradation of online services. We also utilize various machine learning techniques to predict potential interference on specific hosts for online services, providing reference information for subsequent scheduling decisions. Simultaneously, we propose a method for quantifying node interference based on scheduling latency. To enhance resource utilization, we train a model for online services that predicts CPU and MEM (memory) resource allocation based on workload type and QPS. Finally, we present a scheduling algorithm based on predictive modeling, aiming to reduce interference in online services while balancing node resource utilization. Through experiments and comparisons with three other baseline methods, we demonstrate the effectiveness of our approach. Compared with three baselines, our approach can reduce the average response time, 90th percentile response time, and 99th percentile response time of online services by 29.4%, 31.4%, and 14.5%, respectively. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 8 pages

Journal ref: In the Proceedings of IEEE SmartData 2023

arXiv:2402.04134 [pdf, other]

A quasi-optimal lower bound for skew polynomial multiplication

Authors: Qiyuan Chen, Ke Ye

Abstract: We establish a lower bound for the complexity of multiplying two skew polynomials. The lower bound coincides with the upper bound conjectured by Caruso and Borgne in 2017, up to a log factor. We present algorithms for three special cases, indicating that the aforementioned lower bound is quasi-optimal. In fact, our lower bound is quasi-optimal in the sense of bilinear complexity. In addition, we d… ▽ More We establish a lower bound for the complexity of multiplying two skew polynomials. The lower bound coincides with the upper bound conjectured by Caruso and Borgne in 2017, up to a log factor. We present algorithms for three special cases, indicating that the aforementioned lower bound is quasi-optimal. In fact, our lower bound is quasi-optimal in the sense of bilinear complexity. In addition, we discuss the average bilinear complexity of simultaneous multiplication of skew polynomials and the complexity of skew polynomial multiplication in the case of towers of extensions. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.03456 [pdf, other]

Constrained Multiview Representation for Self-supervised Contrastive Learning

Authors: Siyuan Dai, Kai Ye, Kun Zhao, Ge Cui, Haoteng Tang, Liang Zhan

Abstract: Representation learning constitutes a pivotal cornerstone in contemporary deep learning paradigms, offering a conduit to elucidate distinctive features within the latent space and interpret the deep models. Nevertheless, the inherent complexity of anatomical patterns and the random nature of lesion distribution in medical image segmentation pose significant challenges to the disentanglement of rep… ▽ More Representation learning constitutes a pivotal cornerstone in contemporary deep learning paradigms, offering a conduit to elucidate distinctive features within the latent space and interpret the deep models. Nevertheless, the inherent complexity of anatomical patterns and the random nature of lesion distribution in medical image segmentation pose significant challenges to the disentanglement of representations and the understanding of salient features. Methods guided by the maximization of mutual information, particularly within the framework of contrastive learning, have demonstrated remarkable success and superiority in decoupling densely intertwined representations. However, the effectiveness of contrastive learning highly depends on the quality of the positive and negative sample pairs, i.e. the unselected average mutual information among multi-views would obstruct the learning strategy so the selection of the views is vital. In this work, we introduce a novel approach predicated on representation distance-based mutual information (MI) maximization for measuring the significance of different views, aiming at conducting more efficient contrastive learning and representation disentanglement. Additionally, we introduce an MI re-ranking strategy for representation selection, benefiting both the continuous MI estimating and representation significance distance measuring. Specifically, we harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information across varying frequencies, thereby facilitating a multifaceted contrastive learning approach to bolster semantic comprehension. The statistical results under the five metrics demonstrate that our proposed framework proficiently constrains the MI maximization-driven representation selection and steers the multi-view contrastive learning process. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 11 pages, 9 figures, 2 algorithms

arXiv:2401.13160 [pdf, other]

SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

Authors: Ke Ye, Heinrich Jiang, Afshin Rostamizadeh, Ayan Chakrabarti, Giulia DeSalvo, Jean-François Kagy, Lazaros Karydas, Gui Citovsky, Sanjiv Kumar

Abstract: Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes th… ▽ More Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $τ$ iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: 9+13 pages, 5 figures

arXiv:2401.12651 [pdf, other]

Brillouin nonlinearity characterizations of a high refractive index silicon oxynitride platform

Authors: Kaixuan Ye, Akshay Keloth, Yvan Klaver, Alessio Baldazzi, Gioele Piccoli, Matteo Sanna, Lorenzo Pavesi, Mher Ghulinyan, David Marpaung

Abstract: Silicon oxynitride (SiON) is a low-loss and versatile material for linear and nonlinear photonics applications. Controlling the oxygen-to-nitrogen (O/N) ratio in SiON provides an effective way to engineer its optical and mechanical properties, making it a great platform for the investigation of on-chip optomechanical interactions, especially the stimulated Brillouin scattering (SBS). Here we repor… ▽ More Silicon oxynitride (SiON) is a low-loss and versatile material for linear and nonlinear photonics applications. Controlling the oxygen-to-nitrogen (O/N) ratio in SiON provides an effective way to engineer its optical and mechanical properties, making it a great platform for the investigation of on-chip optomechanical interactions, especially the stimulated Brillouin scattering (SBS). Here we report the Brillouin nonlinearity characterization of a SiON platform with a specific O/N ratio (characterized by a refractive index of $n=1.65$). First, we introduce this particular SiON platform with fabrication details. Subsequently, we discuss various techniques for the on-chip Brillouin nonlinearity characterizations. In particular, we focus on the intensity-modulated pump-probe lock-in amplifier technique, which enables ultra-sensitive characterization. Finally, we analyze the Brillouin nonlinearities of this SiON platform and compare them with other SiON platforms. This work underscores the potential of SiON for on-chip Brillouin-based applications. Moreover, it paves the way for Brillouin nonlinearity characterization across various material platforms. △ Less

Submitted 29 January, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.01484 [pdf, other]

Uncertainty Regularized Evidential Regression

Authors: Kai Ye, Tiejin Chen, Hua Wei, Liang Zhan

Abstract: The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory to predict a target and quantify the associated uncertainty. Guided by the underlying theory, specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance by limiting its ability to learn from all… ▽ More The Evidential Regression Network (ERN) represents a novel approach that integrates deep learning with Dempster-Shafer's theory to predict a target and quantify the associated uncertainty. Guided by the underlying theory, specific activation functions must be employed to enforce non-negative values, which is a constraint that compromises model performance by limiting its ability to learn from all samples. This paper provides a theoretical analysis of this limitation and introduces an improvement to overcome it. Initially, we define the region where the models can't effectively learn from the samples. Following this, we thoroughly analyze the ERN and investigate this constraint. Leveraging the insights from our analysis, we address the limitation by introducing a novel regularization term that empowers the ERN to learn from the whole training set. Our extensive experiments substantiate our theoretical findings and demonstrate the effectiveness of the proposed solution. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: Accepted to AAAI 2024 main track

arXiv:2312.13721 [pdf, ps, other]

Bundle-based similarity measurement for positive semidefinite matrices

Authors: Peng Liu, Ke Ye

Abstract: Positive semidefinite (PSD) matrices are indispensable in many fields of science. A similarity measurement for such matrices is usually an essential ingredient in the mathematical modelling of a scientific problem. This paper proposes a unified framework to construct similarity measurements for PSD matrices. The framework is obtained by exploring the fiber bundle structure of the cone of PSD matri… ▽ More Positive semidefinite (PSD) matrices are indispensable in many fields of science. A similarity measurement for such matrices is usually an essential ingredient in the mathematical modelling of a scientific problem. This paper proposes a unified framework to construct similarity measurements for PSD matrices. The framework is obtained by exploring the fiber bundle structure of the cone of PSD matrices and generalizing the idea of the point-set distance previously developed for linear subsapces and positive definite (PD) matrices. The framework demonstrates both theoretical advantages and computational convenience: (1) We prove that the similarity measurement constructed by the framework can be recognized either as the cost of a parallel transport or as the length of a quasi-geodesic curve. (2) We extend commonly used divergences for equidimensional PD matrices to the non-equidimensional case. Examples include Kullback-Leibler divergence, Bhattacharyya divergence and Rényi divergence. We prove that these extensions enjoy the same consistency property as their counterpart for geodesic distance. (3) We apply our geometric framework to further extend those in (2) to similarity measurements for arbitrary PSD matrices. We also provide simple formulae to compute these similarity measurements in most situations. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11595 [pdf, other]

SPIRE: Semantic Prompt-Driven Image Restoration

Authors: Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi

Abstract: Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPI… ▽ More Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of prompt information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects. △ Less

Submitted 16 July, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted by ECCV 2024; Webpage: https://chenyangqiqi.github.io/tip

Showing 1–50 of 186 results for author: Ye, K