Search | arXiv e-print repository

From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integrated into this learning format, resulting in a variety of educational AI applications such as educational recommendation and intelligent tutoring. The emergence of intelligence in large language models (LLMs) has allowed for these educational enhancements to be built upon a unified foundational model, enabling deeper integration. In this context, we propose MAIC (Massive AI-empowered Course), a new form of online education that leverages LLM-driven multi-agent systems to construct an AI-augmented classroom, balancing scalability with adaptivity. Beyond exploring the conceptual framework and technical innovations, we conduct preliminary experiments at Tsinghua University, one of China's leading universities. Drawing from over 100,000 learning records of more than 500 students, we obtain a series of valuable observations and initial analyses. This project will continue to evolve, ultimately aiming to establish a comprehensive open platform that supports and unifies research, technology, and applications in exploring the possibilities of online education in the era of large model AI. We envision this platform as a collaborative hub, bringing together educators, researchers, and innovators to collectively explore the future of AI-driven online education. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.03296 [pdf, other]

An Efficient Two-Dimensional Functional Mixed-Effect Model Framework for Repeatedly Measured Functional Data

Authors: Cheng Cao, Jiguo Cao, Hao Pan, Yunting Zhang, Fan Jiang, Xinyue Li

Abstract: With the rapid development of wearable device technologies, accelerometers can record minute-by-minute physical activity for consecutive days, which provides important insight into a dynamic association between the intensity of physical activity and mental health outcomes for large-scale population studies. Using Shanghai school adolescent cohort we estimate the effect of health assessment results… ▽ More With the rapid development of wearable device technologies, accelerometers can record minute-by-minute physical activity for consecutive days, which provides important insight into a dynamic association between the intensity of physical activity and mental health outcomes for large-scale population studies. Using Shanghai school adolescent cohort we estimate the effect of health assessment results on physical activity profiles recorded by accelerometers throughout a week, which is recognized as repeatedly measured functional data. To achieve this goal, we propose an innovative two-dimensional functional mixed-effect model (2dFMM) for the specialized data, which smoothly varies over longitudinal day observations with covariate-dependent mean and covariance functions. The modeling framework characterizes the longitudinal and functional structures while incorporating two-dimensional fixed effects for covariates of interest. We also develop a fast three-stage estimation procedure to provide accurate fixed-effect inference for model interpretability and improve computational efficiency when encountering large datasets. We find strong evidence of intraday and interday varying significant associations between physical activity and mental health assessments among our cohort population, which shed light on possible intervention strategies targeting daily physical activity patterns to improve school adolescent mental health. Our method is also used in environmental data to illustrate the wide applicability. Supplementary materials for this article are available online. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 50 pages, 8 figures in main, 6 figures in supp

arXiv:2409.03209 [pdf, other]

iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

Authors: Lin Sun, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang

Abstract: Stable diffusion has demonstrated strong image synthesis ability to given text descriptions, suggesting it to contain strong semantic clue for grouping objects. Inspired by this, researchers have explored employing stable diffusion for trainingfree segmentation. Most existing approaches either simply employ cross-attention map or refine it by self-attention map, to generate segmentation masks. We… ▽ More Stable diffusion has demonstrated strong image synthesis ability to given text descriptions, suggesting it to contain strong semantic clue for grouping objects. Inspired by this, researchers have explored employing stable diffusion for trainingfree segmentation. Most existing approaches either simply employ cross-attention map or refine it by self-attention map, to generate segmentation masks. We believe that iterative refinement with self-attention map would lead to better results. However, we mpirically demonstrate that such a refinement is sub-optimal likely due to the self-attention map containing irrelevant global information which hampers accurately refining cross-attention map with multiple iterations. To address this, we propose an iterative refinement framework for training-free segmentation, named iSeg, having an entropy-reduced self-attention module which utilizes a gradient descent scheme to reduce the entropy of self-attention map, thereby suppressing the weak responses corresponding to irrelevant global information. Leveraging the entropy-reduced self-attention module, our iSeg stably improves refined crossattention map with iterative refinement. Further, we design a category-enhanced cross-attention module to generate accurate cross-attention map, providing a better initial input for iterative refinement. Extensive experiments across different datasets and diverse segmentation tasks reveal the merits of proposed contributions, leading to promising performance on diverse segmentation tasks. For unsupervised semantic segmentation on Cityscapes, our iSeg achieves an absolute gain of 3.8% in terms of mIoU compared to the best existing training-free approach in literature. Moreover, our proposed iSeg can support segmentation with different kind of images and interactions. △ Less

Submitted 4 September, 2024; originally announced September 2024.

arXiv:2409.02025 [pdf, other]

Logarithmic regret in the ergodic Avellaneda-Stoikov market making model

Authors: Jialun Cao, David Šiška, Lukasz Szpruch, Tanut Treetanthiploet

Abstract: We analyse the regret arising from learning the price sensitivity parameter $κ$ of liquidity takers in the ergodic version of the Avellaneda-Stoikov market making model. We show that a learning algorithm based on a regularised maximum-likelihood estimator for the parameter achieves the regret upper bound of order $\ln^2 T$ in expectation. To obtain the result we need two key ingredients. The first… ▽ More We analyse the regret arising from learning the price sensitivity parameter $κ$ of liquidity takers in the ergodic version of the Avellaneda-Stoikov market making model. We show that a learning algorithm based on a regularised maximum-likelihood estimator for the parameter achieves the regret upper bound of order $\ln^2 T$ in expectation. To obtain the result we need two key ingredients. The first are tight upper bounds on the derivative of the ergodic constant in the Hamilton-Jacobi-Bellman (HJB) equation with respect to $κ$. The second is the learning rate of the maximum-likelihood estimator which is obtained from concentration inequalities for Bernoulli signals. Numerical experiment confirms the convergence and the robustness of the proposed algorithm. △ Less

Submitted 3 September, 2024; originally announced September 2024.

MSC Class: Primary 93E35; Secondary 93C40; 93C41; 93E20; 91G80

arXiv:2408.16467 [pdf, other]

Spiking Diffusion Models

Authors: Jiahang Cao, Hanzhong Guo, Ziqing Wang, Deming Zhou, Hao Cheng, Qiang Zhang, Renjing Xu

Abstract: Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional Artificial Neural Networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this paper, we propose the Spiking D… ▽ More Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional Artificial Neural Networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this paper, we propose the Spiking Diffusion Models (SDMs), an innovative family of SNN-based generative models that excel in producing high-quality samples with significantly reduced energy consumption. In particular, we propose a Temporal-wise Spiking Mechanism (TSM) that allows SNNs to capture more temporal features from a bio-plasticity perspective. In addition, we propose a threshold-guided strategy that can further improve the performances by up to 16.7% without any additional training. We also make the first attempt to use the ANN-SNN approach for SNN-based generation tasks. Extensive experimental results reveal that our approach not only exhibits comparable performance to its ANN counterpart with few spiking time steps, but also outperforms previous SNN-based generative models by a large margin. Moreover, we also demonstrate the high-quality generation ability of SDM on large-scale datasets, e.g., LSUN bedroom. This development marks a pivotal advancement in the capabilities of SNN-based generation, paving the way for future research avenues to realize low-energy and low-latency generative applications. Our code is available at https://github.com/AndyCao1125/SDM. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: Accepted by IEEE Transactions on Artificial Intelligence

arXiv:2408.15815 [pdf, other]

MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

Authors: Congying Xu, Songqiang Chen, Jiarong Wu, Shing-Chi Cheung, Valerio Terragni, Hengcheng Zhu, Jialun Cao

Abstract: While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enha… ▽ More While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enhance test adequacy. In this paper, we propose MR-Adopt (Automatic Deduction Of inPut Transformation) to automatically deduce the input transformation from the hard-coded source and follow-up inputs, aiming to enable the encoded MRs to be reused with new source inputs. With typically only one pair of source and follow-up inputs available in an MR-encoded test case as the example, we leveraged LLMs to understand the intention of the test case and generate additional examples of source-followup input pairs. This helps to guide the generation of input transformations generalizable to multiple source inputs. Besides, to mitigate the issue that LLMs generate erroneous code, we refine LLM-generated transformations by removing MR- irrelevant code elements with data-flow analysis. Finally, we assess candidate transformations based on encoded output relations and select the best transformation as the result. Evaluation results show that MR-Adopt can generate input transformations applicable to all experimental source inputs for 72.00% of encoded MRs, which is 33.33% more than using vanilla GPT-3.5. By incorporating MR- Adopt-generated input transformations, encoded MR-based test cases can effectively enhance the test adequacy, increasing the line coverage and mutation score by 10.62% and 18.91%, respectively. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: This paper is accepted to ASE 2024

arXiv:2408.13487 [pdf, ps, other]

Towards Automatic Linearization via SMT Solving

Authors: Jian Cao, Liyong Lin, Lele Li

Abstract: Mathematical optimization is ubiquitous in modern applications. However, in practice, we often need to use nonlinear optimization models, for which the existing optimization tools such as Cplex or Gurobi may not be directly applicable and an (error-prone) manual transformation often has to be done. Thus, to address this issue, in this paper we investigate the problem of automatically verifying and… ▽ More Mathematical optimization is ubiquitous in modern applications. However, in practice, we often need to use nonlinear optimization models, for which the existing optimization tools such as Cplex or Gurobi may not be directly applicable and an (error-prone) manual transformation often has to be done. Thus, to address this issue, in this paper we investigate the problem of automatically verifying and synthesizing reductions, the solution of which may allow an automatic linearization of nonlinear models. We show that the synthesis of reductions can be formulated as an $\exists^* \forall^*$ synthesis problem, which can be solved by an SMT solver via the counter-example guided inductive synthesis approach (CEGIS). △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 4 pages, conference

arXiv:2408.13204 [pdf, other]

DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation

Authors: Qiming Zhu, Jialun Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun, Shing-Chi Cheung

Abstract: Code benchmarks such as HumanEval are widely adopted to evaluate the capabilities of Large Language Models (LLMs), providing insights into their strengths and weaknesses. However, current benchmarks primarily exercise LLMs' capability on common coding tasks (e.g., bubble sort, greatest common divisor), leaving domain-specific coding tasks (e.g., computation, system, cryptography) unexplored. To fi… ▽ More Code benchmarks such as HumanEval are widely adopted to evaluate the capabilities of Large Language Models (LLMs), providing insights into their strengths and weaknesses. However, current benchmarks primarily exercise LLMs' capability on common coding tasks (e.g., bubble sort, greatest common divisor), leaving domain-specific coding tasks (e.g., computation, system, cryptography) unexplored. To fill this gap, we propose a multi-domain code benchmark, DOMAINEVAL, designed to evaluate LLMs' coding capabilities thoroughly. Our pipeline works in a fully automated manner, enabling a push-bottom construction from code repositories into formatted subjects under study. Interesting findings are observed by evaluating 12 representative LLMs against DOMAINEVAL. We notice that LLMs are generally good at computation tasks while falling short on cryptography and system coding tasks. The performance gap can be as much as 68.94% (80.94% - 12.0%) in some LLMs. We also observe that generating more samples can increase the overall performance of LLMs, while the domain bias may even increase. The contributions of this study include a code generation benchmark dataset DOMAINEVAL, encompassing six popular domains, a fully automated pipeline for constructing code benchmarks, and an identification of the limitations of LLMs in code generation tasks based on their performance on DOMAINEVAL, providing directions for future research improvements. The leaderboard is available at https://domaineval.github.io/. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.13156 [pdf]

Ultrafast measurement of field-particle energy transfer during chorus emissions in space

Authors: C. M. Liu, B. N. Zhao, J. B. Cao, C. J. Pollock, C. T. Russell, Y. Y. Liu, X. N. Xing, P. A. Linqvist, J. L. Burch

Abstract: Chorus is one of the strongest electromagnetic emissions naturally occurring in space, and can cause hazardous radiations to humans and satellites1-3. Although chorus has attracted extreme interest and been intensively studied for decades4-7, its generation and evolution remain highly debated, due to the complexity of the underlying physics and the limited capacity of previous spacecraft missions7… ▽ More Chorus is one of the strongest electromagnetic emissions naturally occurring in space, and can cause hazardous radiations to humans and satellites1-3. Although chorus has attracted extreme interest and been intensively studied for decades4-7, its generation and evolution remain highly debated, due to the complexity of the underlying physics and the limited capacity of previous spacecraft missions7. Chorus has also been believed to be governed by planetary magnetic dipolar fields5,7. Contrary to such conventional expectation, here we report unexpected observations of chorus in the terrestrial neutral sheet where magnetic dipolar effect is absent. Using unprecedentedly high-cadence data from the Magnetospheric Multiscale Mission, we present the first, ultrafast measurements of the wave dispersion relation and electron three-dimensional distributions within the waves, showing smoking-gun evidences for chorus-electron interactions and development of electron holes in the wave phase space. We estimate field-particle energy transfer inside the waves and find that the waves were extracting energy from local thermal electrons, in line with the wave positive growth rate derived from instability analysis. Our observations, opening new pathways for resolving long-standing controversies regarding the chorus emissions, are crucial for understanding nonlinear energy transport ubiquitously observed in space and astrophysical environments. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: under review; comments and suggestions are welcomed

arXiv:2408.13001 [pdf, other]

CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution

Authors: Ruiyang Xu, Jialun Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Ben He, Shing-Chi Cheung, Le Sun

Abstract: Code benchmarks such as HumanEval are widely adopted to evaluate Large Language Models' (LLMs) coding capabilities. However, there is an unignorable programming language bias in existing code benchmarks -- over 95% code generation benchmarks are dominated by Python, leaving the LLMs' capabilities in other programming languages such as Java and C/C++ unknown. Moreover, coding task bias is also cruc… ▽ More Code benchmarks such as HumanEval are widely adopted to evaluate Large Language Models' (LLMs) coding capabilities. However, there is an unignorable programming language bias in existing code benchmarks -- over 95% code generation benchmarks are dominated by Python, leaving the LLMs' capabilities in other programming languages such as Java and C/C++ unknown. Moreover, coding task bias is also crucial. Most benchmarks focus on code generation capability, while benchmarks for code reasoning (given input, reasoning output; and given output, reasoning input), an essential coding capability, are insufficient. Yet, constructing multi-lingual benchmarks can be expensive and labor-intensive, and codes in contest websites such as Leetcode suffer from data contamination during training. To fill this gap, we propose CRUXEVAL-X, a multi-lingual code reasoning benchmark that contains 19 programming languages. It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total. In particular, the construction pipeline of CRUXEVAL-X works in a fully automated and test-guided manner, which iteratively generates and repairs based on execution feedback. Also, to cross language barriers (e.g., dynamic/static type systems in Python/C++), we formulated various transition rules between language pairs to facilitate translation. Our intensive evaluation of 24 representative LLMs reveals the correlation between language pairs. For example, TypeScript and JavaScript show a significant positive correlation, while Racket has less correlation with other languages. More interestingly, even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages, revealing the cross-language generalization of LLMs. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: 13pages

arXiv:2408.12405 [pdf, ps, other]

Non-extensive (3+1)-dimensional hydrodynamics for relativistic heavy-ion collisions

Authors: Jia-Hao Shi, Zhi-Ying Qin, Jin-Peng Zhang, Jian Cao, Ze-Fang Jiang, Wen-Chao Zhang, Hua Zheng

Abstract: A non-extensive (3+1)-dimensional hydrodynamic model for multi-particle production processes, NEX-CLVisc, is developed in the framework of CLVisc where the viscous corrections are turned off. It assumes that the non-extensive effects consistently exist in the initial conditions set by the optical Glauber model, the equation of state and the hadron kinetic freeze-out procedure. The model is then ap… ▽ More A non-extensive (3+1)-dimensional hydrodynamic model for multi-particle production processes, NEX-CLVisc, is developed in the framework of CLVisc where the viscous corrections are turned off. It assumes that the non-extensive effects consistently exist in the initial conditions set by the optical Glauber model, the equation of state and the hadron kinetic freeze-out procedure. The model is then applied to simulate the pseudo-rapidity ($η$) distribution, the transverse momentum ($p_{\rm T}$) spectra and the $p_{\rm T}$-differential elliptic flow ($v_2$) of charged particles in Pb-Pb collisions at $\sqrt{s_{NN}}=$ 2.76 TeV and 5.02 TeV, respectively. It is found that the model can reasonably well reproduce the experimental data of the $η$ distribution and the charged-particle spectra in a $p_{\rm T}$ range up to 6-8 GeV/c. When compared with the ideal hydrodynamic model, the $p_{\rm T}$-differential $v_2$ of charged particles is suppressed in the NEX-CLVisc model, which is similar to that observed in the viscous hydrodynamic model. Moreover, due to the lack of the viscous corrections and the event-by-event fluctuation, the model can only describe the $p_{\rm T}$-differential $v_2$ up to 4 GeV/c, which is smaller than the applicable range for the particle $p_{\rm T}$ spectra. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 8 pages, 4 figures, 4 tables

arXiv:2408.10566 [pdf, other]

SparseGrow: Addressing Growth-Induced Forgetting in Task-Agnostic Continual Learning

Authors: Yuqing Zhao, Divya Saxena, Jiannong Cao, Xiaoyun Liu, Changlin Song

Abstract: In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue we name as growth-induced forgetting (GIFt), especially in task-agnostic CL using entire grown model for inference. Existing works, despite adopting model growth and random… ▽ More In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue we name as growth-induced forgetting (GIFt), especially in task-agnostic CL using entire grown model for inference. Existing works, despite adopting model growth and random initialization for better adaptability, often fail to recognize the presence of GIFt caused by improper model growth. This oversight limits comprehensive control of forgetting and hinders full utilization of model growth. We are the first in CL to identify this issue and conduct an in-depth study on root cause of GIFt, where layer expansion stands out among model growth strategies, widening layers without affecting model functionality. Yet, direct adoption of layer expansion presents challenges. It lacks data-driven control and initialization of expanded parameters to balance adaptability and knowledge retention. This paper presents a novel SparseGrow approach to overcome the issue of GIFt while enhancing adaptability over new data. SparseGrow employs data-driven sparse layer expansion to control efficient parameter usage during growth, reducing GIFt from excessive growth and functionality changes. It also combines sparse growth with on-data initialization at training late-stage to create partially 0-valued expansions that fit learned distribution, enhancing retention and adaptability. To further minimize forgetting, freezing is applied by calculating the sparse mask, allowing data-driven preservation of important parameters. Through experiments across datasets with various settings, cases and task numbers, we demonstrate the necessity of layer expansion and showcase the effectiveness of SparseGrow in overcoming GIFt, highlighting its adaptability and knowledge retention for incremental tasks. △ Less

Submitted 26 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

Comments: This paper has been submitted to the AAAI conference. If accepted, the final version will be updated to reflect the conference proceedings

arXiv:2408.09260 [pdf]

doi 10.3390/rs16163030

Analysis of the Effect of Tilted Corner Cube Reflector Arrays on Lunar Laser Ranging

Authors: Jin Cao, Rufeng Tang, Kai Huang, Zhulian Li, Yongzhang Yang, Kai Huang, Jintao Li, Yuqiang Li

Abstract: This paper primarily investigates the effect of the tilt of corner cube reflector (CCR) arrays on lunar laser ranging (LLR). A mathematical model was established to study the random errors caused by the tilt of the CCR arrays. The study found that, ideally, when the laser ranging pulse width is 10 picoseconds or less, it is possible to distinguish from which specific corner cubes within the CCR ar… ▽ More This paper primarily investigates the effect of the tilt of corner cube reflector (CCR) arrays on lunar laser ranging (LLR). A mathematical model was established to study the random errors caused by the tilt of the CCR arrays. The study found that, ideally, when the laser ranging pulse width is 10 picoseconds or less, it is possible to distinguish from which specific corner cubes within the CCR array each peak in the echo signal originates. Consequently, partial data from the echo can be extracted for signal processing, significantly reducing random errors and improving the single-shot precision of LLR. The distance obtained by extracting part of the echo can be reduced to the center position of the array, thereby providing multiple higher-precision ranging results from each measurement. This not only improves the precision of LLR but also increases the data volume. A simulation experiment based on the 1.2 m laser ranging system at Yunnan Observatories was conducted. By extracting one peak for signal processing, the single-shot precision improved from 32.24 mm to 2.52 mm, validating the theoretical analysis results. Finally, an experimental laser ranging system based on a 53 cm binocular telescope system was established for ground experiments. The experimental results indicated that the echo signal could identify the tilt state of the CCR array. By extracting the peak returned by the central CCR for signal processing, the ranging precision was greatly improved. Through theoretical analyses, simulation experiments, and ground experiments, a solution to reduce the random errors caused by the tilt of the CCR array was provided. This offers an approach to enhance the single-shot precision of future LLR and provides a reference for upgrading ground-based equipment at future laser ranging stations. △ Less

Submitted 21 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

Journal ref: Remote Sens. 2024, 16(16), 3030

arXiv:2408.09086 [pdf, ps, other]

Composite solitary vortices of three-wave mixing in quasi-phase-matched photonic crystals

Authors: Chao Kong, Jinqing Li, Xinyi Tang, Xuli Li, Ju Jiao, Jun Cao, Haiming Deng

Abstract: We report the composite vortex solitons of three-wave mixing propagate stably in a three-dimensional (3D) quasi-phase-matched photonic crystals (QPM-PhC). The modulation of QPM-PhC is designed as a checkerboard pattern. The vortex solitons, composed by three waves ($ω_{1,2,3}$) propagating through the lattices, exhibit a four-spotted discrete type, which gives rise to four distinct modes: zero-vor… ▽ More We report the composite vortex solitons of three-wave mixing propagate stably in a three-dimensional (3D) quasi-phase-matched photonic crystals (QPM-PhC). The modulation of QPM-PhC is designed as a checkerboard pattern. The vortex solitons, composed by three waves ($ω_{1,2,3}$) propagating through the lattices, exhibit a four-spotted discrete type, which gives rise to four distinct modes: zero-vorticity, vortex, anti-vortex, and quadrupole. The composite vortex solitons result from combinations of these modes and lead to four cases: vortex doubling, hidden vortices, vortex up-conversion, and anti-vortex up-conversion. Our findings indicate that all solitons can propagate stably through the crystals for 10 centimeters; however, only the vortex-doubling case remains stable over longer distances. This work enhances the understanding of vortex beam manipulation within 3D QPM-PhCs. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: to be published in Chaos, Solitons & Fractals

arXiv:2408.08518 [pdf, other]

Visual-Friendly Concept Protection via Selective Adversarial Perturbations

Authors: Xiaoyue Mi, Fan Tang, Juan Cao, Peng Li, Yang Liu

Abstract: Personalized concept generation by tuning diffusion models with a few images raises potential legal and ethical concerns regarding privacy and intellectual property rights. Researchers attempt to prevent malicious personalization using adversarial perturbations. However, previous efforts have mainly focused on the effectiveness of protection while neglecting the visibility of perturbations. They u… ▽ More Personalized concept generation by tuning diffusion models with a few images raises potential legal and ethical concerns regarding privacy and intellectual property rights. Researchers attempt to prevent malicious personalization using adversarial perturbations. However, previous efforts have mainly focused on the effectiveness of protection while neglecting the visibility of perturbations. They utilize global adversarial perturbations, which introduce noticeable alterations to original images and significantly degrade visual quality. In this work, we propose the Visual-Friendly Concept Protection (VCPro) framework, which prioritizes the protection of key concepts chosen by the image owner through adversarial perturbations with lower perceptibility. To ensure these perturbations are as inconspicuous as possible, we introduce a relaxed optimization objective to identify the least perceptible yet effective adversarial perturbations, solved using the Lagrangian multiplier method. Qualitative and quantitative experiments validate that VCPro achieves a better trade-off between the visibility of perturbations and protection effectiveness, effectively prioritizing the protection of target concepts in images with less perceptible perturbations. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: Under Review

arXiv:2408.08091 [pdf, other]

HAIR: Hypernetworks-based All-in-One Image Restoration

Authors: Jin Cao, Yi Cao, Li Pang, Deyu Meng, Xiangyong Cao

Abstract: Image restoration aims to recover a high-quality clean image from its degraded version. Recent progress in image restoration has demonstrated the effectiveness of All-in-One image restoration models in addressing various degradations simultaneously. However, these existing methods typically utilize the same parameters to tackle images with different degradation types, thus forcing the model to bal… ▽ More Image restoration aims to recover a high-quality clean image from its degraded version. Recent progress in image restoration has demonstrated the effectiveness of All-in-One image restoration models in addressing various degradations simultaneously. However, these existing methods typically utilize the same parameters to tackle images with different degradation types, thus forcing the model to balance the performance between different tasks and limiting its performance on each task. To alleviate this issue, we propose HAIR, a \textbf{H}ypernetworks-based \textbf{A}ll-in-One \textbf{I}mage \textbf{R}estoration method that dynamically generates parameters based on input images. Specifically, HAIR consists of two main components, i.e., Classifier and Hyper Selecting Net (HSN). The Classifier is a simple image classification network used to generate a Global Information Vector (GIV) that contains the degradation information of the input image, and the HSN is a simple fully-connected neural network that receives the GIV and outputs parameters for the corresponding modules. Extensive experiments demonstrate that HAIR can significantly improve the performance of existing image restoration models in a plug-and-play manner, both in single-task and all-in-one settings. Notably, our innovative model, Res-HAIR, which integrates HAIR into the well-known Restormer, can obtain superior or comparable performance compared with current state-of-the-art methods. Moreover, we theoretically demonstrate that our proposed HAIR requires fewer parameters in contrast to the prevalent All-in-One methodologies. The code is available at \textcolor{blue}{\href{https://github.com/toummHus/HAIR}{https://github.com/toummHus/HAIR}.} △ Less

Submitted 28 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

Comments: 16 pages

arXiv:2408.07467 [pdf, other]

Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification

Authors: Yongcheng Li, Lingcong Cai, Ying Lu, Cheng Lin, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan

Abstract: Accurate classification of blood cells is of vital significance in the diagnosis of hematological disorders. However, in real-world scenarios, domain shifts caused by the variability in laboratory procedures and settings, result in a rapid deterioration of the model's generalization performance. To address this issue, we propose a novel framework of domain-invariant representation learning (DoRL)… ▽ More Accurate classification of blood cells is of vital significance in the diagnosis of hematological disorders. However, in real-world scenarios, domain shifts caused by the variability in laboratory procedures and settings, result in a rapid deterioration of the model's generalization performance. To address this issue, we propose a novel framework of domain-invariant representation learning (DoRL) via segment anything model (SAM) for blood cell classification. The DoRL comprises two main components: a LoRA-based SAM (LoRA-SAM) and a cross-domain autoencoder (CAE). The advantage of DoRL is that it can extract domain-invariant representations from various blood cell datasets in an unsupervised manner. Specifically, we first leverage the large-scale foundation model of SAM, fine-tuned with LoRA, to learn general image embeddings and segment blood cells. Additionally, we introduce CAE to learn domain-invariant representations across different-domain datasets while mitigating images' artifacts. To validate the effectiveness of domain-invariant representations, we employ five widely used machine learning classifiers to construct blood cell classification models. Experimental results on two public blood cell datasets and a private real dataset demonstrate that our proposed DoRL achieves a new state-of-the-art cross-domain performance, surpassing existing methods by a significant margin. The source code can be available at the URL (https://github.com/AnoK3111/DoRL). △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.06716 [pdf, other]

Towards Cross-Domain Single Blood Cell Image Classification via Large-Scale LoRA-based Segment Anything Model

Authors: Yongcheng Li, Lingcong Cai, Ying Lu, Yupeng Zhang, Jingyan Jiang, Genan Dai, Bowen Zhang, Jingzhou Cao, Xiangzhong Zhang, Xiaomao Fan

Abstract: Accurate classification of blood cells plays a vital role in hematological analysis as it aids physicians in diagnosing various medical conditions. In this study, we present a novel approach for classifying blood cell images known as BC-SAM. BC-SAM leverages the large-scale foundation model of Segment Anything Model (SAM) and incorporates a fine-tuning technique using LoRA, allowing it to extract… ▽ More Accurate classification of blood cells plays a vital role in hematological analysis as it aids physicians in diagnosing various medical conditions. In this study, we present a novel approach for classifying blood cell images known as BC-SAM. BC-SAM leverages the large-scale foundation model of Segment Anything Model (SAM) and incorporates a fine-tuning technique using LoRA, allowing it to extract general image embeddings from blood cell images. To enhance the applicability of BC-SAM across different blood cell image datasets, we introduce an unsupervised cross-domain autoencoder that focuses on learning intrinsic features while suppressing artifacts in the images. To assess the performance of BC-SAM, we employ four widely used machine learning classifiers (Random Forest, Support Vector Machine, Artificial Neural Network, and XGBoost) to construct blood cell classification models and compare them against existing state-of-the-art methods. Experimental results conducted on two publicly available blood cell datasets (Matek-19 and Acevedo-20) demonstrate that our proposed BC-SAM achieves a new state-of-the-art result, surpassing the baseline methods with a significant improvement. The source code of this paper is available at https://github.com/AnoK3111/BC-SAM. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06286 [pdf, other]

Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering

Authors: Jiameng Li, Yue Shi, Jiezhang Cao, Bingbing Ni, Wenjun Zhang, Kai Zhang, Luc Van Gool

Abstract: 3D Gaussian Splatting (3DGS) has attracted great attention in novel view synthesis because of its superior rendering efficiency and high fidelity. However, the trained Gaussians suffer from severe zooming degradation due to non-adjustable representation derived from single-scale training. Though some methods attempt to tackle this problem via post-processing techniques such as selective rendering… ▽ More 3D Gaussian Splatting (3DGS) has attracted great attention in novel view synthesis because of its superior rendering efficiency and high fidelity. However, the trained Gaussians suffer from severe zooming degradation due to non-adjustable representation derived from single-scale training. Though some methods attempt to tackle this problem via post-processing techniques such as selective rendering or filtering techniques towards primitives, the scale-specific information is not involved in Gaussians. In this paper, we propose a unified optimization method to make Gaussians adaptive for arbitrary scales by self-adjusting the primitive properties (e.g., color, shape and size) and distribution (e.g., position). Inspired by the mipmap technique, we design pseudo ground-truth for the target scale and propose a scale-consistency guidance loss to inject scale information into 3D Gaussians. Our method is a plug-in module, applicable for any 3DGS models to solve the zoom-in and zoom-out aliasing. Extensive experiments demonstrate the effectiveness of our method. Notably, our method outperforms 3DGS in PSNR by an average of 9.25 dB for zoom-in and 10.40 dB for zoom-out on the NeRF Synthetic dataset. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: 9 pages

arXiv:2408.05709 [pdf, other]

Moment&Cross: Next-Generation Real-Time Cross-Domain CTR Prediction for Live-Streaming Recommendation at Kuaishou

Authors: Jiangxia Cao, Shen Wang, Yue Li, Shenghui Wang, Jian Tang, Shiyao Wang, Shuang Yang, Zhaojie Liu, Guorui Zhou

Abstract: Kuaishou, is one of the largest short-video and live-streaming platform, compared with short-video recommendations, live-streaming recommendation is more complex because of: (1) temporarily-alive to distribution, (2) user may watch for a long time with feedback delay, (3) content is unpredictable and changes over time. Actually, even if a user is interested in the live-streaming author, it still m… ▽ More Kuaishou, is one of the largest short-video and live-streaming platform, compared with short-video recommendations, live-streaming recommendation is more complex because of: (1) temporarily-alive to distribution, (2) user may watch for a long time with feedback delay, (3) content is unpredictable and changes over time. Actually, even if a user is interested in the live-streaming author, it still may be an negative watching (e.g., short-view < 3s) since the real-time content is not attractive enough. Therefore, for live-streaming recommendation, there exists a challenging task: how do we recommend the live-streaming at right moment for users? Additionally, our platform's major exposure content is short short-video, and the amount of exposed short-video is 9x more than exposed live-streaming. Thus users will leave more behaviors on short-videos, which leads to a serious data imbalance problem making the live-streaming data could not fully reflect user interests. In such case, there raises another challenging task: how do we utilize users' short-video behaviors to make live-streaming recommendation better? △ Less

Submitted 11 August, 2024; originally announced August 2024.

Comments: Work in progress

arXiv:2408.05492 [pdf, other]

doi 10.1145/3664647.3680676

ZePo: Zero-Shot Portrait Stylization with Faster Sampling

Authors: Jin Liu, Huaibo Huang, Jie Cao, Ran He

Abstract: Diffusion-based text-to-image generation models have significantly advanced the field of art content synthesis. However, current portrait stylization methods generally require either model fine-tuning based on examples or the employment of DDIM Inversion to revert images to noise space, both of which substantially decelerate the image generation process. To overcome these limitations, this paper p… ▽ More Diffusion-based text-to-image generation models have significantly advanced the field of art content synthesis. However, current portrait stylization methods generally require either model fine-tuning based on examples or the employment of DDIM Inversion to revert images to noise space, both of which substantially decelerate the image generation process. To overcome these limitations, this paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We observed that Latent Consistency Models employing consistency distillation can effectively extract representative Consistency Features from noisy images. To blend the Consistency Features extracted from both content and style images, we introduce a Style Enhancement Attention Control technique that meticulously merges content and style features within the attention space of the target image. Moreover, we propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control. Extensive experiments have validated the effectiveness of our proposed framework in enhancing stylization efficiency and fidelity. The code is available at \url{https://github.com/liujin112/ZePo}. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: Accepted by ACM MM 2024

arXiv:2408.05430 [pdf, other]

HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou

Authors: Xu Wang, Jiangxia Cao, Zhiyi Fu, Kun Gai, Guorui Zhou

Abstract: In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we st… ▽ More In this paper, we present the practical problems and the lessons learned at short-video services from Kuaishou. In industry, a widely-used multi-task framework is the Mixture-of-Experts (MoE) paradigm, which always introduces some shared and specific experts for each task and then uses gate networks to measure related experts' contributions. Although the MoE achieves remarkable improvements, we still observe three anomalies that seriously affect model performances in our iteration: (1) Expert Collapse: We found that experts' output distributions are significantly different, and some experts have over 90% zero activations with ReLU, making it hard for gate networks to assign fair weights to balance experts. (2) Expert Degradation: Ideally, the shared-expert aims to provide predictive information for all tasks simultaneously. Nevertheless, we find that some shared-experts are occupied by only one task, which indicates that shared-experts lost their ability but degenerated into some specific-experts. (3) Expert Underfitting: In our services, we have dozens of behavior tasks that need to be predicted, but we find that some data-sparse prediction tasks tend to ignore their specific-experts and assign large weights to shared-experts. The reason might be that the shared-experts can perceive more gradient updates and knowledge from dense tasks, while specific-experts easily fall into underfitting due to their sparse behaviors. Motivated by those observations, we propose HoME to achieve a simple, efficient and balanced MoE system for multi-task learning. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: Work in progress

arXiv:2408.04201 [pdf, ps, other]

Exact solution of a quantum integrable system associated with the $G_2$ exceptional Lie algebra

Authors: Guang-Liang Li, Junpeng Cao, Wen-Li Yang, Kangjie Shi, Yupeng Wang

Abstract: A quantum integrable spin chain model associated with the $G_2$ exceptional Lie algebra is studied. By using the fusion technique, the closed recursive relations among the fused transfer matrices are obtained. These identities allow us to derive the exact energy spectrum and Bethe ansatz equations of the system based on polynomial analysis. The present method provides a unified treatment to invest… ▽ More A quantum integrable spin chain model associated with the $G_2$ exceptional Lie algebra is studied. By using the fusion technique, the closed recursive relations among the fused transfer matrices are obtained. These identities allow us to derive the exact energy spectrum and Bethe ansatz equations of the system based on polynomial analysis. The present method provides a unified treatment to investigate the Bethe ansatz solutions for both periodic and non-diagonal open boundary conditions associated with exceptional Lie algebras. △ Less

Submitted 27 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: 32 pages

arXiv:2408.04130 [pdf, ps, other]

X(2370) glueball-like particle productions in $e^+e^-$ collisions at the BESIII energy and in pp collisions at the LHC energy with PACIAE model

Authors: Jian Cao, Zhi-Lei She, Jin-Peng Zhang, Jia-Hao Shi, Zhi-Ying Qin, Wen-Chao Zhang, Hua Zheng, An-Ke Lei, Dai-Mei Zhou, Yu-Liang Yan, Ben-Hao Sa

Abstract: Inspired by the BESIII newest observation of X(2370) glueball-like particle, we search its productions in both $e^+e^-$ collisions at $\sqrt{s}=$ 4.95 GeV and proton-proton (pp) collisions at $\sqrt{s}=$ 13 TeV with a parton and hadron cascade model PACIAE. In this model, the final partonic state (FPS) and the final hadronic state (FHS) are consecutively simulated and recorded. The X(2370) gluebal… ▽ More Inspired by the BESIII newest observation of X(2370) glueball-like particle, we search its productions in both $e^+e^-$ collisions at $\sqrt{s}=$ 4.95 GeV and proton-proton (pp) collisions at $\sqrt{s}=$ 13 TeV with a parton and hadron cascade model PACIAE. In this model, the final partonic state (FPS) and the final hadronic state (FHS) are consecutively simulated and recorded. The X(2370) glueball- or tetraquark-state is then, respectively, recombined by two gluons or four quarks $ss\bar{s}\bar{s}$ in the FPS using the quantum statistical mechanics inspired dynamically constrained phase-space coalescence (DCPC) model. The X(2370) molecular-state is recombined by the baryon-antibaryon of $Λ$-$\barΛ$ or $Σ$-$\barΣ$, or by three mesons of $π^+π^{-}η'$, $K^+K^-η'$, or $K_S^0K_S^0η'$ in the FHS using DCPC model. In both $e^+e^-$ and pp collisions, significant discrepancies in the yields, the transverse momentum spectra and the rapidity distributions among the X(2370) glueball-, tetraquark-, and molecular-state are observed. These discrepancies are proposed as valuable criteria identifying the X(2370) different states from each other. Our results not only support the BESIII observation of glueball-like particle $\rm X(2370)$ production in $e^+e^-$ collisions, but also serve as a prediction for the $\rm X(2370)$ production in pp collisions. We strongly suggest the experimental measurement of the X(2370) glueball-like particle production in pp collisions at the LHC energies. △ Less

Submitted 2 September, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: 6 pages, 5 figures

arXiv:2408.03695 [pdf, other]

Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

Authors: Zilyu Ye, Jinxiu Liu, Ruotian Peng, Jinjin Cao, Zhiyang Chen, Yiyang Zhang, Ziwei Xuan, Mingyuan Zhou, Xiaoqian Shen, Mohamed Elhoseiny, Qi Liu, Guo-Jun Qi

Abstract: Recent image generation models excel at creating high-quality images from brief captions. However, they fail to maintain consistency of multiple instances across images when encountering lengthy contexts. This inconsistency is largely due to in existing training datasets the absence of granular instance feature labeling in existing training datasets. To tackle these issues, we introduce Openstory+… ▽ More Recent image generation models excel at creating high-quality images from brief captions. However, they fail to maintain consistency of multiple instances across images when encountering lengthy contexts. This inconsistency is largely due to in existing training datasets the absence of granular instance feature labeling in existing training datasets. To tackle these issues, we introduce Openstory++, a large-scale dataset combining additional instance-level annotations with both images and text. Furthermore, we develop a training methodology that emphasizes entity-centric image-text generation, ensuring that the models learn to effectively interweave visual and textual information. Specifically, Openstory++ streamlines the process of keyframe extraction from open-domain videos, employing vision-language models to generate captions that are then polished by a large language model for narrative continuity. It surpasses previous datasets by offering a more expansive open-domain resource, which incorporates automated captioning, high-resolution imagery tailored for instance count, and extensive frame sequences for temporal consistency. Additionally, we present Cohere-Bench, a pioneering benchmark framework for evaluating the image generation tasks when long multimodal context is provided, including the ability to keep the background, style, instances in the given context coherent. Compared to existing benchmarks, our work fills critical gaps in multi-modal generation, propelling the development of models that can adeptly generate and interpret complex narratives in open-domain environments. Experiments conducted within Cohere-Bench confirm the superiority of Openstory++ in nurturing high-quality visual storytelling models, enhancing their ability to address open-domain generation tasks. More details can be found at https://openstorypp.github.io/ △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.00621 [pdf, other]

CAVE: Crowdsourcing Passing-By Vehicles for Reliable In-Vehicle Edge Computing

Authors: Jiahe Cao, Qiang Liu, Dawei Chen, Kyungtae Han

Abstract: In-vehicle edge computing is a much anticipated paradigm to serve ever-increasing computation demands originated from the ego vehicle, such as passenger entertainments. In this paper, we explore the unique idea of crowdsourcing passing-by vehicles to augment computing of the ego vehicle. The challenges lie in the high dynamics of passing-by vehicles, time-correlated task computation, and the strin… ▽ More In-vehicle edge computing is a much anticipated paradigm to serve ever-increasing computation demands originated from the ego vehicle, such as passenger entertainments. In this paper, we explore the unique idea of crowdsourcing passing-by vehicles to augment computing of the ego vehicle. The challenges lie in the high dynamics of passing-by vehicles, time-correlated task computation, and the stringent requirement of computing reliability for individual user tasks. To this end, we formulate an optimization problem to minimize the end-to-end latency by optimizing the task assignment and resource allocation of user tasks. To address the complex problem, we propose a new algorithm (named CAVE) with multiple key designs. We build an end-to-end network and compute simulator and conduct extensive simulation to evaluate the performance of the proposed algorithm. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: This paper is accepted by IEEE GLOBECOM 2024

arXiv:2407.21258 [pdf, ps, other]

Exact surface energies and boundary excitations of the Izergin-Korepin model with generic boundary fields

Authors: Pengcheng Lu, Junpeng Cao, Wen-Li Yang, Ian Marquette, Yao-Zhong Zhang

Abstract: The Izergin-Korepin model is an integrable model with the simplest twisted quantum affine algebra $U_q(A_2^{(2)})$ symmetry. Applying the $t-W$ method, we derive the homogeneous zero roots Bethe ansatz equations and the corresponding zero root patterns of the Izergin-Korepin model with generic integrable boundaries. Based on these results, we analytically compute the surface energies and boundary… ▽ More The Izergin-Korepin model is an integrable model with the simplest twisted quantum affine algebra $U_q(A_2^{(2)})$ symmetry. Applying the $t-W$ method, we derive the homogeneous zero roots Bethe ansatz equations and the corresponding zero root patterns of the Izergin-Korepin model with generic integrable boundaries. Based on these results, we analytically compute the surface energies and boundary excitations in different regimes of boundary parameters of the model. It is shown that in some regimes, correlation effect appears between two boundary fields. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.21075 [pdf, other]

Apple Intelligence Foundation Language Models

Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.20870 [pdf, other]

Mean of Means: A 10-dollar Solution for Human Localization with Calibration-free and Unconstrained Camera Settings

Authors: Tianyi Zhang, Wengyu Zhang, Xulu Zhang, Jiaxin Wu, Xiao-Yong Wei, Jiannong Cao, Qing Li

Abstract: Accurate human localization is crucial for various applications, especially in the Metaverse era. Existing high precision solutions rely on expensive, tag-dependent hardware, while vision-based methods offer a cheaper, tag-free alternative. However, current vision solutions based on stereo vision face limitations due to rigid perspective transformation principles and error propagation in multi-sta… ▽ More Accurate human localization is crucial for various applications, especially in the Metaverse era. Existing high precision solutions rely on expensive, tag-dependent hardware, while vision-based methods offer a cheaper, tag-free alternative. However, current vision solutions based on stereo vision face limitations due to rigid perspective transformation principles and error propagation in multi-stage SVD solvers. These solutions also require multiple high-resolution cameras with strict setup constraints. To address these limitations, we propose a probabilistic approach that considers all points on the human body as observations generated by a distribution centered around the body's geometric center. This enables us to improve sampling significantly, increasing the number of samples for each point of interest from hundreds to billions. By modeling the relation between the means of the distributions of world coordinates and pixel coordinates, leveraging the Central Limit Theorem, we ensure normality and facilitate the learning process. Experimental results demonstrate human localization accuracy of 95% within a 0.3m range and nearly 100% accuracy within a 0.5m range, achieved at a low cost of only 10 USD using two web cameras with a resolution of 640x480 pixels. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.19130 [pdf]

Panoramic single-pixel imaging with megapixel resolution based on rotational subdivision

Authors: Huan Cui, Jie Cao, Haoyu Zhang, Chang Zhou, Haifeng Yao, Yingbo Wang, Qun Hao

Abstract: Single-pixel imaging (SPI) using a single-pixel detector is an unconventional imaging method, which has great application prospects in many fields to realize high-performance imaging. In especial, the recent proposed catadioptric panoramic ghost imaging (CPGI) extends the application potential of SPI to high-performance imaging at a wide field of view (FOV) with recent growing demands. However, th… ▽ More Single-pixel imaging (SPI) using a single-pixel detector is an unconventional imaging method, which has great application prospects in many fields to realize high-performance imaging. In especial, the recent proposed catadioptric panoramic ghost imaging (CPGI) extends the application potential of SPI to high-performance imaging at a wide field of view (FOV) with recent growing demands. However, the resolution of CPGI is limited by hardware parameters of the digital micromirror device (DMD), which may not meet ultrahigh-resolution panoramic imaging needs that require detailed information. Therefore, to overcome the resolution limitation of CPGI, we propose a panoramic SPI based on rotational subdivision (RSPSI). The key of the proposed RSPSI is to obtain the entire panoramic scene by the rotation-scanning with a rotating mirror tilted 45°, so that one single pattern that only covers one sub-Fov with a small FOV can complete a uninterrupted modulation on the entire panoramic FOV during a once-through pattern projection. Then, based on temporal resolution subdivision, images sequence of sub-Fovs subdivided from the entire panoramic FOV can be reconstructed with pixels-level or even subpixels-level horizontal shifting adjacently. Experimental results using a proof-of-concept setup show that the panoramic image can be obtained with 10428*543 of 5,662,404 pixels, which is more than 9.6 times higher than the resolution limit of the CPGI using the same DMD. To our best knowledge, the RSPSI is the first to achieve a megapixel resolution via SPI, which can provide potential applications in fields requiring the imaging with ultrahigh-resolution and wide FOV. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2407.17940 [pdf, other]

Positive Text Reframing under Multi-strategy Optimization

Authors: Shutong Jia, Biwei Cao, Qingqing Gao, Jiuxin Cao, Bo Liu

Abstract: Differing from sentiment transfer, positive reframing seeks to substitute negative perspectives with positive expressions while preserving the original meaning. With the emergence of pre-trained language models (PLMs), it is possible to achieve acceptable results by fine-tuning PLMs. Nevertheless, generating fluent, diverse and task-constrained reframing text remains a significant challenge. To ta… ▽ More Differing from sentiment transfer, positive reframing seeks to substitute negative perspectives with positive expressions while preserving the original meaning. With the emergence of pre-trained language models (PLMs), it is possible to achieve acceptable results by fine-tuning PLMs. Nevertheless, generating fluent, diverse and task-constrained reframing text remains a significant challenge. To tackle this issue, a \textbf{m}ulti-\textbf{s}trategy \textbf{o}ptimization \textbf{f}ramework (MSOF) is proposed in this paper. Starting from the objective of positive reframing, we first design positive sentiment reward and content preservation reward to encourage the model to transform the negative expressions of the original text while ensuring the integrity and consistency of the semantics. Then, different decoding optimization approaches are introduced to improve the quality of text generation. Finally, based on the modeling formula of positive reframing, we propose a multi-dimensional re-ranking method that further selects candidate sentences from three dimensions: strategy consistency, text similarity and fluency. Extensive experiments on two Seq2Seq PLMs, BART and T5, demonstrate our framework achieves significant improvements on unconstrained and controlled positive reframing tasks. △ Less

Submitted 27 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.17867 [pdf, other]

Intrinsic Nonlinear Spin Hall Effect and Manipulation of Perpendicular Magnetization

Authors: Hui Wang, Huiying Liu, Xukun Feng, Jin Cao, Weikang Wu, Shen Lai, Weibo Gao, Cong Xiao, Shengyuan A. Yang

Abstract: We propose an intrinsic nonlinear spin Hall effect, which enables the generation of collinearly-polarized spin current in a large class of nonmagnetic materials with the corresponding linear response being symmetry-forbidden. This opens a new avenue for field-free switching of perpendicular magnetization, which is required for the next-generation information storage technology. We develop the micr… ▽ More We propose an intrinsic nonlinear spin Hall effect, which enables the generation of collinearly-polarized spin current in a large class of nonmagnetic materials with the corresponding linear response being symmetry-forbidden. This opens a new avenue for field-free switching of perpendicular magnetization, which is required for the next-generation information storage technology. We develop the microscopic theory of this effect, and clarify its quantum origin in band geometric quantities which can be enhanced by topological nodal features. Combined with first-principles calculations, we predict pronounced effects at room temperature in topological metals $\mathrm{PbTaSe_{2}}$ and PdGa. Our work establishes a fundamental nonlinear response in spin transport, and opens the door to exploring spintronic applications based on nonlinear spin Hall effect. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.17120 [pdf, other]

Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

Authors: Jingren Liu, Zhong Ji, YunLong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

Abstract: Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics fo… ▽ More Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics for continual scenarios using Neural Tangent Kernel (NTK) theory. With the aid of NTK as a mathematical analysis tool, we recast the challenge of test-time forgetting into the quantifiable generalization gaps during training, identifying three key factors that influence these gaps and the performance of PEFT-CL: training sample size, task-level feature orthogonality, and regularization. To address these challenges, we introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features. Aligning with theoretical guidance, NTK-CL triples the feature representation of each sample, theoretically and empirically reducing the magnitude of both task-interplay and task-specific generalization gaps. Grounded in NTK analysis, our approach imposes an adaptive exponential moving average mechanism and constraints on task-level feature orthogonality, maintaining intra-task NTK forms while attenuating inter-task NTK forms. Ultimately, by fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks. This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.16670 [pdf, other]

FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process

Authors: Yuyan Bu, Qiang Sheng, Juan Cao, Peng Qi, Danding Wang, Jintao Li

Abstract: As short-form video-sharing platforms become a significant channel for news consumption, fake news in short videos has emerged as a serious threat in the online information ecosystem, making developing detection methods for this new scenario an urgent need. Compared with that in text and image formats, fake news on short video platforms contains rich but heterogeneous information in various modali… ▽ More As short-form video-sharing platforms become a significant channel for news consumption, fake news in short videos has emerged as a serious threat in the online information ecosystem, making developing detection methods for this new scenario an urgent need. Compared with that in text and image formats, fake news on short video platforms contains rich but heterogeneous information in various modalities, posing a challenge to effective feature utilization. Unlike existing works mostly focusing on analyzing what is presented, we introduce a novel perspective that considers how it might be created. Through the lens of the creative process behind news video production, our empirical analysis uncovers the unique characteristics of fake news videos in material selection and editing. Based on the obtained insights, we design FakingRecipe, a creative process-aware model for detecting fake news short videos. It captures the fake news preferences in material selection from sentimental and semantic aspects and considers the traits of material editing from spatial and temporal aspects. To improve evaluation comprehensiveness, we first construct FakeTT, an English dataset for this task, and conduct experiments on both FakeTT and the existing Chinese FakeSV dataset. The results show FakingRecipe's superiority in detecting fake news on short video platforms. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Will appear at ACM Multimedia 2024 (MM 2024), 13 pages, 15 figures

arXiv:2407.16224 [pdf, other]

OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person

Authors: Ke Sun, Jian Cao, Qi Wang, Linrui Tian, Xindi Zhang, Lian Zhuo, Bang Zhang, Liefeng Bo, Wenbo Zhou, Weiming Zhang, Daiheng Gao

Abstract: Virtual Try-On (VTON) has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. While diffusion models, such as Stable Diffusion series, have shown their capability in creating high-quality and photorealistic images, they… ▽ More Virtual Try-On (VTON) has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. While diffusion models, such as Stable Diffusion series, have shown their capability in creating high-quality and photorealistic images, they encounter formidable challenges in conditional generation scenarios like VTON. Specifically, these models struggle to maintain a balance between control and consistency when generating images for virtual clothing trials. OutfitAnyone addresses these limitations by leveraging a two-stream conditional diffusion model, enabling it to adeptly handle garment deformation for more lifelike results. It distinguishes itself with scalability-modulating factors such as pose, body shape and broad applicability, extending from anime to in-the-wild images. OutfitAnyone's performance in diverse scenarios underscores its utility and readiness for real-world deployment. For more details and animated results, please see \url{https://humanaigc.github.io/outfit-anyone/}. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 10 pages, 13 figures

arXiv:2407.15481 [pdf, other]

Diverse Image Harmonization

Authors: Xinhao Tao, Tianyuan Qiu, Junyan Cao, Li Niu

Abstract: Image harmonization aims to adjust the foreground illumination in a composite image to make it harmonious. The existing harmonization methods can only produce one deterministic result for a composite image, ignoring that a composite image could have multiple plausible harmonization results due to multiple plausible reflectances. In this work, we first propose a reflectance-guided harmonization net… ▽ More Image harmonization aims to adjust the foreground illumination in a composite image to make it harmonious. The existing harmonization methods can only produce one deterministic result for a composite image, ignoring that a composite image could have multiple plausible harmonization results due to multiple plausible reflectances. In this work, we first propose a reflectance-guided harmonization network, which can achieve better performance with the guidance of ground-truth foreground reflectance. Then, we also design a diverse reflectance generation network to predict multiple plausible foreground reflectances, leading to multiple plausible harmonization results. The extensive experiments on the benchmark datasets demonstrate the effectiveness of our method. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15464 [pdf, other]

The Diversity Bonus: Learning from Dissimilar Distributed Clients in Personalized Federated Learning

Authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Guogang Zhu, Shaojie Tang, Xiaotian Li, Jiannong Cao

Abstract: Personalized Federated Learning (PFL) is a commonly used framework that allows clients to collaboratively train their personalized models. PFL is particularly useful for handling situations where data from different clients are not independent and identically distributed (non-IID). Previous research in PFL implicitly assumes that clients can gain more benefits from those with similar data distribu… ▽ More Personalized Federated Learning (PFL) is a commonly used framework that allows clients to collaboratively train their personalized models. PFL is particularly useful for handling situations where data from different clients are not independent and identically distributed (non-IID). Previous research in PFL implicitly assumes that clients can gain more benefits from those with similar data distributions. Correspondingly, methods such as personalized weight aggregation are developed to assign higher weights to similar clients during training. We pose a question: can a client benefit from other clients with dissimilar data distributions and if so, how? This question is particularly relevant in scenarios with a high degree of non-IID, where clients have widely different data distributions, and learning from only similar clients will lose knowledge from many other clients. We note that when dealing with clients with similar data distributions, methods such as personalized weight aggregation tend to enforce their models to be close in the parameter space. It is reasonable to conjecture that a client can benefit from dissimilar clients if we allow their models to depart from each other. Based on this idea, we propose DiversiFed which allows each client to learn from clients with diversified data distribution in personalized federated learning. DiversiFed pushes personalized models of clients with dissimilar data distributions apart in the parameter space while pulling together those with similar distributions. In addition, to achieve the above effect without using prior knowledge of data distribution, we design a loss function that leverages the model similarity to determine the degree of attraction and repulsion between any two models. Experiments on several datasets show that DiversiFed can benefit from dissimilar clients and thus outperform the state-of-the-art methods. △ Less

Submitted 22 July, 2024; originally announced July 2024.

Comments: 14 pages, 9 figures

arXiv:2407.14829 [pdf, other]

Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct dataset and baseline model respectively. In total, 32 competing teams register for the challenge, from which we received 11 successful submissions. In this paper, we will present the results of the challenge and a summary of the systems, highlighting commonalities and innovations among participating systems. Datasets and baseline models of the AI-Debater 2023 Challenge have been already released and can be accessed through the official website of the challenge. △ Less

Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.13205 [pdf, ps, other]

Transformer-based Single-Cell Language Model: A Survey

Authors: Wei Lan, Guohang He, Mingyang Liu, Qingfeng Chen, Junyue Cao, Wei Peng

Abstract: The transformers have achieved significant accomplishments in the natural language processing as its outstanding parallel processing capabilities and highly flexible attention mechanism. In addition, increasing studies based on transformers have been proposed to model single-cell data. In this review, we attempt to systematically summarize the single-cell language models and applications based on… ▽ More The transformers have achieved significant accomplishments in the natural language processing as its outstanding parallel processing capabilities and highly flexible attention mechanism. In addition, increasing studies based on transformers have been proposed to model single-cell data. In this review, we attempt to systematically summarize the single-cell language models and applications based on transformers. First, we provide a detailed introduction about the structure and principles of transformers. Then, we review the single-cell language models and large language models for single-cell data analysis. Moreover, we explore the datasets and applications of single-cell language models in downstream tasks such as batch correction, cell clustering, cell type annotation, gene regulatory network inference and perturbation response. Further, we discuss the challenges of single-cell language models and provide promising research directions. We hope this review will serve as an up-to-date reference for researchers interested in the direction of single-cell language models. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.12421 [pdf, other]

SafePowerGraph: Safety-aware Evaluation of Graph Neural Networks for Transmission Power Grids

Authors: Salah Ghamizi, Aleksandar Bojchevski, Aoxiang Ma, Jun Cao

Abstract: Power grids are critical infrastructures of paramount importance to modern society and their rapid evolution and interconnections has heightened the complexity of power systems (PS) operations. Traditional methods for grid analysis struggle with the computational demands of large-scale RES and ES integration, prompting the adoption of machine learning (ML) techniques, particularly Graph Neural Net… ▽ More Power grids are critical infrastructures of paramount importance to modern society and their rapid evolution and interconnections has heightened the complexity of power systems (PS) operations. Traditional methods for grid analysis struggle with the computational demands of large-scale RES and ES integration, prompting the adoption of machine learning (ML) techniques, particularly Graph Neural Networks (GNNs). GNNs have proven effective in solving the alternating current (AC) Power Flow (PF) and Optimal Power Flow (OPF) problems, crucial for operational planning. However, existing benchmarks and datasets completely ignore safety and robustness requirements in their evaluation and never consider realistic safety-critical scenarios that most impact the operations of the power grids. We present SafePowerGraph, the first simulator-agnostic, safety-oriented framework and benchmark for GNNs in PS operations. SafePowerGraph integrates multiple PF and OPF simulators and assesses GNN performance under diverse scenarios, including energy price variations and power line outages. Our extensive experiments underscore the importance of self-supervised learning and graph attention architectures for GNN robustness. We provide at https://github.com/yamizi/SafePowerGraph our open-source repository, a comprehensive leaderboard, a dataset and model zoo and expect our framework to standardize and advance research in the critical field of GNN for power systems. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.12248 [pdf, other]

Mitigating Interference of Microservices with a Scoring Mechanism in Large-scale Clusters

Authors: Dingyu Yang, Kangpeng Zheng, Shiyou Qian, Jian Cao, Guangtao Xue

Abstract: Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we ob… ▽ More Co-locating latency-critical services (LCSs) and best-effort jobs (BEJs) constitute the principal approach for enhancing resource utilization in production. Nevertheless, the co-location practice hurts the performance of LCSs due to resource competition, even when employing isolation technology. Through an extensive analysis of voluminous real trace data derived from two production clusters, we observe that BEJs typically exhibit periodic execution patterns and serve as the primary sources of interference to LCSs. Furthermore, despite occupying the same level of resource consumption, the diverse compositions of BEJs can result in varying degrees of interference on LCSs. Subsequently, we propose PISM, a proactive Performance Interference Scoring and Mitigating framework for LCSs through the optimization of BEJ scheduling. Firstly, PISM adopts a data-driven approach to establish a characterization and classification methodology for BEJs. Secondly, PISM models the relationship between the composition of BEJs on servers and the response time (RT) of LCSs. Thirdly, PISM establishes an interference scoring mechanism in terms of RT, which serves as the foundation for BEJ scheduling. We assess the effectiveness of PISM on a small-scale cluster and through extensive data-driven simulations. The experiment results demonstrate that PISM can reduce cluster interference by up to 41.5%, and improve the throughput of long-tail LCSs by 76.4%. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11385 [pdf, other]

Grasping Diverse Objects with Simulated Humanoids

Authors: Zhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu

Abstract: We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. T… ▽ More We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. To close this gap, we learn a controller that can pick up a large number (>1200) of objects and carry them to follow randomly generated trajectories. Our key insight is to leverage a humanoid motion representation that provides human-like motor skills and significantly speeds up training. Using only simplistic reward, state, and object representations, our method shows favorable scalability on diverse object and trajectories. For training, we do not need dataset of paired full-body motion and object trajectories. At test time, we only require the object mesh and desired trajectories for grasping and transporting. To demonstrate the capabilities of our method, we show state-of-the-art success rates in following object trajectories and generalizing to unseen objects. Code and models will be released. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Project page: https://www.zhengyiluo.com/Omnigrasp/

arXiv:2407.10613 [pdf, other]

Global destabilization of drift-tearing mode with coupling to discretized electron drift-wave instability

Authors: J. Bao, W. L. Zhang, Z. Lin, H. S. Cai, D. J. Liu, H. T. Chen, C. Dong, J. T. Cao, D. Li

Abstract: The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable sol… ▽ More The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable solutions. It is found that in the small EDD frequency (omega_*e) regime, DTM growth rate agrees well with local theory that is reduced with increasing omega_*e. However, when omega_*e exceeds a critical threshold omega_*crit, the strongly linear coupling between DTM and other discretized EDW instabilities happens so that the free energies from current and pressure channels can be released together and thus enhance the DTM, of which growth rate increases with increasing omega_*e and deviates from local theory results qualitatively. Correspondingly, a cross-scale mode structure forms with mixed polarization, namely, phi perturbation is dominated by electrostatic polarized short-wavelength oscillation as EDW instability character, and A_para perturbation remains typical tearing mode solution of Alfvenic polarized macroscopic structure. Within omega_*e > omega_*crit, the additional IDD causes phi oscillating structure to shift towards small density gradient domain, which cancels the extra drive from ion channel and thus DTM growth rate is insensitive to IDD frequency. Compared to EDD effects, the IDD effect alone with zero-omega_*e only leads to the stabilization of RTM that shows agreements between global simulation and local theory, which is no longer the condition for DTM regime. These results are useful for clarifying the DTM global properties with underlying physics mechanisms, which occurs in the regime of omega_*e >> gamma_c that is relevant to nowadays tokamak discharges with hot plasmas. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 23 pages, 15 figues

arXiv:2407.10486 [pdf, other]

IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

Authors: Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically i… ▽ More Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically investigated two indispensable characteristics that the LLMs-based QFS models should be harnessed, Lengthy Document Summarization and Efficiently Fine-grained Query-LLM Alignment, respectively. Correspondingly, we propose two modules called Query-aware HyperExpert and Query-focused Infini-attention to access the aforementioned characteristics. These innovations pave the way for broader application and accessibility in the field of QFS technology. Extensive experiments conducted on existing QFS benchmarks indicate the effectiveness and generalizability of the proposed approach. Our code is publicly available at https://github.com/DCDmllm/IDEAL_Summary. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.08136 [pdf, other]

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

Authors: Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma

Abstract: The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to… ▽ More The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to the relatively weaker audio signal, while methods driven exclusively by facial key points, although more stable in driving, can result in unnatural outcomes due to the excessive control of key point information. In addressing the previously mentioned challenges, in this paper, we introduce a novel approach which we named EchoMimic. EchoMimic is concurrently trained using both audios and facial landmarks. Through the implementation of a novel training strategy, EchoMimic is capable of generating portrait videos not only by audios and facial landmarks individually, but also by a combination of both audios and selected facial landmarks. EchoMimic has been comprehensively compared with alternative algorithms across various public datasets and our collected dataset, showcasing superior performance in both quantitative and qualitative evaluations. Additional visualization and access to the source code can be located on the EchoMimic project page. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07690 [pdf]

High power GaSb-based distributed feedback laser with laterally coupled dielectric gratings at 1.95μm

Authors: Zhengqing Ding, Juntian Cao, Kun Zhan, Yihang Chen, Lidan Zhou, Hao Tan, Chenao Yang, Ying Yu, Zhichuan Niu, Siyuan Yu

Abstract: Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce ad… ▽ More Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce additional absorption loss that limits device efficiency and output power. In this paper, we introduce a novel laterally coupled dielectric Bragg grating structure, which enables highly controllable, deterministic, and stable coupling between the grating and the optical mode. Our device demonstrates a continuous-wave output power of 47.02 mW at room temperature, exhibiting stable single-mode operation from 300-1000 mA and achieving a maximum side mode suppression ratio of 46.7 dB. These results underscore the innovative lateral coupled dielectric grating as a feasible and technologically superior approach for fabricating DFB and DBR lasers, which hold universal applicability across different material platforms and wavelength bands. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 9 pages, 7 figures, 1 table

MSC Class: 78A60 ACM Class: J.2.6

arXiv:2407.05130 [pdf, other]

doi 10.1016/j.ijheatmasstransfer.2022.122966

Heat transfer enhancement by mist/air two-phase flow in a high-temperature channel

Authors: Junxian Cao, Mengqi Ye, Haiwang Li, Tianyou Wang, Zhizhao Che

Abstract: Mist/air two-phase flow is a promising cooling technique for many applications such as internal cooling of gas turbine blades. A significant enhancement of heat transfer can be achieved with a low mass fraction of droplets by utilizing the latent heat of the droplets. Using newly designed atomizers to accurately control the mist droplets, this study experimentally explores the heat transfer perfor… ▽ More Mist/air two-phase flow is a promising cooling technique for many applications such as internal cooling of gas turbine blades. A significant enhancement of heat transfer can be achieved with a low mass fraction of droplets by utilizing the latent heat of the droplets. Using newly designed atomizers to accurately control the mist droplets, this study experimentally explores the heat transfer performance of mist/air flow in a high-temperature channel with a maximum temperature of 880 K. The effects of the mist/air mass ratio, droplet size, Reynolds number, and wall heat flux are studied. The results show that the cooling performance of the test section can be significantly improved by even adding a small amount of droplets. Considering mist droplets of different sizes, larger droplets can cause more remarkable temperature reduction, while smaller droplets can improve the uniformity of temperature distribution. For large droplets, the cooling effect in the upstream is more obvious than that in the downstream due to the interaction between the wall and the droplets, and with the increase of mist/air mass ratio, the area with obvious cooling extends downstream. The performance of mist/air cooling is tested by increasing the heat flux until the maximum temperature at the outlet reaches a predetermined value. Compared with air-only cooling, the increment in the wall heat flux by the mist/air cooling with a mass ratio of 3% can be up to 18.4%. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 16 pages, 10 figures

Journal ref: International Journal of Heat and Mass Transfer. Volume 193, 1 September 2022, 122966Volume 193, 1 September 2022, 122966

arXiv:2407.03937 [pdf, other]

TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models

Authors: Jiahuan Cao, Dezhi Peng, Peirong Zhang, Yongxin Shi, Yang Liu, Kai Ding, Lianwen Jin

Abstract: Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowle… ▽ More Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowledge-intensive tasks. In response to this dilemma, we propose \textbf{TongGu} (mean understanding ancient and modern), the first CCU-specific LLM, underpinned by three core contributions. First, we construct a two-stage instruction-tuning dataset ACCN-INS derived from rich classical Chinese corpora, aiming to unlock the full CCU potential of LLMs. Second, we propose Redundancy-Aware Tuning (RAT) to prevent catastrophic forgetting, enabling TongGu to acquire new capabilities while preserving its foundational knowledge. Third, we present a CCU Retrieval-Augmented Generation (CCU-RAG) technique to reduce hallucinations based on knowledge-grounding. Extensive experiments across 24 diverse CCU tasks validate TongGu's superior ability, underscoring the effectiveness of RAT and CCU-RAG. The model and dataset will be public available. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02759 [pdf]

Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System

Authors: Yang Zhao, Chang Zhou, Jin Cao, Yi Zhao, Shaobo Liu, Chiyu Cheng, Xingchen Li

Abstract: This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a sh… ▽ More This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a shared objective and allows for strategy communication to boost overall performance. Our results show marked improvements in metrics such as click-through rate (CTR), conversion rate, and total sales, confirming our method's efficacy in practical settings. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by 2024 5th International Conference on Artificial Intelligence and Electromechanical Automation IEEE (ISBN: 979-8-3503-6617-4)

arXiv:2407.00692 [pdf, ps, other]

The motivic fundamental group of a punctured elliptic curve and algebraic cycles

Authors: Jin Cao, Tomohide Terasoma

Abstract: In this paper, we consider the motivic fundamental group of the punctured elliptic curves as a DG complex in the DG category of elliptic motives and describe its resolution via Schur complexes. During this process, we find the algebraic cycles analogous to the Bloch-Totaro cycles. In this paper, we consider the motivic fundamental group of the punctured elliptic curves as a DG complex in the DG category of elliptic motives and describe its resolution via Schur complexes. During this process, we find the algebraic cycles analogous to the Bloch-Totaro cycles. △ Less

Submitted 30 June, 2024; originally announced July 2024.

MSC Class: 14C15; 14C25

Showing 1–50 of 1,647 results for author: Cao, J