Search | arXiv e-print repository

Deep potential for interaction between hydrated Cs+ and graphene

Authors: Yangjun Qin, Xiao Wan, Liuhua Mu, Zhicheng Zong, Tianhao Li, Nuo Yang

Abstract: The influence of hydrated cation-π interaction forces on the adsorption and filtration capabilities of graphene-based membrane materials is significant. However, the lack of interaction potential between hydrated Cs+ and graphene limits the scope of adsorption studies. Here, it is developed that a deep neural network potential function model to predict the interaction force between hydrated Cs+ an… ▽ More The influence of hydrated cation-π interaction forces on the adsorption and filtration capabilities of graphene-based membrane materials is significant. However, the lack of interaction potential between hydrated Cs+ and graphene limits the scope of adsorption studies. Here, it is developed that a deep neural network potential function model to predict the interaction force between hydrated Cs+ and graphene. The deep potential has DFT-level accuracy, enabling accurate property prediction. This deep potential is employed to investigate the properties of the graphene surface solution, including the density distribution, mean square displacement, and vibrational power spectrum of water. Furthermore, calculations of the molecular orbital electron distributions indicate the presence of electron migration in the molecular orbitals of graphene and hydrated Cs+, resulting in a strong electrostatic interaction force. The method provides a powerful tool to study the adsorption behavior of hydrated cations on graphene surfaces and offers a new solution for handling radionuclides. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2406.19421 [pdf, other]

The Belle II Detector Upgrades Framework Conceptual Design Report

Authors: H. Aihara, A. Aloisio, D. P. Auguste, M. Aversano, M. Babeluk, S. Bahinipati, Sw. Banerjee, M. Barbero, J. Baudot, A. Beaubien, F. Becherer, T. Bergauer, F. U. Bernlochner., V. Bertacchi, G. Bertolone, C. Bespin, M. Bessner, S. Bettarini, A. J. Bevan, B. Bhuyan, M. Bona, J. F. Bonis, J. Borah, F. Bosi, R. Boudagga , et al. (186 additional authors not shown)

Abstract: We describe the planned near-term and potential longer-term upgrades of the Belle II detector at the SuperKEKB electron-positron collider operating at the KEK laboratory in Tsukuba, Japan. These upgrades will allow increasingly sensitive searches for possible new physics beyond the Standard Model in flavor, tau, electroweak and dark sector physics that are both complementary to and competitive wit… ▽ More We describe the planned near-term and potential longer-term upgrades of the Belle II detector at the SuperKEKB electron-positron collider operating at the KEK laboratory in Tsukuba, Japan. These upgrades will allow increasingly sensitive searches for possible new physics beyond the Standard Model in flavor, tau, electroweak and dark sector physics that are both complementary to and competitive with the LHC and other experiments. △ Less

Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: Editor: F. Forti 170 pages

Report number: KEK-REPORT-2024-1, BELLE2-REPORT-2024-042

arXiv:2406.11831 [pdf, other]

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Authors: Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, Yu Liu

Abstract: Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the… ▽ More Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the prompt-following ability in image generation. We identified two main obstacles behind this issue. One is the misalignment between the next token prediction training in LLM and the requirement for discriminative prompt features in diffusion models. The other is the intrinsic positional bias introduced by the decoder-only architecture. To deal with this issue, we propose a novel framework to fully harness the capabilities of LLMs. Through the carefully designed usage guidance, we effectively enhance the text representation capability for prompt encoding and eliminate its inherent positional bias. This allows us to integrate state-of-the-art LLMs into the text-to-image generation model flexibly. Furthermore, we also provide an effective manner to fuse multiple LLMs into our framework. Considering the excellent performance and scaling capabilities demonstrated by the transformer architecture, we further design an LLM-Infused Diffusion Transformer (LI-DiT) based on the framework. We conduct extensive experiments to validate LI-DiT across model size and data size. Benefiting from the inherent ability of the LLMs and our innovative designs, the prompt understanding performance of LI-DiT easily surpasses state-of-the-art open-source models as well as mainstream closed-source commercial models including Stable Diffusion 3, DALL-E 3, and Midjourney V6. The powerful LI-DiT-10B will be available through the online platform and API after further optimization and security checks. △ Less

Submitted 21 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.03520 [pdf, other]

VideoPhy: Evaluating Physical Commonsense for Video Generation

Authors: Hritik Bansal, Zongyu Lin, Tianyi Xie, Zeshun Zong, Michal Yarom, Yonatan Bitton, Chenfanfu Jiang, Yizhou Sun, Kai-Wei Chang, Aditya Grover

Abstract: Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However,… ▽ More Recent advances in internet-scale video data pretraining have led to the development of text-to-video generative models that can create high-quality videos across a broad range of visual concepts and styles. Due to their ability to synthesize realistic motions and render complex objects, these generative models have the potential to become general-purpose simulators of the physical world. However, it is unclear how far we are from this goal with the existing text-to-video generative models. To this end, we present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities (e.g. marbles will roll down when placed on a slanted surface). Specifically, we curate a list of 688 captions that involve interactions between various material types in the physical world (e.g., solid-solid, solid-fluid, fluid-fluid). We then generate videos conditioned on these captions from diverse state-of-the-art text-to-video generative models, including open models (e.g., VideoCrafter2) and closed models (e.g., Lumiere from Google, Pika). Further, our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts, while also lack physical commonsense. Specifically, the best performing model, Pika, generates videos that adhere to the caption and physical laws for only 19.7% of the instances. VideoPhy thus highlights that the video generative models are far from accurately simulating the physical world. Finally, we also supplement the dataset with an auto-evaluator, VideoCon-Physics, to assess semantic adherence and physical commonsense at scale. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 36 pages, 26 figures, 8 tables

arXiv:2405.18515 [pdf, other]

Atlas3D: Physically Constrained Self-Supporting Text-to-3D for Simulation and Fabrication

Authors: Yunuo Chen, Tianyi Xie, Zeshun Zong, Xuan Li, Feng Gao, Yin Yang, Ying Nian Wu, Chenfanfu Jiang

Abstract: Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embod… ▽ More Existing diffusion-based text-to-3D generation methods primarily focus on producing visually realistic shapes and appearances, often neglecting the physical constraints necessary for downstream tasks. Generated models frequently fail to maintain balance when placed in physics-based simulations or 3D printed. This balance is crucial for satisfying user design intentions in interactive gaming, embodied AI, and robotics, where stable models are needed for reliable interaction. Additionally, stable models ensure that 3D-printed objects, such as figurines for home decoration, can stand on their own without requiring additional supports. To fill this gap, we introduce Atlas3D, an automatic and easy-to-implement method that enhances existing Score Distillation Sampling (SDS)-based text-to-3D tools. Atlas3D ensures the generation of self-supporting 3D models that adhere to physical laws of stability under gravity, contact, and friction. Our approach combines a novel differentiable simulation-based loss function with physically inspired regularization, serving as either a refinement or a post-processing module for existing frameworks. We verify Atlas3D's efficacy through extensive generation tasks and validate the resulting 3D models in both simulated and real-world environments. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.13046 [pdf, other]

MoVA: Adapting Mixture of Vision Experts to Multimodal Context

Authors: Zhuofan Zong, Bingqi Ma, Dazhong Shen, Guanglu Song, Hao Shao, Dongzhi Jiang, Hongsheng Li, Yu Liu

Abstract: As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understandi… ▽ More As the key component in multimodal large language models (MLLMs), the ability of the visual encoder greatly affects MLLM's understanding on diverse image content. Although some large-scale pretrained vision encoders such as vision encoders in CLIP and DINOv2 have brought promising performance, we found that there is still no single vision encoder that can dominate various image content understanding, e.g., the CLIP vision encoder leads to outstanding results on general image understanding but poor performance on document or chart content. To alleviate the bias of CLIP vision encoder, we first delve into the inherent behavior of different pre-trained vision encoders and then propose the MoVA, a powerful and novel MLLM, adaptively routing and fusing task-specific vision experts with a coarse-to-fine mechanism. In the coarse-grained stage, we design a context-aware expert routing strategy to dynamically select the most suitable vision experts according to the user instruction, input image, and expertise of vision experts. This benefits from the powerful model function understanding ability of the large language model (LLM) equipped with expert-routing low-rank adaptation (LoRA). In the fine-grained stage, we elaborately conduct the mixture-of-vision-expert adapter (MoV-Adapter) to extract and fuse task-specific knowledge from various experts. This coarse-to-fine paradigm effectively leverages representations from experts based on multimodal context and model expertise, further enhancing the generalization ability. We conduct extensive experiments to evaluate the effectiveness of the proposed approach. Without any bells and whistles, MoVA can achieve significant performance gains over current state-of-the-art methods in a wide range of challenging multimodal benchmarks. Codes and models will be available at https://github.com/TempleX98/MoVA. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.03653 [pdf, other]

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Authors: Dongzhi Jiang, Guanglu Song, Xiaoshi Wu, Renrui Zhang, Dazhong Shen, Zhuofan Zong, Yu Liu, Hongsheng Li

Abstract: Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffu… ▽ More Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensively investigated. We observe that the misalignment is caused by inadequate token attention activation. We further attribute this phenomenon to the diffusion model's insufficient condition utilization, which is caused by its training paradigm. To address the issue, we propose CoMat, an end-to-end diffusion model fine-tuning strategy with an image-to-text concept matching mechanism. We leverage an image captioning model to measure image-to-text alignment and guide the diffusion model to revisit ignored tokens. A novel attribute concentration module is also proposed to address the attribute binding problem. Without any image or human preference data, we use only 20K text prompts to fine-tune SDXL to obtain CoMat-SDXL. Extensive experiments show that CoMat-SDXL significantly outperforms the baseline model SDXL in two text-to-image alignment benchmarks and achieves start-of-the-art performance. △ Less

Submitted 3 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Project Page: https://caraj7.github.io/comat

arXiv:2403.16999 [pdf, other]

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Authors: Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li

Abstract: Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduc… ▽ More Multi-Modal Large Language Models (MLLMs) have demonstrated impressive performance in various VQA tasks. However, they often lack interpretability and struggle with complex visual inputs, especially when the resolution of the input image is high or when the interested region that could provide key information for answering the question is small. To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions. Additionally, about 98k pairs of them are annotated with detailed reasoning steps. Importantly, we propose a multi-turn processing pipeline that dynamically focuses on visual inputs and provides interpretable thoughts. We also introduce the related benchmark to evaluate the MLLMs in scenarios requiring specific local region identification. Extensive experiments demonstrate the effectiveness of our framework and shed light on better inference strategies. The Visual CoT dataset, benchmark, and pre-trained models are released to foster further research in this direction. △ Less

Submitted 7 July, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

Comments: Code: https://github.com/deepcs233/Visual-CoT

arXiv:2403.13783 [pdf, other]

A Convex Formulation of Frictional Contact for the Material Point Method and Rigid Bodies

Authors: Zeshun Zong, Chenfanfu Jiang, Xuchen Han

Abstract: In this paper, we introduce a novel convex formulation that seamlessly integrates the Material Point Method (MPM) with articulated rigid body dynamics in frictional contact scenarios. We extend the linear corotational hyperelastic model into the realm of elastoplasticity and include an efficient return mapping algorithm. This approach is particularly effective for MPM simulations involving signifi… ▽ More In this paper, we introduce a novel convex formulation that seamlessly integrates the Material Point Method (MPM) with articulated rigid body dynamics in frictional contact scenarios. We extend the linear corotational hyperelastic model into the realm of elastoplasticity and include an efficient return mapping algorithm. This approach is particularly effective for MPM simulations involving significant deformation and topology changes, while preserving the convexity of the optimization problem. Our method ensures global convergence, enabling the use of large simulation time steps without compromising robustness. We have validated our approach through rigorous testing and performance evaluations, highlighting its superior capabilities in managing complex simulations relevant to robotics. Compared to previous MPM based robotic simulators, our method significantly improves the stability of contact resolution -- a critical factor in robot manipulation tasks. We make our method available in the open-source robotics toolkit, Drake. △ Less

Submitted 22 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: The supplemental video is available at https://youtu.be/5jrQtF5D0DA

arXiv:2401.15318 [pdf, other]

Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering

Authors: Yutao Feng, Xiang Feng, Yintong Shang, Ying Jiang, Chang Yu, Zeshun Zong, Tianjia Shao, Hongzhi Wu, Kun Zhou, Chenfanfu Jiang, Yin Yang

Abstract: We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian Splatting and Position-Based Dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesiv… ▽ More We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian Splatting and Position-Based Dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to GaussianShader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. For more information, please visit our project page at \url{https://gaussiansplashing.github.io/}. △ Less

Submitted 23 July, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

arXiv:2312.06160 [pdf, ps, other]

Open WDVV Equations and Frobenius Structures for Toric Calabi-Yau 3-Folds

Authors: Song Yu, Zhengyu Zong

Abstract: Let $X$ be a toric Calabi-Yau 3-fold and let $L\subset X$ be an Aganagic-Vafa outer brane. We prove two versions of open WDVV equations for the open Gromov-Witten theory of $(X,L)$. The first version of the open WDVV equation leads to the construction of a semi-simple (formal) Frobenius manifold and the second version leads to the construction of a flat (formal) $F$-manifold. Let $X$ be a toric Calabi-Yau 3-fold and let $L\subset X$ be an Aganagic-Vafa outer brane. We prove two versions of open WDVV equations for the open Gromov-Witten theory of $(X,L)$. The first version of the open WDVV equation leads to the construction of a semi-simple (formal) Frobenius manifold and the second version leads to the construction of a flat (formal) $F$-manifold. △ Less

Submitted 2 June, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 26 pages

MSC Class: 14N35; 53D45

arXiv:2311.12198 [pdf, other]

PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics

Authors: Tianyi Xie, Zeshun Zong, Yuxing Qiu, Xuan Li, Yutao Feng, Yin Yang, Chenfanfu Jiang

Abstract: We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principl… ▽ More We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, "cage meshes," or any other geometry embedding, highlighting the principle of "what you see is what you simulate (WS$^2$)." Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. Our project page is at: https://xpandora.github.io/PhysGaussian/ △ Less

Submitted 15 April, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: Accepted by CVPR 2024

arXiv:2310.17790 [pdf, other]

doi 10.1145/3610548.3618207

Neural Stress Fields for Reduced-order Elastoplasticity and Fracture

Authors: Zeshun Zong, Xuan Li, Minchen Li, Maurizio M. Chiaramonte, Wojciech Matusik, Eitan Grinspun, Kevin Carlberg, Chenfanfu Jiang, Peter Yichen Chen

Abstract: We propose a hybrid neural network and physics framework for reduced-order modeling of elastoplasticity and fracture. State-of-the-art scientific computing models like the Material Point Method (MPM) faithfully simulate large-deformation elastoplasticity and fracture mechanics. However, their long runtime and large memory consumption render them unsuitable for applications constrained by computati… ▽ More We propose a hybrid neural network and physics framework for reduced-order modeling of elastoplasticity and fracture. State-of-the-art scientific computing models like the Material Point Method (MPM) faithfully simulate large-deformation elastoplasticity and fracture mechanics. However, their long runtime and large memory consumption render them unsuitable for applications constrained by computation time and memory usage, e.g., virtual reality. To overcome these barriers, we propose a reduced-order framework. Our key innovation is training a low-dimensional manifold for the Kirchhoff stress field via an implicit neural representation. This low-dimensional neural stress field (NSF) enables efficient evaluations of stress values and, correspondingly, internal forces at arbitrary spatial locations. In addition, we also train neural deformation and affine fields to build low-dimensional manifolds for the deformation and affine momentum fields. These neural stress, deformation, and affine fields share the same low-dimensional latent space, which uniquely embeds the high-dimensional simulation state. After training, we run new simulations by evolving in this single latent space, which drastically reduces the computation time and memory consumption. Our general continuum-mechanics-based reduced-order framework is applicable to any phenomena governed by the elastodynamics equation. To showcase the versatility of our framework, we simulate a wide range of material behaviors, including elastica, sand, metal, non-Newtonian fluids, fracture, contact, and collision. We demonstrate dimension reduction by up to 100,000X and time savings by up to 10X. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.02638 [pdf, other]

P2CADNet: An End-to-End Reconstruction Network for Parametric 3D CAD Model from Point Clouds

Authors: Zhihao Zong, Fazhi He, Rubin Fan, Yuxin Liu

Abstract: Computer Aided Design (CAD), especially the feature-based parametric CAD, plays an important role in modern industry and society. However, the reconstruction of featured CAD model is more challenging than the reconstruction of other CAD models. To this end, this paper proposes an end-to-end network to reconstruct featured CAD model from point cloud (P2CADNet). Initially, the proposed P2CADNet arch… ▽ More Computer Aided Design (CAD), especially the feature-based parametric CAD, plays an important role in modern industry and society. However, the reconstruction of featured CAD model is more challenging than the reconstruction of other CAD models. To this end, this paper proposes an end-to-end network to reconstruct featured CAD model from point cloud (P2CADNet). Initially, the proposed P2CADNet architecture combines a point cloud feature extractor, a CAD sequence reconstructor and a parameter optimizer. Subsequently, in order to reconstruct the featured CAD model in an autoregressive way, the CAD sequence reconstructor applies two transformer decoders, one with target mask and the other without mask. Finally, for predicting parameters more precisely, we design a parameter optimizer with cross-attention mechanism to further refine the CAD feature parameters. We evaluate P2CADNet on the public dataset, and the experimental results show that P2CADNet has excellent reconstruction quality and accuracy. To our best knowledge, P2CADNet is the first end-to-end network to reconstruct featured CAD model from point cloud, and can be regarded as baseline for future works. Therefore, we open the source code at https://github.com/Blice0415/P2CADNet. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2309.01473 [pdf, ps, other]

Twisted Equivariant Gromov-Witten Theory of the Classifying Space of a Finite Group

Authors: Zhuoming Lan, Zhengyu Zong

Abstract: For any finite group $G$, the equivariant Gromov-Witten invariants of $[\mathbb{C}^r/G]$ can be viewed as a certain twisted Gromov-Witten invariants of the classifying stack $\mathcal{B} G$. In this paper, we use Tseng's orbifold quantum Riemann-Roch theorem to express the equivariant Gromov-Witten invariants of $[\mathbb{C}^r/G]$ as a sum over Feynman graphs, where the weight of each graph is exp… ▽ More For any finite group $G$, the equivariant Gromov-Witten invariants of $[\mathbb{C}^r/G]$ can be viewed as a certain twisted Gromov-Witten invariants of the classifying stack $\mathcal{B} G$. In this paper, we use Tseng's orbifold quantum Riemann-Roch theorem to express the equivariant Gromov-Witten invariants of $[\mathbb{C}^r/G]$ as a sum over Feynman graphs, where the weight of each graph is expressed in terms of descendant integrals over moduli spaces of stable curves and representations of $G$. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: This paper is a non-abelian generalization of arXiv:1310.4812

arXiv:2308.02164 [pdf]

Using Targeted Phonon Excitation to Modulate Thermal Conductivity of Boron Nitride

Authors: Dongkai Pan, Xiao Wan, Zhicheng Zong, Yangjun Qin, Nuo Yang

Abstract: Modulation of thermal conductivity has become a hotspot in the field of heat conduction. A novel strategy based on targeted phonon excitation has been recently proposed for efficient and reversible modulation of thermal conductivity. In this article, the effectiveness of that strategy is further evaluated on hexagonal boron nitride through ab initio methods. Results indicate that thermal conductiv… ▽ More Modulation of thermal conductivity has become a hotspot in the field of heat conduction. A novel strategy based on targeted phonon excitation has been recently proposed for efficient and reversible modulation of thermal conductivity. In this article, the effectiveness of that strategy is further evaluated on hexagonal boron nitride through ab initio methods. Results indicate that thermal conductivity can be increased from 885 W m-1 K-1 to 1151 W m-1 K-1 or decreased to 356 W m-1 K-1, thereby broadening the scope of applicability of this strategy. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 12 pages, 3 figures

arXiv:2306.05326 [pdf, ps, other]

Torus knots in Lens spaces, open Gromov-Witten invariants, and topological recursion

Authors: Jinghao Yu, Zhengyu Zong

Abstract: Starting from a torus knot $\mathcal{K}$ in the lens space $L(p,-1)$, we construct a Lagrangian sub-manifold $L_{\mathcal{K}}$ in $\mathcal{X}=\big(\mathcal{O}_{\mathbb{P}^1}(-1)\oplus \mathcal{O}_{\mathbb{P}^1}(-1)\big)/\mathbb{Z}_p$ under the conifold transition. We prove a mirror theorem which relates the all genus open-closed Gromov-Witten invariants of $(\mathcal{X},L_{\mathcal{K}})$ to the t… ▽ More Starting from a torus knot $\mathcal{K}$ in the lens space $L(p,-1)$, we construct a Lagrangian sub-manifold $L_{\mathcal{K}}$ in $\mathcal{X}=\big(\mathcal{O}_{\mathbb{P}^1}(-1)\oplus \mathcal{O}_{\mathbb{P}^1}(-1)\big)/\mathbb{Z}_p$ under the conifold transition. We prove a mirror theorem which relates the all genus open-closed Gromov-Witten invariants of $(\mathcal{X},L_{\mathcal{K}})$ to the topological recursion on the B-model spectral curve. This verifies a conjecture in \cite{Bor-Bri} in the case of lens space. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 43 pages, 6 figures

arXiv:2305.19877 [pdf]

Enhancing interfacial thermal conductance of Si/PVDF by strengthening atomic couplings

Authors: Zhicheng Zong, Shichen Deng, Yangjun Qin, Xiao Wan, Jiahong Zhan, Dengke Ma, Nuo Yang

Abstract: The thermal transport across inorganic/organic interfaces attracts interest for both academic and industry due to its widely applications in flexible electronics etc. Here, the interfacial thermal conductance of inorganic/organic interfaces consisting of silicon and polyvinylidene fluoride is systematically investigated by molecular dynamics simulations. Interestingly, it is demonstrated that a mo… ▽ More The thermal transport across inorganic/organic interfaces attracts interest for both academic and industry due to its widely applications in flexible electronics etc. Here, the interfacial thermal conductance of inorganic/organic interfaces consisting of silicon and polyvinylidene fluoride is systematically investigated by molecular dynamics simulations. Interestingly, it is demonstrated that a modified silicon surface with hydroxyl groups can drastically enhance the conductance by 698%. These results are elucidated based on interfacial couplings and lattice dynamics insights. This study not only provides feasible strategies to effectively modulate the interfacial thermal conductance of inorganic/organic interfaces but also deepens the understanding of the fundamental physics underlying phonon transport across interfaces. △ Less

Submitted 10 June, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.18295 [pdf, other]

RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths

Authors: Zeyue Xue, Guanglu Song, Qiushan Guo, Boxiao Liu, Zhuofan Zong, Yu Liu, Ping Luo

Abstract: Text-to-image generation has recently witnessed remarkable achievements. We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is achieved by stacking tens of mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling billions… ▽ More Text-to-image generation has recently witnessed remarkable achievements. We introduce a text-conditional image diffusion model, termed RAPHAEL, to generate highly artistic images, which accurately portray the text prompts, encompassing multiple nouns, adjectives, and verbs. This is achieved by stacking tens of mixture-of-experts (MoEs) layers, i.e., space-MoE and time-MoE layers, enabling billions of diffusion paths (routes) from the network input to the output. Each path intuitively functions as a "painter" for depicting a particular textual concept onto a specified image region at a diffusion timestep. Comprehensive experiments reveal that RAPHAEL outperforms recent cutting-edge models, such as Stable Diffusion, ERNIE-ViLG 2.0, DeepFloyd, and DALL-E 2, in terms of both image quality and aesthetic appeal. Firstly, RAPHAEL exhibits superior performance in switching images across diverse styles, such as Japanese comics, realism, cyberpunk, and ink illustration. Secondly, a single model with three billion parameters, trained on 1,000 A100 GPUs for two months, achieves a state-of-the-art zero-shot FID score of 6.61 on the COCO dataset. Furthermore, RAPHAEL significantly surpasses its counterparts in human evaluation on the ViLG-300 benchmark. We believe that RAPHAEL holds the potential to propel the frontiers of image generation research in both academia and industry, paving the way for future breakthroughs in this rapidly evolving field. More details can be found on a webpage: https://raphael-painter.github.io/. △ Less

Submitted 9 March, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2305.16143 [pdf, other]

Condensed Prototype Replay for Class Incremental Learning

Authors: Jiangtao Kong, Zhenyu Zong, Tianyi Zhou, Huajie Shao

Abstract: Incremental learning (IL) suffers from catastrophic forgetting of old tasks when learning new tasks. This can be addressed by replaying previous tasks' data stored in a memory, which however is usually prone to size limits and privacy leakage. Recent studies store only class centroids as prototypes and augment them with Gaussian noises to create synthetic data for replay. However, they cannot effe… ▽ More Incremental learning (IL) suffers from catastrophic forgetting of old tasks when learning new tasks. This can be addressed by replaying previous tasks' data stored in a memory, which however is usually prone to size limits and privacy leakage. Recent studies store only class centroids as prototypes and augment them with Gaussian noises to create synthetic data for replay. However, they cannot effectively avoid class interference near their margins that leads to forgetting. Moreover, the injected noises distort the rich structure between real data and prototypes, hence even detrimental to IL. In this paper, we propose YONO that You Only Need to replay One condensed prototype per class, which for the first time can even outperform memory-costly exemplar-replay methods. To this end, we develop a novel prototype learning method that (1) searches for more representative prototypes in high-density regions by an attentional mean-shift algorithm and (2) moves samples in each class to their prototype to form a compact cluster distant from other classes. Thereby, the class margins are maximized, which effectively reduces interference causing future forgetting. In addition, we extend YONO to YONO+, which creates synthetic replay data by random sampling in the neighborhood of each prototype in the representation space. We show that the synthetic data can further improve YONO. Extensive experiments on IL benchmarks demonstrate the advantages of YONO/YONO+ over existing IL methods in terms of both accuracy and forgetting. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2304.00967 [pdf, other]

Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction

Authors: Zhuofan Zong, Dongzhi Jiang, Guanglu Song, Zeyue Xue, Jingyong Su, Hongsheng Li, Yu Liu

Abstract: In this paper, we propose a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection to leverage temporal information more effectively. The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our appr… ▽ More In this paper, we propose a new paradigm, named Historical Object Prediction (HoP) for multi-view 3D detection to leverage temporal information more effectively. The HoP approach is straightforward: given the current timestamp t, we generate a pseudo Bird's-Eye View (BEV) feature of timestamp t-k from its adjacent frames and utilize this feature to predict the object set at timestamp t-k. Our approach is motivated by the observation that enforcing the detector to capture both the spatial location and temporal motion of objects occurring at historical timestamps can lead to more accurate BEV feature learning. First, we elaborately design short-term and long-term temporal decoders, which can generate the pseudo BEV feature for timestamp t-k without the involvement of its corresponding camera images. Second, an additional object decoder is flexibly attached to predict the object targets using the generated pseudo BEV feature. Note that we only perform HoP during training, thus the proposed method does not introduce extra overheads during inference. As a plug-and-play approach, HoP can be easily incorporated into state-of-the-art BEV detection frameworks, including BEVFormer and BEVDet series. Furthermore, the auxiliary HoP approach is complementary to prevalent temporal modeling methods, leading to significant performance gains. Extensive experiments are conducted to evaluate the effectiveness of the proposed HoP on the nuScenes dataset. We choose the representative methods, including BEVFormer and BEVDet4D-Depth to evaluate our method. Surprisingly, HoP achieves 68.5% NDS and 62.4% mAP with ViT-L on nuScenes test, outperforming all the 3D object detectors on the leaderboard. Codes will be available at https://github.com/Sense-X/HoP. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: Tech report. Codes will be available at https://github.com/Sense-X/HoP

arXiv:2303.01098 [pdf, other]

doi 10.1007/s11433-023-2315-0

Determination of Molecular Energies via Quantum Imaginary Time Evolution in a Superconducting Qubit System

Authors: Zhiwen Zong, Sainan Huai, Tianqi Cai, Wenyan Jin, Ze Zhan, Zhenxing Zhang, Kunliang Bu, Liyang Sui, Ying Fei, Yicong Zheng, Shengyu Zhang, Jianlan Wu, Yi Yin

Abstract: As a valid tool for solving ground state problems, imaginary time evolution (ITE) is widely used in physical and chemical simulations. Different ITE-based algorithms in their quantum counterpart have recently been proposed and applied to some real systems. We experimentally realize the variational-based quantum imaginary time evolution (QITE) algorithm to simulate the ground state energy of hydrog… ▽ More As a valid tool for solving ground state problems, imaginary time evolution (ITE) is widely used in physical and chemical simulations. Different ITE-based algorithms in their quantum counterpart have recently been proposed and applied to some real systems. We experimentally realize the variational-based quantum imaginary time evolution (QITE) algorithm to simulate the ground state energy of hydrogen (H2) and lithium hydride (LiH) molecules in a superconducting qubit system. The H2 molecule is directly simulated using the 3-qubit circuit with unitary-coupled clusters (UCC) ansatz. We also combine QITE with the cluster mean-field (CMF) method to obtain an effective Hamiltonian. The LiH molecule is correspondingly simulated using the 3-qubit circuit with hardware-efficient ansatz. For comparison, the LiH molecule is also directly simulated using the 4-qubit circuit with UCC ansatz at the equilibrium point. All the experimental results show a convergence within 4 iterations, with high-fidelity ground state energy obtained. For a more complex system in the future, the CMF may allow further grouping of interactions to obtain an effective Hamiltonian, then the hybrid QITE algorithm can possibly simulate a relatively large-scale system with fewer qubits. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 11 pages, 5 figures

arXiv:2211.12860 [pdf, other]

DETRs with Collaborative Hybrid Assignments Training

Authors: Zhuofan Zong, Guanglu Song, Yu Liu

Abstract: In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, nam… ▽ More In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely $\mathcal{C}$o-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear margins with much fewer model sizes. Codes are available at \url{https://github.com/Sense-X/Co-DETR}. △ Less

Submitted 10 August, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: ICCV 2023. Codes are available at https://github.com/Sense-X/Co-DETR

arXiv:2211.05910 [pdf, other]

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Maurizio Denna, Abdel Younes, Ganzorig Gankhuyag, Jingang Huh, Myeong Kyun Kim, Kihwan Yoon, Hyeon-Cheol Moon, Seungho Lee, Yoonsik Choe, Jinwoo Jeong, Sungjei Kim, Maciej Smyl, Tomasz Latkowski, Pawel Kubik, Michal Sokolski, Yujie Ma, Jiahao Chao, Zhou Zhou, Hongfan Gao, Zhengfeng Yang, Zhenbing Zeng, Zhengyang Zhuge, Chenghua Li , et al. (71 additional authors not shown)

Abstract: Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose… ▽ More Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.07825, arXiv:2105.08826, arXiv:2211.04470, arXiv:2211.03885, arXiv:2211.05256

arXiv:2211.00683 [pdf, other]

Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

Authors: Cody Blakeney, Jessica Zosa Forde, Jonathan Frankle, Ziliang Zong, Matthew L. Leavitt

Abstract: Methods for improving the efficiency of deep network training (i.e. the resources required to achieve a given level of model quality) are of immediate benefit to deep learning practitioners. Distillation is typically used to compress models or improve model quality, but it's unclear if distillation actually improves training efficiency. Can the quality improvements of distillation be converted int… ▽ More Methods for improving the efficiency of deep network training (i.e. the resources required to achieve a given level of model quality) are of immediate benefit to deep learning practitioners. Distillation is typically used to compress models or improve model quality, but it's unclear if distillation actually improves training efficiency. Can the quality improvements of distillation be converted into training speed-ups, or do they simply increase final model quality with no resource savings? We conducted a series of experiments to investigate whether and how distillation can be used to accelerate training using ResNet-50 trained on ImageNet and BERT trained on C4 with a masked language modeling objective and evaluated on GLUE, using common enterprise hardware (8x NVIDIA A100). We found that distillation can speed up training by up to 1.96x in ResNet-50 trained on ImageNet and up to 1.42x on BERT when evaluated on GLUE. Furthermore, distillation for BERT yields optimal results when it is only performed for the first 20-50% of training. We also observed that training with distillation is almost always more efficient than training without distillation, even when using the poorest-quality model as a teacher, in both ResNet-50 and BERT. Finally, we found that it's possible to gain the benefit of distilling from an ensemble of teacher models, which has O(n) runtime cost, by randomly sampling a single teacher from the pool of teacher models on each step, which only has a O(1) runtime cost. Taken together, these results show that distillation can substantially improve training efficiency in both image classification and language modeling, and that a few simple optimizations to distillation protocols can further enhance these efficiency improvements. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.11153 [pdf, other]

Reversed Image Signal Processing and RAW Reconstruction. AIM 2022 Challenge Report

Authors: Marcos V. Conde, Radu Timofte, Yibin Huang, Jingyang Peng, Chang Chen, Cheng Li, Eduardo Pérez-Pellitero, Fenglong Song, Furui Bai, Shuai Liu, Chaoyu Feng, Xiaotao Wang, Lei Lei, Yu Zhu, Chenghua Li, Yingying Jiang, Yong A, Peisong Wang, Cong Leng, Jian Cheng, Xiaoyu Liu, Zhicun Yin, Zhilu Zhang, Junyi Li, Ming Liu , et al. (18 additional authors not shown)

Abstract: Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP). Numerous low-level vision tasks operate in the RAW domain (e.g. image denoising, white balance) due to its linear relationship with the scene irradiance, wide-range of information at 12bits, and sensor designs. Despite this, RAW image data… ▽ More Cameras capture sensor RAW images and transform them into pleasant RGB images, suitable for the human eyes, using their integrated Image Signal Processor (ISP). Numerous low-level vision tasks operate in the RAW domain (e.g. image denoising, white balance) due to its linear relationship with the scene irradiance, wide-range of information at 12bits, and sensor designs. Despite this, RAW image datasets are scarce and more expensive to collect than the already large and public RGB datasets. This paper introduces the AIM 2022 Challenge on Reversed Image Signal Processing and RAW Reconstruction. We aim to recover raw sensor images from the corresponding RGBs without metadata and, by doing this, "reverse" the ISP transformation. The proposed methods and benchmark establish the state-of-the-art for this low-level vision inverse problem, and generating realistic raw sensor readings can potentially benefit other tasks such as denoising and super-resolution. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: ECCV 2022 Advances in Image Manipulation (AIM) workshop

arXiv:2210.11078 [pdf, other]

Large-batch Optimization for Dense Visual Predictions

Authors: Zeyue Xue, Jianming Liang, Guanglu Song, Zhuofan Zong, Liang Chen, Yu Liu, Ping Luo

Abstract: Training a large-scale deep neural network in a large-scale dataset is challenging and time-consuming. The recent breakthrough of large-batch optimization is a promising way to tackle this challenge. However, although the current advanced algorithms such as LARS and LAMB succeed in classification models, the complicated pipelines of dense visual predictions such as object detection and segmentatio… ▽ More Training a large-scale deep neural network in a large-scale dataset is challenging and time-consuming. The recent breakthrough of large-batch optimization is a promising way to tackle this challenge. However, although the current advanced algorithms such as LARS and LAMB succeed in classification models, the complicated pipelines of dense visual predictions such as object detection and segmentation still suffer from the heavy performance drop in the large-batch training regime. To address this challenge, we propose a simple yet effective algorithm, named Adaptive Gradient Variance Modulator (AGVM), which can train dense visual predictors with very large batch size, enabling several benefits more appealing than prior arts. Firstly, AGVM can align the gradient variances between different modules in the dense visual predictors, such as backbone, feature pyramid network (FPN), detection, and segmentation heads. We show that training with a large batch size can fail with the gradient variances misaligned among them, which is a phenomenon primarily overlooked in previous work. Secondly, AGVM is a plug-and-play module that generalizes well to many different architectures (e.g., CNNs and Transformers) and different tasks (e.g., object detection, instance segmentation, semantic segmentation, and panoptic segmentation). It is also compatible with different optimizers (e.g., SGD and AdamW). Thirdly, a theoretical analysis of AGVM is provided. Extensive experiments on the COCO and ADE20K datasets demonstrate the superiority of AGVM. For example, it can train Faster R-CNN+ResNet50 in 4 minutes without losing performance. AGVM enables training an object detector with one billion parameters in just 3.5 hours, reducing the training time by 20.9x, whilst achieving 62.2 mAP on COCO. The deliverables are released at https://github.com/Sense-X/AGVM. △ Less

Submitted 20 October, 2022; originally announced October 2022.

Comments: 23 pages, 6 figures

Journal ref: NeurIPS 2022

arXiv:2209.09694 [pdf]

Modulating Thermal Conductivity via Targeted Phonon Excitation

Authors: Xiao Wan, Dongkai Pan, Jing-Tao Lü, Sebastian Volz, Lifa Zhang, Qing Hao, Yangjun Qin, Zhicheng Zong, Nuo Yang

Abstract: Thermal conductivity is a critical material property in numerous applications, such as those related to thermoelectric devices and heat dissipation. Effectively modulating thermal conductivity has become a great concern in the field of heat conduction. In this study, a quantum strategy is proposed to modulate thermal conductivity by exciting targeted phonons. The results show that the thermal cond… ▽ More Thermal conductivity is a critical material property in numerous applications, such as those related to thermoelectric devices and heat dissipation. Effectively modulating thermal conductivity has become a great concern in the field of heat conduction. In this study, a quantum strategy is proposed to modulate thermal conductivity by exciting targeted phonons. The results show that the thermal conductivity of graphene can be tailored in the range of 1559 W/m-K (49%) to 4093 W/m-K (128%), compared with the intrinsic value of 3189 W/m-K. A similar trend is also observed for graphene nanoribbons. The results are obtained through both ab initio calculations and molecular dynamics simulations. This brand-new quantum strategy to modulate thermal conductivity paves a way for quantum heat conduction. △ Less

Submitted 5 April, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2208.04844 [pdf, other]

Topology Optimization with Frictional Self-Contact

Authors: Zeshun Zong, Xuan Li, Jianping Ye, Sian Wen, Yin Yang, Danny M. Kaufman, Minchen Li, Chenfanfu Jiang

Abstract: Contact-aware topology optimization faces challenges in robustness, accuracy, and applicability to internal structural surfaces under self-contact. This work builds on the recently proposed barrier-based Incremental Potential Contact (IPC) model and presents a new self-contact-aware topology optimization framework. A combination of SIMP, adjoint sensitivity analysis, and the IPC frictional-contact… ▽ More Contact-aware topology optimization faces challenges in robustness, accuracy, and applicability to internal structural surfaces under self-contact. This work builds on the recently proposed barrier-based Incremental Potential Contact (IPC) model and presents a new self-contact-aware topology optimization framework. A combination of SIMP, adjoint sensitivity analysis, and the IPC frictional-contact model is investigated. Numerical examples for optimizing varying objective functions under contact are presented. The resulting algorithm proposed solves topology optimization for large deformation and complex frictionally contacting scenarios with accuracy and robustness. △ Less

Submitted 24 August, 2022; v1 submitted 6 August, 2022; originally announced August 2022.

arXiv:2208.03620 [pdf, other]

Learning Omnidirectional Flow in 360-degree Video via Siamese Representation

Authors: Keshav Bhandari, Bin Duan, Gaowen Liu, Hugo Latapie, Ziliang Zong, Yan Yan

Abstract: Optical flow estimation in omnidirectional videos faces two significant issues: the lack of benchmark datasets and the challenge of adapting perspective video-based methods to accommodate the omnidirectional nature. This paper proposes the first perceptually natural-synthetic omnidirectional benchmark dataset with a 360-degree field of view, FLOW360, with 40 different videos and 4,000 video frames… ▽ More Optical flow estimation in omnidirectional videos faces two significant issues: the lack of benchmark datasets and the challenge of adapting perspective video-based methods to accommodate the omnidirectional nature. This paper proposes the first perceptually natural-synthetic omnidirectional benchmark dataset with a 360-degree field of view, FLOW360, with 40 different videos and 4,000 video frames. We conduct comprehensive characteristic analysis and comparisons between our dataset and existing optical flow datasets, which manifest perceptual realism, uniqueness, and diversity. To accommodate the omnidirectional nature, we present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF). We train our network in a contrastive manner with a hybrid loss function that combines contrastive loss and optical flow loss. Extensive experiments verify the proposed framework's effectiveness and show up to 40% performance improvement over the state-of-the-art approaches. Our FLOW360 dataset and code are available at https://siamlof.github.io/. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: Accepted to ECCV22

arXiv:2207.06540 [pdf, other]

Lipschitz Continuity Retained Binary Neural Network

Authors: Yuzhang Shang, Dan Xu, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan

Abstract: Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective. However, robustness remains to be an ill-defined concept without sol… ▽ More Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective. However, robustness remains to be an ill-defined concept without solid theoretical support. In this work, we introduce the Lipschitz continuity, a well-defined functional property, as the rigorous criteria to define the model robustness for BNN. We then propose to retain the Lipschitz continuity as a regularization term to improve the model robustness. Particularly, while the popular Lipschitz-involved regularization methods often collapse in BNN due to its extreme sparsity, we design the Retention Matrices to approximate spectral norms of the targeted weight matrices, which can be deployed as the approximation for the Lipschitz constant of BNNs without the exact Lipschitz constant computation (NP-hard). Our experiments prove that our BNN-specific regularization method can effectively strengthen the robustness of BNN (testified on ImageNet-C), achieving state-of-the-art performance on CIFAR and ImageNet. △ Less

Submitted 16 July, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: Paper accepted to ECCV 2022

arXiv:2207.05785 [pdf, other]

Domain Gap Estimation for Source Free Unsupervised Domain Adaptation with Many Classifiers

Authors: Ziyang Zong, Jun He, Lei Zhang, Hai Huan

Abstract: In theory, the success of unsupervised domain adaptation (UDA) largely relies on domain gap estimation. However, for source free UDA, the source domain data can not be accessed during adaptation, which poses great challenge of measuring the domain gap. In this paper, we propose to use many classifiers to learn the source domain decision boundaries, which provides a tighter upper bound of the domai… ▽ More In theory, the success of unsupervised domain adaptation (UDA) largely relies on domain gap estimation. However, for source free UDA, the source domain data can not be accessed during adaptation, which poses great challenge of measuring the domain gap. In this paper, we propose to use many classifiers to learn the source domain decision boundaries, which provides a tighter upper bound of the domain gap, even if both of the domain data can not be simultaneously accessed. The source model is trained to push away each pair of classifiers whilst ensuring the correctness of the decision boundaries. In this sense, our many classifiers model separates the source different categories as far as possible which induces the maximum disagreement of many classifiers in the target domain, thus the transferable source domain knowledge is maximized. For adaptation, the source model is adapted to maximize the agreement among pairs of the classifiers. Thus the target features are pushed away from the decision boundaries. Experiments on several datasets of UDA show that our approach achieves state of the art performance among source free UDA approaches and can even compete to source available UDA methods. △ Less

Submitted 2 October, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

Comments: 31 pages

arXiv:2207.02970 [pdf, other]

Network Binarization via Contrastive Learning

Authors: Yuzhang Shang, Dan Xu, Ziliang Zong, Liqiang Nie, Yan Yan

Abstract: Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit. However, there is still a huge performance gap between Binary Neural Networks (BNNs) and their full-precision (FP) counterparts. As the quantization error caused by weights binarization has been reduced in earlier works, the activations binarization becomes the major obstacle for further imp… ▽ More Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit. However, there is still a huge performance gap between Binary Neural Networks (BNNs) and their full-precision (FP) counterparts. As the quantization error caused by weights binarization has been reduced in earlier works, the activations binarization becomes the major obstacle for further improvement of the accuracy. BNN characterises a unique and interesting structure, where the binary and latent FP activations exist in the same forward pass (i.e., $\text{Binarize}(\mathbf{a}_F) = \mathbf{a}_B$). To mitigate the information degradation caused by the binarization operation from FP to binary activations, we establish a novel contrastive learning framework while training BNNs through the lens of Mutual Information (MI) maximization. MI is introduced as the metric to measure the information shared between binary and FP activations, which assists binarization with contrastive learning. Specifically, the representation ability of the BNNs is greatly strengthened via pulling the positive pairs with binary and FP activations from the same input samples, as well as pushing negative pairs from different samples (the number of negative pairs can be exponentially large). This benefits the downstream tasks, not only classification but also segmentation and depth estimation, etc. The experimental results show that our method can be implemented as a pile-up module on existing state-of-the-art binarization methods and can remarkably improve the performance over them on CIFAR-10/100 and ImageNet, in addition to the great generalization ability on NYUD-v2. △ Less

Submitted 16 July, 2022; v1 submitted 6 July, 2022; originally announced July 2022.

Comments: Accepted to ECCV 2022

arXiv:2205.05675 [pdf, other]

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, Jingyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, Jinshan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

arXiv:2201.12712 [pdf, other]

Win the Lottery Ticket via Fourier Analysis: Frequencies Guided Network Pruning

Authors: Yuzhang Shang, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan

Abstract: With the remarkable success of deep learning recently, efficient network compression algorithms are urgently demanded for releasing the potential computational power of edge devices, such as smartphones or tablets. However, optimal network pruning is a non-trivial task which mathematically is an NP-hard problem. Previous researchers explain training a pruned network as buying a lottery ticket. In… ▽ More With the remarkable success of deep learning recently, efficient network compression algorithms are urgently demanded for releasing the potential computational power of edge devices, such as smartphones or tablets. However, optimal network pruning is a non-trivial task which mathematically is an NP-hard problem. Previous researchers explain training a pruned network as buying a lottery ticket. In this paper, we investigate the Magnitude-Based Pruning (MBP) scheme and analyze it from a novel perspective through Fourier analysis on the deep learning model to guide model designation. Besides explaining the generalization ability of MBP using Fourier transform, we also propose a novel two-stage pruning approach, where one stage is to obtain the topological structure of the pruned network and the other stage is to retrain the pruned network to recover the capacity using knowledge distillation from lower to higher on the frequency domain. Extensive experiments on CIFAR-10 and CIFAR-100 demonstrate the superiority of our novel Fourier analysis based MBP compared to other traditional MBP algorithms. △ Less

Submitted 29 January, 2022; originally announced January 2022.

Comments: accepted to ICASSP 2022

arXiv:2111.12624 [pdf, other]

Self-slimmed Vision Transformer

Authors: Zhuofan Zong, Kunchang Li, Guanglu Song, Yali Wang, Yu Qiao, Biao Leng, Yu Liu

Abstract: Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks. However, such powerful transformers bring a huge computation burden, because of the exhausting token-to-token comparison. The previous works focus on dropping insignificant tokens to reduce the computational cost of ViTs. But when the dropping ratio increases… ▽ More Vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks. However, such powerful transformers bring a huge computation burden, because of the exhausting token-to-token comparison. The previous works focus on dropping insignificant tokens to reduce the computational cost of ViTs. But when the dropping ratio increases, this hard manner will inevitably discard the vital tokens, which limits its efficiency. To solve the issue, we propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT. Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation. As a general method of token hard dropping, our TSM softly integrates redundant tokens into fewer informative ones. It can dynamically zoom visual attention without cutting off discriminative token relations in the images, even with a high slimming ratio. Furthermore, we introduce a concise Feature Recalibration Distillation (FRD) framework, wherein we design a reverse version of TSM (RTSM) to recalibrate the unstructured token in a flexible auto-encoder manner. Due to the similar structure between teacher and student, our FRD can effectively leverage structure knowledge for better convergence. Finally, we conduct extensive experiments to evaluate our SiT. It demonstrates that our method can speed up ViTs by 1.7x with negligible accuracy drop, and even speed up ViTs by 3.6x while maintaining 97% of their performance. Surprisingly, by simply arming LV-ViT with our SiT, we achieve new state-of-the-art performance on ImageNet. Code is available at https://github.com/Sense-X/SiT. △ Less

Submitted 12 September, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

Comments: Accepted by ECCV 2022. Code is available at https://github.com/Sense-X/SiT

arXiv:2110.12130 [pdf, other]

doi 10.1145/3474085.3475708

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

Authors: Zhuofan Zong, Qianggang Cao, Biao Leng

Abstract: Feature pyramid networks (FPN) are widely exploited for multi-scale feature fusion in existing advanced object detection frameworks. Numerous previous works have developed various structures for bidirectional feature fusion, all of which are shown to improve the detection performance effectively. We observe that these complicated network structures require feature pyramids to be stacked in a fixed… ▽ More Feature pyramid networks (FPN) are widely exploited for multi-scale feature fusion in existing advanced object detection frameworks. Numerous previous works have developed various structures for bidirectional feature fusion, all of which are shown to improve the detection performance effectively. We observe that these complicated network structures require feature pyramids to be stacked in a fixed order, which introduces longer pipelines and reduces the inference speed. Moreover, semantics from non-adjacent levels are diluted in the feature pyramid since only features at adjacent pyramid levels are merged by the local fusion operation in a sequence manner. To address these issues, we propose a novel architecture named RCNet, which consists of Reverse Feature Pyramid (RevFP) and Cross-scale Shift Network (CSN). RevFP utilizes local bidirectional feature fusion to simplify the bidirectional pyramid inference pipeline. CSN directly propagates representations to both adjacent and non-adjacent levels to enable multi-scale features more correlative. Extensive experiments on the MS COCO dataset demonstrate RCNet can consistently bring significant improvements over both one-stage and two-stage detectors with subtle extra computational overhead. In particular, RetinaNet is boosted to 40.2 AP, which is 3.7 points higher than baseline, by replacing FPN with our proposed model. On COCO test-dev, RCNet can achieve very competitive performance with a single-model single-scale 50.5 AP. Codes will be made available. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: Accepted by ACM MM2021

arXiv:2110.04397 [pdf, other]

Measure Twice, Cut Once: Quantifying Bias and Fairness in Deep Neural Networks

Authors: Cody Blakeney, Gentry Atkinson, Nathaniel Huish, Yan Yan, Vangelis Metris, Ziliang Zong

Abstract: Algorithmic bias is of increasing concern, both to the research community, and society at large. Bias in AI is more abstract and unintuitive than traditional forms of discrimination and can be more difficult to detect and mitigate. A clear gap exists in the current literature on evaluating the relative bias in the performance of multi-class classifiers. In this work, we propose two simple yet effe… ▽ More Algorithmic bias is of increasing concern, both to the research community, and society at large. Bias in AI is more abstract and unintuitive than traditional forms of discrimination and can be more difficult to detect and mitigate. A clear gap exists in the current literature on evaluating the relative bias in the performance of multi-class classifiers. In this work, we propose two simple yet effective metrics, Combined Error Variance (CEV) and Symmetric Distance Error (SDE), to quantitatively evaluate the class-wise bias of two models in comparison to one another. By evaluating the performance of these new metrics and by demonstrating their practical application, we show that they can be used to measure fairness as well as bias. These demonstrations show that our metrics can address specific needs for measuring bias in multi-class classification. △ Less

Submitted 8 October, 2021; originally announced October 2021.

arXiv:2110.00941 [pdf, ps, other]

Experimental Determination of Multi-Qubit Ground State via a Cluster Mean-Field Algorithm

Authors: Ze Zhan, Chongxin Run, Zhiwen Zong, Liang Xiang, Ying Fei, Wenyan Jin, Zhilong Jia, Peng Duan, Jianlan Wu, Yi Yin, Guoping Guo

Abstract: A quantum eigensolver is designed under a multi-layer cluster mean-field (CMF) algorithm by partitioning a quantum system into spatially-separated clusters. For each cluster, a reduced Hamiltonian is obtained after a partial average over its environment cluster. The products of eigenstates from different clusters construct a compressed Hilbert space, in which an effective Hamiltonian is diagonaliz… ▽ More A quantum eigensolver is designed under a multi-layer cluster mean-field (CMF) algorithm by partitioning a quantum system into spatially-separated clusters. For each cluster, a reduced Hamiltonian is obtained after a partial average over its environment cluster. The products of eigenstates from different clusters construct a compressed Hilbert space, in which an effective Hamiltonian is diagonalized to determine certain eigenstates of the whole Hamiltonian. The CMF method is numerically verified in multi-spin chains and experimentally studied in a fully-connected three-spin network, both yielding an excellent prediction of their ground states. △ Less

Submitted 3 October, 2021; originally announced October 2021.

arXiv:2108.12905 [pdf, other]

Lipschitz Continuity Guided Knowledge Distillation

Authors: Yuzhang Shang, Bin Duan, Ziliang Zong, Liqiang Nie, Yan Yan

Abstract: Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniq… ▽ More Knowledge distillation has become one of the most important model compression techniques by distilling knowledge from larger teacher networks to smaller student ones. Although great success has been achieved by prior distillation methods via delicately designing various types of knowledge, they overlook the functional properties of neural networks, which makes the process of applying those techniques to new tasks unreliable and non-trivial. To alleviate such problem, in this paper, we initially leverage Lipschitz continuity to better represent the functional characteristic of neural networks and guide the knowledge distillation process. In particular, we propose a novel Lipschitz Continuity Guided Knowledge Distillation framework to faithfully distill knowledge by minimizing the distance between two neural networks' Lipschitz constants, which enables teacher networks to better regularize student networks and improve the corresponding performance. We derive an explainable approximation algorithm with an explicit theoretical derivation to address the NP-hard problem of calculating the Lipschitz constant. Experimental results have shown that our method outperforms other benchmarks over several knowledge distillation tasks (e.g., classification, segmentation and object detection) on CIFAR-100, ImageNet, and PASCAL VOC datasets. △ Less

Submitted 29 August, 2021; originally announced August 2021.

Comments: This work has been accepted by ICCV 2021

arXiv:2108.04462 [pdf, other]

Deep Reinforcement Learning for Demand Driven Services in Logistics and Transportation Systems: A Survey

Authors: Zefang Zong, Tao Feng, Tong Xia, Depeng Jin, Yong Li

Abstract: Recent technology development brings the booming of numerous new Demand-Driven Services (DDS) into urban lives, including ridesharing, on-demand delivery, express systems and warehousing. In DDS, a service loop is an elemental structure, including its service worker, the service providers and corresponding service targets. The service workers should transport either humans or parcels from the prov… ▽ More Recent technology development brings the booming of numerous new Demand-Driven Services (DDS) into urban lives, including ridesharing, on-demand delivery, express systems and warehousing. In DDS, a service loop is an elemental structure, including its service worker, the service providers and corresponding service targets. The service workers should transport either humans or parcels from the providers to the target locations. Various planning tasks within DDS can thus be classified into two individual stages: 1) Dispatching, which is to form service loops from demand/supply distributions, and 2)Routing, which is to decide specific serving orders within the constructed loops. Generating high-quality strategies in both stages is important to develop DDS but faces several challenging. Meanwhile, deep reinforcement learning (DRL) has been developed rapidly in recent years. It is a powerful tool to solve these problems since DRL can learn a parametric model without relying on too many problem-based assumptions and optimize long-term effect by learning sequential decisions. In this survey, we first define DDS, then highlight common applications and important decision/control problems within. For each problem, we comprehensively introduce the existing DRL solutions. We also introduce open simulation environments for development and evaluation of DDS applications. Finally, we analyze remaining challenges and discuss further research opportunities in DRL solutions for DDS. △ Less

Submitted 23 March, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: 21 pages. survey preprint

arXiv:2106.07849 [pdf, other]

Simon Says: Evaluating and Mitigating Bias in Pruned Neural Networks with Knowledge Distillation

Authors: Cody Blakeney, Nathaniel Huish, Yan Yan, Ziliang Zong

Abstract: In recent years the ubiquitous deployment of AI has posed great concerns in regards to algorithmic bias, discrimination, and fairness. Compared to traditional forms of bias or discrimination caused by humans, algorithmic bias generated by AI is more abstract and unintuitive therefore more difficult to explain and mitigate. A clear gap exists in the current literature on evaluating and mitigating b… ▽ More In recent years the ubiquitous deployment of AI has posed great concerns in regards to algorithmic bias, discrimination, and fairness. Compared to traditional forms of bias or discrimination caused by humans, algorithmic bias generated by AI is more abstract and unintuitive therefore more difficult to explain and mitigate. A clear gap exists in the current literature on evaluating and mitigating bias in pruned neural networks. In this work, we strive to tackle the challenging issues of evaluating, mitigating, and explaining induced bias in pruned neural networks. Our paper makes three contributions. First, we propose two simple yet effective metrics, Combined Error Variance (CEV) and Symmetric Distance Error (SDE), to quantitatively evaluate the induced bias prevention quality of pruned models. Second, we demonstrate that knowledge distillation can mitigate induced bias in pruned neural networks, even with unbalanced datasets. Third, we reveal that model similarity has strong correlations with pruning induced bias, which provides a powerful method to explain why bias occurs in pruned neural networks. Our code is available at https://github.com/codestar12/pruning-distilation-bias △ Less

Submitted 14 June, 2021; originally announced June 2021.

arXiv:2106.01532 [pdf, other]

Noise Doesn't Lie: Towards Universal Detection of Deep Inpainting

Authors: Ang Li, Qiuhong Ke, Xingjun Ma, Haiqin Weng, Zhiyuan Zong, Feng Xue, Rui Zhang

Abstract: Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in a… ▽ More Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in an image. In this paper, we make the first attempt towards universal detection of deep inpainting, where the detection network can generalize well when detecting different deep inpainting methods. To this end, we first propose a novel data generation approach to generate a universal training dataset, which imitates the noise discrepancies exist in real versus inpainted image contents to train universal detectors. We then design a Noise-Image Cross-fusion Network (NIX-Net) to effectively exploit the discriminative information contained in both the images and their noise patterns. We empirically show, on multiple benchmark datasets, that our approach outperforms existing detection methods by a large margin and generalize well to unseen deep inpainting techniques. Our universal training dataset can also significantly boost the generalizability of existing detection methods. △ Less

Submitted 2 June, 2021; originally announced June 2021.

Comments: Accepted by IJCAI 2021

arXiv:2105.03333 [pdf, ps, other]

Quantify the Non-Markovian Process with Intervening Projections in a Superconducting Processor

Authors: Liang Xiang, Zhiwen Zong, Ze Zhan, Ying Fei, Chongxin Run, Yaozu Wu, Wenyan Jin, Zhilong Jia, Peng Duan, Jianlan Wu, Yi Yin, Guoping Guo

Abstract: A Markov assumption considers a physical system memoryless to simplify its dynamics. Whereas memory effect or the non-Markovian phenomenon is more general in nature. In the quantum regime, it is challenging to define or quantify the non-Markovianity because the measurement of a quantum system often interferes with it. We simulate the open quantum dynamics in a superconducting processor, then chara… ▽ More A Markov assumption considers a physical system memoryless to simplify its dynamics. Whereas memory effect or the non-Markovian phenomenon is more general in nature. In the quantum regime, it is challenging to define or quantify the non-Markovianity because the measurement of a quantum system often interferes with it. We simulate the open quantum dynamics in a superconducting processor, then characterize and quantify the non-Markovian process. With the complete set of intervening projections and the final measurement of the qubit, a restricted process tensor can be determined to account for the qubit-environment interaction. We apply the process tensor to predict the quantum state with memory effect, yielding an average fidelity of $99.86\%\pm 1.1\unicode{x2030}$. We further derive the Choi state of the rest process conditioned on history operations and quantify the non-Markovianity with a clear operational interpretation. △ Less

Submitted 18 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

arXiv:2104.11936 [pdf, ps, other]

doi 10.1103/PhysRevApplied.15.064005

Optimization of Controlled-Z Gate with Data-Driven Gradient Ascent Pulse Engineering in a Superconducting Qubit System

Authors: Zhiwen Zong, Zhenhai Sun, Zhangjingzi Dong, Chongxin Run, Liang Xiang, Ze Zhan, Qianlong Wang, Ying Fei, Yaozu Wu, Wenyan Jin, Cong Xiao, Zhilong Jia, Peng Duan, Jianlan Wu, Yi Yin, Guoping Guo

Abstract: The experimental optimization of a two-qubit controlled-Z (CZ) gate is realized following two different data-driven gradient ascent pulse engineering (GRAPE) protocols in the aim of optimizing the gate operator and the output quantum state, respectively. For both GRAPE protocols, the key computation of gradients utilizes mixed information of the input Z-control pulse and the experimental measureme… ▽ More The experimental optimization of a two-qubit controlled-Z (CZ) gate is realized following two different data-driven gradient ascent pulse engineering (GRAPE) protocols in the aim of optimizing the gate operator and the output quantum state, respectively. For both GRAPE protocols, the key computation of gradients utilizes mixed information of the input Z-control pulse and the experimental measurement. With an imperfect initial pulse in a flattop waveform, our experimental implementation shows that the CZ gate is quickly improved and the gate fidelities subject to the two optimized pulses are around 99%. Our experimental study confirms the applicability of the data-driven GRAPE protocols in the problem of the gate optimization. △ Less

Submitted 24 April, 2021; originally announced April 2021.

Journal ref: Phys. Rev. Applied 15, 064005 (2021)

arXiv:2103.06098 [pdf, ps, other]

doi 10.1103/PhysRevApplied.16.034050

Experimental Determination of Electronic States via Digitized Shortcut-to-Adiabaticity and Sequential Digitized Adiabaticity

Authors: Ze Zhan, Chongxin Run, Zhiwen Zong, Liang Xiang, Ying Fei, Zhenhai Sun, Yaozu Wu, Zhilong Jia, Peng Duan, Jianlan Wu, Yi Yin, Guoping Guo

Abstract: A combination of the digitized shortcut-to-adiabaticity (STA) and the sequential digitized adiabaticity is implemented in a superconducting quantum device to determine electronic states in two example systems, the H2 molecule and the topological Bernevig-Hughes-Zhang (BHZ) model. For H2, a short internuclear distance is chosen as a starting point, at which the ground and excited states are obtaine… ▽ More A combination of the digitized shortcut-to-adiabaticity (STA) and the sequential digitized adiabaticity is implemented in a superconducting quantum device to determine electronic states in two example systems, the H2 molecule and the topological Bernevig-Hughes-Zhang (BHZ) model. For H2, a short internuclear distance is chosen as a starting point, at which the ground and excited states are obtained via the digitized STA. From this starting point, a sequence of internuclear distances is built. The eigenstates at each distance are sequentially determined from those at the previous distance via the digitized adiabaticity, leading to the potential energy landscapes of H2. The same approach is applied to the BHZ model, and the valence and conduction bands are excellently obtained along the X-Γ-X linecut of the first Brillouin zone. Furthermore, a numerical simulation of this method is performed to successfully extract the ground states of hydrogen chains with the lengths of 3 to 6 atoms. △ Less

Submitted 18 June, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

Journal ref: Phys. Rev. Applied 16, 034050 (2021)

arXiv:2012.03096 [pdf, other]

Parallel Blockwise Knowledge Distillation for Deep Neural Network Compression

Authors: Cody Blakeney, Xiaomin Li, Yan Yan, Ziliang Zong

Abstract: Deep neural networks (DNNs) have been extremely successful in solving many challenging AI tasks in natural language processing, speech recognition, and computer vision nowadays. However, DNNs are typically computation intensive, memory demanding, and power hungry, which significantly limits their usage on platforms with constrained resources. Therefore, a variety of compression techniques (e.g. qu… ▽ More Deep neural networks (DNNs) have been extremely successful in solving many challenging AI tasks in natural language processing, speech recognition, and computer vision nowadays. However, DNNs are typically computation intensive, memory demanding, and power hungry, which significantly limits their usage on platforms with constrained resources. Therefore, a variety of compression techniques (e.g. quantization, pruning, and knowledge distillation) have been proposed to reduce the size and power consumption of DNNs. Blockwise knowledge distillation is one of the compression techniques that can effectively reduce the size of a highly complex DNN. However, it is not widely adopted due to its long training time. In this paper, we propose a novel parallel blockwise distillation algorithm to accelerate the distillation process of sophisticated DNNs. Our algorithm leverages local information to conduct independent blockwise distillation, utilizes depthwise separable layers as the efficient replacement block architecture, and properly addresses limiting factors (e.g. dependency, synchronization, and load balancing) that affect parallelism. The experimental results running on an AMD server with four Geforce RTX 2080Ti GPUs show that our algorithm can achieve 3x speedup plus 19% energy savings on VGG distillation, and 3.5x speedup plus 29% energy savings on ResNet distillation, both with negligible accuracy loss. The speedup of ResNet distillation can be further improved to 3.87 when using four RTX6000 GPUs in a distributed cluster. △ Less

Submitted 5 December, 2020; originally announced December 2020.

arXiv:2011.09290 [pdf, other]

Practical Privacy Attacks on Vertical Federated Learning

Authors: Haiqin Weng, Juntao Zhang, Xingjun Ma, Feng Xue, Tao Wei, Shouling Ji, Zhiyuan Zong

Abstract: Federated learning (FL) is a privacy-preserving learning paradigm that allows multiple parities to jointly train a powerful machine learning model without sharing their private data. According to the form of collaboration, FL can be further divided into horizontal federated learning (HFL) and vertical federated learning (VFL). In HFL, participants share the same feature space and collaborate on da… ▽ More Federated learning (FL) is a privacy-preserving learning paradigm that allows multiple parities to jointly train a powerful machine learning model without sharing their private data. According to the form of collaboration, FL can be further divided into horizontal federated learning (HFL) and vertical federated learning (VFL). In HFL, participants share the same feature space and collaborate on data samples, while in VFL, participants share the same sample IDs and collaborate on features. VFL has a broader scope of applications and is arguably more suitable for joint model training between large enterprises. In this paper, we focus on VFL and investigate potential privacy leakage in real-world VFL frameworks. We design and implement two practical privacy attacks: reverse multiplication attack for the logistic regression VFL protocol; and reverse sum attack for the XGBoost VFL protocol. We empirically show that the two attacks are (1) effective - the adversary can successfully steal the private training data, even when the intermediate outputs are encrypted to protect data privacy; (2) evasive - the attacks do not deviate from the protocol specification nor deteriorate the accuracy of the target model; and (3) easy - the adversary needs little prior knowledge about the data distribution of the target participant. We also show the leaked information is as effective as the raw training data in training an alternative classifier. We further discuss potential countermeasures and their challenges, which we hope can lead to several promising research directions. △ Less

Submitted 22 July, 2022; v1 submitted 18 November, 2020; originally announced November 2020.

arXiv:2010.08055 [pdf, other]

Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset

Authors: Keshav Bhandari, Mario A. DeLaGarza, Ziliang Zong, Hugo Latapie, Yan Yan

Abstract: Recently, there has been a growing interest in wearable sensors which provides new research perspectives for 360 ° video analysis. However, the lack of 360 ° datasets in literature hinders the research in this field. To bridge this gap, in this paper we propose a novel Egocentric (first-person) 360° Kinetic human activity video dataset (EgoK360). The EgoK360 dataset contains annotations of human a… ▽ More Recently, there has been a growing interest in wearable sensors which provides new research perspectives for 360 ° video analysis. However, the lack of 360 ° datasets in literature hinders the research in this field. To bridge this gap, in this paper we propose a novel Egocentric (first-person) 360° Kinetic human activity video dataset (EgoK360). The EgoK360 dataset contains annotations of human activity with different sub-actions, e.g., activity Ping-Pong with four sub-actions which are pickup-ball, hit, bounce-ball and serve. To the best of our knowledge, EgoK360 is the first dataset in the domain of first-person activity recognition with a 360° environmental setup, which will facilitate the egocentric 360 ° video understanding. We provide experimental results and comprehensive analysis of variants of the two-stream network for 360 egocentric activity recognition. The EgoK360 dataset can be downloaded from https://egok360.github.io/. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 5 pages, 5 figures, 1 table, 2020 IEEE International Conference on Image Processing (ICIP)

arXiv:2010.08045 [pdf, other]

Revisiting Optical Flow Estimation in 360 Videos

Authors: Keshav Bhandari, Ziliang Zong, Yan Yan

Abstract: Nowadays 360 video analysis has become a significant research topic in the field since the appearance of high-quality and low-cost 360 wearable devices. In this paper, we propose a novel LiteFlowNet360 architecture for 360 videos optical flow estimation. We design LiteFlowNet360 as a domain adaptation framework from perspective video domain to 360 video domain. We adapt it from simple kernel trans… ▽ More Nowadays 360 video analysis has become a significant research topic in the field since the appearance of high-quality and low-cost 360 wearable devices. In this paper, we propose a novel LiteFlowNet360 architecture for 360 videos optical flow estimation. We design LiteFlowNet360 as a domain adaptation framework from perspective video domain to 360 video domain. We adapt it from simple kernel transformation techniques inspired by Kernel Transformer Network (KTN) to cope with inherent distortion in 360 videos caused by the sphere-to-plane projection. First, we apply an incremental transformation of convolution layers in feature pyramid network and show that further transformation in inference and regularization layers are not important, hence reducing the network growth in terms of size and computation cost. Second, we refine the network by training with augmented data in a supervised manner. We perform data augmentation by projecting the images in a sphere and re-projecting to a plane. Third, we train LiteFlowNet360 in a self-supervised manner using target domain 360 videos. Experimental results show the promising results of 360 video optical flow estimation using the proposed novel architecture. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 8 Pages, 7 figures, 1 Table, 5 Equations, 25th International Conference on Pattern Recognition Milan, Italy

Showing 1–50 of 109 results for author: Zong, Z