Search | arXiv e-print repository

UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents

Authors: Yi Tu, Chong Zhang, Ya Guo, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang

Abstract: The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role in various real-world scenarios and applications. However, the research in VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, and unsuitable task formulations. To address these challenges, we propose a query-aware entity extraction head, namely UNER, to collaborate wi… ▽ More The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role in various real-world scenarios and applications. However, the research in VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, and unsuitable task formulations. To address these challenges, we propose a query-aware entity extraction head, namely UNER, to collaborate with existing multi-modal document transformers to develop more robust VrD-NER models. The UNER head considers the VrD-NER task as a combination of sequence labeling and reading order prediction, effectively addressing the issues of discontinuous entities in documents. Experimental evaluations on diverse datasets demonstrate the effectiveness of UNER in improving entity extraction performance. Moreover, the UNER head enables a supervised pre-training stage on various VrD-NER datasets to enhance the document transformer backbones and exhibits substantial knowledge transfer from the pre-training stage to the fine-tuning stage. By incorporating universal layout understanding, a pre-trained UNER-based model demonstrates significant advantages in few-shot and cross-linguistic scenarios and exhibits zero-shot entity extraction abilities. △ Less

Submitted 11 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

Comments: accepted by ACM Multimedia 2024

arXiv:2407.20255 [pdf]

Preparation and electrochemical properties of nitrogen-doped starch hard carbon anode materials for lithium-ion battery

Authors: Aoqi Huang, Yibo Tu, Qichao Yu

Abstract: Here, we report the synthesis of hard carbon materials(CSH) made from corn starch and their application as an anode in lithium-ion batteries. The study shows that the Microstructure and electrochemical properties of CSHs are affected by nitrogen doping. It is found that nitrogen is embedded in the carbon layer with graphite nitrogen, pyridine nitrogen, and pyrrole nitrogen, so as to the surface mo… ▽ More Here, we report the synthesis of hard carbon materials(CSH) made from corn starch and their application as an anode in lithium-ion batteries. The study shows that the Microstructure and electrochemical properties of CSHs are affected by nitrogen doping. It is found that nitrogen is embedded in the carbon layer with graphite nitrogen, pyridine nitrogen, and pyrrole nitrogen, so as to the surface morphology was changed and reduced the disorder of the materials. The electrochemical test results show that the introduction of nitrogen elements can increase the reversible capacity of the material, with the first discharge capacity reaching above 426.35 mAh g-1, and the rate performance also improves. When triethylenetetramine and pre-carbonized corn starch are carbonized at a mass ratio of 1:9, the obtained material has a reversible capacity of 122.04 mAh g-1 at a rate of 2 C. During the carbonization process, the nitrogen in triethylenetetramine is doped into the carbon materials, improving the electrochemical performance of the material. Keywords: Lithium-ion battery; Hard carbon; Corn starch; Nitrogen doping; △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.17779 [pdf, other]

doi 10.1145/3664647.3680859

DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction

Authors: Chaofan Gan, Yuanpeng Tu, Yuxi Li, Weiyao Lin

Abstract: With the recent burst of 2D and 3D data, cross-modal retrieval has attracted increasing attention recently. However, manual labeling by non-experts will inevitably introduce corrupted annotations given ambiguous 2D/3D content. Though previous works have addressed this issue by designing a naive division strategy with hand-crafted thresholds, their performance generally exhibits great sensitivity t… ▽ More With the recent burst of 2D and 3D data, cross-modal retrieval has attracted increasing attention recently. However, manual labeling by non-experts will inevitably introduce corrupted annotations given ambiguous 2D/3D content. Though previous works have addressed this issue by designing a naive division strategy with hand-crafted thresholds, their performance generally exhibits great sensitivity to the threshold value. Besides, they fail to fully utilize the valuable supervisory signals within each divided subset. To tackle this problem, we propose a Divide-and-conquer 2D-3D cross-modal Alignment and Correction framework (DAC), which comprises Multimodal Dynamic Division (MDD) and Adaptive Alignment and Correction (AAC). Specifically, the former performs accurate sample division by adaptive credibility modeling for each sample based on the compensation information within multimodal loss distribution. Then in AAC, samples in distinct subsets are exploited with different alignment strategies to fully enhance the semantic compactness and meanwhile alleviate over-fitting to noisy labels, where a self-correction strategy is introduced to improve the quality of representation. Moreover. To evaluate the effectiveness in real-world scenarios, we introduce a challenging noisy benchmark, namely Objaverse-N200, which comprises 200k-level samples annotated with 1156 realistic noisy labels. Extensive experiments on both traditional and the newly proposed benchmarks demonstrate the generality and superiority of our DAC, where DAC outperforms state-of-the-art models by a large margin. (i.e., with +5.9% gain on ModelNet40 and +5.8% on Objaverse-N200). △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: accepted by ACM MM 2024

arXiv:2407.17436 [pdf, other]

AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

Authors: Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

Abstract: Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in… ▽ More Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in recent regulations and policies, which makes it challenging to evaluate and compare FMs across these benchmarks. To bridge this gap, we introduce AIR-Bench 2024, the first AI safety benchmark aligned with emerging government regulations and company policies, following the regulation-based safety categories grounded in our AI risks study, AIR 2024. AIR 2024 decomposes 8 government regulations and 16 company policies into a four-tiered safety taxonomy with 314 granular risk categories in the lowest tier. AIR-Bench 2024 contains 5,694 diverse prompts spanning these categories, with manual curation and human auditing to ensure quality. We evaluate leading language models on AIR-Bench 2024, uncovering insights into their alignment with specified safety concerns. By bridging the gap between public benchmarks and practical AI risks, AIR-Bench 2024 provides a foundation for assessing model safety across jurisdictions, fostering the development of safer and more responsible AI systems. △ Less

Submitted 5 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.17075 [pdf, other]

SAFETY-J: Evaluating Safety with Critique

Authors: Yixiu Liu, Yuxiang Zheng, Shijie Xia, Jiajun Li, Yi Tu, Chaoling Song, Pengfei Liu

Abstract: The deployment of Large Language Models (LLMs) in content generation raises significant safety concerns, particularly regarding the transparency and interpretability of content evaluations. Current methods, primarily focused on binary safety classifications, lack mechanisms for detailed critique, limiting their utility for model improvement and user trust. To address these limitations, we introduc… ▽ More The deployment of Large Language Models (LLMs) in content generation raises significant safety concerns, particularly regarding the transparency and interpretability of content evaluations. Current methods, primarily focused on binary safety classifications, lack mechanisms for detailed critique, limiting their utility for model improvement and user trust. To address these limitations, we introduce SAFETY-J, a bilingual generative safety evaluator for English and Chinese with critique-based judgment. SAFETY-J utilizes a robust training dataset that includes diverse dialogues and augmented query-response pairs to assess safety across various scenarios comprehensively. We establish an automated meta-evaluation benchmark that objectively assesses the quality of critiques with minimal human intervention, facilitating scalable and continuous improvement. Additionally, SAFETY-J employs an iterative preference learning technique to dynamically refine safety assessments based on meta-evaluations and critiques. Our evaluations demonstrate that SAFETY-J provides more nuanced and accurate safety evaluations, thereby enhancing both critique quality and predictive reliability in complex content scenarios. To facilitate further research and application, we open-source SAFETY-J's training protocols, datasets, and code at https://github.com/GAIR-NLP/Safety-J. △ Less

Submitted 13 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.11683 [pdf, other]

Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning

Authors: Yunbin Tu, Liang Li, Li Su, Chenggang Yan, Qingming Huang

Abstract: Change captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo changes about location and scale, and certain objects might overlap others, resulting in perturbational and discrimination-degraded features between two images. Howe… ▽ More Change captioning aims to succinctly describe the semantic change between a pair of similar images, while being immune to distractors (illumination and viewpoint changes). Under these distractors, unchanged objects often appear pseudo changes about location and scale, and certain objects might overlap others, resulting in perturbational and discrimination-degraded features between two images. However, most existing methods directly capture the difference between them, which risk obtaining error-prone difference features. In this paper, we propose a distractors-immune representation learning network that correlates the corresponding channels of two image representations and decorrelates different ones in a self-supervised manner, thus attaining a pair of stable image representations under distractors. Then, the model can better interact them to capture the reliable difference features for caption generation. To yield words based on the most related difference features, we further design a cross-modal contrastive regularization, which regularizes the cross-modal alignment by maximizing the contrastive alignment between the attended difference features and generated words. Extensive experiments show that our method outperforms the state-of-the-art methods on four public datasets. The code is available at https://github.com/tuyunbin/DIRL. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.08032 [pdf, other]

Rossby Wave Instability and Substructure Formation in 3D Non-Ideal MHD Wind-Launching Disks

Authors: Chun-Yen Hsu, Zhi-Yun Li, Yisheng Tu, Xiao Hu, Min-Kai Lin

Abstract: Rings and gaps are routinely observed in the dust continuum emission of protoplanetary discs (PPDs). How they form and evolve remains debated. Previous studies have demonstrated the possibility of spontaneous gas rings and gaps formation in wind-launching disks. Here, we show that such gas substructures are unstable to the Rossby Wave Instability (RWI) through numerical simulations. Specifically,… ▽ More Rings and gaps are routinely observed in the dust continuum emission of protoplanetary discs (PPDs). How they form and evolve remains debated. Previous studies have demonstrated the possibility of spontaneous gas rings and gaps formation in wind-launching disks. Here, we show that such gas substructures are unstable to the Rossby Wave Instability (RWI) through numerical simulations. Specifically, shorter wavelength azimuthal modes develop earlier, and longer wavelength ones dominate later, forming elongated (arc-like) anti-cyclonic vortices in the rings and (strongly magnetized) cyclonic vortices in the gaps that persist until the end of the simulation. Highly elongated vortices with aspect ratios of 10 or more are found to decay with time in our non-ideal MHD simulation, in contrast with the hydro case. This difference could be caused by magnetically induced motions, particularly strong meridional circulations with large values of the azimuthal component of the vorticity, which may be incompatible with the columnar structure preferred by vortices. The cyclonic and anti-cyclonic RWI vortices saturate at moderate levels, modifying but not destroying the rings and gaps in the radial gas distribution of the disk. In particular, they do not shut off the poloidal magnetic flux accumulation in low-density regions and the characteristic meridional flow patterns that are crucial to the ring and gap formation in wind-launching disks. Nevertheless, the RWI and their associated vortices open up the possibility of producing non-axisymmetric dust features observed in a small fraction of protoplanetary disks through non-ideal MHD, although detailed dust treatment is needed to explore this possibility. △ Less

Submitted 13 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by MNRAS. 18 pages, 16 figures

arXiv:2407.05620 [pdf, other]

Geometry of optimal control in chemical reaction networks

Authors: Yikuan Zhang, Qi Ouyang, Yuhai Tu

Abstract: Although optimal control (OC) has been studied in stochastic thermodynamics for systems with continuous state variables, less is known in systems with discrete state variables, such as Chemical Reaction Networks (CRNs). Here, we develop a general theoretical framework to study OC of CRNs for changing the system from an initial distribution of states to a final distribution with minimum dissipation… ▽ More Although optimal control (OC) has been studied in stochastic thermodynamics for systems with continuous state variables, less is known in systems with discrete state variables, such as Chemical Reaction Networks (CRNs). Here, we develop a general theoretical framework to study OC of CRNs for changing the system from an initial distribution of states to a final distribution with minimum dissipation. We derive a ``Kirchhoff's law" for the probability current in the adiabatic limit, from which the optimal kinetic rates are determined analytically for any given probability trajectory. By using the optimal rates, we show that the total dissipation is determined by a $L_2$-distance measure in the probability space and derive an analytical expression for the metric tensor that depends on the probability distribution, network topology, and capacity of each link. Minimizing the total dissipation leads to the geodesic trajectory in the probability space and the corresponding OC protocol is determined by the Kirchhoff's law. To demonstrate our general approach, we use it to find a lower bound for the minimum dissipation that is tighter than existing bounds obtained with only global constraints. We also apply it to simple networks, e.g., fully connected 3-state CRNs with different local constraints and show that indirect pathway and non-functional transient state can play a crucial role in switching between different probability distributions efficiently. Future directions in studying OC in CRNs by using our general framework are discussed. △ Less

Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 6 pages,3 figures, with supplement

arXiv:2407.01664 [pdf, other]

Negative intercept of the apparent zero-temperature extrapolated linear-in-$T$ metallic resistivity

Authors: Yi-Ting Tu, Sankar Das Sarma

Abstract: We consider the well-known phonon scattering induced high-temperature linear-in-$T$ metallic resistivity, showing that a naive extrapolation of the effective linearity from high temperatures to $T=0$ leads to an apparent zero-temperature negative resistivity. The precise magnitude of this extrapolated $T=0$ negative resistivity depends on the temperature regime from where the extrapolation is carr… ▽ More We consider the well-known phonon scattering induced high-temperature linear-in-$T$ metallic resistivity, showing that a naive extrapolation of the effective linearity from high temperatures to $T=0$ leads to an apparent zero-temperature negative resistivity. The precise magnitude of this extrapolated $T=0$ negative resistivity depends on the temperature regime from where the extrapolation is carried out, and approaches the correct physical result of zero resistivity at $T=0$ only if the extrapolation starts from $T\gg T_D$, where $T_D$ is the Debye temperature. We establish a theoretical relationship between the negative intercept and the slope of the linear-in-$T$ resistivity as a function of the temperature $T$ from where the extrapolation is carried out. Experimental implications of our finding are discussed for the much-discussed Planckian behavior of the transport scattering rate. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 7 pages, 6 figures

arXiv:2407.00391 [pdf, ps, other]

Towards Quantifying Requirements Technical Debt for Software Requirements concerning Veracity: A Perspective and Research Roadmap

Authors: Judith Perera, Ewan Tempero, Yu-Cheng Tu, Kelly Blincoe, Matthias Galster

Abstract: Software practitioners can make sub-optimal decisions concerning requirements during gathering, documenting, prioritizing, and implementing requirements as software features or architectural design decisions -- this is captured by the metaphor `Requirements Technical Debt (RTD).' In our prior work, we developed a conceptual model to understand the quantification of RTD and support its management.… ▽ More Software practitioners can make sub-optimal decisions concerning requirements during gathering, documenting, prioritizing, and implementing requirements as software features or architectural design decisions -- this is captured by the metaphor `Requirements Technical Debt (RTD).' In our prior work, we developed a conceptual model to understand the quantification of RTD and support its management. In this paper, we present our perspective and the vision to apply the lens of RTD to software requirements concerning veracity, i.e., requirements related to truth, trust, authenticity, and demonstrability in software-intensive systems. Our goal is to cultivate awareness of veracity as an important concern and eventually support the management of RTD for software requirements concerning veracity, what we term as `Veracity Debt,' through its quantification. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.17621 [pdf]

Quasiphase transition of a single-file water chain influenced by atomic charges in water model using orientational-biased replica exchange Monte Carlo simulations

Authors: Liang Zhao, Junqing Ni, Zhi Zhu, Yusong Tu, Chunlei Wang

Abstract: The recently observed temperature-dependent quasiphase transition of the single-file water chain confined within a carbon nanotube in experiments has been validated by simple lattice theory and molecular dynamic simulations. It has been pointed out that atomic charges in water model is an important issue, yet how the values will affect the structural details and thermodynamic properties of the qua… ▽ More The recently observed temperature-dependent quasiphase transition of the single-file water chain confined within a carbon nanotube in experiments has been validated by simple lattice theory and molecular dynamic simulations. It has been pointed out that atomic charges in water model is an important issue, yet how the values will affect the structural details and thermodynamic properties of the quasiphase transition has not been fully revealed. In this work, we performed orientational-biased replica exchange Monte Carlo simulations in the canonical ensemble to explore the effect of atomic charges in SPC/E water model on the quasiphase transition of a single-file water chain. Based on the atomic charge values reported from literature, three distinct quasiphases are reproduced, comprising a fully hydrogen-bonded water chain at lower temperatures, a more ordered dipolar orientation along the x-axis at intermediate temperatures, and a completely disordered structure at higher temperatures. Then by increasing the atomic charge value, we find that the fragmentation of the entire water chain into shorter water segments, the orientational ordering of water dipoles, and the transition towards complete disorder are all inhibited. Consequently, the transition temperatures between three quasiphases have been shifted to higher temperatures. The thermodynamic analysis demonstrates that the increased atomic charge values enhance the hydrogen bonding between neighbouring water molecules also the electrostatic attraction within the water chain, leading to a longer water dipole correlation length even at higher temperatures. These findings shed light on the vital role of atomic charges in water models and also the electrostatic interaction in regulating the orientational ordering of water molecules under nanoconfinement. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 14 pages and 7 figures in Main text, 4 figures in Appendix

arXiv:2405.20810 [pdf, other]

Context-aware Difference Distilling for Multi-change Captioning

Authors: Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang

Abstract: Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences.… ▽ More Multi-change captioning aims to describe complex and coupled changes within an image pair in natural language. Compared with single-change captioning, this task requires the model to have higher-level cognition ability to reason an arbitrary number of changes. In this paper, we propose a novel context-aware difference distilling (CARD) network to capture all genuine changes for yielding sentences. Given an image pair, CARD first decouples context features that aggregate all similar/dissimilar semantics, termed common/difference context features. Then, the consistency and independence constraints are designed to guarantee the alignment/discrepancy of common/difference context features. Further, the common context features guide the model to mine locally unchanged features, which are subtracted from the pair to distill locally difference features. Next, the difference context features augment the locally difference features to ensure that all changes are distilled. In this way, we obtain an omni-representation of all changes, which is translated into linguistic sentences by a transformer decoder. Extensive experiments on three public datasets show CARD performs favourably against state-of-the-art methods.The code is available at https://github.com/tuyunbin/CARD. △ Less

Submitted 7 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: Accepted by ACL 2024 main conference (long paper)

arXiv:2405.03602 [pdf, other]

One nose but two nostrils: Learn to align with sparse connections between two olfactory cortices

Authors: Bo Liu, Shanshan Qin, Venkatesh Murthy, Yuhai Tu

Abstract: The integration of neural representations in the two hemispheres is an important problem in neuroscience. Recent experiments revealed that odor responses in cortical neurons driven by separate stimulation of the two nostrils are highly correlated. This bilateral alignment points to structured inter-hemispheric connections, but detailed mechanism remains unclear. Here, we hypothesized that continuo… ▽ More The integration of neural representations in the two hemispheres is an important problem in neuroscience. Recent experiments revealed that odor responses in cortical neurons driven by separate stimulation of the two nostrils are highly correlated. This bilateral alignment points to structured inter-hemispheric connections, but detailed mechanism remains unclear. Here, we hypothesized that continuous exposure to environmental odors shapes these projections and modeled it as online learning with local Hebbian rule. We found that Hebbian learning with sparse connections achieves bilateral alignment, exhibiting a linear trade-off between speed and accuracy. We identified an inverse scaling relationship between the number of cortical neurons and the inter-hemispheric projection density required for desired alignment accuracy, i.e., more cortical neurons allow sparser inter-hemispheric projections. We next compared the alignment performance of local Hebbian rule and the global stochastic-gradient-descent (SGD) learning for artificial neural networks. We found that although SGD leads to the same alignment accuracy with modestly sparser connectivity, the same inverse scaling relation holds. We showed that their similar performance originates from the fact that the update vectors of the two learning rules align significantly throughout the learning process. This insight may inspire efficient sparse local learning algorithms for more complex problems. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.01622 [pdf, other]

doi 10.1103/PhysRevB.109.214309

Interacting quasiperiodic spin chains in the prethermal regime

Authors: Yi-Ting Tu, David M. Long, Sankar Das Sarma

Abstract: Recent progress in the study of many-body localization (MBL) in strongly disordered interacting spin chains has emphasized the importance of distinguishing finite time prethermal behavior from long time and large volume asymptotics. We re-examine a reported non-ergodic extended (NEE) regime in the interacting quasiperiodic Ganeshan-Pixley-Das Sarma model from this perspective, and propose that thi… ▽ More Recent progress in the study of many-body localization (MBL) in strongly disordered interacting spin chains has emphasized the importance of distinguishing finite time prethermal behavior from long time and large volume asymptotics. We re-examine a reported non-ergodic extended (NEE) regime in the interacting quasiperiodic Ganeshan-Pixley-Das Sarma model from this perspective, and propose that this regime is a prethermal feature. Indeed, we argue that the NEE regime may be identified through a change in the functional form of spin-spin autocorrelation functions, demonstrating that the NEE regime is distinguishable within intermediate-time dynamics. This is in contrast with existing conjectures relating the NEE regime to the presence of an asymptotic mobility edge in the single-particle spectrum. Thus, we propose a mechanism for the formation of an NEE regime which does not rely on asymptotic properties of the spin chain. Namely, we propose that the NEE regime emerges due to regularly spaced deep wells in the disorder potential. The highly detuned sites suppress spin transport across the system, effectively cutting the chain, and producing a separation of time scales between the spreading of different operators. To support this proposal, we show that the NEE phenomenology also occurs in random models with deep wells but with no mobility edges, and does not occur in quasiperiodic models with mobility edges but with no deep wells. Our results support the broad conclusion that there is not a sharp distinction between the dynamics of quasiperiodically and randomly disordered systems in the prethermal regime. More specifically, we find that generic interacting quasiperiodic models do not have stable intermediate dynamical phases arising from their single-particle mobility edges, and that NEE phenomenology in such models is transient. △ Less

Submitted 21 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

Comments: 13 pages, 9 figures

Journal ref: Phys. Rev. B 109, 214309 (2024)

arXiv:2405.01552 [pdf, other]

Enhancing 3T Retinotopic Maps Using Diffeomorphic Registration

Authors: Negar Jalili-Mallak, Yanshuai Tu, Zhong-Lin Lu, Yalin Wang

Abstract: Retinotopic mapping aims to uncover the relationship between visual stimuli on the retina and neural responses on the visual cortical surface. This study advances retinotopic mapping by applying diffeomorphic registration to the 3T NYU retinotopy dataset, encompassing analyze-PRF and mrVista data. Diffeomorphic Registration for Retinotopic Maps (DRRM) quantifies the diffeomorphic condition, ensuri… ▽ More Retinotopic mapping aims to uncover the relationship between visual stimuli on the retina and neural responses on the visual cortical surface. This study advances retinotopic mapping by applying diffeomorphic registration to the 3T NYU retinotopy dataset, encompassing analyze-PRF and mrVista data. Diffeomorphic Registration for Retinotopic Maps (DRRM) quantifies the diffeomorphic condition, ensuring accurate alignment of retinotopic maps without topological violations. Leveraging the Beltrami coefficient and topological condition, DRRM significantly enhances retinotopic map accuracy. Evaluation against existing methods demonstrates DRRM's superiority on various datasets, including 3T and 7T retinotopy data. The application of diffeomorphic registration improves the interpretability of low-quality retinotopic maps, holding promise for clinical applications. △ Less

Submitted 1 March, 2024; originally announced May 2024.

Comments: 5 pages, 1 figures, 2 tables, 2024 IEEE International Symposium on Biomedical Imaging

arXiv:2404.18243 [pdf, other]

LEGENT: Open Platform for Embodied Agents

Authors: Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun

Abstract: Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open sourcing, challenging collective progress in this field. We introduce LEGENT, an open, scalable platfo… ▽ More Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open sourcing, challenging collective progress in this field. We introduce LEGENT, an open, scalable platform for developing embodied agents using LLMs and LMMs. LEGENT offers a dual approach: a rich, interactive 3D environment with communicable and actionable agents, paired with a user-friendly interface, and a sophisticated data generation pipeline utilizing advanced algorithms to exploit supervision from simulated worlds at scale. In our experiments, an embryonic vision-language-action model trained on LEGENT-generated data surpasses GPT-4V in embodied tasks, showcasing promising generalization capabilities. △ Less

Submitted 11 August, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: ACL 2024 System Demonstration

arXiv:2404.11577 [pdf, other]

Towards Reliable Empirical Machine Unlearning Evaluation: A Game-Theoretic View

Authors: Yiwen Tu, Pingbang Hu, Jiaqi Ma

Abstract: Machine unlearning is the process of updating machine learning models to remove the information of specific training data samples, in order to comply with data protection regulations that allow individuals to request the removal of their personal data. Despite the recent development of numerous unlearning algorithms, reliable evaluation of these algorithms remains an open research question. In thi… ▽ More Machine unlearning is the process of updating machine learning models to remove the information of specific training data samples, in order to comply with data protection regulations that allow individuals to request the removal of their personal data. Despite the recent development of numerous unlearning algorithms, reliable evaluation of these algorithms remains an open research question. In this work, we focus on membership inference attack (MIA) based evaluation, one of the most common approaches for evaluating unlearning algorithms, and address various pitfalls of existing evaluation metrics that lack reliability. Specifically, we propose a game-theoretic framework that formalizes the evaluation process as a game between unlearning algorithms and MIA adversaries, measuring the data removal efficacy of unlearning algorithms by the capability of the MIA adversaries. Through careful design of the game, we demonstrate that the natural evaluation metric induced from the game enjoys provable guarantees that the existing evaluation metrics fail to satisfy. Furthermore, we propose a practical and efficient algorithm to estimate the evaluation metric induced from the game, and demonstrate its effectiveness through both theoretical analysis and empirical experiments. This work presents a novel and reliable approach to empirically evaluating unlearning algorithms, paving the way for the development of more effective unlearning techniques. △ Less

Submitted 12 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.10332 [pdf, other]

Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning

Authors: Rui Hu, Yahan Tu, Jitao Sang

Abstract: Despite achieving outstanding performance on various cross-modal tasks, current large vision-language models (LVLMs) still suffer from hallucination issues, manifesting as inconsistencies between their generated responses and the corresponding images. Prior research has implicated that the low quality of instruction data, particularly the skewed balance between positive and negative samples, is a… ▽ More Despite achieving outstanding performance on various cross-modal tasks, current large vision-language models (LVLMs) still suffer from hallucination issues, manifesting as inconsistencies between their generated responses and the corresponding images. Prior research has implicated that the low quality of instruction data, particularly the skewed balance between positive and negative samples, is a significant contributor to model hallucinations. Recently, researchers have proposed high-quality instruction datasets, such as LRV-Instruction, to mitigate model hallucination. Nonetheless, our investigation reveals that hallucinatory concepts from different LVLMs exhibit specificity, i.e. the distribution of hallucinatory concepts varies significantly across models. Existing datasets did not consider the hallucination specificity of different models in the design processes, thereby diminishing their efficacy in mitigating model hallucination. In this paper, we propose a targeted instruction data generation framework named DFTG that tailored to the hallucination specificity of different models. Concretely, DFTG consists of two stages: hallucination diagnosis, which extracts the necessary information from the model's responses and images for hallucination diagnosis; and targeted data generation, which generates targeted instruction data based on diagnostic results. The experimental results on hallucination benchmarks demonstrate that the targeted instruction data generated by our method are more effective in mitigating hallucinations compared to previous datasets. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.06819 [pdf, other]

Enc2DB: A Hybrid and Adaptive Encrypted Query Processing Framework

Authors: Hui Li, Jingwen Shi, Qi Tian, Zheng Li, Yan Fu, Bingqing Shen, Yaofeng Tu

Abstract: As cloud computing gains traction, data owners are outsourcing their data to cloud service providers (CSPs) for Database Service (DBaaS), bringing in a deviation of data ownership and usage, and intensifying privacy concerns, especially with potential breaches by hackers or CSP insiders. To address that, encrypted database services propose encrypting every tuple and query statement before submitti… ▽ More As cloud computing gains traction, data owners are outsourcing their data to cloud service providers (CSPs) for Database Service (DBaaS), bringing in a deviation of data ownership and usage, and intensifying privacy concerns, especially with potential breaches by hackers or CSP insiders. To address that, encrypted database services propose encrypting every tuple and query statement before submitting to the CSP, ensuring data confidentiality when the CSP is honest-but-curious, or even compromised. Existing solutions either employ property preserving cryptography schemes, which can perform certain operations over ciphertext without decrypting the data over the CSP, or utilize trusted execution environment (TEE) to safeguard data and computations from the CSP. Based on these efforts, we introduce Enc2DB, a novel secure database system, following a hybrid strategy on PostgreSQL and openGauss. We present a micro-benchmarking test and self-adaptive mode switch strategy that can dynamically choose the best execution path (cryptography or TEE) to answer a given query. Besides, we also design and implement a ciphertext index compatible with native cost model and query optimizers to accelerate query processing. Empirical study over TPC-C test justifies that Enc2DB outperforms pure TEE and cryptography solutions, and our ciphertext index implementation also outperforms the state-of-the-art cryptographic-based system. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 33 pages,33 figures, DASAFAA24

arXiv:2404.06505 [pdf, other]

doi 10.1117/12.2653656

Characterizing visual cortical magnification with topological smoothing and optimal transportation

Authors: Yujian Xiong, Yanshuai Tu, Zhong-Lin Lu, Yalin Wang

Abstract: Human vision has different concentration on visual fields. Cortical magnification factor (CMF) is a popular measurement on visual acuity and cortex concentration. In order to achieve thorough measurement of CMF across the whole visual field, we propose a method to measure planar CMF upon retinotopic maps generated by pRF decoding, with help of our proposed methods: optimal transportation and topol… ▽ More Human vision has different concentration on visual fields. Cortical magnification factor (CMF) is a popular measurement on visual acuity and cortex concentration. In order to achieve thorough measurement of CMF across the whole visual field, we propose a method to measure planar CMF upon retinotopic maps generated by pRF decoding, with help of our proposed methods: optimal transportation and topological smoothing. The optimal transportation re-calculates vertex location in retinotopic mapping, and topological smoothing guarantees topological conditions in retinotopic maps, which allow us to calculate planar CMF with the proposed 1-ring patch method. The pipeline was applied to the HCP 7T dataset, giving new planar results on CMF measurement across all 181 subjects, which illustrate novel concentration behavior on visual fields and their individual difference. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Accepted by SPIE 2023

Journal ref: Proc. SPIE 12464, Medical Imaging 2023: Image Processing, 124641Z (3 April 2023)

arXiv:2404.06395 [pdf, other]

MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

Authors: Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

Abstract: The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce… ▽ More The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce MiniCPM, specifically the 1.2B and 2.4B non-embedding parameter variants, not only excel in their respective categories but also demonstrate capabilities on par with 7B-13B LLMs. While focusing on SLMs, our approach exhibits scalability in both model and data dimensions for future LLM research. Regarding model scaling, we employ extensive model wind tunnel experiments for stable and optimal scaling. For data scaling, we introduce a Warmup-Stable-Decay (WSD) learning rate scheduler (LRS), conducive to continuous training and domain adaptation. We present an in-depth analysis of the intriguing training dynamics that occurred in the WSD LRS. With WSD LRS, we are now able to efficiently study data-model scaling law without extensive retraining experiments on both axes of model and data, from which we derive the much higher compute optimal data-model ratio than Chinchilla Optimal. Additionally, we introduce MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE and MiniCPM-128K, whose excellent performance further cementing MiniCPM's foundation in diverse SLM applications. MiniCPM models are available publicly at https://github.com/OpenBMB/MiniCPM . △ Less

Submitted 3 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: revise according to peer review

arXiv:2404.04193 [pdf, other]

ToolEENet: Tool Affordance 6D Pose Estimation

Authors: Yunlong Wang, Lei Zhang, Yuyang Tu, Hui Zhang, Kaixin Bai, Zhaopeng Chen, Jianwei Zhang

Abstract: The exploration of robotic dexterous hands utilizing tools has recently attracted considerable attention. A significant challenge in this field is the precise awareness of a tool's pose when grasped, as occlusion by the hand often degrades the quality of the estimation. Additionally, the tool's overall pose often fails to accurately represent the contact interaction, thereby limiting the effective… ▽ More The exploration of robotic dexterous hands utilizing tools has recently attracted considerable attention. A significant challenge in this field is the precise awareness of a tool's pose when grasped, as occlusion by the hand often degrades the quality of the estimation. Additionally, the tool's overall pose often fails to accurately represent the contact interaction, thereby limiting the effectiveness of vision-guided, contact-dependent activities. To overcome this limitation, we present the innovative TOOLEE dataset, which, to the best of our knowledge, is the first to feature affordance segmentation of a tool's end-effector (EE) along with its defined 6D pose based on its usage. Furthermore, we propose the ToolEENet framework for accurate 6D pose estimation of the tool's EE. This framework begins by segmenting the tool's EE from raw RGBD data, then uses a diffusion model-based pose estimator for 6D pose estimation at a category-specific level. Addressing the issue of symmetry in pose estimation, we introduce a symmetry-aware pose representation that enhances the consistency of pose estimation. Our approach excels in this field, demonstrating high levels of precision and generalization. Furthermore, it shows great promise for application in contact-based manipulation scenarios. All data and codes are available on the project website: https://yuyangtu.github.io/projectToolEENet.html △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.13409 [pdf, ps, other]

Influence of concentration-dependent material properties on the fracture and debonding of electrode particles with core-shell structure

Authors: Y. Tu, B. Wu, W. Ai, E. Martínez-Pañeda

Abstract: Core-shell electrode particle designs offer a route to improved lithium-ion battery performance. However, they are susceptible to mechanical damage such as fracture and debonding, which can significantly reduce their lifetime. Using a coupled finite element model, we explore the impacts of diffusion-induced stresses on the failure mechanisms of an exemplar system with an NMC811 core and an NMC111… ▽ More Core-shell electrode particle designs offer a route to improved lithium-ion battery performance. However, they are susceptible to mechanical damage such as fracture and debonding, which can significantly reduce their lifetime. Using a coupled finite element model, we explore the impacts of diffusion-induced stresses on the failure mechanisms of an exemplar system with an NMC811 core and an NMC111 shell. In particular, we systematically compare the implications of assuming constant material properties against using Li concentration-dependent diffusion coefficient and partial molar volume. With constant material properties, our results show that smaller cores with thinner shells avoid debonding and fracture regimes. When factoring in a concentration-dependent partial molar volume, the maximum values of tensile hoop stress in the shell are found to be significantly lower than those predicted with constant properties, reducing the likelihood of fracture. Furthermore, with a concentration-dependent diffusion coefficient, significant barriers to full electrode utilisation are observed due to reduced lithium mobility at high states of lithiation. This provides a possible explanation for the reduced accessible capacity observed in experiments. Shell thickness is found to be the dominant factor in precluding structural integrity once the concentration dependency is accounted for. These findings shed new light on the performance and effective design of core-shell electrode particles. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.09890 [pdf, other]

doi 10.1103/PhysRevB.109.235118

Role of many phonon modes on the high-temperature linear-in-$T$ electronic resistivity

Authors: Sankar Das Sarma, Yi-Ting Tu

Abstract: We theoretically consider the possibility that phonons may be playing a role in the observed linear-in-$T$ resistivity in cuprates by focusing on the obvious question: How can phonon scattering be consistent with a linear-in-$T$ resistivity with a constant slope given that cuprates have many phonon modes with different energies and electron-phonon couplings (e.g. 21 phonon modes for LSCO)? We show… ▽ More We theoretically consider the possibility that phonons may be playing a role in the observed linear-in-$T$ resistivity in cuprates by focusing on the obvious question: How can phonon scattering be consistent with a linear-in-$T$ resistivity with a constant slope given that cuprates have many phonon modes with different energies and electron-phonon couplings (e.g. 21 phonon modes for LSCO)? We show using an arbitrarily large number of independent phonon modes that, within a model Boltzmann transport theory, the emergent high-$T$ linear-in-$T$ resistivity manifests an approximately constant slope independent of the number of phonon modes except in some fine-tuned narrow temperature regimes. We also comment on the quantitative magnitude of the linear-in-$T$ resistivity in cuprates pointing out the constraints on the effective electron-phonon coupling necessary to produce the observed resistivity. △ Less

Submitted 10 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 10 pages, 3 figures

Journal ref: Phys. Rev. B 109, 235118 (2024)

arXiv:2403.07777 [pdf, other]

doi 10.1093/mnras/stae1639

Fragmentation of Dense Rotation-Dominated Structures Fed by Collapsing Gravomagneto-Sheetlets and Origin of Misaligned 100 au-Scale Binaries and Multiple Systems

Authors: Yisheng Tu, Zhi-Yun Li, Zhaohuan Zhu, Chun-Yen Hsu

Abstract: The majority of stars are in binary/multiple systems. How such systems form in turbulent, magnetized cores of molecular clouds in the presence of non-ideal MHD effects remains relatively under-explored. Through ATHENA++-based non-ideal MHD AMR simulations with ambipolar diffusion, we show that the collapsing protostellar envelope is dominated by dense gravo-magneto-sheetlets, a turbulence-warped v… ▽ More The majority of stars are in binary/multiple systems. How such systems form in turbulent, magnetized cores of molecular clouds in the presence of non-ideal MHD effects remains relatively under-explored. Through ATHENA++-based non-ideal MHD AMR simulations with ambipolar diffusion, we show that the collapsing protostellar envelope is dominated by dense gravo-magneto-sheetlets, a turbulence-warped version of the classic pseudodisk produced by anisotropic magnetic resistance to the gravitational collapse, in agreement with previous simulations of turbulent, magnetized single-star formation. The sheetlets feed mass, magnetic fields, and angular momentum to a Dense ROtation-Dominated (DROD) structure, which fragments into binary/multiple systems. This DROD fragmentation scenario is a more dynamic variant of the traditional disk fragmentation scenario for binary/multiple formation, with dense spiral filaments created by inhomogeneous feeding from the highly structured larger-scale sheetlets rather than the need for angular momentum transport, which is dominated by magnetic braking. Provided that the local material is sufficiently demagnetized, with a plasma-$β$ of 10 or more, collisions between the dense spiraling filaments play a key role in facilitating gravitational collapse and stellar companion formation by pushing the local magnetic Toomre parameter $Q_\mathrm{m}$ below unity. This mechanism can naturally produce {\it in situ} misaligned systems on the 100-au scale, often detected with high-resolution Atacama Large Millimeter Array (ALMA) observations. Our simulations also highlight the importance of non-ideal MHD effects, which affect whether fragmentation occurs and, if so, the masses and orbital parameters of the stellar companions formed. △ Less

Submitted 23 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.17989 [pdf]

Hydrogen bonding in water under extreme confinement unveiled by nanoscale vibrational spectroscopy and simulations

Authors: Xintong Xu, Xin Jin, Matthias Kuehne, De-Liang Bao, Joel Martis, Yu-Ming Tu, Cody L. Ritt, Juan Carlos Idrobo, Michael S. Strano, Arun Majumdar, Sokrates T. Pantelides, Jordan A. Hachtel

Abstract: Fluids under extreme confinement exhibit distinctly new properties compared to their bulk analogs. Understanding the structure and intermolecular bonding of confined water lays the foundation for creating and improving applications at the water-energy nexus. However, probing confined water experimentally at the length scale of intermolecular and surface forces has remained a challenge. Here, we re… ▽ More Fluids under extreme confinement exhibit distinctly new properties compared to their bulk analogs. Understanding the structure and intermolecular bonding of confined water lays the foundation for creating and improving applications at the water-energy nexus. However, probing confined water experimentally at the length scale of intermolecular and surface forces has remained a challenge. Here, we report a combined experiment/theory framework to reveal changes in H-bonding environment and the underlying molecular structure of confined water inside individual carbon nanotubes. H-bonding is directly probed through the O-H stretch frequency with vibrational electron energy-loss spectroscopy and compared to spectra from molecular-dynamics simulations based on density-functional-theory. Experimental spectra show that water in larger carbon nanotubes exhibit the bonded O-H vibrations of bulk water, but at smaller diameters, the frequency blueshifts to near the 'free' O-H stretch found in water vapor and hydrophobic surfaces. The matching simulations reveal that, in addition to steric confinement, the tube's vibrations play a key role in breaking up the H-bond network, resulting in an orientationally-dispersed, non-H-bonded phase. Furthermore, the temperature-dependence of the vibrations is investigated, providing insights into phase transitions and the confined-water density. This research demonstrates the potential of the experiment/theory framework to explore unprecedented aspects of structure and bonding in confined fluids. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.02379 [pdf, other]

Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an Entity-Centric Perspective

Authors: Chong Zhang, Yixi Zhao, Chenshu Yuan, Yi Tu, Ya Guo, Qi Zhang

Abstract: Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the prevailing evaluation pipeline may not be sufficiently robust for assessing the information extraction ability of PTLMs, due to inadequate annotations within the benchmarks. Therefore, we claim the necessary standards for an i… ▽ More Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the prevailing evaluation pipeline may not be sufficiently robust for assessing the information extraction ability of PTLMs, due to inadequate annotations within the benchmarks. Therefore, we claim the necessary standards for an ideal benchmark to evaluate the information extraction ability of PTLMs. We then introduce EC-FUNSD, an entity-centric benckmark designed for the evaluation of semantic entity recognition and entity linking on visually-rich documents. This dataset contains diverse formats of document layouts and annotations of semantic-driven entities and their relations. Moreover, this dataset disentangles the falsely coupled annotation of segment and entity that arises from the block-level annotation of FUNSD. Experiment results demonstrate that state-of-the-art PTLMs exhibit overfitting tendencies on the prevailing benchmarks, as their performance sharply decrease when the dataset bias is removed. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.01650 [pdf]

Effect of trip attributes on ridehailing driver trip request acceptance

Authors: Yuanjie Tu, Moein Khaloei, Nazmul Arefin Khan, Don MacKenzie

Abstract: A generalized additive mixed model was estimated to investigate the factors that impact ridehailing driver trip request acceptance choices, relying on 200 responses from a stated preference survey in Seattle, US. Several policy recommendations were proposed to promote trip request acceptance based on ridehailing drivers willingness to accept compensation for undesired trip features. The findings c… ▽ More A generalized additive mixed model was estimated to investigate the factors that impact ridehailing driver trip request acceptance choices, relying on 200 responses from a stated preference survey in Seattle, US. Several policy recommendations were proposed to promote trip request acceptance based on ridehailing drivers willingness to accept compensation for undesired trip features. The findings could be useful for transportation agencies to improve ridehailing service efficiency, better fulfill urban mobility needs, and reduce environmental burden. △ Less

Submitted 9 January, 2024; originally announced February 2024.

Comments: Paper in print at Journal of Sustainable Transportation

arXiv:2402.00866 [pdf, other]

doi 10.1103/PhysRevB.109.165307

Energetic comparison of exciton gas versus electron-hole plasma in a bilayer two-dimensional electron-hole system

Authors: Yi-Ting Tu, Seth M. Davis, Sankar Das Sarma

Abstract: We study the zero-temperature phase diagram of a symmetric electron-hole bilayer system by comparing the ground state energies of two distinct limiting cases, characterized by an electron-hole plasma or an exciton gas, respectively. For the electron-hole plasma, the random phase approximation is used; for the exciton gas, we consider three different approximations: the unscreened Coulomb interacti… ▽ More We study the zero-temperature phase diagram of a symmetric electron-hole bilayer system by comparing the ground state energies of two distinct limiting cases, characterized by an electron-hole plasma or an exciton gas, respectively. For the electron-hole plasma, the random phase approximation is used; for the exciton gas, we consider three different approximations: the unscreened Coulomb interaction, the statically screened one, and the dynamically screened one under the plasmon-pole approximation. Our results suggest that the exciton gas is stable at small layer separation. However, static screening in general suppresses the formation of excitons, and dynamic screening gives different results depending on the representative energy scale we used in the plasmon-pole approximation. We conclude that energetic considerations alone are very sensitive to the approximation schemes, and the phase diagram of the system may depend crucially on exactly how the electron-hole attraction is treated in the theory. For very small and very large densities, however, all our approximations show the exciton gas to have lower energy than the plasma. △ Less

Submitted 20 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. B 109, 165307 (2024)

arXiv:2401.18032 [pdf, other]

DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

Authors: Shuguang Dou, Xiangyang Jiang, Yuanpeng Tu, Junyao Gao, Zefan Qu, Qingsong Zhao, Cairong Zhao

Abstract: The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP) method for occluded person re-identification (ReID). Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, or relying on semantic information for attention guidance, DROP argues that the inferior performance of the former is due to distinct granularity requireme… ▽ More The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP) method for occluded person re-identification (ReID). Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, or relying on semantic information for attention guidance, DROP argues that the inferior performance of the former is due to distinct granularity requirements for ReID and human parsing features. ReID focuses on instance part-level differences between pedestrian parts, while human parsing centers on semantic spatial context, reflecting the internal structure of the human body. To address this, DROP decouples features for ReID and human parsing, proposing detail-preserving upsampling to combine varying resolution feature maps. Parsing-specific features for human parsing are decoupled, and human position information is exclusively added to the human parsing branch. In the ReID branch, a part-aware compactness loss is introduced to enhance instance-level part differences. Experimental results highlight the efficacy of DROP, especially achieving a Rank-1 accuracy of 76.8% on Occluded-Duke, surpassing two mainstream methods. The codebase is accessible at https://github.com/shuguang-52/DROP. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.15641 [pdf, other]

PRE: A Peer Review Based Large Language Model Evaluator

Authors: Zhumin Chu, Qingyao Ai, Yiteng Tu, Haitao Li, Yiqun Liu

Abstract: The impressive performance of large language models (LLMs) has attracted considerable attention from the academic and industrial communities. Besides how to construct and train LLMs, how to effectively evaluate and compare the capacity of LLMs has also been well recognized as an important yet difficult problem. Existing paradigms rely on either human annotators or model-based evaluators to evaluat… ▽ More The impressive performance of large language models (LLMs) has attracted considerable attention from the academic and industrial communities. Besides how to construct and train LLMs, how to effectively evaluate and compare the capacity of LLMs has also been well recognized as an important yet difficult problem. Existing paradigms rely on either human annotators or model-based evaluators to evaluate the performance of LLMs on different tasks. However, these paradigms often suffer from high cost, low generalizability, and inherited biases in practice, which make them incapable of supporting the sustainable development of LLMs in long term. In order to address these issues, inspired by the peer review systems widely used in academic publication process, we propose a novel framework that can automatically evaluate LLMs through a peer-review process. Specifically, for the evaluation of a specific task, we first construct a small qualification exam to select "reviewers" from a couple of powerful LLMs. Then, to actually evaluate the "submissions" written by different candidate LLMs, i.e., the evaluatees, we use the reviewer LLMs to rate or compare the submissions. The final ranking of evaluatee LLMs is generated based on the results provided by all reviewers. We conducted extensive experiments on text summarization tasks with eleven LLMs including GPT-4. The results demonstrate the existence of biasness when evaluating using a single LLM. Also, our PRE model outperforms all the baselines, illustrating the effectiveness of the peer review mechanism. △ Less

Submitted 3 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: 11 pages

arXiv:2401.13960 [pdf, other]

Temperature Compensation through Kinetic Regulation in Biochemical Oscillators

Authors: Haochen Fu, Chenyi Fei, Qi Ouyang, Yuhai Tu

Abstract: Nearly all circadian clocks maintain a period that is insensitive to temperature changes, a phenomenon known as temperature compensation (TC). Yet, it is unclear whether there is any common feature among different systems that exhibit TC. From a general timescale invariance, we show that TC relies on existence of certain period-lengthening reactions wherein the period of the system increases stron… ▽ More Nearly all circadian clocks maintain a period that is insensitive to temperature changes, a phenomenon known as temperature compensation (TC). Yet, it is unclear whether there is any common feature among different systems that exhibit TC. From a general timescale invariance, we show that TC relies on existence of certain period-lengthening reactions wherein the period of the system increases strongly with the rates in these reactions. By studying several generic oscillator models, we show that this counter-intuitive dependence is nonetheless a common feature of oscillators in the nonlinear (far-from-onset) regime where the oscillation can be separated into fast and slow phases. The increase of the period with the period-lengthening reaction rates occurs when the amplitude of the slow phase in the oscillation increases with these rates while the progression-speed in the slow phase is controlled by other rates of the system. The positive dependence of the period on the period-lengthening rates balances its inverse dependence on other kinetic rates in the system, which gives rise to robust TC in a wide range of parameters. We demonstrate the existence of such period-lengthening reactions and their relevance for TC in all four model systems we considered. Theoretical results for a model of the Kai system are supported by experimental data. A study of the energy dissipation also shows that better TC performance requires higher energy consumption. Our study unveils a general mechanism by which a biochemical oscillator achieves TC by operating at regimes far from the onset where period-lengthening reactions exist. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 19 pages, 11 figures (main text + supplementary information)

arXiv:2401.13325 [pdf, other]

Memory Consistency Guided Divide-and-Conquer Learning for Generalized Category Discovery

Authors: Yuanpeng Tu, Zhun Zhong, Yuxi Li, Hengshuang Zhao

Abstract: Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning, where only part of the category labels are assigned to certain training samples. Previous methods generally employ naive contrastive learning or unsupervised clustering scheme for all the samples. Nevertheless, they usually ignore the inherent critical information within th… ▽ More Generalized category discovery (GCD) aims at addressing a more realistic and challenging setting of semi-supervised learning, where only part of the category labels are assigned to certain training samples. Previous methods generally employ naive contrastive learning or unsupervised clustering scheme for all the samples. Nevertheless, they usually ignore the inherent critical information within the historical predictions of the model being trained. Specifically, we empirically reveal that a significant number of salient unlabeled samples yield consistent historical predictions corresponding to their ground truth category. From this observation, we propose a Memory Consistency guided Divide-and-conquer Learning framework (MCDL). In this framework, we introduce two memory banks to record historical prediction of unlabeled data, which are exploited to measure the credibility of each sample in terms of its prediction consistency. With the guidance of credibility, we can design a divide-and-conquer learning strategy to fully utilize the discriminative information of unlabeled data while alleviating the negative influence of noisy labels. Extensive experimental results on multiple benchmarks demonstrate the generality and superiority of our method, where our method outperforms state-of-the-art models by a large margin on both seen and unseen classes of the generic image recognition and challenging semantic shift settings (i.e.,with +8.4% gain on CUB and +8.1% on Standford Cars). △ Less

Submitted 31 January, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.04755 [pdf]

Understanding working time and relocation choices of ridehailing drivers

Authors: Yuanjie Tu, Moein Khaloei, Natalia Zuniga-Garcia, Don MacKenzie

Abstract: We identified four types of ridehailing drivers and jointly modeled driver working time and relocation choices using a stated preference survey of 200 drivers in Seattle, US. We identified four types of ridehailing drivers and jointly modeled driver working time and relocation choices using a stated preference survey of 200 drivers in Seattle, US. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Paper major revision at Transportation

arXiv:2401.03145 [pdf, other]

Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection

Authors: Yuanpeng Tu, Boshen Zhang, Liang Liu, Yuxi Li, Xuhai Chen, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cai Rong Zhao

Abstract: Industrial anomaly detection is generally addressed as an unsupervised task that aims at locating defects with only normal training samples. Recently, numerous 2D anomaly detection methods have been proposed and have achieved promising results, however, using only the 2D RGB data as input is not sufficient to identify imperceptible geometric surface anomalies. Hence, in this work, we focus on mult… ▽ More Industrial anomaly detection is generally addressed as an unsupervised task that aims at locating defects with only normal training samples. Recently, numerous 2D anomaly detection methods have been proposed and have achieved promising results, however, using only the 2D RGB data as input is not sufficient to identify imperceptible geometric surface anomalies. Hence, in this work, we focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets, i.e., ImageNet, to construct feature databases. And we empirically find that directly using these pre-trained models is not optimal, it can either fail to detect subtle defects or mistake abnormal features as normal ones. This may be attributed to the domain gap between target industrial data and source data.Towards this problem, we propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.Both intra-modal adaptation and cross-modal alignment are optimized from a local-to-global perspective in LSFA to ensure the representation quality and consistency in the inference stage.Extensive experiments demonstrate that our method not only brings a significant performance boost to feature embedding based approaches, but also outperforms previous State-of-The-Art (SoTA) methods prominently on both MVTec-3D AD and Eyecandies datasets, e.g., LSFA achieves 97.1% I-AUROC on MVTec-3D, surpass previous SoTA by +3.4%. △ Less

Submitted 17 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

arXiv:2401.02169 [pdf]

Factors resisting protein adsorption on hydrophilic/hydrophobic self-assembled monolayers terminated with hydrophilic hydroxyl groups

Authors: Dangxin Mao, Yuan-Yan Wu, Yusong Tu

Abstract: The hydroxyl-terminated self-assembled monolayer (OH-SAM), as a surface resistant to protein adsorption, exhibits substantial potential in applications such as ship navigation and medical implants, and the appropriate strategies for designing anti-fouling surfaces are crucial. Here, we employ molecular dynamics simulations and alchemical free energy calculations to systematically analyze the facto… ▽ More The hydroxyl-terminated self-assembled monolayer (OH-SAM), as a surface resistant to protein adsorption, exhibits substantial potential in applications such as ship navigation and medical implants, and the appropriate strategies for designing anti-fouling surfaces are crucial. Here, we employ molecular dynamics simulations and alchemical free energy calculations to systematically analyze the factors influencing resistance to protein adsorption on the SAMs terminated with single or double OH groups at three packing densities (Σ = 2.0 nm-2, 4.5 nm-2, and 6.5 nm-2), respectively. For the first time, we observe that the compactness and order of interfacial water enhance its physical barrier effect, subsequently enhancing the resistance of SAM to protein adsorption. Notably, the weak spatial hindrance effect of SAM leads to the embedding of protein into SAM, resulting in a lack of resistance of SAM towards protein. Furthermore, the number of hydroxyl groups per unit area of double OH-terminated SAM at Σ = 6.5 nm-2 is approximately 2 to 3 times that of single OH-terminated SAM at Σ = 6.5 nm-2 and 4.5 nm-2, consequently yielding a weaker resistance of double OH-terminated SAM towards protein. Meanwhile, due to the structure of SAM itself, i.e., the formation of a nearly perfect ice-like hydrogen bond structure, the SAM exhibits the weakest resistance towards protein. This study will complement and improve the mechanism of OH-SAM resistance to protein adsorption, especially the traditional barrier effect of interfacial water. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.17424 [pdf, other]

Time-reversal symmetry breaking in the chemosensory array reveals mechanisms for dissipation-enhanced cooperative sensing

Authors: David Hathcock, Qiwei Yu, Yuhai Tu

Abstract: The Escherichia coli chemoreceptors form an extensive array that achieves cooperative and adaptive sensing of extracellular signals. The receptors control the activity of histidine kinase CheA, which drives a nonequilibrium phosphorylation-dephosphorylation reaction cycle for response regulator CheY. Cooperativity and dissipation are both important aspects of chemotaxis signaling, yet their conseq… ▽ More The Escherichia coli chemoreceptors form an extensive array that achieves cooperative and adaptive sensing of extracellular signals. The receptors control the activity of histidine kinase CheA, which drives a nonequilibrium phosphorylation-dephosphorylation reaction cycle for response regulator CheY. Cooperativity and dissipation are both important aspects of chemotaxis signaling, yet their consequences have only been studied separately. Recent single-cell FRET measurements revealed that kinase activity of the array spontaneously switches between active and inactive states, with asymmetric switching times that signify time-reversal symmetry breaking in the underlying dynamics. Here, we present a nonequilibrium lattice model of the chemosensory array, which demonstrates that the observed asymmetric switching dynamics can only be explained by an interplay between the dissipative reactions within individual core units and the cooperative coupling between neighboring units. Microscopically, the switching time asymmetry originates from irreversible transition paths. The model shows that strong dissipation enables sensitive and rapid signaling response by relieving the speed-sensitivity trade-off, which can be tested by future single-cell experiments. Overall, our model provides a general framework for studying biological complexes composed of coupled subunits that are individually driven by dissipative cycles and the rich nonequilibrium physics within. △ Less

Submitted 8 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: 11 pages, 5 figures. SI included as an ancillary PDF file

arXiv:2312.01229 [pdf, other]

Fast Commitment for Geo-Distributed Transactions via Decentralized Co-coordinators

Authors: Zihao Zhang, Huiqi Hu, Xuan Zhou, Yaofeng Tu, Weining Qian, Aoying Zhou

Abstract: In a geo-distributed database, data shards and their respective replicas are deployed in distinct datacenters across multiple regions, enabling regional-level disaster recovery and the ability to serve global users locally. However, transaction processing in geo-distributed databases requires multiple cross-region communications, especially during the commit phase, which can significantly impact s… ▽ More In a geo-distributed database, data shards and their respective replicas are deployed in distinct datacenters across multiple regions, enabling regional-level disaster recovery and the ability to serve global users locally. However, transaction processing in geo-distributed databases requires multiple cross-region communications, especially during the commit phase, which can significantly impact system performance. To optimize the performance of geo-distributed transactions, we propose Decentralized Two-phase Commit (D2PC), a new transaction commit protocol aiming to minimize the negative impact of cross-region communication. In D2PC, we employ multiple co-coordinators that perform commit coordination in parallel. Each co-coordinator is responsible for collecting 2PC votes and making a PreCommit decision in its local region. This approach allows for the concurrent invocation of multiple cross-region network round trips, and each region can conclude its concurrency control locally before replication is complete, thus significantly reducing the chances of blocking and enhancing system concurrency. Moreover, we propose the bypass leader replication reply method, leveraging decentralized co-coordinators to bypass the leader for message transmission, thereby reducing the commit latency. Experimental results have demonstrated that D2PC can reduce commit latency by 43% and improve throughput by up to 2.43 times compared to the existing alternative geo-distributed transaction processing methods. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2312.01194 [pdf, other]

Stochastic dynamics of granular hopper flows: a configurational mode controls the stability of clogs

Authors: David Hathcock, Sam Dillavou, Jesse M. Hanlan, Douglas J. Durian, Yuhai Tu

Abstract: Granular flows in small-outlet hoppers exhibit several characteristic but poorly understood behaviors: temporary clogs (pauses) that last for an extended period before flow spontaneously restarts, permanent clogs that last indefinitely, and non-Gaussian, non-monotonic flow-rate statistics. These aspects have been extensively studied independently, but a model of hopper flow that includes all three… ▽ More Granular flows in small-outlet hoppers exhibit several characteristic but poorly understood behaviors: temporary clogs (pauses) that last for an extended period before flow spontaneously restarts, permanent clogs that last indefinitely, and non-Gaussian, non-monotonic flow-rate statistics. These aspects have been extensively studied independently, but a model of hopper flow that includes all three has not been formulated. Here, we introduce such a phenomenological model that provides a unifying dynamical explanation of all three behaviors: a coupling between the flow rate and a hidden configurational mode that controls the stability of clogs. In the theory, flow rate evolves according to Langevin dynamics with multiplicative noise and an absorbing state at zero flow, conditional on the hidden mode. The model fully reproduces the statistics of pause and clog events taken from a large ($>40,000$ flows) experimental dataset, including non-exponentially distributed clogging times and non-Gaussian flow rate distribution, and explains the stretched-exponential growth of the average clogging time with outlet size. Further, we highlight several microscopic configurational features of this hidden mode, including size and smoothness of the static arch structure formed during pauses and clogs. Our work provides a unifying framework for several poorly understood granular phenomena, and suggests numerous new paths toward further understanding of this complex system. △ Less

Submitted 9 July, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: 7 pages, 4 figures; SM included as an ancillary PDF file

arXiv:2312.00884 [pdf, other]

Modeling CN Zeeman Effect Observations of the Envelopes of a Low-Mass Protostellar Disk and a Massive Protostar

Authors: Renato Mazzei, Zhi-Yun Li, Che-Yu Chen, Yisheng Tu, Laura Fissel, Richard I. Klein

Abstract: We use the POLARIS radiative transfer code to produce simulated circular polarization Zeeman emission maps of the CN $J = 1 - 0$ molecular line transition for two types of protostellar envelope magnetohydrodynamic simulations. Our first model is a low mass disk envelope system (box length $L = 200\text{ au}$), and our second model is the envelope of a massive protostar ($L = 10^4\text{ au}$) with… ▽ More We use the POLARIS radiative transfer code to produce simulated circular polarization Zeeman emission maps of the CN $J = 1 - 0$ molecular line transition for two types of protostellar envelope magnetohydrodynamic simulations. Our first model is a low mass disk envelope system (box length $L = 200\text{ au}$), and our second model is the envelope of a massive protostar ($L = 10^4\text{ au}$) with a protostellar wind and a CN enhanced outflow shell. We compute the velocity-integrated Stokes $I$ and $V$, as well as the implied $V/I$ polarization percentage, for each detector pixel location in our simulated emission maps. Our results show that both types of protostellar environment are in principle accessible with current circular polarization instruments, with each containing swaths of envelope area that yield percentage polarizations that exceed the 1.8\% nominal sensitivity limit for circular polarization experiments with the Atacama Large Millimeter/submillimeter Array (ALMA). In both systems, high polarization ($\gtrsim$1.8\%) pixels tend to lie at an intermediate distance away from the central star and where the line-center opacity of the CN emission is moderately optically thin ($τ_{LC} \sim 0.1-1$). Furthermore, our computed $V/I$ values scale roughly with the density weighted mean line-of-sight magnetic field strength, indicating that Zeeman observations can effectively diagnose the strength of envelope-scale magnetic fields. We also find that pixels with large $V/I$ are preferentially co-located where the absolute value of the velocity-integrated $V$ is also large, suggesting that locations with favorable percentage polarization are also favorable in terms of raw signal. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 16 pages, 13 figures, accepted for publication in MNRAS

arXiv:2311.18314 [pdf, other]

A Collaborative Jamming Algorithm Based on Multi-UAV Scheduling

Authors: Yixin Jiang, Lingyun Zhou, Yijia Tang, Ya Tu, Chunhong Liu, Qingjiang Shi

Abstract: In this paper, we consider the problem of multi-unmanned aerial vehicles' scheduling for cooperative jamming, where UAVs equipped with directional antennas perform collaborative jamming tasks against several targets of interest. To ensure effective jamming towards the targets, we formulate it as an non-convex optimization problem, aiming to minimize the communication performance of the targets by… ▽ More In this paper, we consider the problem of multi-unmanned aerial vehicles' scheduling for cooperative jamming, where UAVs equipped with directional antennas perform collaborative jamming tasks against several targets of interest. To ensure effective jamming towards the targets, we formulate it as an non-convex optimization problem, aiming to minimize the communication performance of the targets by jointly optimizing UAVs' deployment and directional antenna orientations. Due to the unique structure of the problem, we derive an equivalent transformation by introducing a set of auxiliary matrices. Subsequently, we propose an efficient iterative algorithm based on the alternating direction method of multipliers, which decomposes the problem into multiple tractable subproblems solved in closed-form or by gradient projection method. Extensive simulations validate the efficacy of the proposed algorithm. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.15627 [pdf, other]

Phonetic-aware speaker embedding for far-field speaker verification

Authors: Zezhong Jin, Youzhi Tu, Man-Wai Mak

Abstract: When a speaker verification (SV) system operates far from the sound sourced, significant challenges arise due to the interference of noise and reverberation. Studies have shown that incorporating phonetic information into speaker embedding can improve the performance of text-independent SV. Inspired by this observation, we propose a joint-training speech recognition and speaker recognition (JTSS)… ▽ More When a speaker verification (SV) system operates far from the sound sourced, significant challenges arise due to the interference of noise and reverberation. Studies have shown that incorporating phonetic information into speaker embedding can improve the performance of text-independent SV. Inspired by this observation, we propose a joint-training speech recognition and speaker recognition (JTSS) framework to exploit phonetic content for far-field SV. The framework encourages speaker embeddings to preserve phonetic information by matching the frame-based feature maps of a speaker embedding network with wav2vec's vectors. The intuition is that phonetic information can preserve low-level acoustic dynamics with speaker information and thus partly compensate for the degradation due to noise and reverberation. Results show that the proposed framework outperforms the standard speaker embedding on the VOiCES Challenge 2019 evaluation set and the VoxCeleb1 test set. This indicates that leveraging phonetic information under far-field conditions is effective for learning robust speaker representations. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: submitted to ICASSP2024

arXiv:2311.00285 [pdf, ps, other]

Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach

Authors: Zhenbang Du, Jiayu An, Yunlu Tu, Jiahao Hong, Dongrui Wu

Abstract: Open Set Domain Adaptation (OSDA) aims to cope with the distribution and label shifts between the source and target domains simultaneously, performing accurate classification for known classes while identifying unknown class samples in the target domain. Most existing OSDA approaches, depending on the final image feature space of deep models, require manually-tuned thresholds, and may easily miscl… ▽ More Open Set Domain Adaptation (OSDA) aims to cope with the distribution and label shifts between the source and target domains simultaneously, performing accurate classification for known classes while identifying unknown class samples in the target domain. Most existing OSDA approaches, depending on the final image feature space of deep models, require manually-tuned thresholds, and may easily misclassify unknown samples as known classes. Mixture-of-Experts (MoE) could be a remedy. Within a MoE, different experts handle distinct input features, producing unique expert routing patterns for various classes in a routing feature space. As a result, unknown class samples may display different expert routing patterns to known classes. In this paper, we propose Dual-Space Detection, which exploits the inconsistencies between the image feature space and the routing feature space to detect unknown class samples without any threshold. Graph Router is further introduced to better make use of the spatial information among image patches. Experiments on three different datasets validated the effectiveness and superiority of our approach. △ Less

Submitted 3 July, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2310.19832 [pdf]

Photomolecular Effect: Visible Light Interaction with Air-Water Interface

Authors: Guangxin Lv, Yaodong Tu, James H. Zhang, Gang Chen

Abstract: Although water is almost transparent to visible light, we demonstrate that the air-water interface interacts strongly with visible light via what we hypothesize as the photomolecular effect. In this effect, transverse-magnetic polarized photons cleave off water clusters from the air-water interface. We use over 10 different experiments to demonstrate the existence of this effect and its dependence… ▽ More Although water is almost transparent to visible light, we demonstrate that the air-water interface interacts strongly with visible light via what we hypothesize as the photomolecular effect. In this effect, transverse-magnetic polarized photons cleave off water clusters from the air-water interface. We use over 10 different experiments to demonstrate the existence of this effect and its dependence on the wavelength, incident angle and polarization of visible light. We further demonstrate that visible light heats up thin fogs, suggesting that this process can impact weather, climate, and the earth's water cycle. Our study suggests that the photomolecular effect should happen widely in nature, from clouds to fogs, ocean to soil surfaces, and plant transpiration, and can also lead to new applications in energy and clear water. △ Less

Submitted 28 October, 2023; originally announced October 2023.

arXiv:2310.11016 [pdf, other]

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Authors: Chong Zhang, Ya Guo, Yi Tu, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang, Tao Gui

Abstract: Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in… ▽ More Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Accepted as a long paper in the main conference of EMNLP 2023

arXiv:2310.09743 [pdf, other]

Decoding Modular Reconfigurable Robots: A Survey on Mechanisms and Design

Authors: Guanqi Liang, Di Wu, Yuxiao Tu, Tin Lun Lam

Abstract: The intrinsic modularity and reconfigurability of modular reconfigurable robots (MRR) confer advantages such as versatility, fault tolerance, and economic efficacy, thereby showcasing considerable potential across diverse applications. The continuous evolution of the technology landscape and the emergence of diverse conceptual designs have generated multiple MRR categories, each described by its r… ▽ More The intrinsic modularity and reconfigurability of modular reconfigurable robots (MRR) confer advantages such as versatility, fault tolerance, and economic efficacy, thereby showcasing considerable potential across diverse applications. The continuous evolution of the technology landscape and the emergence of diverse conceptual designs have generated multiple MRR categories, each described by its respective morphology or capability characteristics, leading to some ambiguity in the taxonomy. This paper conducts a comprehensive survey encompassing the entirety of MRR hardware and design, spanning from the inception in 1985 to 2023. This paper introduces an innovative, unified conceptual framework for understanding MRR hardware, which encompasses three pivotal elements: connectors, actuators, and homogeneity. Through the utilization of this trilateral framework, this paper provide an intuitive understanding of the diverse spectrum of MRR hardware iterations while systematically deciphering and classifying the entire range, offering a more structured perspective. This survey elucidates the fundamental attributes characterizing MRRs and their compositional aspects, providinig insights into their design, technology, functionality, and categorization. Augmented by the proposed trilateral framework, this paper also elaborates on the trajectory of evolution, prevailing trends, principal challenges, and potential prospects within the field of MRRs. △ Less

Submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.00644 [pdf, other]

On the Hardness of $\sf{S|LWE\rangle}$ with Gaussian and Other Amplitudes

Authors: Yilei Chen, Zihan Hu, Qipeng Liu, Han Luo, Yaxin Tu

Abstract: The learning with errors problem (LWE) is one of the most important building blocks for post-quantum cryptography. To better understand the quantum hardness of LWE, it is crucial to explore quantum variants of LWE, show quantum algorithms for those variants, or prove they are as hard as standard LWE. To this end, Chen, Liu, and Zhandry [Eurocrypt 2022] define the $\sf{S|LWE\rangle}$ problem, whi… ▽ More The learning with errors problem (LWE) is one of the most important building blocks for post-quantum cryptography. To better understand the quantum hardness of LWE, it is crucial to explore quantum variants of LWE, show quantum algorithms for those variants, or prove they are as hard as standard LWE. To this end, Chen, Liu, and Zhandry [Eurocrypt 2022] define the $\sf{S|LWE\rangle}$ problem, which encodes the error of LWE samples into quantum amplitudes. They then show efficient quantum algorithms for $\sf{S|LWE\rangle}$ with a few interesting amplitudes. However, the hardness of the most interesting amplitude, Gaussian, was not addressed by Chen et al., or only known for some restricted settings (for example, when the number of $\sf{S|LWE\rangle}$ samples is very small, it is well known that $\sf{S|LWE\rangle}$ is as hard as standard LWE). In this paper, we show new hardness and algorithms for $\sf{S|LWE\rangle}$ with Gaussian and other amplitudes. Our main results are 1. There exist quantum reductions from standard LWE or worst-case GapSVP to $\sf{S|LWE\rangle}$ with Gaussian amplitude with unknown phase, and arbitrarily many $\sf{S|LWE\rangle}$ samples. 2. There is a $2^{\widetilde{O}(\sqrt{n})}$-time algorithm for $\sf{S|LWE\rangle}$ with Gaussian amplitude with known phase, given $2^{\widetilde{O}(\sqrt{n})}$ many quantum samples. The algorithm is modified from Kuperberg's sieve, and in fact works for more general amplitudes as long as the amplitudes and phases are completely known. One way of interpreting our result is: to show a sub-exponential time quantum algorithm for standard LWE, all we need is to handle phases in $\sf{S|LWE\rangle}$ amplitudes better, either in the algorithm or the reduction. △ Less

Submitted 1 October, 2023; originally announced October 2023.

Comments: 40 pages, 3 figures

arXiv:2309.16348 [pdf, other]

Smoothing the Nonsmoothness

Authors: Chaohua Dong, Jiti Gao, Bin Peng, Yundong Tu

Abstract: To tackle difficulties for theoretical studies in situations involving nonsmooth functions, we propose a sequence of infinitely differentiable functions to approximate the nonsmooth function under consideration. A rate of approximation is established and an illustration of its application is then provided. To tackle difficulties for theoretical studies in situations involving nonsmooth functions, we propose a sequence of infinitely differentiable functions to approximate the nonsmooth function under consideration. A rate of approximation is established and an illustration of its application is then provided. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.16283 [pdf, other]

Self-supervised Cross-view Representation Reconstruction for Change Captioning

Authors: Yunbin Tu, Liang Li, Li Su, Zheng-Jun Zha, Chenggang Yan, Qingming Huang

Abstract: Change captioning aims to describe the difference between a pair of similar images. Its key challenge is how to learn a stable difference representation under pseudo changes caused by viewpoint change. In this paper, we address this by proposing a self-supervised cross-view representation reconstruction (SCORER) network. Concretely, we first design a multi-head token-wise matching to model relatio… ▽ More Change captioning aims to describe the difference between a pair of similar images. Its key challenge is how to learn a stable difference representation under pseudo changes caused by viewpoint change. In this paper, we address this by proposing a self-supervised cross-view representation reconstruction (SCORER) network. Concretely, we first design a multi-head token-wise matching to model relationships between cross-view features from similar/dissimilar images. Then, by maximizing cross-view contrastive alignment of two similar images, SCORER learns two view-invariant image representations in a self-supervised way. Based on these, we reconstruct the representations of unchanged objects by cross-attention, thus learning a stable difference representation for caption generation. Further, we devise a cross-modal backward reasoning to improve the quality of caption. This module reversely models a ``hallucination'' representation with the caption and ``before'' representation. By pushing it closer to the ``after'' representation, we enforce the caption to be informative about the difference in a self-supervised manner. Extensive experiments show our method achieves the state-of-the-art results on four datasets. The code is available at https://github.com/tuyunbin/SCORER. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV 2023

arXiv:2309.13253 [pdf, other]

Contrastive Speaker Embedding With Sequential Disentanglement

Authors: Youzhi Tu, Man-Wai Mak, Jen-Tzung Chien

Abstract: Contrastive speaker embedding assumes that the contrast between the positive and negative pairs of speech segments is attributed to speaker identity only. However, this assumption is incorrect because speech signals contain not only speaker identity but also linguistic content. In this paper, we propose a contrastive learning framework with sequential disentanglement to remove linguistic content b… ▽ More Contrastive speaker embedding assumes that the contrast between the positive and negative pairs of speech segments is attributed to speaker identity only. However, this assumption is incorrect because speech signals contain not only speaker identity but also linguistic content. In this paper, we propose a contrastive learning framework with sequential disentanglement to remove linguistic content by incorporating a disentangled sequential variational autoencoder (DSVAE) into the conventional SimCLR framework. The DSVAE aims to disentangle speaker factors from content factors in an embedding space so that only the speaker factors are used for constructing a contrastive loss objective. Because content factors have been removed from the contrastive learning, the resulting speaker embeddings will be content-invariant. Experimental results on VoxCeleb1-test show that the proposed method consistently outperforms SimCLR. This suggests that applying sequential disentanglement is beneficial to learning speaker-discriminative embeddings. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: Submitted to ICASSP 2024

Showing 1–50 of 263 results for author: Tu, Y