Search | arXiv e-print repository

arXiv:2407.01987 [pdf, other]

AHMsys: An Automated HVAC Modeling System for BIM Project

Authors: Long Hoang Dang, Duy-Hung Nguyen, Thai Quang Le, Thinh Truong Nguyen, Clark Mei, Vu Hoang

Abstract: This paper presents a novel system, named AHMsys, designed to automate the process of generating 3D Heating, Ventilation, and Air Conditioning (HVAC) models from 2D Computer-Aided Design (CAD) drawings, a key component of Building Information Modeling (BIM). By automatically preprocessing and extracting essential HVAC object information then creating detailed 3D models, our proposed AHMsys signifi… ▽ More This paper presents a novel system, named AHMsys, designed to automate the process of generating 3D Heating, Ventilation, and Air Conditioning (HVAC) models from 2D Computer-Aided Design (CAD) drawings, a key component of Building Information Modeling (BIM). By automatically preprocessing and extracting essential HVAC object information then creating detailed 3D models, our proposed AHMsys significantly reduced the 20 percent work schedule of the BIM process in Akila. This advancement highlights the essential impact of integrating AI technologies in managing the lifecycle of a digital representation of the building. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01983 [pdf, other]

SADL: An Effective In-Context Learning Method for Compositional Visual QA

Authors: Long Hoang Dang, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran

Abstract: Large vision-language models (LVLMs) offer a novel capability for performing in-context learning (ICL) in Visual QA. When prompted with a few demonstrations of image-question-answer triplets, LVLMs have demonstrated the ability to discern underlying patterns and transfer this latent knowledge to answer new questions about unseen images without the need for expensive supervised fine-tuning. However… ▽ More Large vision-language models (LVLMs) offer a novel capability for performing in-context learning (ICL) in Visual QA. When prompted with a few demonstrations of image-question-answer triplets, LVLMs have demonstrated the ability to discern underlying patterns and transfer this latent knowledge to answer new questions about unseen images without the need for expensive supervised fine-tuning. However, designing effective vision-language prompts, especially for compositional questions, remains poorly understood. Adapting language-only ICL techniques may not necessarily work because we need to bridge the visual-linguistic semantic gap: Symbolic concepts must be grounded in visual content, which does not share the syntactic linguistic structures. This paper introduces SADL, a new visual-linguistic prompting framework for the task. SADL revolves around three key components: SAmpling, Deliberation, and Pseudo-Labeling of image-question pairs. Given an image-question query, we sample image-question pairs from the training data that are in semantic proximity to the query. To address the compositional nature of questions, the deliberation step decomposes complex questions into a sequence of subquestions. Finally, the sequence is progressively annotated one subquestion at a time to generate a sequence of pseudo-labels. We investigate the behaviors of SADL under OpenFlamingo on large-scale Visual QA datasets, namely GQA, GQA-OOD, CLEVR, and CRIC. The evaluation demonstrates the critical roles of sampling in the neighborhood of the image, the decomposition of complex questions, and the accurate pairing of the subquestions and labels. These findings do not always align with those found in language-only ICL, suggesting fresh insights in vision-language settings. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.01271 [pdf, other]

Long-range ballistic propagation of 80$\%$-excitonic-fraction polaritons in a perovskite metasurface at room temperature

Authors: Nguyen Ha My Dang, Simone Zanotti, Emmanuel Drouard, Céline Chevalier, Gaëlle Trippé-Allard, Emmanuelle Deleporte, Christian Seassal, Dario Gerace, Hai Son Nguyen

Abstract: Exciton-polaritons, hybrid light-matter elementary excitations arising from the strong coupling regime between excitons in semiconductors and photons in photonic nanostructures, offer a fruitful playground to explore the physics of quantum fluids of light as well as to develop all-optical devices. However, achieving room temperature propagation of polaritons with a large excitonic fraction, which… ▽ More Exciton-polaritons, hybrid light-matter elementary excitations arising from the strong coupling regime between excitons in semiconductors and photons in photonic nanostructures, offer a fruitful playground to explore the physics of quantum fluids of light as well as to develop all-optical devices. However, achieving room temperature propagation of polaritons with a large excitonic fraction, which would be crucial, e.g., for nonlinear light transport in prospective devices, remains a significant challenge. } Here we report on experimental studies of exciton-polariton propagation at room temperature in resonant metasurfaces made from a sub-wavelength lattice of perovskite pillars. Thanks to the large Rabi splitting, an order of magnitude larger than the optical phonon energy, the lower polariton band is completely decoupled from the phonon bath of perovskite crystals. The long lifetime of these cooled polaritons, in combination with the high group velocity achieved through the metasurface design, enables long-range propagation regardless of the polariton excitonic fraction. Remarkably, we observed propagation distances exceeding hundreds of micrometers at room temperature, even when the polaritons possess a very high excitonic component, approximately {80}$\%$. Furthermore, the design of the metasurface introduces an original mechanism for directing uni-directional propagation through polarization control. This discovery of a ballistic propagation mode, leveraging high-speed cooled polaritons, heralds a promising avenue for the development of advanced polaritonic devices. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.01845 [pdf, other]

The Hurwitz tree obstruction for the refined local lifting problem

Authors: Huy Dang

Abstract: In this manuscript, we formulate the differential Hurwitz tree obstructions for the refined local lifting problem. We specifically explore the circumstances under which these obstructions vanish for cyclic covers. The constructions presented in this paper will be used to address the refined local lifting problem in our forthcoming works. In this manuscript, we formulate the differential Hurwitz tree obstructions for the refined local lifting problem. We specifically explore the circumstances under which these obstructions vanish for cyclic covers. The constructions presented in this paper will be used to address the refined local lifting problem in our forthcoming works. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2403.15956 [pdf, other]

Nanoimprinted Exciton-Polaritons Metasurfaces: Cost-Effective, Large-Scale, High Homogeneity, and Room Temperature Operation

Authors: Nguyen Ha My Dang, Paul Bouteyre, Gaëlle Trippé-Allard, Céline Chevalier, Emmanuelle Deleporte, Emmanuel Drouard, Christian Seassal, Hai Son Nguyen

Abstract: Exciton-polaritons represent a promising platform that combines the strengths of both photonic and electronic systems for future optoelectronic devices. However, their application is currently limited to laboratory research due to the high cost and complexity of fabrication methods, which are not compatible with the mature CMOS technology developed for microelectronics. In this work, we develop an… ▽ More Exciton-polaritons represent a promising platform that combines the strengths of both photonic and electronic systems for future optoelectronic devices. However, their application is currently limited to laboratory research due to the high cost and complexity of fabrication methods, which are not compatible with the mature CMOS technology developed for microelectronics. In this work, we develop an innovative, low-cost, and CMOS-compatible method for fabricating large surface polaritonic devices. This is achieved by direct patterning of a halide-perovskite thin film via thermal nanoimprint. As a result, we observe highly homogeneous polaritonic modes of quality factor $Q\approx 300$ at room temperature across a centimetric scale. Impressively, the process provides high reproducibility and fidelity, as the same mold can be reused more than 10 times to imprint the perovskite layer on different types of substrates. Our results could pave the way for the production of low-cost integrated polaritonic devices operating at room temperature. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.03435 [pdf, ps, other]

VLSP 2023 -- LTER: A Summary of the Challenge on Legal Textual Entailment Recognition

Authors: Vu Tran, Ha-Thanh Nguyen, Trung Vo, Son T. Luu, Hoang-Anh Dang, Ngoc-Cam Le, Thi-Thuy Le, Minh-Tien Nguyen, Truong-Son Nguyen, Le-Minh Nguyen

Abstract: In this new era of rapid AI development, especially in language processing, the demand for AI in the legal domain is increasingly critical. In the context where research in other languages such as English, Japanese, and Chinese has been well-established, we introduce the first fundamental research for the Vietnamese language in the legal domain: legal textual entailment recognition through the Vie… ▽ More In this new era of rapid AI development, especially in language processing, the demand for AI in the legal domain is increasingly critical. In the context where research in other languages such as English, Japanese, and Chinese has been well-established, we introduce the first fundamental research for the Vietnamese language in the legal domain: legal textual entailment recognition through the Vietnamese Language and Speech Processing workshop. In analyzing participants' results, we discuss certain linguistic aspects critical in the legal domain that pose challenges that need to be addressed. △ Less

Submitted 5 March, 2024; originally announced March 2024.

arXiv:2402.13745 [pdf]

Infrared Imaging using thermally stable HgTe/CdS nanocrystals

Authors: Huichen Zhang, Yoann Prado, Rodolphe Alchaar, Henri Lehouelleur, Mariarosa Cavallo, Tung Huu Dang, Adrien Khalili, Erwan Bossavit, Corentin Dabard, Nicolas Ledos, Mathieu G Silly, Ali Madouri, Daniele Fournier, James K. Utterback, Debora Pierucci, Victor Parahyba, Pierre Potet, David Darson, Sandrine Ithurria, Bartłomiej Szafran, Benjamin T. Diroll, Juan I. Climente, Emmanuel Lhuillier

Abstract: Transferring the nanocrystals (NCs) from the laboratory environment toward practical applications has raised new challenges. In the case of NCs for display and lightning, the focus was on reduced Auger recombination and maintaining luminescence at high temperatures. When it comes to infrared sensing, narrow band gap materials are required and HgTe appears as the most spectrally tunable platform. I… ▽ More Transferring the nanocrystals (NCs) from the laboratory environment toward practical applications has raised new challenges. In the case of NCs for display and lightning, the focus was on reduced Auger recombination and maintaining luminescence at high temperatures. When it comes to infrared sensing, narrow band gap materials are required and HgTe appears as the most spectrally tunable platform. Its low-temperature synthesis reduces the growth energy cost yet also favors sintering. As a result, once coupled to a read-out circuit, the Joule effect aggregates the particles leading to a poorly defined optical edge and dramatically large dark current. Here, we demonstrate that CdS shells bring the expected thermal stability (no redshift upon annealing, reduced tendency to form amalgams and preservation of photoconduction after an atomic layer deposition process). The peculiar electronic structure of these confined particles is unveiled using k.p self-consistent simulations showing a significant exciton biding energy at around 200 meV. After shelling, the material displays a p-type behavior that favors the generation of photoconductive gain. The latter is then used to increase the external quantum △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.08464 [pdf, other]

Computationally Predicted Electronic Properties and Energetics of Native Defects in Cubic Boron Nitride

Authors: Ngoc Linh Nguyen, Hung The Dang, Tien Lam Pham, Thi Minh Hoa Nghiem

Abstract: In this study, we employ a first-principles approach to conduct a comprehensive investigation of the properties of nine common native point defects in cubic boron nitride. This analysis combines standard semi-local and dielectric hybrid density-exchange-correlation functional calculations, encompassing vacancies, interstitials, antisites, and their complexes. Our findings elucidate the influence o… ▽ More In this study, we employ a first-principles approach to conduct a comprehensive investigation of the properties of nine common native point defects in cubic boron nitride. This analysis combines standard semi-local and dielectric hybrid density-exchange-correlation functional calculations, encompassing vacancies, interstitials, antisites, and their complexes. Our findings elucidate the influence of these defects on the structural and electronic characteristics of cubic boron nitride, such as local structures, formation energy, magnetism, and the energies of defect states within the band gap. Notably, we accurately simulate the photoluminescent spectra of cubic boron nitride induced by these defects, demonstrating excellent agreement with experimental observations. This outcome indicates that the prominent peaks in the photoluminescent spectrum at 2.5 and 2.8 eV can be attributed to the nitrogen to boron antisite (N$_{\rm B}$) and boron interstitial (B$_{\rm i}$) defects, respectively. Additionally, we investigate the energetic stability of defects under various charge states, providing valuable references for benchmarking purposes. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 11 pages, 5 figures, 2 tables

arXiv:2401.03169 [pdf, other]

QCD anomalies in electromagnetic processes: A solution to the $γ\to3π$ puzzle

Authors: Zanbin Xing, Hao Dang, M. Atif Sultan, Khépani Raya, Lei Chang

Abstract: In this work, the $γ\to3π$ form factor is calculated within the Dyson-Schwinger equations framework using a contact interaction model within the so-called modified rainbow ladder truncation. The present calculation takes into account the pseudovector component in the pion Bethe-Salpeter amplitude (BSA) and $π-π$ scattering effects, producing a $γ\to3π$ anomaly which is $1+6\mathcal{R}_π^2$ larger… ▽ More In this work, the $γ\to3π$ form factor is calculated within the Dyson-Schwinger equations framework using a contact interaction model within the so-called modified rainbow ladder truncation. The present calculation takes into account the pseudovector component in the pion Bethe-Salpeter amplitude (BSA) and $π-π$ scattering effects, producing a $γ\to3π$ anomaly which is $1+6\mathcal{R}_π^2$ larger than the low energy prediction. Here $\mathcal{R_π}$ is the relative ratio of the pseudovector and pseudoscalar components in the pion BSA; with our parameters input, this correction raises the $γ\to3π$ anomaly by around $10\%$. The main outcome of this work is the unveiling of the origin of such correction, which could be a possible explanation of the discrepancy between the existing experimental data and the low energy prediction. Moreover, it is highlighted how the magnitude of the anomaly is affected in effective theories that require an irremovable ultraviolet cutoff. We find that for both the anomalous processes $π\to2γ$ and $γ\to 3π$, the missing contribution to the anomaly can be compensated by the additional structures related with the quark anomalous magnetic moment. △ Less

Submitted 11 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: 10 pages, 3 figures, references added

arXiv:2401.02058 [pdf, other]

Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model

Authors: Hien Dang, Tho Tran, Tan Nguyen, Nhat Ho

Abstract: The current paradigm of training deep neural networks for classification tasks includes minimizing the empirical risk that pushes the training loss value towards zero, even after the training error has been vanished. In this terminal phase of training, it has been observed that the last-layer features collapse to their class-means and these class-means converge to the vertices of a simplex Equiang… ▽ More The current paradigm of training deep neural networks for classification tasks includes minimizing the empirical risk that pushes the training loss value towards zero, even after the training error has been vanished. In this terminal phase of training, it has been observed that the last-layer features collapse to their class-means and these class-means converge to the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is termed as Neural Collapse (NC). To theoretically understand this phenomenon, recent works employ a simplified unconstrained feature model to prove that NC emerges at the global solutions of the training problem. However, when the training dataset is class-imbalanced, some NC properties will no longer be true. For example, the class-means geometry will skew away from the simplex ETF when the loss converges. In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model. We prove that, while the within-class features collapse property still holds in this setting, the class-means will converge to a structure consisting of orthogonal vectors with different lengths. Furthermore, we find that the classifier weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class, which generalizes NC in the class-balanced setting. We empirically prove our results through experiments on practical architectures and dataset. △ Less

Submitted 6 June, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: 2024 International Conference on Machine Learning

arXiv:2312.16346 [pdf, other]

An efficient approach to characterize spatio-temporal dependence in cortical surface fMRI data

Authors: Huy Dang, Marzia Cremona, Nicole Lazar, Francesca Chiaromonte

Abstract: Functional magnetic resonance imaging (fMRI) is a neuroimaging technique known for its ability to capture brain activity non-invasively and at fine spatial resolution (2-3mm). Cortical surface fMRI (cs-fMRI) is a recent development of fMRI that focuses on signals from tissues that have neuronal activities, as opposed to the whole brain. cs-fMRI data is plagued with non-stationary spatial correlati… ▽ More Functional magnetic resonance imaging (fMRI) is a neuroimaging technique known for its ability to capture brain activity non-invasively and at fine spatial resolution (2-3mm). Cortical surface fMRI (cs-fMRI) is a recent development of fMRI that focuses on signals from tissues that have neuronal activities, as opposed to the whole brain. cs-fMRI data is plagued with non-stationary spatial correlations and long temporal dependence which, if inadequately accounted for, can hinder downstream statistical analyses. We propose a fully integrated approach that captures both spatial non-stationarity and varying ranges of temporal dependence across regions of interest. More specifically, we impose non-stationary spatial priors on the latent activation fields and model temporal dependence via fractional Gaussian errors of varying Hurst parameters, which can be studied through a wavelet transformation and its coefficients' variances at different scales. We demonstrate the performance of our proposed approach through simulations and an application to a visual working memory task cs-fMRI dataset. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.05920 [pdf, other]

Local Randomized Neural Networks with Hybridized Discontinuous Petrov-Galerkin Methods for Stokes-Darcy Flows

Authors: Haoning Dang, Fei Wang

Abstract: This paper introduces a new numerical approach that integrates local randomized neural networks (LRNNs) and the hybridized discontinuous Petrov-Galerkin (HDPG) method for solving coupled fluid flow problems. The proposed method partitions the domain of interest into several subdomains and constructs an LRNN on each subdomain. Then, the HDPG scheme is used to couple the LRNNs to approximate the unk… ▽ More This paper introduces a new numerical approach that integrates local randomized neural networks (LRNNs) and the hybridized discontinuous Petrov-Galerkin (HDPG) method for solving coupled fluid flow problems. The proposed method partitions the domain of interest into several subdomains and constructs an LRNN on each subdomain. Then, the HDPG scheme is used to couple the LRNNs to approximate the unknown functions. We develop LRNN-HDPG methods based on velocity-stress formulation to solve two types of problems: Stokes-Darcy problems and Brinkman equations, which model the flow in porous media and free flow. We devise a simple and effective way to deal with the interface conditions in the Stokes-Darcy problems without adding extra terms to the numerical scheme. We conduct extensive numerical experiments to demonstrate the stability, efficiency, and robustness of the proposed method. The numerical results show that the LRNN-HDPG method can achieve high accuracy with a small number of degrees of freedom. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 21 pages, 4 figures

MSC Class: 65N30; 41A46

arXiv:2312.00640 [pdf, ps, other]

One to beat them all: "RYU'' -- a unifying framework for the construction of safe balls

Authors: Thu-Le Tran, Clément Elvira, Hong-Phuong Dang, Cédric Herzet

Abstract: In this paper, we put forth a novel framework (named ``RYU'') for the construction of ``safe'' balls, i.e. regions that provably contain the dual solution of a target optimization problem. We concentrate on the standard setup where the cost function is the sum of two terms: a closed, proper, convex Lipschitz-smooth function and a closed, proper, convex function. The RYU framework is shown to gener… ▽ More In this paper, we put forth a novel framework (named ``RYU'') for the construction of ``safe'' balls, i.e. regions that provably contain the dual solution of a target optimization problem. We concentrate on the standard setup where the cost function is the sum of two terms: a closed, proper, convex Lipschitz-smooth function and a closed, proper, convex function. The RYU framework is shown to generalize or improve upon all the results proposed in the last decade for the considered family of optimization problems. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 19 pages, 1 table

arXiv:2311.11086 [pdf]

LightBTSeg: A lightweight breast tumor segmentation model using ultrasound images via dual-path joint knowledge distillation

Authors: Hongjiang Guo, Shengwen Wang, Hao Dang, Kangle Xiao, Yaru Yang, Wenpei Liu, Tongtong Liu, Yiying Wan

Abstract: The accurate segmentation of breast tumors is an important prerequisite for lesion detection, which has significant clinical value for breast tumor research. The mainstream deep learning-based methods have achieved a breakthrough. However, these high-performance segmentation methods are formidable to implement in clinical scenarios since they always embrace high computation complexity, massive par… ▽ More The accurate segmentation of breast tumors is an important prerequisite for lesion detection, which has significant clinical value for breast tumor research. The mainstream deep learning-based methods have achieved a breakthrough. However, these high-performance segmentation methods are formidable to implement in clinical scenarios since they always embrace high computation complexity, massive parameters, slow inference speed, and huge memory consumption. To tackle this problem, we propose LightBTSeg, a dual-path joint knowledge distillation framework, for lightweight breast tumor segmentation. Concretely, we design a double-teacher model to represent the fine-grained feature of breast ultrasound according to different semantic feature realignments of benign and malignant breast tumors. Specifically, we leverage the bottleneck architecture to reconstruct the original Attention U-Net. It is regarded as a lightweight student model named Simplified U-Net. Then, the prior knowledge of benign and malignant categories is utilized to design the teacher network combined dual-path joint knowledge distillation, which distills the knowledge from cumbersome benign and malignant teachers to a lightweight student model. Extensive experiments conducted on breast ultrasound images (Dataset BUSI) and Breast Ultrasound Dataset B (Dataset B) datasets demonstrate that LightBTSeg outperforms various counterparts. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: 7 pages, 7 figures, conference

arXiv:2309.12153 [pdf, ps, other]

$a$-Numbers of Cyclic Degree $p^2$ Covers of the Projective Line

Authors: Huy Dang, Steven R. Groen

Abstract: We investigate the $a$-numbers of $\mathbb{Z}/p^2\mathbb{Z}$-covers in characteristic $p>2$ and extend a technique originally introduced by Farnell and Pries for $\mathbb{Z}/p\mathbb{Z}$-covers. As an application of our approach, we demonstrate that the $a$-numbers of ``minimal'' $\mathbb{Z}/9\mathbb{Z}$-covers can be deduced from the associated branching datum. We investigate the $a$-numbers of $\mathbb{Z}/p^2\mathbb{Z}$-covers in characteristic $p>2$ and extend a technique originally introduced by Farnell and Pries for $\mathbb{Z}/p\mathbb{Z}$-covers. As an application of our approach, we demonstrate that the $a$-numbers of ``minimal'' $\mathbb{Z}/9\mathbb{Z}$-covers can be deduced from the associated branching datum. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: 37 pages. Comments welcome!

arXiv:2308.13355 [pdf, other]

WorldSmith: Iterative and Expressive Prompting for World Building with a Generative AI

Authors: Hai Dang, Frederik Brudy, George Fitzmaurice, Fraser Anderson

Abstract: Crafting a rich and unique environment is crucial for fictional world-building, but can be difficult to achieve since illustrating a world from scratch requires time and significant skill. We investigate the use of recent multi-modal image generation systems to enable users iteratively visualize and modify elements of their fictional world using a combination of text input, sketching, and region-b… ▽ More Crafting a rich and unique environment is crucial for fictional world-building, but can be difficult to achieve since illustrating a world from scratch requires time and significant skill. We investigate the use of recent multi-modal image generation systems to enable users iteratively visualize and modify elements of their fictional world using a combination of text input, sketching, and region-based filling. WorldSmith enables novice world builders to quickly visualize a fictional world with layered edits and hierarchical compositions. Through a formative study (4 participants) and first-use study (13 participants) we demonstrate that WorldSmith offers more expressive interactions with prompt-based models. With this work, we explore how creatives can be empowered to leverage prompt-based generative AI as a tool in their creative process, beyond current "click-once" prompting UI paradigms. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: User Interface Software and Technology 2023

arXiv:2307.15442 [pdf, other]

Predicting pedestrian trajectories at different densities: A multi-criteria empirical analysis

Authors: Raphael Korbmacher, Huu-Tu Dang, Antoine Tordeux

Abstract: Predicting human trajectories is a challenging task due to the complexity of pedestrian behavior, which is influenced by external factors such as the scene's topology and interactions with other pedestrians. A special challenge arises from the dependence of the behaviour on the density of the scene. In the literature, deep learning algorithms show the best performance in predicting pedestrian traj… ▽ More Predicting human trajectories is a challenging task due to the complexity of pedestrian behavior, which is influenced by external factors such as the scene's topology and interactions with other pedestrians. A special challenge arises from the dependence of the behaviour on the density of the scene. In the literature, deep learning algorithms show the best performance in predicting pedestrian trajectories, but so far just for situations with low densities. In this study, we aim to investigate the suitability of these algorithms for high-density scenarios by evaluating them on different error metrics and comparing their accuracy to that of knowledge-based models that have been used since long time in the literature. The findings indicate that deep learning algorithms provide improved trajectory prediction accuracy in the distance metrics for all tested densities. Nevertheless, we observe a significant number of collisions in the predictions, especially in high-density scenarios. This issue arises partly due to the absence of a collision avoidance mechanism within the algorithms and partly because the distance-based collision metric is inadequate for dense situations. To address these limitations, we propose the introduction of a novel continuous collision metric based on pedestrians' time-to-collision. Subsequently, we outline how this metric can be utilized to enhance the training of the algorithms. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.03892 [pdf, other]

Embedding Mental Health Discourse for Community Recommendation

Authors: Hy Dang, Bang Nguyen, Noah Ziems, Meng Jiang

Abstract: Our paper investigates the use of discourse embedding techniques to develop a community recommendation system that focuses on mental health support groups on social media. Social media platforms provide a means for users to anonymously connect with communities that cater to their specific interests. However, with the vast number of online communities available, users may face difficulties in ident… ▽ More Our paper investigates the use of discourse embedding techniques to develop a community recommendation system that focuses on mental health support groups on social media. Social media platforms provide a means for users to anonymously connect with communities that cater to their specific interests. However, with the vast number of online communities available, users may face difficulties in identifying relevant groups to address their mental health concerns. To address this challenge, we explore the integration of discourse information from various subreddit communities using embedding techniques to develop an effective recommendation system. Our approach involves the use of content-based and collaborative filtering techniques to enhance the performance of the recommendation system. Our findings indicate that the proposed approach outperforms the use of each technique separately and provides interpretability in the recommendation process. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: Accepted to the 4th workshop on Computational Approaches to Discourse (CODI-2023) at ACL 2023

arXiv:2306.14711 [pdf, ps, other]

doi 10.1093/imrn/rnae060

The moduli space of cyclic covers in positive characteristic

Authors: Huy Dang, Matthias Hippold

Abstract: We study the $p$-rank stratification of the moduli space $\mathcal{ASW}_{(d_1,d_2,\ldots,d_n)}$, which represents $\mathbb{Z}/p^n$-covers in characteristic $p>0$ whose $\mathbb{Z}/p^i$-subcovers have conductor $d_i$. In particular, we identify the irreducible components of the moduli space and determine their dimensions. To achieve this, we analyze the ramification data of the represented curves a… ▽ More We study the $p$-rank stratification of the moduli space $\mathcal{ASW}_{(d_1,d_2,\ldots,d_n)}$, which represents $\mathbb{Z}/p^n$-covers in characteristic $p>0$ whose $\mathbb{Z}/p^i$-subcovers have conductor $d_i$. In particular, we identify the irreducible components of the moduli space and determine their dimensions. To achieve this, we analyze the ramification data of the represented curves and use it to classify all the irreducible components of the space. In addition, we provide a comprehensive list of pairs $(p,(d_1,d_2,\ldots,d_n))$ for which $\mathcal{ASW}_{(d_1,d_2,\ldots,d_n)}$ in characteristic $p$ is irreducible. Finally, we investigate the geometry of $\mathcal{ASW}_{(d_1,d_2,\ldots,d_n)}$ by studying the deformations of cyclic covers which vary the $p$-rank and the number of branch points. △ Less

Submitted 22 November, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: Fix a mistake in Theorem 4.16, remove Proposition 4.14, and make some minor changes

MSC Class: 14H30; 14H10; 11S31

Journal ref: International Mathematics Research Notices, rnae060, 2024

arXiv:2306.05023 [pdf, other]

Beyond Vanilla Variational Autoencoders: Detecting Posterior Collapse in Conditional and Hierarchical Variational Autoencoders

Authors: Hien Dang, Tho Tran, Tan Nguyen, Nhat Ho

Abstract: The posterior collapse phenomenon in variational autoencoder (VAE), where the variational posterior distribution closely matches the prior distribution, can hinder the quality of the learned latent variables. As a consequence of posterior collapse, the latent variables extracted by the encoder in VAE preserve less information from the input data and thus fail to produce meaningful representations… ▽ More The posterior collapse phenomenon in variational autoencoder (VAE), where the variational posterior distribution closely matches the prior distribution, can hinder the quality of the learned latent variables. As a consequence of posterior collapse, the latent variables extracted by the encoder in VAE preserve less information from the input data and thus fail to produce meaningful representations as input to the reconstruction process in the decoder. While this phenomenon has been an actively addressed topic related to VAE performance, the theory for posterior collapse remains underdeveloped, especially beyond the standard VAE. In this work, we advance the theoretical understanding of posterior collapse to two important and prevalent yet less studied classes of VAE: conditional VAE and hierarchical VAE. Specifically, via a non-trivial theoretical analysis of linear conditional VAE and hierarchical VAE with two levels of latent, we prove that the cause of posterior collapses in these models includes the correlation between the input and output of the conditional VAE and the effect of learnable encoder variance in the hierarchical VAE. We empirically validate our theoretical findings for linear conditional and hierarchical VAE and demonstrate that these results are also predictive for non-linear cases with extensive experiments. △ Less

Submitted 13 May, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: Accepted (Poster) at the Twelfth International Conference on Learning Representations

arXiv:2306.04307 [pdf, other]

The chiral anomaly and the pion transition form factor: beyond the cutoff

Authors: Hao Dang, Zanbin Xing, M. Atif Sultan, Khépani Raya, Lei Chang

Abstract: In the presence of a momentum cutoff, effective theories seem unable to faithfully reproduce the so called chiral anomaly in the Standard Model. A novel prospect to overcome this related issue is discussed herein via the calculation of the $γ^{*}π^0γ$ transition form factor, $G^{γ^* π^0 γ}(Q^2)$, whose normalization is intimately connected with the chiral anomaly and dynamical chiral symmetry brea… ▽ More In the presence of a momentum cutoff, effective theories seem unable to faithfully reproduce the so called chiral anomaly in the Standard Model. A novel prospect to overcome this related issue is discussed herein via the calculation of the $γ^{*}π^0γ$ transition form factor, $G^{γ^* π^0 γ}(Q^2)$, whose normalization is intimately connected with the chiral anomaly and dynamical chiral symmetry breaking (DCSB). To compute such transition, we employ contact interaction model of Quantum Chromodynamics (QCD) under a modified rainbow ladder truncation, which automatically generates a quark anomalous magnetic moment term, weighted by a strenght parameter $ξ$. This term, whose origin is also connected with DCSB, is interpreted as an additional interaction that mimics the complex dynamics beyond the cutoff. By fixing $ξ$ to produce the value of $G^{γ^* π^0 γ}(0)$ dictated by the chiral anomaly, the computed transition form factor, as well as the interaction radius and neutral pion decay width, turn out to be comparable with QCD-based studies and experimental data. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 9 pages, 2 figures

arXiv:2306.01768 [pdf, other]

A Quantitative Review on Language Model Efficiency Research

Authors: Meng Jiang, Hy Dang, Lingbo Tong

Abstract: Language models (LMs) are being scaled and becoming powerful. Improving their efficiency is one of the core research topics in neural information processing systems. Tay et al. (2022) provided a comprehensive overview of efficient Transformers that have become an indispensable staple in the field of NLP. However, in the section of "On Evaluation", they left an open question "which fundamental effi… ▽ More Language models (LMs) are being scaled and becoming powerful. Improving their efficiency is one of the core research topics in neural information processing systems. Tay et al. (2022) provided a comprehensive overview of efficient Transformers that have become an indispensable staple in the field of NLP. However, in the section of "On Evaluation", they left an open question "which fundamental efficient Transformer one should consider," answered by "still a mystery" because "many research papers select their own benchmarks." Unfortunately, there was not quantitative analysis about the performances of Transformers on any benchmarks. Moreover, state space models (SSMs) have demonstrated their abilities of modeling long-range sequences with non-attention mechanisms, which were not discussed in the prior review. This article makes a meta analysis on the results from a set of papers on efficient Transformers as well as those on SSMs. It provides a quantitative review on LM efficiency research and gives suggestions for future research. △ Less

Submitted 28 May, 2023; originally announced June 2023.

Comments: 29 pages, 24 tables

arXiv:2305.07524 [pdf]

Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution

Authors: Hoai Nam Dang, Vladimir Golkov, Thomas Wimmer, Daniel Cremers, Andreas Maier, Moritz Zaiss

Abstract: Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning appro… ▽ More Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning approach to perform an end-to-end optimization of MR sequence and neural net-work parameters for SR-TSE. This MR-physics-informed training procedure jointly optimizes the radiofrequency pulse train of a proton density- (PD-) and T2-weighted TSE and a subsequently applied convolutional neural network to predict the corresponding PDw and T2w super-resolution TSE images. The found radiofrequency pulse train designs generate an optimal signal for the NN to perform the SR task. Our method generalizes from the simulation-based optimi-zation to in vivo measurements and the acquired physics-informed SR images show higher correlation with a time-consuming segmented high-resolution TSE sequence compared to a pure network training approach. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 13 pages, 4 figures, 3 tables, submitted to MICCAI 2023 for review

arXiv:2305.00105 [pdf]

Modeling of mixed-mechanism stimulation for the enhancement of geothermal reservoirs

Authors: Hau Trung Dang, Eirik Keilegavlen, Inga Berre

Abstract: Hydraulic stimulation is a critical process for increasing the permeability of fractured geothermal reservoirs. This technique relies on coupled hydromechanical processes induced by reservoir stimulation through pressurized fluid injection into the rock formation. The injection of fluids causes poromechanical stress changes that can lead to the dilation of fractures due to fracture slip and to ten… ▽ More Hydraulic stimulation is a critical process for increasing the permeability of fractured geothermal reservoirs. This technique relies on coupled hydromechanical processes induced by reservoir stimulation through pressurized fluid injection into the rock formation. The injection of fluids causes poromechanical stress changes that can lead to the dilation of fractures due to fracture slip and to tensile fracture opening and propagation, so-called mixed-mechanism stimulation. The effective permeability of the rock is particularly enhanced when new fractures connect with pre-existing fractures. Mixed-mechanism stimulation can significantly improve the productivity of geothermal reservoirs, and the technique is especially important in reservoirs where the natural permeability of the rock is insufficient to allow for commercial flow rates. This paper presents a modeling approach for simulating the deformation and expansion of fracture networks in porous media under the influence of anisotropic stress and fluid injection. It utilizes a coupled hydromechanical model for poroelastic, fractured media. Fractures are governed by contact mechanics and allowed to grow and connect through a fracture propagation model. To conduct numerical simulations, we employ a twolevel approach, combining a finite volume method for poroelasticity with a finite element method for fracture propagation. The study investigates the impact of injection rate, matrix permeability, and stress anisotropy on stimulation outcomes. By analyzing these factors, we can better understand the behavior of fractured geothermal reservoirs under mixedmechanism stimulation. △ Less

Submitted 28 April, 2023; originally announced May 2023.

arXiv:2304.05864 [pdf, other]

Scale-Equivariant Deep Learning for 3D Data

Authors: Thomas Wimmer, Vladimir Golkov, Hoai Nam Dang, Moritz Zaiss, Andreas Maier, Daniel Cremers

Abstract: The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reason… ▽ More The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reasons such as the underlying object size or the resolution of the imaging modality. In this paper, we propose a scale-equivariant convolutional network layer for three-dimensional data that guarantees scale-equivariance in 3D CNNs. Scale-equivariance lifts the burden of having to learn each possible scale separately, allowing the neural network to focus on higher-level learning goals, which leads to better results and better data-efficiency. We provide an overview of the theoretical foundations and scientific work on scale-equivariant neural networks in the two-dimensional domain. We then transfer the concepts from 2D to the three-dimensional space and create a scale-equivariant convolutional layer for 3D data. Using the proposed scale-equivariant layer, we create a scale-equivariant U-Net for medical image segmentation and compare it with a non-scale-equivariant baseline method. Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariance for 3D medical image analysis. We publish our code at https://github.com/wimmerth/scale-equivariant-3d-convnet for further research and application. △ Less

Submitted 12 April, 2023; originally announced April 2023.

Comments: 12 pages, 4 figures

arXiv:2303.10359 [pdf, other]

A conforming discontinuous Galerkin finite element method for Brinkman equations

Authors: Haoning Dang, Qilong Zhai, Zhongshu Zhao

Abstract: In this paper, we present a conforming discontinuous Galerkin (CDG) finite element method for Brinkman equations. The velocity stabilizer is removed by employing the higher degree polynomials to compute the weak gradient. The theoretical analysis shows that the CDG method is actually stable and accurate for the Brinkman equations. Optimal order error estimates are established in $H^1$ and $L^2$ no… ▽ More In this paper, we present a conforming discontinuous Galerkin (CDG) finite element method for Brinkman equations. The velocity stabilizer is removed by employing the higher degree polynomials to compute the weak gradient. The theoretical analysis shows that the CDG method is actually stable and accurate for the Brinkman equations. Optimal order error estimates are established in $H^1$ and $L^2$ norm. Finally, numerical experiments verify the stability and accuracy of the CDG numerical scheme. △ Less

Submitted 18 March, 2023; originally announced March 2023.

Comments: 24 pages, 8 tables, 3 figures

MSC Class: 65N30

arXiv:2303.03199 [pdf, other]

doi 10.1145/3544548.3580969

Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting

Authors: Hai Dang, Sven Goller, Florian Lehmann, Daniel Buschek

Abstract: We propose a conceptual perspective on prompts for Large Language Models (LLMs) that distinguishes between (1) diegetic prompts (part of the narrative, e.g. "Once upon a time, I saw a fox..."), and (2) non-diegetic prompts (external, e.g. "Write about the adventures of the fox."). With this lens, we study how 129 crowd workers on Prolific write short texts with different user interfaces (1 vs 3 su… ▽ More We propose a conceptual perspective on prompts for Large Language Models (LLMs) that distinguishes between (1) diegetic prompts (part of the narrative, e.g. "Once upon a time, I saw a fox..."), and (2) non-diegetic prompts (external, e.g. "Write about the adventures of the fox."). With this lens, we study how 129 crowd workers on Prolific write short texts with different user interfaces (1 vs 3 suggestions, with/out non-diegetic prompts; implemented with GPT-3): When the interface offered multiple suggestions and provided an option for non-diegetic prompting, participants preferred choosing from multiple suggestions over controlling them via non-diegetic prompts. When participants provided non-diegetic prompts it was to ask for inspiration, topics or facts. Single suggestions in particular were guided both with diegetic and non-diegetic information. This work informs human-AI interaction with generative models by revealing that (1) writing non-diegetic prompts requires effort, (2) people combine diegetic and non-diegetic prompting, and (3) they use their draft (i.e. diegetic information) and suggestion timing to strategically guide LLMs. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 17 pages, 9 figures, 3 tables, ACM CHI 2023

ACM Class: H.5.2; I.2.7

arXiv:2301.00437 [pdf, other]

Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data

Authors: Hien Dang, Tho Tran, Stanley Osher, Hung Tran-The, Nhat Ho, Tan Nguyen

Abstract: Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-laye… ▽ More Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse (NC). Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified "unconstrained feature model". In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of NC under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios. △ Less

Submitted 18 June, 2023; v1 submitted 1 January, 2023; originally announced January 2023.

Comments: 75 pages, 20 figures, 4 tables. Hien Dang and Tho Tran contributed equally to this work

arXiv:2211.00899 [pdf, other]

LightVessel: Exploring Lightweight Coronary Artery Vessel Segmentation via Similarity Knowledge Distillation

Authors: Hao Dang, Yuekai Zhang, Xingqun Qi, Wanting Zhou, Muyi Sun

Abstract: In recent years, deep convolution neural networks (DCNNs) have achieved great prospects in coronary artery vessel segmentation. However, it is difficult to deploy complicated models in clinical scenarios since high-performance approaches have excessive parameters and high computation costs. To tackle this problem, we propose \textbf{LightVessel}, a Similarity Knowledge Distillation Framework, for… ▽ More In recent years, deep convolution neural networks (DCNNs) have achieved great prospects in coronary artery vessel segmentation. However, it is difficult to deploy complicated models in clinical scenarios since high-performance approaches have excessive parameters and high computation costs. To tackle this problem, we propose \textbf{LightVessel}, a Similarity Knowledge Distillation Framework, for lightweight coronary artery vessel segmentation. Primarily, we propose a Feature-wise Similarity Distillation (FSD) module for semantic-shift modeling. Specifically, we calculate the feature similarity between the symmetric layers from the encoder and decoder. Then the similarity is transferred as knowledge from a cumbersome teacher network to a non-trained lightweight student network. Meanwhile, for encouraging the student model to learn more pixel-wise semantic information, we introduce the Adversarial Similarity Distillation (ASD) module. Concretely, the ASD module aims to construct the spatial adversarial correlation between the annotation and prediction from the teacher and student models, respectively. Through the ASD module, the student model obtains fined-grained subtle edge segmented results of the coronary artery vessel. Extensive experiments conducted on Clinical Coronary Artery Vessel Dataset demonstrate that LightVessel outperforms various knowledge distillation counterparts. △ Less

Submitted 25 February, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: 5 pages, 7 figures, conference

arXiv:2210.12150 [pdf, other]

Formalizing Chemical Physics using the Lean Theorem Prover

Authors: Maxwell P. Bobbin, Samiha Sharlin, Parivash Feyzishendi, An Hong Dang, Catherine M. Wraback, Tyler R. Josephson

Abstract: Chemical theory can be made more rigorous using the Lean theorem prover, an interactive theorem prover for complex mathematics. We formalize the Langmuir and BET theories of adsorption, making each scientific premise clear and every step of the derivations explicit. Lean's math library, mathlib, provides formally verified theorems for infinite geometries series, which are central to BET theory. Wh… ▽ More Chemical theory can be made more rigorous using the Lean theorem prover, an interactive theorem prover for complex mathematics. We formalize the Langmuir and BET theories of adsorption, making each scientific premise clear and every step of the derivations explicit. Lean's math library, mathlib, provides formally verified theorems for infinite geometries series, which are central to BET theory. While writing these proofs, Lean prompts us to include mathematical constraints that were not originally reported. We also illustrate how Lean flexibly enables the reuse of proofs that build on more complex theories through the use of functions, definitions, and structures. Finally, we construct scientific frameworks for interoperable proofs, by creating structures for classical thermodynamics and kinematics, using them to formalize gas law relationships like Boyle's Law and equations of motion underlying Newtonian mechanics, respectively. This approach can be extended to other fields, enabling the formalization of rich and complex theories in science and engineering. △ Less

Submitted 8 December, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

arXiv:2209.02971 [pdf, other]

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Authors: Huu-Tien Dang, Thi-Hai-Yen Vuong, Xuan-Hieu Phan

Abstract: Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL… ▽ More Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transforming NSWs into pronounceable syllables, such as URL, email address, hashtag, and contact name. In this paper, we propose a new two-phase normalization approach to deal with these challenges. First, a model-based tagger is designed to detect NSWs. Then, depending on NSW types, a rule-based normalizer expands those NSWs into their final verbal forms. We conducted three empirical experiments for NSW detection using Conditional Random Fields (CRFs), BiLSTM-CNN-CRF, and BERT-BiGRU-CRF models on a manually annotated dataset including 5819 sentences extracted from Vietnamese news articles. In the second phase, we propose a forward lexicon-based maximum matching algorithm to split down the hashtag, email, URL, and contact name. The experimental results of the tagging phase show that the average F1 scores of the BiLSTM-CNN-CRF and CRF models are above 90.00%, reaching the highest F1 of 95.00% with the BERT-BiGRU-CRF model. Overall, our approach has low sentence error rates, at 8.15% with CRF and 7.11% with BiLSTM-CNN-CRF taggers, and only 6.67% with BERT-BiGRU-CRF tagger. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: The 14th International Conference on Knowledge and Systems Engineering (KSE 2022)

arXiv:2209.01390 [pdf, other]

How to Prompt? Opportunities and Challenges of Zero- and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models

Authors: Hai Dang, Lukas Mecke, Florian Lehmann, Sven Goller, Daniel Buschek

Abstract: Deep generative models have the potential to fundamentally change the way we create high-fidelity digital content but are often hard to control. Prompting a generative model is a promising recent development that in principle enables end-users to creatively leverage zero-shot and few-shot learning to assign new tasks to an AI ad-hoc, simply by writing them down. However, for the majority of end-us… ▽ More Deep generative models have the potential to fundamentally change the way we create high-fidelity digital content but are often hard to control. Prompting a generative model is a promising recent development that in principle enables end-users to creatively leverage zero-shot and few-shot learning to assign new tasks to an AI ad-hoc, simply by writing them down. However, for the majority of end-users writing effective prompts is currently largely a trial and error process. To address this, we discuss the key opportunities and challenges for interactive creative applications that use prompting as a new paradigm for Human-AI interaction. Based on our analysis, we propose four design goals for user interfaces that support prompting. We illustrate these with concrete UI design sketches, focusing on the use case of creative writing. The research community in HCI and AI can take these as starting points to develop adequate user interfaces for models capable of zero- and few-shot learning. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: 7 pages, 6 figures; Generative AI and HCI Workshop at CHI 2022

ACM Class: H.5.2; I.2.7

arXiv:2208.09323 [pdf, other]

doi 10.1145/3526113.3545672

Beyond Text Generation: Supporting Writers with Continuous Automatic Text Summaries

Authors: Hai Dang, Karim Benharrak, Florian Lehmann, Daniel Buschek

Abstract: We propose a text editor to help users plan, structure and reflect on their writing process. It provides continuously updated paragraph-wise summaries as margin annotations, using automatic text summarization. Summary levels range from full text, to selected (central) sentences, down to a collection of keywords. To understand how users interact with this system during writing, we conducted two use… ▽ More We propose a text editor to help users plan, structure and reflect on their writing process. It provides continuously updated paragraph-wise summaries as margin annotations, using automatic text summarization. Summary levels range from full text, to selected (central) sentences, down to a collection of keywords. To understand how users interact with this system during writing, we conducted two user studies (N=4 and N=8) in which people wrote analytic essays about a given topic and article. As a key finding, the summaries gave users an external perspective on their writing and helped them to revise the content and scope of their drafted paragraphs. People further used the tool to quickly gain an overview of the text and developed strategies to integrate insights from the automated summaries. More broadly, this work explores and highlights the value of designing AI tools for writers, with Natural Language Processing (NLP) capabilities that go beyond direct text generation and correction. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: 13 pages, 6 figures, 2 tables, ACM UIST 2022

ACM Class: H.5.2; I.2.7

arXiv:2208.01946 [pdf, other]

Mixed Fault Tolerance Protocols with Trusted Execution Environment

Authors: Mingyuan Gao, Hung Dang, Ee-Chien Chang, Jialin Li

Abstract: Blockchain systems are designed, built and operated in the presence of failures. There are two dominant failure models, namely crash fault and Byzantine fault. Byzantine fault tolerance (BFT) protocols offer stronger security guarantees, and thus are widely used in blockchain systems. However, their security guarantees come at a dear cost to their performance and scalability. Several works have im… ▽ More Blockchain systems are designed, built and operated in the presence of failures. There are two dominant failure models, namely crash fault and Byzantine fault. Byzantine fault tolerance (BFT) protocols offer stronger security guarantees, and thus are widely used in blockchain systems. However, their security guarantees come at a dear cost to their performance and scalability. Several works have improved BFT protocols, and Trusted Execution Environment (TEE) has been shown to be an effective solution. However, existing such works typically assume that each participating node is equipped with TEE. For blockchain systems wherein participants typically have different hardware configurations, i.e., some nodes feature TEE while others do not, existing TEE-based BFT protocols are not applicable. This work studies the setting wherein not all participating nodes feature TEE, under which we propose a new fault model called mixed fault. We explore a new approach to designing efficient distributed fault-tolerant protocols under the mixed fault model. In general, mixed fault tolerance (MFT) protocols assume a network of $n$ nodes, among which up to $f = \frac{n-2}{3}$ can be subject to mixed faults. We identify two key principles for designing efficient MFT protocols, namely, (i) prioritizing non-equivocating nodes in leading the protocol, and (ii) advocating the use of public-key cryptographic primitives that allow authenticated messages to be aggregated. We showcase these design principles by prescribing an MFT protocol, namely MRaft. We implemented a prototype of MRaft using Intel SGX, integrated it into the CCF blockchain framework, conducted experiments, and showed that MFT protocols can obtain the same security guarantees as their BFT counterparts while still providing better performance (both transaction throughput and latency) and scalability. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: 12 pages, 3 figures

arXiv:2208.00870 [pdf, other]

doi 10.1145/3543758.3543947

Suggestion Lists vs. Continuous Generation: Interaction Design for Writing with Generative Models on Mobile Devices Affect Text Length, Wording and Perceived Authorship

Authors: Florian Lehmann, Niklas Markert, Hai Dang, Daniel Buschek

Abstract: Neural language models have the potential to support human writing. However, questions remain on their integration and influence on writing and output. To address this, we designed and compared two user interfaces for writing with AI on mobile devices, which manipulate levels of initiative and control: 1) Writing with continuously generated text, the AI adds text word-by-word and user steers. 2) W… ▽ More Neural language models have the potential to support human writing. However, questions remain on their integration and influence on writing and output. To address this, we designed and compared two user interfaces for writing with AI on mobile devices, which manipulate levels of initiative and control: 1) Writing with continuously generated text, the AI adds text word-by-word and user steers. 2) Writing with suggestions, the AI suggests phrases and user selects from a list. In a supervised online study (N=18), participants used these prototypes and a baseline without AI. We collected touch interactions, ratings on inspiration and authorship, and interview data. With AI suggestions, people wrote less actively, yet felt they were the author. Continuously generated text reduced this perceived authorship, yet increased editing behavior. In both designs, AI increased text length and was perceived to influence wording. Our findings add new empirical evidence on the impact of UI design decisions on user experience and output with co-creative systems. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: Pre-Print, to appear in MuC '22: Mensch und Computer 2022

arXiv:2206.05718 [pdf, other]

smoothEM: a new approach for the simultaneous assessment of smooth patterns and spikes

Authors: Huy Dang, Marzia Cremona, Francesca Chiaromonte

Abstract: We consider functional data where an underlying smooth curve is composed not just with errors, but also with irregular spikes. We propose an approach that, combining regularized spline smoothing and an Expectation-Maximization algorithm, allows one to both identify spikes and estimate the smooth component. Imposing some assumptions on the error distribution, we prove consistency of EM estimates. N… ▽ More We consider functional data where an underlying smooth curve is composed not just with errors, but also with irregular spikes. We propose an approach that, combining regularized spline smoothing and an Expectation-Maximization algorithm, allows one to both identify spikes and estimate the smooth component. Imposing some assumptions on the error distribution, we prove consistency of EM estimates. Next, we demonstrate the performance of our proposal on finite samples and its robustness to assumptions violations through simulations. Finally, we apply our proposal to data on the annual heatwaves index in the US and on weekly electricity consumption in Ireland. In both datasets, we are able to characterize underlying smooth trends and to pinpoint irregular/extreme behaviors. △ Less

Submitted 16 July, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

arXiv:2203.02245 [pdf, other]

doi 10.1103/PhysRevLett.129.083602

Unveiling the Enhancement of Spontaneous Emission at Exceptional Points

Authors: Lydie Ferrier, Paul Bouteyre, Adi Pick, Sébastien Cueff, Nguyen Ha My Dang, Carole Diederichs, Ali Belarouci, Taha Benyattou, Jiaxin Zhao, Rui Su, Jun Xing, Qihua Xiong, Hai-Son Nguyen

Abstract: Exceptional points (EPs), singularities of non-Hermitian physics where complex spectral resonances degenerate, are one of the most exotic features of nonequilibrium open systems with unique properties. For instance, the emission rate of quantum emitters placed near resonators with EPs is enhanced (compared to the free-space emission rate) by a factor that scales quadratically with the resonance qu… ▽ More Exceptional points (EPs), singularities of non-Hermitian physics where complex spectral resonances degenerate, are one of the most exotic features of nonequilibrium open systems with unique properties. For instance, the emission rate of quantum emitters placed near resonators with EPs is enhanced (compared to the free-space emission rate) by a factor that scales quadratically with the resonance quality factor. Here, we verify the theory of spontaneous emission at EPs by measuring photoluminescence from photonic-crystal slabs that are embedded with a high-quantum-yield active material. While our experimental results verify the theoretically predicted enhancement, it also highlights the practical limitations on the enhancement due to material loss. Our designed structures can be used in applications that require enhanced and controlled emission, such as quantum sensing and imaging. △ Less

Submitted 16 July, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

arXiv:2203.00987 [pdf, other]

Beyond GAP screening for Lasso by exploiting new dual cutting half-spaces with supplementary material

Authors: Thu-Le Tran, Clément Elvira, Hong-Phuong Dang, Cédric Herzet

Abstract: In this paper, we propose a novel safe screening test for Lasso. Our procedure is based on a safe region with a dome geometry and exploits a canonical representation of the set of half-spaces (referred to as "dual cutting half-spaces" in this paper) containing the dual feasible set. The proposed safe region is shown to be always included in the state-of-the-art "GAP Sphere" and "GAP Dome" proposed… ▽ More In this paper, we propose a novel safe screening test for Lasso. Our procedure is based on a safe region with a dome geometry and exploits a canonical representation of the set of half-spaces (referred to as "dual cutting half-spaces" in this paper) containing the dual feasible set. The proposed safe region is shown to be always included in the state-of-the-art "GAP Sphere" and "GAP Dome" proposed by Fercoq et al. (and strictly so under very mild conditions) while involving the same computational burden. Numerical experiments confirm that our new dome enables to devise more powerful screening tests than GAP regions and lead to significant acceleration to solve Lasso. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: 6 pages, 2 figures

arXiv:2202.02053 [pdf, other]

doi 10.1145/3490100.3516471

SummaryLens -- A Smartphone App for Exploring Interactive Use of Automated Text Summarization in Everyday Life

Authors: Karim Benharrak, Florian Lehmann, Hai Dang, Daniel Buschek

Abstract: We present SummaryLens, a concept and prototype for a mobile tool that leverages automated text summarization to enable users to quickly scan and summarize physical text documents. We further combine this with a text-to-speech system to read out the summary on demand. With this concept, we propose and explore a concrete application case of bringing ongoing progress in AI and Natural Language Proce… ▽ More We present SummaryLens, a concept and prototype for a mobile tool that leverages automated text summarization to enable users to quickly scan and summarize physical text documents. We further combine this with a text-to-speech system to read out the summary on demand. With this concept, we propose and explore a concrete application case of bringing ongoing progress in AI and Natural Language Processing to a broad audience with interactive use cases in everyday life. Based on our implemented features, we describe a set of potential usage scenarios and benefits, including support for low-vision, low-literate and dyslexic users. A first usability study shows that the interactive use of automated text summarization in everyday life has noteworthy potential. We make the prototype available as an open-source project to facilitate further research on such tools. △ Less

Submitted 4 February, 2022; originally announced February 2022.

Comments: 4 pages, 1 figure, ACM IUI 2022 Companion

ACM Class: H.5.2

arXiv:2202.00965 [pdf, other]

doi 10.1145/3491102.3502141

GANSlider: How Users Control Generative Models for Images using Multiple Sliders with and without Feedforward Information

Authors: Hai Dang, Lukas Mecke, Daniel Buschek

Abstract: We investigate how multiple sliders with and without feedforward visualizations influence users' control of generative models. In an online study (N=138), we collected a dataset of people interacting with a generative adversarial network (StyleGAN2) in an image reconstruction task. We found that more control dimensions (sliders) significantly increase task difficulty and user actions. Visual feedf… ▽ More We investigate how multiple sliders with and without feedforward visualizations influence users' control of generative models. In an online study (N=138), we collected a dataset of people interacting with a generative adversarial network (StyleGAN2) in an image reconstruction task. We found that more control dimensions (sliders) significantly increase task difficulty and user actions. Visual feedforward partly mitigates this by enabling more goal-directed interaction. However, we found no evidence of faster or more accurate task performance. This indicates a tradeoff between feedforward detail and implied cognitive costs, such as attention. Moreover, we found that visualizations alone are not always sufficient for users to understand individual control dimensions. Our study quantifies fundamental UI design factors and resulting interaction behavior in this context, revealing opportunities for improvement in the UI design for interactive applications of generative models. We close by discussing design directions and further aspects. △ Less

Submitted 2 February, 2022; originally announced February 2022.

Comments: 15 pages, 10 figures, ACM CHI 2022

ACM Class: H.5.2

arXiv:2201.02898 [pdf, other]

doi 10.1103/PhysRevResearch.3.043170

Spin to charge conversion at Rashba-split SrTiO$_3$ interfaces from resonant tunneling

Authors: D. Q. To, T. H. Dang, L. Vila, J. P. Attané, M. Bibes, H. Jaffrès

Abstract: Spin-charge interconversion is a very active direction in spintronics. Yet, the complex behaviour of some of the most promising systems such as SrTiO$_3$ (STO) interfaces is not fully understood. Here, on the basis of a 6-band $\boldsymbol{k.p}$ method combined with spin-resolved scattering theory, we give a theoretical demonstration of transverse spin-charge interconversion physics in STO Rashba… ▽ More Spin-charge interconversion is a very active direction in spintronics. Yet, the complex behaviour of some of the most promising systems such as SrTiO$_3$ (STO) interfaces is not fully understood. Here, on the basis of a 6-band $\boldsymbol{k.p}$ method combined with spin-resolved scattering theory, we give a theoretical demonstration of transverse spin-charge interconversion physics in STO Rashba interfaces. Calculations involve injection of spin current from a ferromagnetic contact by resonant tunneling into the native Rashba-split resonant levels of the STO triangular quantum well. We compute an asymmetric tunneling electronic transmission yielding a transverse charge current flowing in plane, with a dependence with gate voltage in a very good agreement with existing experimental data. △ Less

Submitted 8 January, 2022; originally announced January 2022.

Comments: 21 pages, 6 figures

Journal ref: Phys. Rev. Research 3, 043170 (2021)

arXiv:2112.01811 [pdf]

doi 10.1016/j.ijrmms.2022.105248

Multiscale simulation of injection-induced fracture slip and wing-crack propagation in poroelastic media

Authors: Hau Trung Dang, Inga Berre, Eirik Keilegavlen

Abstract: In fractured poroelastic media under high differential stress, the shearing of fractures and faults and the corresponding propagation of wing cracks can be induced by fluid injection. Focusing on low-pressure stimulation with fluid pressures below the minimum principal stress but above the threshold required to overcome the fracture's frictional resistance to slip, this paper presents a mathematic… ▽ More In fractured poroelastic media under high differential stress, the shearing of fractures and faults and the corresponding propagation of wing cracks can be induced by fluid injection. Focusing on low-pressure stimulation with fluid pressures below the minimum principal stress but above the threshold required to overcome the fracture's frictional resistance to slip, this paper presents a mathematical model and a numerical solution approach for coupling fluid flow with fracture shearing and propagation. Numerical challenges are related to the strong coupling between hydraulic and mechanical processes, the material discontinuity the fractures represent in the medium, the wide range of spatial scales involved, and the strong effect that fracture deformation and propagation have on the physical processes. The solution approach is based on a multiscale strategy. In the macroscale model, flow in and poroelastic deformation of the matrix are coupled with the flow in the fractures and fracture contact mechanics, allowing fractures to frictionally slide. Fracture propagation is handled at the microscale, where the maximum tangential stress criterion triggers the propagation of fractures, and Paris' law governs the fracture growth processes. Simulations show how the shearing of a fracture due to fluid injection is linked to fracture propagation, including cases with hydraulically and mechanically interacting fractures. △ Less

Submitted 22 March, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

arXiv:2111.15632 [pdf, other]

Wound Healing Modeling Using Partial Differential Equation and Deep Learning

Authors: Hy Dang

Abstract: The process of wound healing has been an active area of research around the world. The problem is the wounds of different patients heal differently. For example, patients with a background of diabetes may have difficulties in healing [1]. By clearly understanding this process, we can determine the type and quantity of medicine to give to patients with varying types of wounds. In this research, we… ▽ More The process of wound healing has been an active area of research around the world. The problem is the wounds of different patients heal differently. For example, patients with a background of diabetes may have difficulties in healing [1]. By clearly understanding this process, we can determine the type and quantity of medicine to give to patients with varying types of wounds. In this research, we use a variation of the Alternating Direction Implicit method to solve a partial differential equation that models part of the wound healing process. Wound images are used as the dataset that we analyze. To segment the image's wound, we implement deep learning-based models. We show that the combination of a variant of the Alternating Direction Implicit method and Deep Learning provides a reasonably accurate model for the process of wound healing. To the best of our knowledge, this is the first attempt to combine both numerical PDE and deep learning techniques in an automated system to capture the long-term behavior of wound healing. △ Less

Submitted 13 September, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

Comments: This paper was part of an undergraduate honors thesis at Texas Christian University, written while being advised by Dr. Ken Richardson

arXiv:2110.15785 [pdf, other]

doi 10.1002/adom.202102386

Realization of room temperature polaritonic vortex in momentum space with hybrid Perovskite metasurface

Authors: Nguyen Ha My Dang, Simone Zanotti, Céline Chevalier, Gaëlle Trippé-Allard, Emmanuelle Deleporte, Mohamed Amara, Vincenzo Ardizzone, Daniele Sanvitto, Lucio Claudio Andreani, Christian Seassal, Dario Gerace, Hai Son Nguyen

Abstract: Exciton-polaritons are mixed light-matter excitations that result from the strong coupling regime between an active excitonic material and photonic resonances. Harnessing these hybrid excitations provides a rich playground to explore fascinating fundamental features, such as out-of-equilibrium Bose-Einstein condensation and quantum fluids of light, as well as novel mechanisms to be exploited in op… ▽ More Exciton-polaritons are mixed light-matter excitations that result from the strong coupling regime between an active excitonic material and photonic resonances. Harnessing these hybrid excitations provides a rich playground to explore fascinating fundamental features, such as out-of-equilibrium Bose-Einstein condensation and quantum fluids of light, as well as novel mechanisms to be exploited in optoelectronic devices. Here, we investigate experimentally the formation of exciton-polaritons arising from the mixing between hybrid inorganic-organic perovskite excitons and an optical Bound state In a Continuum (BIC) of a subwavelength-scale metasurface, at room temperature. These polaritonic eigenmodes, hereby called polariton BICs (pol-BICs) are revealed in both reflectivity, resonant scattering, and photoluminescence measurements. Although pol-BICs only exhibit a finite quality factor that is bounded by the non-radiative losses of the excitonic component, they fully inherit BIC peculiar features: a full uncoupling from the radiative continuum in the vertical direction, which is associated to a locally vanishing farfield radiation in momentum space. Most importantly, our experimental results confirm that the topological nature of the photonic BIC is perfectly transferred to the pol-BIC. This is evidenced with the observation of a polarization vortex in the farfield of polaritonic emission. Our results pave the way to engineer BIC physics of interacting bosons, as well as novel room temperature polaritonic devices. △ Less

Submitted 29 October, 2021; originally announced October 2021.

arXiv:2108.11695 [pdf, other]

PAENet: A Progressive Attention-Enhanced Network for 3D to 2D Retinal Vessel Segmentation

Authors: Zhuojie Wu, Zijian Wang, Wenxuan Zou, Fan Ji, Hao Dang, Wanting Zhou, Muyi Sun

Abstract: 3D to 2D retinal vessel segmentation is a challenging problem in Optical Coherence Tomography Angiography (OCTA) images. Accurate retinal vessel segmentation is important for the diagnosis and prevention of ophthalmic diseases. However, making full use of the 3D data of OCTA volumes is a vital factor for obtaining satisfactory segmentation results. In this paper, we propose a Progressive Attention… ▽ More 3D to 2D retinal vessel segmentation is a challenging problem in Optical Coherence Tomography Angiography (OCTA) images. Accurate retinal vessel segmentation is important for the diagnosis and prevention of ophthalmic diseases. However, making full use of the 3D data of OCTA volumes is a vital factor for obtaining satisfactory segmentation results. In this paper, we propose a Progressive Attention-Enhanced Network (PAENet) based on attention mechanisms to extract rich feature representation. Specifically, the framework consists of two main parts, the three-dimensional feature learning path and the two-dimensional segmentation path. In the three-dimensional feature learning path, we design a novel Adaptive Pooling Module (APM) and propose a new Quadruple Attention Module (QAM). The APM captures dependencies along the projection direction of volumes and learns a series of pooling coefficients for feature fusion, which efficiently reduces feature dimension. In addition, the QAM reweights the features by capturing four-group cross-dimension dependencies, which makes maximum use of 4D feature tensors. In the two-dimensional segmentation path, to acquire more detailed information, we propose a Feature Fusion Module (FFM) to inject 3D information into the 2D path. Meanwhile, we adopt the Polarized Self-Attention (PSA) block to model the semantic interdependencies in spatial and channel dimensions respectively. Experimentally, our extensive experiments on the OCTA-500 dataset show that our proposed algorithm achieves state-of-the-art performance compared with previous methods. △ Less

Submitted 16 December, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

Comments: Accepted by BIBM 2021

arXiv:2107.09854 [pdf, other]

Phase diagram of a pseudogap Anderson model with application to graphene

Authors: Hung T. Dang, Hoa T. M. Nghiem

Abstract: The Anderson model of an $s$-wave single-orbital correlated impurity placed on a noninteracting honeycomb lattice, a simplified model for studying an impurity on graphene, is used to investigate pseudogap Kondo problem. In this model, there are two quantum phases: the phase of free impurity local moment and the Kondo phase where this local moment is fully screened. The transition between these two… ▽ More The Anderson model of an $s$-wave single-orbital correlated impurity placed on a noninteracting honeycomb lattice, a simplified model for studying an impurity on graphene, is used to investigate pseudogap Kondo problem. In this model, there are two quantum phases: the phase of free impurity local moment and the Kondo phase where this local moment is fully screened. The transition between these two phases is under investigation. The work focuses mostly on the case where the impurity is placed on top of a lattice site. In this case, the full phase diagram is constructed using three parameters: the Hubbard interaction $U$, the hybridization strength $v_0$ and the impurity energy level $ε_d$. The phase diagram exhibits linear $(U^c, ε_d^c)$ phase boundary, the slope of which, as well as the critical value $ε_d^c$, depends strongly on $v_0^2$. Further analysis shows that the real part of the self energy at zero frequency and the impurity occupancy can help to understand the behaviors of the phase boundaries. The dependence of the phase transition on the impurity position is briefly discussed, revealing difficulties that one needs to solve in order to realize the pseudogap Kondo model in the realistic graphene lattice. △ Less

Submitted 20 July, 2021; originally announced July 2021.

Comments: 14 pages, 9 figures

arXiv:2107.01780 [pdf, ps, other]

The refined local lifting problem for cyclic covers of order four

Authors: Huy Dang

Abstract: Suppose $φ$ is a $\mathbb{Z}/4$-cover of a curve over an algebraically closed field $k$ of characteristic $2$, and $Φ_1$ is a \emph{nice} lift of $φ$'s $\mathbb{Z}/2$-sub-cover to a complete discrete valuation ring $R$ in characteristic zero. We show that there exist a finite extension $R'$ of $R$, which is determined by $Φ_1$, and a lift $Φ$ of $φ$ to $R'$ whose $\mathbb{Z}/2$-sub-cover isomorphi… ▽ More Suppose $φ$ is a $\mathbb{Z}/4$-cover of a curve over an algebraically closed field $k$ of characteristic $2$, and $Φ_1$ is a \emph{nice} lift of $φ$'s $\mathbb{Z}/2$-sub-cover to a complete discrete valuation ring $R$ in characteristic zero. We show that there exist a finite extension $R'$ of $R$, which is determined by $Φ_1$, and a lift $Φ$ of $φ$ to $R'$ whose $\mathbb{Z}/2$-sub-cover isomorphic to $Φ_1 \otimes_R R'$. That result gives a non-trivial family of cyclic covers where Sa{ï}di's refined lifting conjecture holds. In addition, the manuscript exhibits some phenomena that may shed some light on the mysterious moduli space of wildly ramified Galois covers. △ Less

Submitted 17 September, 2023; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: There were some gaps in section 4 of the previous version. The current result is now weaker than a known one but uses a different approach. We may consider replacing this manuscript with another one that yields a stronger result

MSC Class: 14H30; 14H10; 11S15

arXiv:2106.13432 [pdf, other]

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Authors: Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran

Abstract: Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol-like manipulation over objects, their behavior and interactions. Toward reaching this go… ▽ More Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol-like manipulation over objects, their behavior and interactions. Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects. At each stage of the video event flow, these objects interact with each other, and their interactions are reasoned about with respect to the query and under the overall context of a video. This mechanism is materialized into a family of general-purpose neural units and their multi-level architecture called Hierarchical Object-oriented Spatio-Temporal Reasoning (HOSTR) networks. This neural model maintains the objects' consistent lifelines in the form of a hierarchically nested spatio-temporal graph. Within this graph, the dynamic interactive object-oriented representations are built up along the video sequence, hierarchically abstracted in a bottom-up manner, and converge toward the key information for the correct answer. The method is evaluated on multiple major Video QA datasets and establishes new state-of-the-arts in these tasks. Analysis into the model's behavior indicates that object-oriented reasoning is a reliable, interpretable and efficient approach to Video QA. △ Less

Submitted 25 August, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

Comments: Accepted by IJCAI 2021. Please cite the conference version

arXiv:2106.00548 [pdf, other]

Enhanced Error Estimates for Augmented Subspace Method

Authors: Haikun Dang, Yifan Wang, Hehu Xie, Chenguang Zhou

Abstract: In this paper, some enhanced error estimates are derived for the augmented subspace methods which are designed for solving eigenvalue problems. We will show that the augmented subspace methods have the second order convergence rate which is better than the existing results. These sharper estimates provide a new dependence of convergence rate on the coarse spaces in augmented subspace methods. Thes… ▽ More In this paper, some enhanced error estimates are derived for the augmented subspace methods which are designed for solving eigenvalue problems. We will show that the augmented subspace methods have the second order convergence rate which is better than the existing results. These sharper estimates provide a new dependence of convergence rate on the coarse spaces in augmented subspace methods. These new results are also validated by some numerical examples. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: 19 pages, 24 figures

MSC Class: 65N30; 65N25; 65L15; 65B99

arXiv:2105.07777 [pdf, ps, other]

doi 10.1119/1.3139531

Diet Soda and Liquid Nitrogen

Authors: Bart H. McGuyer, Justin M. Brown, Hoan B. Dang

Abstract: Letter to the Editor about how the diet soda and Mentos reaction can be produced by the direct immersion of a plastic soda bottle in liquid nitrogen. Letter to the Editor about how the diet soda and Mentos reaction can be produced by the direct immersion of a plastic soda bottle in liquid nitrogen. △ Less

Submitted 14 May, 2021; originally announced May 2021.

Comments: 1 page

Journal ref: American Journal of Physics 77, 677 (2009)

Showing 1–50 of 126 results for author: Dang, H