Search | arXiv e-print repository

arXiv:2406.16987 [pdf]

AI for Equitable Tennis Training: Leveraging AI for Equitable and Accurate Classification of Tennis Skill Levels and Training Phases

Authors: Gyanna Gao, Hao-Yu Liao, Zhenhong Hu

Abstract: Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training s… ▽ More Numerous studies have demonstrated the manifold benefits of tennis, such as increasing overall physical and mental health. Unfortunately, many children and youth from low-income families are unable to engage in this sport mainly due to financial constraints such as private lesson expenses as well as logistical concerns to and back from such lessons and clinics. While several tennis self-training systems exist, they are often tailored for professionals and are prohibitively expensive. The present study aims to classify tennis players' skill levels and classify tennis strokes into phases characterized by motion attributes for a future development of an AI-based tennis self-training model for affordable and convenient applications running on devices used in daily life such as an iPhone or an Apple Watch for tennis skill improvement. We collected motion data, including Motion Yaw, Roll and Pitch from inertial measurement units (IMUs) worn by participating junior tennis players. For this pilot study, data from twelve participants were processed using Support Vector Machine (SVM) algorithms. The SVM models demonstrated an overall accuracy of 77% in classifying players as beginners or intermediates, with low rates of false positives and false negatives, effectively distinguishing skill levels. Additionally, the tennis swings were successfully classified into five phases based on the collected motion data. These findings indicate that SVM-based classification can be a reliable foundation for developing an equitable and accessible AI-driven tennis training system. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 21 pages, 9 figures, 1 table

arXiv:2406.16873 [pdf, other]

A Survey of Machine Learning Techniques for Improving Global Navigation Satellite Systems

Authors: Adyasha Mohanty, Grace Gao

Abstract: Global Navigation Satellite Systems (GNSS)-based positioning plays a crucial role in various applications, including navigation, transportation, logistics, mapping, and emergency services. Traditional GNSS positioning methods are model-based and they utilize satellite geometry and the known properties of satellite signals. However, model-based methods have limitations in challenging environments a… ▽ More Global Navigation Satellite Systems (GNSS)-based positioning plays a crucial role in various applications, including navigation, transportation, logistics, mapping, and emergency services. Traditional GNSS positioning methods are model-based and they utilize satellite geometry and the known properties of satellite signals. However, model-based methods have limitations in challenging environments and often lack adaptability to uncertain noise models. This paper highlights recent advances in Machine Learning (ML) and its potential to address these limitations. It covers a broad range of ML methods, including supervised learning, unsupervised learning, deep learning, and hybrid approaches. The survey provides insights into positioning applications related to GNSS such as signal analysis, anomaly detection, multi-sensor integration, prediction, and accuracy enhancement using ML. It discusses the strengths, limitations, and challenges of current ML-based approaches for GNSS positioning, providing a comprehensive overview of the field. △ Less

Submitted 29 March, 2024; originally announced June 2024.

Comments: Under consideration for EURASIP Journal on Advances in Signal Processing

arXiv:2406.16679 [pdf, other]

Multi-Robot Collaborative Localization and Planning with Inter-Ranging

Authors: Derek Knowles, Adam Dai, Grace Gao

Abstract: Robots often use feature-based image tracking to identify their position in their surrounding environment; however, feature-based image tracking is prone to errors in low-textured and poorly lit environments. Specifically, we investigate a scenario where robots are tasked with exploring the surface of the Moon and are required to have an accurate estimate of their position to be able to correctly… ▽ More Robots often use feature-based image tracking to identify their position in their surrounding environment; however, feature-based image tracking is prone to errors in low-textured and poorly lit environments. Specifically, we investigate a scenario where robots are tasked with exploring the surface of the Moon and are required to have an accurate estimate of their position to be able to correctly geotag scientific measurements. To reduce localization error, we complement traditional feature-based image tracking with ultra-wideband (UWB) distance measurements between the robots. The robots use an advanced mesh-ranging protocol that allows them to continuously share distance measurements amongst each other rather than relying on the common "anchor" and "tag" UWB architecture. We develop a decentralized multi-robot coordination algorithm that actively plans paths based on measurement line-of-sight vectors amongst all robots to minimize collective localization error. We then demonstrate the emergent behavior of the proposed multi-robot coordination algorithm both in simulation and hardware to lower a geometry-based uncertainty metric and reduce localization error. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2406.09759 [pdf, other]

Autonomous Constellation Fault Monitoring with Inter-satellite Links: A Rigidity-Based Approach

Authors: Keidai Iiyama, Daniel Neamati, Grace Gao

Abstract: To address the need for robust positioning, navigation, and timing services in lunar and Martian environments, this paper proposes a novel fault detection framework for satellite constellations using inter-satellite ranging (ISR). Traditional fault monitoring methods rely on intense monitoring from ground-based stations, which are impractical for lunar and Martian missions due to cost constraints.… ▽ More To address the need for robust positioning, navigation, and timing services in lunar and Martian environments, this paper proposes a novel fault detection framework for satellite constellations using inter-satellite ranging (ISR). Traditional fault monitoring methods rely on intense monitoring from ground-based stations, which are impractical for lunar and Martian missions due to cost constraints. Our approach leverages graph-rigidity theory to detect faults without relying on precise ephemeris. We model satellite constellations as graphs where satellites are vertices and inter-satellite links are edges. By analyzing the Euclidean Distance Matrix (EDM) derived from ISR measurements, we identify faults through the singular values of the geometric-centered EDM (GCEDM). A neural network predictor is employed to handle the diverse geometry of the graph, enhancing fault detection robustness. The proposed method is validated through simulations of constellations around Mars and the Moon, demonstrating its effectiveness in various configurations. This research contributes to the reliable operation of satellite constellations for future lunar and Martian exploration missions. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Submitted to ION GNSS+ 2024 Conference

arXiv:2406.07061 [pdf, other]

Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments

Authors: Gan Gao, Andrew H. Song, Fiona Wang, David Brenes, Rui Wang, Sarah S. L. Chow, Kevin W. Bishop, Lawrence D. True, Faisal Mahmood, Jonathan T. C. Liu

Abstract: Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibili… ▽ More Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibility to improve diagnostic determinations. A potential early route towards clinical adoption for 3D pathology is to rely on pathologists for final diagnosis based on viewing familiar 2D H&E-like image sections from the 3D datasets. However, manual examination of the massive 3D pathology datasets is infeasible. To address this, we present CARP3D, a deep learning triage approach that automatically identifies the highest-risk 2D slices within 3D volumetric biopsy, enabling time-efficient review by pathologists. For a given slice in the biopsy, we estimate its risk by performing attention-based aggregation of 2D patches within each slice, followed by pooling of the neighboring slices to compute a context-aware 2.5D risk score. For prostate cancer risk stratification, CARP3D achieves an area under the curve (AUC) of 90.4% for triaging slices, outperforming methods relying on independent analysis of 2D sections (AUC=81.3%). These results suggest that integrating additional depth context enhances the model's discriminative capabilities. In conclusion, CARP3D has the potential to improve pathologist diagnosis via accurate triage of high-risk slices within large-volume 3D pathology datasets. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: CVPR CVMI 2024

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6955-6965

arXiv:2405.15227 [pdf, other]

Neural Elevation Models for Terrain Mapping and Path Planning

Authors: Adam Dai, Shubh Gupta, Grace Gao

Abstract: This work introduces Neural Elevations Models (NEMos), which adapt Neural Radiance Fields to a 2.5D continuous and differentiable terrain model. In contrast to traditional terrain representations such as digital elevation models, NEMos can be readily generated from imagery, a low-cost data source, and provide a lightweight representation of terrain through an implicit continuous and differentiable… ▽ More This work introduces Neural Elevations Models (NEMos), which adapt Neural Radiance Fields to a 2.5D continuous and differentiable terrain model. In contrast to traditional terrain representations such as digital elevation models, NEMos can be readily generated from imagery, a low-cost data source, and provide a lightweight representation of terrain through an implicit continuous and differentiable height field. We propose a novel method for jointly training a height field and radiance field within a NeRF framework, leveraging quantile regression. Additionally, we introduce a path planning algorithm that performs gradient-based optimization of a continuous cost function for minimizing distance, slope changes, and control effort, enabled by differentiability of the height field. We perform experiments on simulated and real-world terrain imagery, demonstrating NEMos ability to generate high-quality reconstructions and produce smoother paths compared to discrete path planning methods. Future work will explore the incorporation of features and semantics into the height field, creating a generalized terrain model. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.05474 [pdf]

(Dis)placed Contributions: Uncovering Hidden Hurdles to Collaborative Writing Involving Non-Native Speakers, Native Speakers, and AI-Powered Editing Tools

Authors: Yimin Xiao, Yuewen Chen, Naomi Yamashita, Yuexi Chen, Zhicheng Liu, Ge Gao

Abstract: Content creation today often takes place via collaborative writing. A longstanding interest of CSCW research lies in understanding and promoting the coordination between co-writers. However, little attention has been paid to individuals who write in their non-native language and to co-writer groups involving them. We present a mixed-method study that fills the above gap. Our participants included… ▽ More Content creation today often takes place via collaborative writing. A longstanding interest of CSCW research lies in understanding and promoting the coordination between co-writers. However, little attention has been paid to individuals who write in their non-native language and to co-writer groups involving them. We present a mixed-method study that fills the above gap. Our participants included 32 co-writer groups, each consisting of one native speaker (NS) of English and one non-native speaker (NNS) with limited proficiency. They performed collaborative writing adopting two different workflows: half of the groups began with NNSs taking the first editing turn and half had NNSs act after NSs. Our data revealed a "late-mover disadvantage" exclusively experienced by NNSs: an NNS's ideational contributions to the joint document were suppressed when their editing turn was placed after an NS's turn, as opposed to ahead of it. Surprisingly, editing help provided by AI-powered tools did not exempt NNSs from being disadvantaged. Instead, it triggered NSs' overestimation of NNSs' English proficiency and agency displayed in the writing, introducing unintended tensions into the collaboration. These findings shed light on the fair assessment and effective promotion of a co-writer's contributions in language diverse settings. In particular, they underscore the necessity of disentangling contributions made to the ideational, expressional, and lexical aspects of the joint writing. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.15269 [pdf, other]

Aligning LLM Agents by Learning Latent Preference from User Edits

Authors: Ge Gao, Alexey Taymanov, Eduardo Salinas, Paul Mineiro, Dipendra Misra

Abstract: We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is natur… ▽ More We study interactive learning of LLM-based language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data. The inferred user preference descriptions are used to define prompts for generating responses in the future. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks. Furthermore, learning descriptive preference improves interpretability, allowing the user to view and modify the learned preference. However, user preference can be complex, subtle, and vary based on context, making it challenging to learn. To address this, we propose a simple yet effective algorithm named CIPHER that leverages the LLM to infer the user preference for a given context based on user edits. In the future, CIPHER retrieves inferred preferences from the k-closest contexts in the history, and forms an aggregate preference for response generation. We introduce two interactive environments -- summarization and email writing, and use a GPT-4 simulated user for evaluation. On both tasks, CIPHER outperforms several baselines by achieving the lowest edit distance cost while only having a small overhead in LLM query cost. Our analysis reports that user preferences learned by CIPHER show significant similarity to the ground truth latent preferences. △ Less

Submitted 9 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.13409 [pdf, other]

"I Wish There Were an AI": Challenges and AI Potential in Cancer Patient-Provider Communication

Authors: Ziqi Yang, Xuhai Xu, Bingsheng Yao, Jiachen Li, Jennifer Bagdasarian, Guodong Gao, Dakuo Wang

Abstract: Patient-provider communication has been crucial to cancer patients' survival after their cancer treatments. However, the research community and patients themselves often overlook the communication challenges after cancer treatments as they are overshadowed by the severity of the patient's illness and the variety and rarity of the cancer disease itself. Meanwhile, the recent technical advances in A… ▽ More Patient-provider communication has been crucial to cancer patients' survival after their cancer treatments. However, the research community and patients themselves often overlook the communication challenges after cancer treatments as they are overshadowed by the severity of the patient's illness and the variety and rarity of the cancer disease itself. Meanwhile, the recent technical advances in AI, especially in Large Language Models (LLMs) with versatile natural language interpretation and generation ability, demonstrate great potential to support communication in complex real-world medical situations. By interviewing six healthcare providers and eight cancer patients, our goal is to explore the providers' and patients' communication barriers in the post-cancer treatment recovery period, their expectations for future communication technologies, and the potential of AI technologies in this context. Our findings reveal several challenges in current patient-provider communication, including the knowledge and timing gaps between cancer patients and providers, their collaboration obstacles, and resource limitations. Moreover, based on providers' and patients' needs and expectations, we summarize a set of design implications for intelligent communication systems, especially with the power of LLMs. Our work sheds light on the design of future AI-powered systems for patient-provider communication under high-stake and high-uncertainty situations. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 18 pages, 2 figures, submission to CSCW'24

arXiv:2404.13273 [pdf, other]

Multi-feature Reconstruction Network using Crossed-mask Restoration for Unsupervised Anomaly Detection

Authors: Junpu Wang, Guili Xu, Chunlei Li, Guangshuai Gao, Yuehua Cheng

Abstract: Unsupervised anomaly detection using only normal samples is of great significance for quality inspection in industrial manufacturing. Although existing reconstruction-based methods have achieved promising results, they still face two problems: poor distinguishable information in image reconstruction and well abnormal regeneration caused by model over-generalization ability. To overcome the above i… ▽ More Unsupervised anomaly detection using only normal samples is of great significance for quality inspection in industrial manufacturing. Although existing reconstruction-based methods have achieved promising results, they still face two problems: poor distinguishable information in image reconstruction and well abnormal regeneration caused by model over-generalization ability. To overcome the above issues, we convert the image reconstruction into a combination of parallel feature restorations and propose a multi-feature reconstruction network, MFRNet, using crossed-mask restoration in this paper. Specifically, a multi-scale feature aggregator is first developed to generate more discriminative hierarchical representations of the input images from a pre-trained model. Subsequently, a crossed-mask generator is adopted to randomly cover the extracted feature map, followed by a restoration network based on the transformer structure for high-quality repair of the missing regions. Finally, a hybrid loss is equipped to guide model training and anomaly estimation, which gives consideration to both the pixel and structural similarity. Extensive experiments show that our method is highly competitive with or significantly outperforms other state-of-the-arts on four public available datasets and one self-made dataset. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.12617 [pdf, other]

Greedy Detection and Exclusion of Multiple Faults using Euclidean Distance Matrices

Authors: Derek Knowles, Grace Gao

Abstract: Numerous methods have been proposed for global navigation satellite system (GNSS) receivers to detect faulty GNSS signals. One such fault detection and exclusion (FDE) method is based on the mathematical concept of Euclidean distance matrices (EDMs). This paper outlines a greedy approach that uses an improved Euclidean distance matrix-based fault detection and exclusion algorithm. The novel greedy… ▽ More Numerous methods have been proposed for global navigation satellite system (GNSS) receivers to detect faulty GNSS signals. One such fault detection and exclusion (FDE) method is based on the mathematical concept of Euclidean distance matrices (EDMs). This paper outlines a greedy approach that uses an improved Euclidean distance matrix-based fault detection and exclusion algorithm. The novel greedy EDM FDE method implements a new fault detection test statistic and fault exclusion strategy that drastically simplifies the complexity of the algorithm over previous work. To validate the novel greedy EDM FDE algorithm, we created a simulated dataset using receiver locations from around the globe. The simulated dataset allows us to verify our results on 2,601 different satellite geometries. Additionally, we tested the greedy EDM FDE algorithm using a real-world dataset from seven different android phones. Across both the simulated and real-world datasets, the Python implementation of the greedy EDM FDE algorithm is shown to be computed an order of magnitude more rapidly than a comparable greedy residual FDE method while obtaining similar fault exclusion accuracy. We provide discussion on the comparative time complexities of greedy EDM FDE, greedy residual FDE, and solution separation. We also explain potential modifications to greedy residual FDE that can be added to alter performance characteristics. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Submitted to NAVIGATION: Journal of the Institute of Navigation

arXiv:2404.09155 [pdf, other]

Mitigating Heterogeneity among Factor Tensors via Lie Group Manifolds for Tensor Decomposition Based Temporal Knowledge Graph Embedding

Authors: Jiang Li, Xiangdong Su, Yeyun Gong, Guanglai Gao

Abstract: Recent studies have highlighted the effectiveness of tensor decomposition methods in the Temporal Knowledge Graphs Embedding (TKGE) task. However, we found that inherent heterogeneity among factor tensors in tensor decomposition significantly hinders the tensor fusion process and further limits the performance of link prediction. To overcome this limitation, we introduce a novel method that maps f… ▽ More Recent studies have highlighted the effectiveness of tensor decomposition methods in the Temporal Knowledge Graphs Embedding (TKGE) task. However, we found that inherent heterogeneity among factor tensors in tensor decomposition significantly hinders the tensor fusion process and further limits the performance of link prediction. To overcome this limitation, we introduce a novel method that maps factor tensors onto a unified smooth Lie group manifold to make the distribution of factor tensors approximating homogeneous in tensor decomposition. We provide the theoretical proof of our motivation that homogeneous tensors are more effective than heterogeneous tensors in tensor fusion and approximating the target for tensor decomposition based TKGE methods. The proposed method can be directly integrated into existing tensor decomposition based TKGE methods without introducing extra parameters. Extensive experiments demonstrate the effectiveness of our method in mitigating the heterogeneity and in enhancing the tensor decomposition based TKGE models. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08854 [pdf, other]

gnss_lib_py: Analyzing GNSS Data with Python

Authors: Derek Knowles, Ashwin Vivek Kanhere, Daniel Neamati, Grace Gao

Abstract: This paper presents gnss_lib_py, a Python library used to parse, analyze, and visualize data from a variety of GNSS (Global Navigation Satellite Systems) data sources. The gnss_lib_py library's ease of use, modular capabilities, testing coverage, and extensive documentation make it an attractive tool not only for scientific and industry users wanting a quick, out-of-the-box solution but also for a… ▽ More This paper presents gnss_lib_py, a Python library used to parse, analyze, and visualize data from a variety of GNSS (Global Navigation Satellite Systems) data sources. The gnss_lib_py library's ease of use, modular capabilities, testing coverage, and extensive documentation make it an attractive tool not only for scientific and industry users wanting a quick, out-of-the-box solution but also for advanced GNSS users developing new GNSS algorithms. gnss_lib_py has already demonstrated its usefulness and impact through presentation in academic conferences, use in research papers, and adoption in graduate-level university course curricula. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: Submitted to the SoftwareX journal

arXiv:2404.06180 [pdf, other]

YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images

Authors: Chenguang Liu, Guangshuai Gao, Ziyue Huang, Zhenghui Hu, Qingjie Liu, Yunhong Wang

Abstract: Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial images typically have very large sizes, generally with millions or even hundreds of millions of pixels, while computational resources are limited. 2) Small object size leads to insufficient information for effective detection. 3) Non-uniform object distribution leads to computational resource… ▽ More Detecting objects from aerial images poses significant challenges due to the following factors: 1) Aerial images typically have very large sizes, generally with millions or even hundreds of millions of pixels, while computational resources are limited. 2) Small object size leads to insufficient information for effective detection. 3) Non-uniform object distribution leads to computational resource wastage. To address these issues, we propose YOLC (You Only Look Clusters), an efficient and effective framework that builds on an anchor-free object detector, CenterNet. To overcome the challenges posed by large-scale images and non-uniform object distribution, we introduce a Local Scale Module (LSM) that adaptively searches cluster regions for zooming in for accurate detection. Additionally, we modify the regression loss using Gaussian Wasserstein distance (GWD) to obtain high-quality bounding boxes. Deformable convolution and refinement methods are employed in the detection head to enhance the detection of small objects. We perform extensive experiments on two aerial image datasets, including Visdrone2019 and UAVDT, to demonstrate the effectiveness and superiority of our proposed approach. Code is available at https://github.com/dawn-ech/YOLC. △ Less

Submitted 16 June, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: accepted to TITS

arXiv:2404.02880 [pdf, other]

doi 10.1145/3613904.3642608

Fragmented Moments, Balanced Choices: How Do People Make Use of Their Waiting Time?

Authors: Jian Zheng, Ge Gao

Abstract: Everyone spends some time waiting every day. HCI research has developed tools for boosting productivity while waiting. However, little is known about how people naturally spend their waiting time. We conducted an experience sampling study with 21 working adults who used a mobile app to report their daily waiting time activities over two weeks. The aim of this study is to understand the activities… ▽ More Everyone spends some time waiting every day. HCI research has developed tools for boosting productivity while waiting. However, little is known about how people naturally spend their waiting time. We conducted an experience sampling study with 21 working adults who used a mobile app to report their daily waiting time activities over two weeks. The aim of this study is to understand the activities people do while waiting and the effect of situational factors. We found that participants spent about 60% of their waiting time on leisure activities, 20% on productive activities, and 20% on maintenance activities. These choices are sensitive to situational factors, including accessible device, location, and certain routines of the day. Our study complements previous ones by demonstrating that people purpose waiting time for various goals beyond productivity and to maintain work-life balance. Our findings shed light on future empirical research and system design for time management. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 14 pages. 6 figures. Published at ACM CHI'24

ACM Class: H.5.m

arXiv:2403.13310 [pdf, other]

A Semantic Search Engine for Mathlib4

Authors: Guoxiong Gao, Haocheng Ju, Jiedong Jiang, Zihan Qin, Bin Dong

Abstract: The interactive theorem prover, Lean, enables the verification of formal mathematical proofs and is backed by an expanding community. Central to this ecosystem is its mathematical library, mathlib4, which lays the groundwork for the formalization of an expanding range of mathematical theories. However, searching for theorems in mathlib4 can be challenging. To successfully search in mathlib4, users… ▽ More The interactive theorem prover, Lean, enables the verification of formal mathematical proofs and is backed by an expanding community. Central to this ecosystem is its mathematical library, mathlib4, which lays the groundwork for the formalization of an expanding range of mathematical theories. However, searching for theorems in mathlib4 can be challenging. To successfully search in mathlib4, users often need to be familiar with its naming conventions or documentation strings. Therefore, creating a semantic search engine that can be used easily by individuals with varying familiarity with mathlib4 is very important. In this paper, we present a semantic search engine for mathlib4 that accepts informal queries and finds the relevant theorems. We also establish a benchmark for assessing the performance of various search engines for mathlib4. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.06064 [pdf, other]

L^2GC:Lorentzian Linear Graph Convolutional Networks for Node Classification

Authors: Qiuyu Liang, Weihua Wang, Feilong Bao, Guanglai Gao

Abstract: Linear Graph Convolutional Networks (GCNs) are used to classify the node in the graph data. However, we note that most existing linear GCN models perform neural network operations in Euclidean space, which do not explicitly capture the tree-like hierarchical structure exhibited in real-world datasets that modeled as graphs. In this paper, we attempt to introduce hyperbolic space into linear GCN an… ▽ More Linear Graph Convolutional Networks (GCNs) are used to classify the node in the graph data. However, we note that most existing linear GCN models perform neural network operations in Euclidean space, which do not explicitly capture the tree-like hierarchical structure exhibited in real-world datasets that modeled as graphs. In this paper, we attempt to introduce hyperbolic space into linear GCN and propose a novel framework for Lorentzian linear GCN. Specifically, we map the learned features of graph nodes into hyperbolic space, and then perform a Lorentzian linear feature transformation to capture the underlying tree-like structure of data. Experimental results on standard citation networks datasets with semi-supervised learning show that our approach yields new state-of-the-art results of accuracy 74.7$\%$ on Citeseer and 81.3$\%$ on PubMed datasets. Furthermore, we observe that our approach can be trained up to two orders of magnitude faster than other nonlinear GCN models on PubMed dataset. Our code is publicly available at https://github.com/llqy123/LLGC-master. △ Less

Submitted 14 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024

arXiv:2403.05817 [pdf, other]

SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

Authors: Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu

Abstract: LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent… ▽ More LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent works have attempted to construct fully sparse detectors to solve this issue; nevertheless, the resulting models either rely on a complex multi-stage pipeline or exhibit inferior performance. In this work, we propose SAFDNet, a straightforward yet highly effective architecture, tailored for fully sparse 3D object detection. In SAFDNet, an adaptive feature diffusion strategy is designed to address the center feature missing problem. We conducted extensive experiments on Waymo Open, nuScenes, and Argoverse2 datasets. SAFDNet performed slightly better than the previous SOTA on the first two datasets but much better on the last dataset, which features long-range detection, verifying the efficacy of SAFDNet in scenarios where long-range detection is required. Notably, on Argoverse2, SAFDNet surpassed the previous best hybrid detector HEDNet by 2.6% mAP while being 2.1x faster, and yielded 2.1% mAP gains over the previous best sparse detector FSDv2 while being 1.3x faster. The code will be available at https://github.com/zhanggang001/HEDNet. △ Less

Submitted 22 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024 (Oral)

arXiv:2403.04096 [pdf, ps, other]

Assisting International Migrants with Everyday Information Seeking: From the Providers' Lens

Authors: Yongle Zhang, Ge Gao

Abstract: International migrants face difficulties obtaining information for a quality life and well-being in the host country. Prior research indicates that international migrants often seek information from their co-national cohort or contacts from the same country. The downside of this practice, however, is that people can end up clustering in a small-world environment, hindering the information seekers'… ▽ More International migrants face difficulties obtaining information for a quality life and well-being in the host country. Prior research indicates that international migrants often seek information from their co-national cohort or contacts from the same country. The downside of this practice, however, is that people can end up clustering in a small-world environment, hindering the information seekers' social adaptation in the long run. In the current research, we investigated the ongoing practices and future opportunities to connect international migrants with others beyond their co-national contacts. Our work zooms in on the providers' perspectives, which complements previous studies that pay exclusive attention to the information seekers. Specifically, we conducted in-depth interviews with 21 participants assisting the needs of informational migrants in the United States. Some of these people are fellow migrants from a different home country than the information seeker, whereas the rest are domestic residents. Our data revealed how these participants dealt with language barriers, overcame knowledge disparities, and calibrated their effort commitment as information providers. Based on these findings, we discuss directions for future information and communication technologies (ICT) design that can facilitate international migrants' daily information seeking by accounting for the provider's needs and concerns. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.02274 [pdf, other]

NatSGD: A Dataset with Speech, Gestures, and Demonstrations for Robot Learning in Natural Human-Robot Interaction

Authors: Snehesh Shrestha, Yantian Zha, Saketh Banagiri, Ge Gao, Yiannis Aloimonos, Cornelia Fermuller

Abstract: Recent advancements in multimodal Human-Robot Interaction (HRI) datasets have highlighted the fusion of speech and gesture, expanding robots' capabilities to absorb explicit and implicit HRI insights. However, existing speech-gesture HRI datasets often focus on elementary tasks, like object pointing and pushing, revealing limitations in scaling to intricate domains and prioritizing human command d… ▽ More Recent advancements in multimodal Human-Robot Interaction (HRI) datasets have highlighted the fusion of speech and gesture, expanding robots' capabilities to absorb explicit and implicit HRI insights. However, existing speech-gesture HRI datasets often focus on elementary tasks, like object pointing and pushing, revealing limitations in scaling to intricate domains and prioritizing human command data over robot behavior records. To bridge these gaps, we introduce NatSGD, a multimodal HRI dataset encompassing human commands through speech and gestures that are natural, synchronized with robot behavior demonstrations. NatSGD serves as a foundational resource at the intersection of machine learning and HRI research, and we demonstrate its effectiveness in training robots to understand tasks through multimodal human commands, emphasizing the significance of jointly considering speech and gestures. We have released our dataset, simulator, and code to facilitate future research in human-robot interaction system learning; access these resources at https://www.snehesh.com/natsgd/ △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.13876 [pdf, other]

Scene Prior Filtering for Depth Map Super-Resolution

Authors: Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Jian Yang, Ying Tai, Guangwei Gao

Abstract: Multi-modal fusion is vital to the success of super-resolution of depth maps. However, commonly used fusion strategies, such as addition and concatenation, fall short of effectively bridging the modal gap. As a result, guided image filtering methods have been introduced to mitigate this issue. Nevertheless, it is observed that their filter kernels usually encounter significant texture interference… ▽ More Multi-modal fusion is vital to the success of super-resolution of depth maps. However, commonly used fusion strategies, such as addition and concatenation, fall short of effectively bridging the modal gap. As a result, guided image filtering methods have been introduced to mitigate this issue. Nevertheless, it is observed that their filter kernels usually encounter significant texture interference and edge inaccuracy. To tackle these two challenges, we introduce a Scene Prior Filtering network, SPFNet, which utilizes the priors surface normal and semantic map from large-scale models. Specifically, we design an All-in-one Prior Propagation that computes the similarity between multi-modal scene priors, i.e., RGB, normal, semantic, and depth, to reduce the texture interference. In addition, we present a One-to-one Prior Embedding that continuously embeds each single-modal prior into depth using Mutual Guided Filtering, further alleviating the texture interference while enhancing edges. Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance. △ Less

Submitted 23 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 14 pages

arXiv:2402.01681 [pdf, other]

Emojis Decoded: Leveraging ChatGPT for Enhanced Understanding in Social Media Communications

Authors: Yuhang Zhou, Paiheng Xu, Xiyao Wang, Xuan Lu, Ge Gao, Wei Ai

Abstract: Emojis, which encapsulate semantics beyond mere words or phrases, have become prevalent in social network communications. This has spurred increasing scholarly interest in exploring their attributes and functionalities. However, emoji-related research and application face two primary challenges. First, researchers typically rely on crowd-sourcing to annotate emojis in order to understand their sen… ▽ More Emojis, which encapsulate semantics beyond mere words or phrases, have become prevalent in social network communications. This has spurred increasing scholarly interest in exploring their attributes and functionalities. However, emoji-related research and application face two primary challenges. First, researchers typically rely on crowd-sourcing to annotate emojis in order to understand their sentiments, usage intentions, and semantic meanings. Second, subjective interpretations by users can often lead to misunderstandings of emojis and cause the communication barrier. Large Language Models (LLMs) have achieved significant success in various annotation tasks, with ChatGPT demonstrating expertise across multiple domains. In our study, we assess ChatGPT's effectiveness in handling previously annotated and downstream tasks. Our objective is to validate the hypothesis that ChatGPT can serve as a viable alternative to human annotators in emoji research and that its ability to explain emoji meanings can enhance clarity and transparency in online communications. Our findings indicate that ChatGPT has extensive knowledge of emojis. It is adept at elucidating the meaning of emojis across various application scenarios and demonstrates the potential to replace human annotators in a range of tasks. △ Less

Submitted 16 February, 2024; v1 submitted 22 January, 2024; originally announced February 2024.

Comments: 12 pages, 2 page appendix

arXiv:2401.04429 [pdf, other]

i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance

Authors: Haoyang Chen, Peiyan Sun, Qiyuan Song, Wanyuan Wang, Weiwei Wu, Wencan Zhang, Guanyu Gao, Yan Lyu

Abstract: Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the r… ▽ More Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the recommendation or not on their own. We propose i-Rebalance, a personalized vehicle reposition technique with deep reinforcement learning (DRL). i-Rebalance estimates drivers' decisions on accepting reposition recommendations through an on-field user study involving 99 real drivers. To optimize supply-demand balance and enhance preference satisfaction simultaneously, i-Rebalance has a sequential reposition strategy with dual DRL agents: Grid Agent to determine the reposition order of idle vehicles, and Vehicle Agent to provide personalized recommendations to each vehicle in the pre-defined order. This sequential learning strategy facilitates more effective policy training within a smaller action space compared to traditional joint-action methods. Evaluation of real-world trajectory data shows that i-Rebalance improves driver acceptance rate by 38.07% and total driver income by 9.97%. △ Less

Submitted 2 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.02292 [pdf, other]

GridFormer: Point-Grid Transformer for Surface Reconstruction

Authors: Shengtao Li, Ge Gao, Yudong Liu, Yu-Shen Liu, Ming Gu

Abstract: Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding the input points into regular grid features (plane or volume) has been commonly employed in existing approaches. However, these methods typically use the grid as an index for uniformly scattering point features. Compared with the irregu… ▽ More Implicit neural networks have emerged as a crucial technology in 3D surface reconstruction. To reconstruct continuous surfaces from discrete point clouds, encoding the input points into regular grid features (plane or volume) has been commonly employed in existing approaches. However, these methods typically use the grid as an index for uniformly scattering point features. Compared with the irregular point features, the regular grid features may sacrifice some reconstruction details but improve efficiency. To take full advantage of these two types of features, we introduce a novel and high-efficiency attention mechanism between the grid and point features named Point-Grid Transformer (GridFormer). This mechanism treats the grid as a transfer point connecting the space and point cloud. Our method maximizes the spatial expressiveness of grid features and maintains computational efficiency. Furthermore, optimizing predictions over the entire space could potentially result in blurred boundaries. To address this issue, we further propose a boundary optimization strategy incorporating margin binary cross-entropy loss and boundary sampling. This approach enables us to achieve a more precise representation of the object structure. Our experiments validate that our method is effective and outperforms the state-of-the-art approaches under widely used benchmarks by producing more precise geometry reconstructions. The code is available at https://github.com/list17/GridFormer. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.14931 [pdf]

A Parallel IFC Normalization Algorithm for Incremental Storage and Version Control

Authors: Han Liu, Ge Gao, Ming Gu

Abstract: Industry Foundation Classes (IFC) files are commonly used for data exchange of Building Information Models (BIMs). Due to the equivalent transformations in the graph structure of IFC data, it is a challenge to perform version comparison and incremental storage on IFC files. In this paper, an IFC normalization method is proposed, which can reduce the influence of the equivalent transformations, so… ▽ More Industry Foundation Classes (IFC) files are commonly used for data exchange of Building Information Models (BIMs). Due to the equivalent transformations in the graph structure of IFC data, it is a challenge to perform version comparison and incremental storage on IFC files. In this paper, an IFC normalization method is proposed, which can reduce the influence of the equivalent transformations, so that the normalized IFC file can be directly used in Git-like tools for version comparison and incremental storage. The algorithm is also designed for getting stable results when running on multi-threads. Experiments show the efficiency of the algorithm and its potential in Common Data Environment (CDE) applications. △ Less

Submitted 12 September, 2023; originally announced December 2023.

Comments: in: 30th International Workshop on Intelligent Computing in Engineering (EG-ICE), 2023: 511-520

arXiv:2312.13977 [pdf, other]

NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views

Authors: Han Huang, Yulun Wu, Junsheng Zhou, Ge Gao, Ming Gu, Yu-Shen Liu

Abstract: Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer… ▽ More Recently, neural implicit functions have demonstrated remarkable results in the field of multi-view reconstruction. However, most existing methods are tailored for dense views and exhibit unsatisfactory performance when dealing with sparse views. Several latest methods have been proposed for generalizing implicit reconstruction to address the sparse view reconstruction task, but they still suffer from high training costs and are merely valid under carefully selected perspectives. In this paper, we propose a novel sparse view reconstruction framework that leverages on-surface priors to achieve highly faithful surface reconstruction. Specifically, we design several constraints on global geometry alignment and local geometry refinement for jointly optimizing coarse shapes and fine details. To achieve this, we train a neural network to learn a global implicit field from the on-surface points obtained from SfM and then leverage it as a coarse geometric constraint. To exploit local geometric consistency, we project on-surface points onto seen and unseen views, treating the consistent loss of projected features as a fine geometric constraint. The experimental results with DTU and BlendedMVS datasets in two prevalent sparse settings demonstrate significant improvements over the state-of-the-art methods. △ Less

Submitted 21 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024. Project page: https://alvin528.github.io/NeuSurf/

arXiv:2312.02605 [pdf, other]

doi 10.1109/PCS60826.2024.10566283

Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

Authors: Tianhao Peng, Ge Gao, Heming Sun, Fan Zhang, David Bull

Abstract: In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a… ▽ More In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with high computational complexity and latency, in particular at the decoder side, which limits their deployment in practical applications. In this paper, we present a novel model-agnostic pruning scheme based on gradient decay and adaptive layer-wise distillation. Gradient decay enhances parameter exploration during sparsification whilst preventing runaway sparsity and is superior to the standard Straight-Through Estimation. The adaptive layer-wise distillation regulates the sparse training in various stages based on the distortion of intermediate features. This stage-wise design efficiently updates parameters with minimal computational overhead. The proposed approach has been applied to three popular end-to-end learnt video codecs, FVC, DCVC, and DCVC-HEM. Results confirm that our method yields up to 65% reduction in MACs and 2x speed-up with less than 0.3dB drop in BD-PSNR. Supporting code and supplementary material can be downloaded from: https://jasminepp.github.io/lightweightdvc/ △ Less

Submitted 5 December, 2023; originally announced December 2023.

Report number: 2312.02605

arXiv:2312.00093 [pdf, other]

GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs

Authors: Gege Gao, Weiyang Liu, Anpei Chen, Andreas Geiger, Bernhard Schölkopf

Abstract: As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized… ▽ More As pretrained text-to-image diffusion models become increasingly powerful, recent efforts have been made to distill knowledge from these text-to-image pretrained models for optimizing a text-guided 3D model. Most of the existing methods generate a holistic 3D model from a plain text input. This can be problematic when the text describes a complex scene with multiple objects, because the vectorized text embeddings are inherently unable to capture a complex description with multiple entities and relationships. Holistic 3D modeling of the entire scene further prevents accurate grounding of text entities and concepts. To address this limitation, we propose GraphDreamer, a novel framework to generate compositional 3D scenes from scene graphs, where objects are represented as nodes and their interactions as edges. By exploiting node and edge information in scene graphs, our method makes better use of the pretrained text-to-image diffusion model and is able to fully disentangle different objects without image-level supervision. To facilitate modeling of object-wise relationships, we use signed distance fields as representation and impose a constraint to avoid inter-penetration of objects. To avoid manual scene graph creation, we design a text prompt for ChatGPT to generate scene graphs based on text inputs. We conduct both qualitative and quantitative experiments to validate the effectiveness of GraphDreamer in generating high-fidelity compositional 3D scenes with disentangled object entities. △ Less

Submitted 10 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

Comments: CVPR 2024 (18 pages, 11 figures, https://graphdreamer.github.io/)

arXiv:2311.16114 [pdf]

Learning Noise-Robust Joint Representation for Multimodal Emotion Recognition under Incomplete Data Scenarios

Authors: Qi Fan, Haolin Zuo, Rui Liu, Zheng Lian, Guanglai Gao

Abstract: Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data across different modalities. To overcome these challenges, researchers have aimed to simulate incomplete conditions during the training phase to enhance the system's overall robustness. Traditional methods have often involved discarding data or substituting data seg… ▽ More Multimodal emotion recognition (MER) in practical scenarios is significantly challenged by the presence of missing or incomplete data across different modalities. To overcome these challenges, researchers have aimed to simulate incomplete conditions during the training phase to enhance the system's overall robustness. Traditional methods have often involved discarding data or substituting data segments with zero vectors to approximate these incompletenesses. However, such approaches neither accurately represent real-world conditions nor adequately address the issue of noisy data availability. For instance, a blurry image cannot be simply replaced with zero vectors, and still retain information. To tackle this issue and develop a more precise MER system, we introduce a novel noise-robust MER model that effectively learns robust multimodal joint representations from noisy data. This approach includes two pivotal components: firstly, a noise scheduler that adjusts the type and level of noise in the data to emulate various realistic incomplete situations. Secondly, a Variational AutoEncoder (VAE)-based module is employed to reconstruct these robust multimodal joint representations from the noisy inputs. Notably, the introduction of the noise scheduler enables the exploration of an entirely new type of incomplete data condition, which is impossible with existing methods. Extensive experimental evaluations on the benchmark datasets IEMOCAP and CMU-MOSEI demonstrate the effectiveness of the noise scheduler and the excellent performance of our proposed model. △ Less

Submitted 7 May, 2024; v1 submitted 21 September, 2023; originally announced November 2023.

arXiv:2310.20234 [pdf, other]

HEDNet: A Hierarchical Encoder-Decoder Network for 3D Object Detection in Point Clouds

Authors: Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Xiaolin Hu

Abstract: 3D object detection in point clouds is important for autonomous driving systems. A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene. Existing high-performance methods typically employ 3D sparse convolutional neural networks with small kernels to extract features. To reduce computational costs, these methods resort to submanifold sparse convo… ▽ More 3D object detection in point clouds is important for autonomous driving systems. A primary challenge in 3D object detection stems from the sparse distribution of points within the 3D scene. Existing high-performance methods typically employ 3D sparse convolutional neural networks with small kernels to extract features. To reduce computational costs, these methods resort to submanifold sparse convolutions, which prevent the information exchange among spatially disconnected features. Some recent approaches have attempted to address this problem by introducing large-kernel convolutions or self-attention mechanisms, but they either achieve limited accuracy improvements or incur excessive computational costs. We propose HEDNet, a hierarchical encoder-decoder network for 3D object detection, which leverages encoder-decoder blocks to capture long-range dependencies among features in the spatial space, particularly for large and distant objects. We conducted extensive experiments on the Waymo Open and nuScenes datasets. HEDNet achieved superior detection accuracy on both datasets than previous state-of-the-art methods with competitive efficiency. The code is available at https://github.com/zhanggang001/HEDNet. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2310.16924 [pdf, other]

Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors

Authors: Nikita Mehandru, Sweta Agrawal, Yimin Xiao, Elaine C Khoong, Ge Gao, Marine Carpuat, Niloufar Salehi

Abstract: A major challenge in the practical use of Machine Translation (MT) is that users lack guidance to make informed decisions about when to rely on outputs. Progress in quality estimation research provides techniques to automatically assess MT quality, but these techniques have primarily been evaluated in vitro by comparison against human judgments outside of a specific context of use. This paper eval… ▽ More A major challenge in the practical use of Machine Translation (MT) is that users lack guidance to make informed decisions about when to rely on outputs. Progress in quality estimation research provides techniques to automatically assess MT quality, but these techniques have primarily been evaluated in vitro by comparison against human judgments outside of a specific context of use. This paper evaluates quality estimation feedback in vivo with a human study simulating decision-making in high-stakes medical settings. Using Emergency Department discharge instructions, we study how interventions based on quality estimation versus backtranslation assist physicians in deciding whether to show MT outputs to a patient. We find that quality estimation improves appropriate reliance on MT, but backtranslation helps physicians detect more clinically harmful errors that QE alone often misses. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2310.11834 [pdf, other]

HB-net: Holistic bursting cell cluster integrated network for occluded multi-objects recognition

Authors: Xudong Gao, Xiao Guang Gao, Jia Rong, Xiaowei Chen, Xiang Liao, Jun Chen

Abstract: Within the realm of image recognition, a specific category of multi-label classification (MLC) challenges arises when objects within the visual field may occlude one another, demanding simultaneous identification of both occluded and occluding objects. Traditional convolutional neural networks (CNNs) can tackle these challenges; however, those models tend to be bulky and can only attain modest lev… ▽ More Within the realm of image recognition, a specific category of multi-label classification (MLC) challenges arises when objects within the visual field may occlude one another, demanding simultaneous identification of both occluded and occluding objects. Traditional convolutional neural networks (CNNs) can tackle these challenges; however, those models tend to be bulky and can only attain modest levels of accuracy. Leveraging insights from cutting-edge neural science research, specifically the Holistic Bursting (HB) cell, this paper introduces a pioneering integrated network framework named HB-net. Built upon the foundation of HB cell clusters, HB-net is designed to address the intricate task of simultaneously recognizing multiple occluded objects within images. Various Bursting cell cluster structures are introduced, complemented by an evidence accumulation mechanism. Testing is conducted on multiple datasets comprising digits and letters. The results demonstrate that models incorporating the HB framework exhibit a significant $2.98\%$ enhancement in recognition accuracy compared to models without the HB framework ($1.0298$ times, $p=0.0499$). Although in high-noise settings, standard CNNs exhibit slightly greater robustness when compared to HB-net models, the models that combine the HB framework and EA mechanism achieve a comparable level of accuracy and resilience to ResNet50, despite having only three convolutional layers and approximately $1/30$ of the parameters. The findings of this study offer valuable insights for improving computer vision algorithms. The essential code is provided at https://github.com/d-lab438/hb-net.git. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.07718 [pdf]

doi 10.1109/MCOM.001.2300461

Long-term and Real-time High-speed Underwater Wireless Optical Communications in Deep Sea

Authors: Jialiang Zhang, Sujing Wang, Ziqi Ma, Guanjun Gao, Yonggang Guo, Fei Zhang, Shanguo Huang, Jie Zhang

Abstract: Seafloor observation network can perform all-weather, long-term, continuous, real-time, and in-situ observation of the ocean by combing various observation methods including cabled seafloor nodes, self-contained nodes, as well as mobile platforms, where reliable and long-term high-speed underwater wireless communication becomes an essential demand. Recently, underwater wireless optical communicati… ▽ More Seafloor observation network can perform all-weather, long-term, continuous, real-time, and in-situ observation of the ocean by combing various observation methods including cabled seafloor nodes, self-contained nodes, as well as mobile platforms, where reliable and long-term high-speed underwater wireless communication becomes an essential demand. Recently, underwater wireless optical communication (UWOC) has emerged as a highly promising solution and is rapidly becoming a research hotspot for meeting this requirement. In this article, we demonstrate the experiment and application of high-speed UWOC system for deep sea seafloor observation network. To the best of our knowledge this is the first long-term real-time deep-sea UWOC link with bitrate as high as 125 Mbps. Between 30 m distance and at a depth of 1650 m, two-way Ethernet UWOC links are realized with 125 Mbps direction-adjustable green light link and 6.25 Mbps non-line-of-sight (NLOS) blue light link. High quality video transmission of 8K 30 FPS and 4K 120 FPS are realized through high-speed 125 Mbps green light link, with 100% peak signal-to-noise ratio (PSNR) agreement, showing the capability of transmitting high-quality videos lossless. The 30-day long-term measurement results show that the BER performance of both 125 Mbps and 6.25 Mbps links is lower than 10-5, proving the stability and reliability of this UWOC system at depth of 1650 m. The maximum transmission distance for the green and blue light links are estimated to be 117.7 and 128.3 m with considering the geometry loss, which can be extended to 231.6 and 337.5 m without geometry loss. As the first long-term and real-time UWOC system in deep sea, we believe this demonstration can provide valuable experience for further UWOC studies and converged ocean observation networking with cabled and cable-less observation platforms. △ Less

Submitted 13 December, 2023; v1 submitted 23 July, 2023; originally announced October 2023.

arXiv:2310.07123 [pdf, other]

Off-Policy Evaluation for Human Feedback

Authors: Qitong Gao, Ge Gao, Juncheng Dong, Vahid Tarokh, Min Chi, Miroslav Pajic

Abstract: Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare.… ▽ More Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, as well as in a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods. △ Less

Submitted 14 October, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2310.06534 [pdf]

Disk failure prediction based on multi-layer domain adaptive learning

Authors: Guangfu Gao, Peng Wu, Hussain Dawood

Abstract: Large scale data storage is susceptible to failure. As disks are damaged and replaced, traditional machine learning models, which rely on historical data to make predictions, struggle to accurately predict disk failures. This paper presents a novel method for predicting disk failures by leveraging multi-layer domain adaptive learning techniques. First, disk data with numerous faults is selected as… ▽ More Large scale data storage is susceptible to failure. As disks are damaged and replaced, traditional machine learning models, which rely on historical data to make predictions, struggle to accurately predict disk failures. This paper presents a novel method for predicting disk failures by leveraging multi-layer domain adaptive learning techniques. First, disk data with numerous faults is selected as the source domain, and disk data with fewer faults is selected as the target domain. A training of the feature extraction network is performed with the selected origin and destination domains. The contrast between the two domains facilitates the transfer of diagnostic knowledge from the domain of source and target. According to the experimental findings, it has been demonstrated that the proposed technique can generate a reliable prediction model and improve the ability to predict failures on disk data with few failure samples. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.06368 [pdf, other]

CoinSeg: Contrast Inter- and Intra- Class Representations for Incremental Segmentation

Authors: Zekang Zhang, Guangyu Gao, Jianbo Jiao, Chi Harold Liu, Yunchao Wei

Abstract: Class incremental semantic segmentation aims to strike a balance between the model's stability and plasticity by maintaining old knowledge while adapting to new concepts. However, most state-of-the-art methods use the freeze strategy for stability, which compromises the model's plasticity.In contrast, releasing parameter training for plasticity could lead to the best performance for all categories… ▽ More Class incremental semantic segmentation aims to strike a balance between the model's stability and plasticity by maintaining old knowledge while adapting to new concepts. However, most state-of-the-art methods use the freeze strategy for stability, which compromises the model's plasticity.In contrast, releasing parameter training for plasticity could lead to the best performance for all categories, but this requires discriminative feature representation.Therefore, we prioritize the model's plasticity and propose the Contrast inter- and intra-class representations for Incremental Segmentation (CoinSeg), which pursues discriminative representations for flexible parameter tuning. Inspired by the Gaussian mixture model that samples from a mixture of Gaussian distributions, CoinSeg emphasizes intra-class diversity with multiple contrastive representation centroids. Specifically, we use mask proposals to identify regions with strong objectness that are likely to be diverse instances/centroids of a category. These mask proposals are then used for contrastive representations to reinforce intra-class diversity. Meanwhile, to avoid bias from intra-class diversity, we also apply category-level pseudo-labels to enhance category-level consistency and inter-category diversity. Additionally, CoinSeg ensures the model's stability and alleviates forgetting through a specific flexible tuning strategy. We validate CoinSeg on Pascal VOC 2012 and ADE20K datasets with multiple incremental scenarios and achieve superior results compared to previous state-of-the-art methods, especially in more challenging and realistic long-term scenarios. Code is available at https://github.com/zkzhang98/CoinSeg. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: Accepted by ICCV 2023

arXiv:2310.05353 [pdf, ps, other]

Complexity of null dynamical systems and Sauer--Shelah lemmas

Authors: Guorong Gao, Jie Ma, Mingyuan Rong, Tuan Tran

Abstract: The topological entropy of a topological dynamical system, introduced in a foundational paper by Adler, Konheim and McAndrew [Trans. Am. Math. Soc., 1965], is a nonnegative number that measures the uncertainty or disorder of the system. Comparing with positive entropy systems, zero entropy systems are much less understood. In order to distinguish between zero entropy systems, Huang and Ye [Adv. Ma… ▽ More The topological entropy of a topological dynamical system, introduced in a foundational paper by Adler, Konheim and McAndrew [Trans. Am. Math. Soc., 1965], is a nonnegative number that measures the uncertainty or disorder of the system. Comparing with positive entropy systems, zero entropy systems are much less understood. In order to distinguish between zero entropy systems, Huang and Ye [Adv. Math., 2009] introduced the concept of maximal pattern entropy of a topological dynamical system. At the heart of their analysis is a Sauer-Shelah type lemma. In the present paper, we provide a shorter and more conceptual proof of a strengthening of this lemma, and discuss its surprising connection between dynamical system, combinatorics and a recent breakthrough in communication complexity. We also improve one of the main results of Huang and Ye on the maximal pattern entropy of zero-dimensional systems, by proving a new Sauer-Shelah type lemma, which unifies and enhances various extremal results on VC-dimension, Natarajan dimension and Steele dimension. △ Less

Submitted 11 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2310.04407 [pdf, other]

Policy-Gradient Training of Language Models for Ranking

Authors: Ge Gao, Jonathan D. Chang, Claire Cardie, Kianté Brantley, Thorsten Joachim

Abstract: Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires… ▽ More Text retrieval plays a crucial role in incorporating factual knowledge for decision making into language processing pipelines, ranging from chat-based web search to question answering systems. Current state-of-the-art text retrieval models leverage pre-trained large language models (LLMs) to achieve competitive performance, but training LLM-based retrievers via typical contrastive losses requires intricate heuristics, including selecting hard negatives and using additional supervision as learning signals. This reliance on heuristics stems from the fact that the contrastive loss itself is heuristic and does not directly optimize the downstream metrics of decision quality at the end of the processing pipeline. To address this issue, we introduce Neural PG-RANK, a novel training algorithm that learns to rank by instantiating a LLM as a Plackett-Luce ranking policy. Neural PG-RANK provides a principled method for end-to-end training of retrieval models as part of larger decision systems via policy gradient, with little reliance on complex heuristics, and it effectively unifies the training objective with downstream decision-making quality. We conduct extensive experiments on various text retrieval benchmarks. The results demonstrate that when the training objective aligns with the evaluation setup, Neural PG-RANK yields remarkable in-domain performance improvement, with substantial out-of-domain generalization to some critical datasets employed in downstream question answering tasks. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.15490 [pdf, other]

Survey on Deep Face Restoration: From Non-blind to Blind and Beyond

Authors: Wenjie Li, Mei Wang, Kai Zhang, Juncheng Li, Xiaoming Li, Yuhang Zhang, Guangwei Gao, Weihong Deng, Chia-Wen Lin

Abstract: Face restoration (FR) is a specialized field within image restoration that aims to recover low-quality (LQ) face images into high-quality (HQ) face images. Recent advances in deep learning technology have led to significant progress in FR methods. In this paper, we begin by examining the prevalent factors responsible for real-world LQ images and introduce degradation techniques used to synthesize… ▽ More Face restoration (FR) is a specialized field within image restoration that aims to recover low-quality (LQ) face images into high-quality (HQ) face images. Recent advances in deep learning technology have led to significant progress in FR methods. In this paper, we begin by examining the prevalent factors responsible for real-world LQ images and introduce degradation techniques used to synthesize LQ images. We also discuss notable benchmarks commonly utilized in the field. Next, we categorize FR methods based on different tasks and explain their evolution over time. Furthermore, we explore the various facial priors commonly utilized in the restoration process and discuss strategies to enhance their effectiveness. In the experimental section, we thoroughly evaluate the performance of state-of-the-art FR methods across various tasks using a unified benchmark. We analyze their performance from different perspectives. Finally, we discuss the challenges faced in the field of FR and propose potential directions for future advancements. The open-source repository corresponding to this work can be found at https:// github.com/ 24wenjie-li/ Awesome-Face-Restoration. △ Less

Submitted 8 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Face restoration, Survey, Deep learning, Non-blind/Blind, Joint restoration tasks, Facial priors

arXiv:2309.09357 [pdf, other]

Talk2Care: Facilitating Asynchronous Patient-Provider Communication with Large-Language-Model

Authors: Ziqi Yang, Xuhai Xu, Bingsheng Yao, Shao Zhang, Ethan Rogers, Stephen Intille, Nawar Shara, Guodong Gordon Gao, Dakuo Wang

Abstract: Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their po… ▽ More Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their powerful natural conversation and summarization capability. However, there is a limited understanding of LLMs' role during the communication. We first conducted two interview studies with both older adults (N=10) and healthcare providers (N=9) to understand their needs and opportunities for LLMs in patient-provider asynchronous communication. Based on the insights, we built an LLM-powered communication system, Talk2Care, and designed interactive components for both groups: (1) For older adults, we leveraged the convenience and accessibility of voice assistants (VAs) and built an LLM-powered VA interface for effective information collection. (2) For health providers, we built an LLM-based dashboard to summarize and present important health information based on older adults' conversations with the VA. We further conducted two user studies with older adults and providers to evaluate the usability of the system. The results showed that Talk2Care could facilitate the communication process, enrich the health information collected from older adults, and considerably save providers' efforts and time. We envision our work as an initial exploration of LLMs' capability in the intersection of healthcare and interpersonal communication. △ Less

Submitted 3 February, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: Under submission to IMWUT'23, 26 pages

MSC Class: 68U35 ACM Class: H.5.2; I.2.7

arXiv:2309.08649 [pdf]

An inspection technology of inner surface of the fine hole based on machine vision

Authors: Rongfang He, Weibin Zhang, Guofang Gao

Abstract: Fine holes are an important structural component of industrial components, and their inner surface quality is closely related to their function.In order to detect the quality of the inner surface of the fine hole,a special optical measurement system was investigated in this paper. A sight pipe is employed to guide the external illumination light into the fine hole and output the relevant images si… ▽ More Fine holes are an important structural component of industrial components, and their inner surface quality is closely related to their function.In order to detect the quality of the inner surface of the fine hole,a special optical measurement system was investigated in this paper. A sight pipe is employed to guide the external illumination light into the fine hole and output the relevant images simultaneously. A flexible light array is introduced to suit the narrow space, and the effective field of view is analyzed. Besides, the arc surface projection error and manufacturing assembly error of the device are analyzed, then compensated or ignored if small enough. In the test of prefabricated circular defects with the diameter φ0.1mm, φ0.2mm, 0.4mm distance distribution and the fissure defects with the width 0.3mm, the maximum measurement error standard deviation are all about 10μm. The minimum diameter of the measured fine hole is 4mm and the depth can reach 47mm. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2308.16360 [pdf, other]

Emoji Promotes Developer Participation and Issue Resolution on GitHub

Authors: Yuhang Zhou, Xuan Lu, Ge Gao, Qiaozhu Mei, Wei Ai

Abstract: Although remote working is increasingly adopted during the pandemic, many are concerned by the low-efficiency in the remote working. Missing in text-based communication are non-verbal cues such as facial expressions and body language, which hinders the effective communication and negatively impacts the work outcomes. Prevalent on social media platforms, emojis, as alternative non-verbal cues, are… ▽ More Although remote working is increasingly adopted during the pandemic, many are concerned by the low-efficiency in the remote working. Missing in text-based communication are non-verbal cues such as facial expressions and body language, which hinders the effective communication and negatively impacts the work outcomes. Prevalent on social media platforms, emojis, as alternative non-verbal cues, are gaining popularity in the virtual workspaces well. In this paper, we study how emoji usage influences developer participation and issue resolution in virtual workspaces. To this end, we collect GitHub issues for a one-year period and apply causal inference techniques to measure the causal effect of emojis on the outcome of issues, controlling for confounders such as issue content, repository, and author information. We find that emojis can significantly reduce the resolution time of issues and attract more user participation. We also compare the heterogeneous effect on different types of issues. These findings deepen our understanding of the developer communities, and they provide design implications on how to facilitate interactions and broaden developer participation. △ Less

Submitted 16 April, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted by the 18th International AAAI Conference on Web and Social Media (ICWSM 2024)

arXiv:2307.07848 [pdf, ps, other]

Fully Scalable MPC Algorithms for Clustering in High Dimension

Authors: Artur Czumaj, Guichen Gao, Shaofeng H. -C. Jiang, Robert Krauthgamer, Pavel Veselý

Abstract: We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be $n^σ$ for arbitrarily small fixed $σ>0$. Importantly, the local memory may be substantially smaller than the number of clusters $k$, yet all our algorithms are fast… ▽ More We design new parallel algorithms for clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine may be $n^σ$ for arbitrarily small fixed $σ>0$. Importantly, the local memory may be substantially smaller than the number of clusters $k$, yet all our algorithms are fast, i.e., run in $O(1)$ rounds. We first devise a fast MPC algorithm for $O(1)$-approximation of uniform facility location. This is the first fully-scalable MPC algorithm that achieves $O(1)$-approximation for any clustering problem in general geometric setting; previous algorithms only provide $\mathrm{poly}(\log n)$-approximation or apply to restricted inputs, like low dimension or small number of clusters $k$; e.g. [Bhaskara and Wijewardena, ICML'18; Cohen-Addad et al., NeurIPS'21; Cohen-Addad et al., ICML'22]. We then build on this facility location result and devise a fast MPC algorithm that achieves $O(1)$-bicriteria approximation for $k$-Median and for $k$-Means, namely, it computes $(1+\varepsilon)k$ clusters of cost within $O(1/\varepsilon^2)$-factor of the optimum for $k$ clusters. A primary technical tool that we introduce, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing for every data point a statistic of its approximate neighborhood, for statistics like range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22]. △ Less

Submitted 14 November, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

arXiv:2307.04692 [pdf, other]

Spoofing-Resilient LiDAR-GPS Factor Graph Localization with Chimera Authentication

Authors: Adam Dai, Tara Minda, Ashwin Kanhere, Grace Gao

Abstract: Many vehicle platforms typically use sensors such as LiDAR or camera for locally-referenced navigation with GPS for globally-referenced navigation. However, due to the unencrypted nature of GPS signals, all civilian users are vulner-able to spoofing attacks, where a malicious spoofer broadcasts fabricated signals and causes the user to track a false position fix. To protect against such GPS spoofi… ▽ More Many vehicle platforms typically use sensors such as LiDAR or camera for locally-referenced navigation with GPS for globally-referenced navigation. However, due to the unencrypted nature of GPS signals, all civilian users are vulner-able to spoofing attacks, where a malicious spoofer broadcasts fabricated signals and causes the user to track a false position fix. To protect against such GPS spoofing attacks, Chips-Message Robust Authentication (Chimera) has been developed and will be tested on the Navigation Technology Satellite 3 (NTS-3) satellite being launched later this year. However, Chimera authentication is not continuously available and may not provide sufficient protection for vehicles which rely on more frequent GPS measurements. In this paper, we propose a factor graph-based state estimation framework which integrates LiDAR and GPS while simultaneously detecting and mitigating spoofing attacks experienced between consecutive Chimera authentications. Our proposed framework combines GPS pseudorange measurements with LiDAR odometry to provide a robust navigation solution. A chi-squared detector, based on pseudorange residuals, is used to detect and mitigate any potential GPS spoofing attacks. We evaluate our method using real-world LiDAR data from the KITTI dataset and simulated GPS measurements, both nominal and with spoofing. Across multiple trajectories and Monte Carlo runs, our method consistently achieves position errors under 5 m during nominal conditions, and successfully bounds positioning error to within odometry drift levels during spoofed conditions. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2306.14580 [pdf, other]

TransERR: Translation-based Knowledge Graph Embedding via Efficient Relation Rotation

Authors: Jiang Li, Xiangdong Su, Fujun Zhang, Guanglai Gao

Abstract: This paper presents a translation-based knowledge geraph embedding method via efficient relation rotation (TransERR), a straightforward yet effective alternative to traditional translation-based knowledge graph embedding models. Different from the previous translation-based models, TransERR encodes knowledge graphs in the hypercomplex-valued space, thus enabling it to possess a higher degree of tr… ▽ More This paper presents a translation-based knowledge geraph embedding method via efficient relation rotation (TransERR), a straightforward yet effective alternative to traditional translation-based knowledge graph embedding models. Different from the previous translation-based models, TransERR encodes knowledge graphs in the hypercomplex-valued space, thus enabling it to possess a higher degree of translation freedom in mining latent information between the head and tail entities. To further minimize the translation distance, TransERR adaptively rotates the head entity and the tail entity with their corresponding unit quaternions, which are learnable in model training. We also provide mathematical proofs to demonstrate the ability of TransERR in modeling various relation patterns, including symmetry, antisymmetry, inversion, composition, and subrelation patterns. The experiments on 10 benchmark datasets validate the effectiveness and the generalization of TransERR. The results also indicate that TransERR can better encode large-scale datasets with fewer parameters than the previous translation-based models. Our code and datasets are available at~\url{https://github.com/dellixx/TransERR}. △ Less

Submitted 9 March, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.09818 [pdf, other]

doi 10.5555/3666122.3669299

HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation

Authors: Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

Abstract: Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed… ▽ More Learning-based video compression is currently a popular research topic, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. This is mainly due to the simplicity of the employed network architectures, which limit their representation capability. In this paper, we propose HiNeRV, an INR that combines light weight layers with novel hierarchical positional encodings. We employs depth-wise convolutional, MLP and interpolation layers to build the deep and wide network architecture with high capacity. HiNeRV is also a unified representation encoding videos in both frames and patches at the same time, which offers higher performance and flexibility than existing methods. We further build a video codec based on HiNeRV and a refined pipeline for training, pruning and quantization that can better preserve HiNeRV's performance during lossy model compression. The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over DCVC on the UVG dataset, measured in PSNR). △ Less

Submitted 26 January, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.17193 [pdf]

AI-based analysis of super-resolution microscopy: Biological discovery in the absence of ground truth

Authors: Ivan R. Nabi, Ben Cardoen, Ismail M. Khater, Guang Gao, Timothy H. Wong, Ghassan Hamarneh

Abstract: Super-resolution microscopy, or nanoscopy, enables the use of fluorescent-based molecular localization tools to study molecular structure at the nanoscale level in the intact cell, bridging the mesoscale gap to classical structural biology methodologies. Analysis of super-resolution data by artificial intelligence (AI), such as machine learning, offers tremendous potential for discovery of new bio… ▽ More Super-resolution microscopy, or nanoscopy, enables the use of fluorescent-based molecular localization tools to study molecular structure at the nanoscale level in the intact cell, bridging the mesoscale gap to classical structural biology methodologies. Analysis of super-resolution data by artificial intelligence (AI), such as machine learning, offers tremendous potential for discovery of new biology, that, by definition, is not known and lacks ground truth. Herein, we describe the application of weakly supervised paradigms to super-resolution microscopy and its potential to enable the accelerated exploration of the nanoscale architecture of subcellular macromolecules and organelles. △ Less

Submitted 27 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 26 pages, 4 figures

arXiv:2305.16353 [pdf, other]

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

Authors: Rui Liu, Jinhua Zhang, Guanglai Gao, Haizhou Li

Abstract: Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc., which is an emerging topic. Traditionally we take the mono signal as input and focus on robust feature extraction and effective classifier design. However, the dual-channel stereo information in the audio signal also includes important cues for deepfake, which has… ▽ More Audio Deepfake Detection (ADD) aims to detect the fake audio generated by text-to-speech (TTS), voice conversion (VC) and replay, etc., which is an emerging topic. Traditionally we take the mono signal as input and focus on robust feature extraction and effective classifier design. However, the dual-channel stereo information in the audio signal also includes important cues for deepfake, which has not been studied in the prior work. In this paper, we propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process. We first projects the mono to a stereo signal using a pretrained stereo synthesizer, then employs a dual-branch neural architecture to process the left and right channel signals, respectively. In this way, we effectively reveal the artifacts in the fake audio, thus improve the ADD performance. The experiments on the ASVspoof2019 database show that M2S-ADD outperforms all baselines that input mono. We release the source code at \url{https://github.com/AI-S2-Lab/M2S-ADD}. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: To appear at InterSpeech2023

arXiv:2305.12473 [pdf, other]

Continually Improving Extractive QA via Human Feedback

Authors: Ge Gao, Hung-Ting Chen, Yoav Artzi, Eunsol Choi

Abstract: We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Ou… ▽ More We study continually improving an extractive question answering (QA) system via human user feedback. We design and deploy an iterative approach, where information-seeking users ask questions, receive model-predicted answers, and provide feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time. Our experiments show effective improvement from user feedback of extractive QA models over time across different data regimes, including significant potential for domain adaptation. △ Less

Submitted 3 November, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

Comments: EMNLP 2023

arXiv:2305.10201 [pdf]

Echoes of Biases: How Stigmatizing Language Affects AI Performance

Authors: Yizhi Liu, Weiguang Wang, Guodong Gordon Gao, Ritu Agarwal

Abstract: Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction us… ▽ More Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare. △ Less

Submitted 12 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: 54 pages, 9 figures

Showing 1–50 of 167 results for author: Gao, G