Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 244 results for author: Gong, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.06407  [pdf, other

    cs.LG stat.ME stat.ML

    A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery

    Authors: Yingyu Lin, Yuxing Huang, Wenqin Liu, Haoran Deng, Ignavier Ng, Kun Zhang, Mingming Gong, Yi-An Ma, Biwei Huang

    Abstract: Real-world data often violates the equal-variance assumption (homoscedasticity), making it essential to account for heteroscedastic noise in causal discovery. In this work, we explore heteroscedastic symmetric noise models (HSNMs), where the effect $Y$ is modeled as $Y = f(X) + σ(X)N$, with $X$ as the cause and $N$ as independent noise following a symmetric distribution. We introduce a novel crite… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  2. arXiv:2410.05041  [pdf

    cs.CV cs.LG

    Systematic Literature Review of Vision-Based Approaches to Outdoor Livestock Monitoring with Lessons from Wildlife Studies

    Authors: Stacey D. Scott, Zayn J. Abbas, Feerass Ellid, Eli-Henry Dykhne, Muhammad Muhaiminul Islam, Weam Ayad, Kristina Kacmorova, Dan Tulpan, Minglun Gong

    Abstract: Precision livestock farming (PLF) aims to improve the health and welfare of livestock animals and farming outcomes through the use of advanced technologies. Computer vision, combined with recent advances in machine learning and deep learning artificial intelligence approaches, offers a possible solution to the PLF ideal of 24/7 livestock monitoring that helps facilitate early detection of animal h… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

    Comments: 28 pages, 5 figures, 2 tables

    Report number: CSL-2024-01 ACM Class: I.2.10; I.2.6; J.7

  3. arXiv:2409.17547  [pdf, other

    cs.CV cs.AI

    Triple Point Masking

    Authors: Jiaming Liu, Linghe Kong, Yue Wu, Maoguo Gong, Hao Li, Qiguang Miao, Wenping Ma, Can Qin

    Abstract: Existing 3D mask learning methods encounter performance bottlenecks under limited data, and our objective is to overcome this limitation. In this paper, we introduce a triple point masking scheme, named TPM, which serves as a scalable framework for pre-training of masked autoencoders to achieve multi-mask learning for 3D point clouds. Specifically, we augment the baselines with two additional mask… ▽ More

    Submitted 15 October, 2024; v1 submitted 26 September, 2024; originally announced September 2024.

  4. arXiv:2409.12207  [pdf, other

    cs.GR

    Architectural Co-LOD Generation

    Authors: Runze Zhang, Shanshan Pan, Chenlei Lv, Minglun Gong, Hui Huang

    Abstract: Managing the level-of-detail (LOD) in architectural models is crucial yet challenging, particularly for effective representation and visualization of buildings. Traditional approaches often fail to deliver controllable detail alongside semantic consistency, especially when dealing with noisy and inconsistent inputs. We address these limitations with \emph{Co-LOD}, a new approach specifically desig… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: ACM Transactions on Graphics (SIGGRAPH Aisa 2024); Project page: https://vcc.tech/research/2024/CoLOD

  5. arXiv:2408.13498  [pdf, other

    cs.LG

    Rethinking State Disentanglement in Causal Reinforcement Learning

    Authors: Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of al… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  6. arXiv:2408.11564  [pdf, other

    cs.CV

    AutoDirector: Online Auto-scheduling Agents for Multi-sensory Composition

    Authors: Minheng Ni, Chenfei Wu, Huaying Yuan, Zhengyuan Yang, Ming Gong, Lijuan Wang, Zicheng Liu, Wangmeng Zuo, Nan Duan

    Abstract: With the advancement of generative models, the synthesis of different sensory elements such as music, visuals, and speech has achieved significant realism. However, the approach to generate multi-sensory outputs has not been fully explored, limiting the application on high-value scenarios such as of directing a film. Developing a movie director agent faces two major challenges: (1) Lack of paralle… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  7. arXiv:2408.11225  [pdf, ps, other

    cs.DS cs.DM

    Approximately covering vertices by order-$5$ or longer paths

    Authors: Mingyang Gong, Zhi-Zhong Chen, Guohui Lin, Lusheng Wang

    Abstract: This paper studies $MPC^{5+}_v$, which is to cover as many vertices as possible in a given graph $G=(V,E)$ by vertex-disjoint $5^+$-paths (i.e., paths each with at least five vertices). $MPC^{5+}_v$ is NP-hard and admits an existing local-search-based approximation algorithm which achieves a ratio of $\frac {19}7\approx 2.714$ and runs in $O(|V|^6)$ time. In this paper, we present a new approximat… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Full version of an extended abstract in COCOON 2024

  8. arXiv:2408.08342  [pdf, other

    cs.GR cs.CV

    CT4D: Consistent Text-to-4D Generation with Animatable Meshes

    Authors: Ce Chen, Shaoli Huang, Xuelin Chen, Guangyi Chen, Xiaoguang Han, Kun Zhang, Mingming Gong

    Abstract: Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  9. arXiv:2407.10132  [pdf, other

    cs.LG stat.ME

    Optimal Kernel Choice for Score Function-based Causal Discovery

    Authors: Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong

    Abstract: Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appr… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML2024

  10. arXiv:2407.09733  [pdf, other

    cs.CV

    Textured-GS: Gaussian Splatting with Spatially Defined Color and Opacity

    Authors: Zhentao Huang, Minglun Gong

    Abstract: In this paper, we introduce Textured-GS, an innovative method for rendering Gaussian splatting that incorporates spatially defined color and opacity variations using Spherical Harmonics (SH). This approach enables each Gaussian to exhibit a richer representation by accommodating varying colors and opacities across its surface, significantly enhancing rendering quality compared to traditional metho… ▽ More

    Submitted 10 September, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: 9 pages

    ACM Class: I.4.0

  11. arXiv:2406.14434  [pdf, other

    cs.CL

    Towards Truthful Multilingual Large Language Models: Benchmarking and Alignment Strategies

    Authors: Weihao Liu, Ning Wu, Wenbiao Ding, Shining Liang, Ming Gong, Dongmei Zhang

    Abstract: In the era of large language models (LLMs), building multilingual large language models (MLLMs) that can serve users worldwide holds great significance. However, existing research seldom focuses on the truthfulness of MLLMs. Meanwhile, contemporary multilingual aligning technologies struggle to balance massive languages and often exhibit serious truthfulness gaps across different languages, especi… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 15 pages

  12. arXiv:2406.14275  [pdf, other

    cs.CL cs.AI

    Step-Back Profiling: Distilling User History for Personalized Scientific Writing

    Authors: Xiangru Tang, Xingyao Zhang, Yanjun Shao, Jie Wu, Yilun Zhao, Arman Cohan, Ming Gong, Dongmei Zhang, Mark Gerstein

    Abstract: Large language models (LLM) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals, particularly in real-world scenarios like scientific writing. Addressing this challenge, we introduce STEP-BACK PROFILING to personalize LLMs by distilling user history into concise profiles, including essential traits and preferences of users. To… ▽ More

    Submitted 11 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  13. arXiv:2406.13327  [pdf, other

    cs.CV

    Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

    Authors: Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

    Abstract: While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, w… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  14. arXiv:2406.13272  [pdf, other

    cs.CV

    AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models

    Authors: Ken Chen, Sachith Seneviratne, Wei Wang, Dongting Hu, Sanjay Saha, Md. Tarek Hasan, Sanka Rasnayaka, Tamasha Malepathirana, Mingming Gong, Saman Halgamuge

    Abstract: Face reenactment refers to the process of transferring the pose and facial expressions from a reference (driving) video onto a static facial (source) image while maintaining the original identity of the source image. Previous research in this domain has made significant progress by training controllable deep generative models to generate faces based on specific identity, pose and expression condit… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  15. arXiv:2406.09383  [pdf, other

    cs.CV

    Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

    Authors: Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng

    Abstract: Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capab… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024

  16. arXiv:2406.05855  [pdf, other

    cs.LG cs.AI stat.ML

    Self-Distilled Disentangled Learning for Counterfactual Prediction

    Authors: Xinshu Li, Mingming Gong, Lina Yao

    Abstract: The advancements in disentangled representation learning significantly enhance the accuracy of counterfactual predictions by granting precise control over instrumental variables, confounders, and adjustable variables. An appealing method for achieving the independent separation of these factors is mutual information minimization, a task that presents challenges in numerous machine learning scenari… ▽ More

    Submitted 14 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  17. arXiv:2406.05485  [pdf, other

    cs.CV

    Training-Free Robust Interactive Video Object Segmentation

    Authors: Xiaoli Wei, Zhaoqing Wang, Yandong Guo, Chunxia Zhang, Tongliang Liu, Mingming Gong

    Abstract: Interactive video object segmentation is a crucial video task, having various applications from video editing to data annotating. However, current approaches struggle to accurately segment objects across diverse domains. Recently, Segment Anything Model (SAM) introduces interactive visual prompts and demonstrates impressive performance across different domains. In this paper, we propose a training… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  18. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  19. arXiv:2406.02191  [pdf, other

    stat.ML cs.LG

    On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data

    Authors: Shunxing Fan, Mingming Gong, Kun Zhang

    Abstract: We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such insta… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  20. arXiv:2405.15325  [pdf, other

    cs.LG stat.ML

    On the Identification of Temporally Causal Representation with Instantaneous Dependence

    Authors: Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Zhengmao Zhu, Guangyi Chen, Kun Zhang

    Abstract: Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observa… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  21. arXiv:2405.03711  [pdf, other

    cs.LG cs.AI cs.NE eess.SY

    Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

    Authors: Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

    Abstract: Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates g… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, accepted to appear on IEEE Access, Mar. 2024

    Journal ref: IEEE Access, vol. 12, pp. 48210-48222, Mar. 2024

  22. arXiv:2404.00362  [pdf, other

    cs.CV eess.IV

    STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario

    Authors: Renyang Liu, Kwok-Yan Lam, Wei Zhou, Sixing Wu, Jun Zhao, Dongting Hu, Mingming Gong

    Abstract: Many attack techniques have been proposed to explore the vulnerability of DNNs and further help to improve their robustness. Despite the significant progress made recently, existing black-box attack methods still suffer from unsatisfactory performance due to the vast number of queries needed to optimize desired perturbations. Besides, the other critical challenge is that adversarial examples built… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  23. arXiv:2403.18038  [pdf

    cs.CV

    TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from images

    Authors: Liping Yang, Joshua Driscol, Ming Gong, Shujie Wang, Catherine G. Potts

    Abstract: Line detection is a classic and essential problem in image processing, computer vision and machine intelligence. Line detection has many important applications, including image vectorization (e.g., document recognition and art design), indoor mapping, and important societal challenges (e.g., sea ice fracture line extraction from satellite imagery). Many line detection algorithms and methods have b… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Our TGGLinesPlus Python implementation is open source. 27 pages, 8 figures and 4 tables

  24. arXiv:2403.16502  [pdf, other

    cs.CV

    Medical Image Registration and Its Application in Retinal Images: A Review

    Authors: Qiushi Nie, Xiaoqing Zhang, Yan Hu, Mingdao Gong, Jiang Liu

    Abstract: Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, these surveys have not systematically summarized methodologies of existing medical image registration methods. To thi… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  25. arXiv:2403.15711  [pdf, other

    cs.LG stat.ME stat.ML

    Identifiable Latent Neural Causal Models

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Mingming Gong, Biwei Huang, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Causal representation learning seeks to uncover latent, high-level causal representations from low-level observed data. It is particularly good at predictions under unseen distribution shifts, because these shifts can generally be interpreted as consequences of interventions. Hence leveraging {seen} distribution shifts becomes a natural strategy to help identifying causal representations, which in… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  26. arXiv:2403.01698  [pdf, other

    cs.CL cs.AI

    Hypertext Entity Extraction in Webpage

    Authors: Yifei Yang, Tianqiao Liu, Bo Shao, Hai Zhao, Linjun Shou, Ming Gong, Daxin Jiang

    Abstract: Webpage entity extraction is a fundamental natural language processing task in both research and applications. Nowadays, the majority of webpage entity extraction models are trained on structured datasets which strive to retain textual content and its structure information. However, existing datasets all overlook the rich hypertext features (e.g., font color, font size) which show their effectiven… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  27. arXiv:2403.00782  [pdf, other

    q-fin.ST cs.AI cs.CL

    Ploutos: Towards interpretable stock movement prediction with financial large language model

    Authors: Hanshuang Tong, Jun Li, Ning Wu, Ming Gong, Dongmei Zhang, Qi Zhang

    Abstract: Recent advancements in large language models (LLMs) have opened new pathways for many domains. However, the full potential of LLMs in financial investments remains largely untapped. There are two main challenges for typical deep learning-based methods for quantitative finance. First, they struggle to fuse textual and numerical information flexibly for stock movement prediction. Second, traditional… ▽ More

    Submitted 18 February, 2024; originally announced March 2024.

    Comments: 8 pages, 4 figures

  28. arXiv:2402.19014  [pdf, other

    cs.CV

    Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models

    Authors: Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun

    Abstract: Recently, the advent of Large Visual-Language Models (LVLMs) has received increasing attention across various domains, particularly in the field of visual document understanding (VDU). Different from conventional vision-language tasks, VDU is specifically concerned with text-rich scenarios containing abundant document elements. Nevertheless, the importance of fine-grained features remains largely… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  29. arXiv:2402.18695  [pdf, other

    cs.CV cs.CL

    Grounding Language Models for Visual Entity Recognition

    Authors: Zilin Xiao, Ming Gong, Paola Cascante-Bonilla, Xingyao Zhang, Jie Wu, Vicente Ordonez

    Abstract: We introduce AutoVER, an Autoregressive model for Visual Entity Recognition. Our model extends an autoregressive Multi-modal Large Language Model by employing retrieval augmented constrained generation. It mitigates low performance on out-of-domain entities while excelling in queries that require visually-situated reasoning. Our method learns to distinguish similar entities within a vast label spa… ▽ More

    Submitted 26 July, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: ECCV 2024

  30. Enhancing Hyperspectral Images via Diffusion Model and Group-Autoencoder Super-resolution Network

    Authors: Zhaoyang Wang, Dongyang Li, Mingyang Zhang, Hao Luo, Maoguo Gong

    Abstract: Existing hyperspectral image (HSI) super-resolution (SR) methods struggle to effectively capture the complex spectral-spatial relationships and low-level details, while diffusion models represent a promising generative model known for their exceptional performance in modeling complex relations and learning high and low-level visual features. The direct application of diffusion models to HSI SR is… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI2024

    Report number: Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5794-5804

  31. arXiv:2402.14401  [pdf, other

    cs.CV cs.LG eess.IV

    Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

    Authors: Zhaoyang Wang, Bo Hu, Mingyang Zhang, Jie Li, Leida Li, Maoguo Gong, Xinbo Gao

    Abstract: Existing free-energy guided No-Reference Image Quality Assessment (NR-IQA) methods still suffer from finding a balance between learning feature information at the pixel level of the image and capturing high-level feature information and the efficient utilization of the obtained high-level feature information remains a challenge. As a novel class of state-of-the-art (SOTA) generative model, the dif… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  32. arXiv:2402.13510  [pdf, other

    cs.CV

    SealD-NeRF: Interactive Pixel-Level Editing for Dynamic Scenes by Neural Radiance Fields

    Authors: Zhentao Huang, Yukun Shi, Neil Bruce, Minglun Gong

    Abstract: The widespread adoption of implicit neural representations, especially Neural Radiance Fields (NeRF), highlights a growing need for editing capabilities in implicit 3D models, essential for tasks like scene post-processing and 3D content creation. Despite previous efforts in NeRF editing, challenges remain due to limitations in editing flexibility and quality. The key issue is developing a neural… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    MSC Class: 68T45

  33. arXiv:2402.08960  [pdf, other

    cs.CV cs.AI

    Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision

    Authors: Zhaoqing Wang, Xiaobo Xia, Ziye Chen, Xiao He, Yandong Guo, Mingming Gong, Tongliang Liu

    Abstract: Current state-of-the-art open-vocabulary segmentation methods typically rely on image-mask-text triplet annotations for supervision. However, acquiring such detailed annotations is labour-intensive and poses scalability challenges in complex real-world scenarios. While existing weakly-supervised approaches leverage image-text pairs to reduce the expansive annotation cost, the lack of mask supervis… ▽ More

    Submitted 11 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: 27 pages, 18 figures, 10 tables

  34. arXiv:2402.06223  [pdf, other

    cs.LG cs.CV stat.ML

    Revealing Multimodal Contrastive Representation Learning through Latent Partial Causal Models

    Authors: Yuhang Liu, Zhen Zhang, Dong Gong, Biwei Huang, Mingming Gong, Anton van den Hengel, Kun Zhang, Javen Qinfeng Shi

    Abstract: Multimodal contrastive representation learning methods have proven successful across a range of domains, partly due to their ability to generate meaningful shared representations of complex phenomena. To enhance the depth of analysis and understanding of these acquired representations, we introduce a unified causal model specifically designed for multimodal data. By examining this model, we show t… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  35. arXiv:2402.05394  [pdf, other

    cs.CV

    Enhancing Zero-shot Counting via Language-guided Exemplar Learning

    Authors: Mingjie Wang, Jun Zhou, Yong Dai, Eric Buys, Minglun Gong

    Abstract: Recently, Class-Agnostic Counting (CAC) problem has garnered increasing attention owing to its intriguing generality and superior efficiency compared to Category-Specific Counting (CSC). This paper proposes a novel ExpressCount to enhance zero-shot object counting by delving deeply into language-guided exemplar learning. Specifically, the ExpressCount is comprised of an innovative Language-oriente… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  36. arXiv:2402.03941  [pdf, other

    cs.LG cs.AI stat.ME

    Discovery of the Hidden World with Large Language Models

    Authors: Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, Kun Zhang

    Abstract: Science originates with discovering new causal knowledge from a combination of known facts and observations. Traditional causal discovery approaches mainly rely on high-quality measured variables, usually given by human experts, to find causal relations. However, the causal variables are usually unavailable in a wide range of real-world applications. The rise of large language models (LLMs) that a… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Preliminary version of an ongoing project; Chenxi and Yongqiang contributed equally; 26 pages, 41 figures; Project page: https://causalcoat.github.io/

  37. arXiv:2401.10632  [pdf, other

    cs.LG

    Interventional Fairness on Partially Known Causal Graphs: A Constrained Optimization Approach

    Authors: Aoqi Zuo, Yiqing Li, Susan Wei, Mingming Gong

    Abstract: Fair machine learning aims to prevent discrimination against individuals or sub-populations based on sensitive attributes such as gender and race. In recent years, causal inference methods have been increasingly used in fair machine learning to measure unfairness by causal effects. However, current methods assume that the true causal graph is given, which is often not true in real-world applicatio… ▽ More

    Submitted 8 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR24

  38. arXiv:2401.03476  [pdf, other

    cs.MM cs.AI cs.HC cs.SD eess.AS

    Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness

    Authors: Sicheng Yang, Zunnan Xu, Haiwei Xue, Yongkang Cheng, Shaoli Huang, Mingming Gong, Zhiyong Wu

    Abstract: Current talking avatars mostly generate co-speech gestures based on audio and text of the utterance, without considering the non-speaking motion of the speaker. Furthermore, previous works on co-speech gesture generation have designed network structures based on individual gesture datasets, which results in limited data volume, compromised generalizability, and restricted speaker movements. To tac… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 6 pages, 3 figures, ICASSP 2024

  39. arXiv:2401.02566  [pdf

    cs.SD cs.LG cs.MM eess.AS

    Siamese Residual Neural Network for Musical Shape Evaluation in Piano Performance Assessment

    Authors: Xiaoquan Li, Stephan Weiss, Yijun Yan, Yinhe Li, Jinchang Ren, John Soraghan, Ming Gong

    Abstract: Understanding and identifying musical shape plays an important role in music education and performance assessment. To simplify the otherwise time- and cost-intensive musical shape evaluation, in this paper we explore how artificial intelligence (AI) driven models can be applied. Considering musical shape evaluation as a classification problem, a light-weight Siamese residual neural network (S-ResN… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: X.Li, S.Weiss, Y.Yan, Y.Li, J.Ren, J.Soraghan, M.Gong,"Siamese residual neural network for musical shape evaluation in piano performance assessment" in Proc. of the 31st European Signal Processing Conference, Helsinki, Finland

  40. arXiv:2401.01510  [pdf, other

    cs.CV

    Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering

    Authors: Haopeng Li, Qiuhong Ke, Mingming Gong, Tom Drummond

    Abstract: While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been largely overlooked in existing research. This paper seeks to bridge that gap by incorporating VideoQA into a curriculum learning (CL) framework that progressively trains models from simpler to more complex dat… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  41. arXiv:2312.12227  [pdf, other

    cs.CV cs.AI

    HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

    Authors: Gaoge Han, Shaoli Huang, Mingming Gong, Jinglei Tang

    Abstract: We introduce HuTuMotion, an innovative approach for generating natural human motions that navigates latent motion diffusion models by leveraging few-shot human feedback. Unlike existing approaches that sample latent variables from a standard normal prior distribution, our method adapts the prior distribution to better suit the characteristics of the data, as indicated by human feedback, thus enhan… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024 Main Track

  42. arXiv:2312.11112  [pdf, other

    cs.CV

    ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

    Authors: Lunhao Duan, Shanshan Zhao, Nan Xue, Mingming Gong, Gui-Song Xia, Dacheng Tao

    Abstract: Transformers have been recently explored for 3D point cloud understanding with impressive progress achieved. A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data. Thus, most methods propose to apply the transformer in a local region, e.g., spherical or cubic window. However, it still contains a large number of Query-Key pairs, which requires hi… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023. Code: https://github.com/LHDuan/ConDaFormer

  43. arXiv:2312.09498  [pdf, other

    cs.LG cs.AI

    Neural Gaussian Similarity Modeling for Differential Graph Structure Learning

    Authors: Xiaolong Fan, Maoguo Gong, Yue Wu, Zedong Tang, Jieyi Liu

    Abstract: Graph Structure Learning (GSL) has demonstrated considerable potential in the analysis of graph-unknown non-Euclidean data across a wide range of domains. However, constructing an end-to-end graph structure learning model poses a challenge due to the impediment of gradient flow caused by the nearest neighbor sampling strategy. In this paper, we construct a differential graph structure learning mod… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  44. arXiv:2312.06117  [pdf, other

    cs.CV

    M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking

    Authors: Jiaming Liu, Yue Wu, Maoguo Gong, Qiguang Miao, Wenping Ma, Can Qin

    Abstract: 3D Single Object Tracking (SOT) stands a forefront task of computer vision, proving essential for applications like autonomous driving. Sparse and occluded data in scene point clouds introduce variations in the appearance of tracked objects, adding complexity to the task. In this research, we unveil M3SOT, a novel 3D SOT framework, which synergizes multiple input frames (template sets), multiple r… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 12 pages, 10 figures, 10 tables, AAAI 2024

    Journal ref: AAAI 2024

  45. arXiv:2312.06063  [pdf, other

    cs.CV cs.AI

    PCRDiffusion: Diffusion Probabilistic Models for Point Cloud Registration

    Authors: Yue Wu, Yongzhe Yuan, Xiaolong Fan, Xiaoshui Huang, Maoguo Gong, Qiguang Miao

    Abstract: We propose a new framework that formulates point cloud registration as a denoising diffusion process from noisy transformation to object transformation. During training stage, object transformation diffuses from ground-truth transformation to random distribution, and the model learns to reverse this noising process. In sampling stage, the model refines randomly generated transformation to the outp… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

  46. arXiv:2312.04333  [pdf, other

    cs.CL

    Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers

    Authors: Nuo Chen, Ning Wu, Shining Liang, Ming Gong, Linjun Shou, Dongmei Zhang, Jia Li

    Abstract: This paper presents an in-depth analysis of Large Language Models (LLMs), focusing on LLaMA, a prominent open-source foundational model in natural language processing. Instead of assessing LLaMA through its generative output, we design multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation. We examine the model horizontally, comparing diffe… ▽ More

    Submitted 9 January, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 15 pages

  47. arXiv:2311.09233  [pdf, other

    cs.LG cs.GR cs.RO

    Neural Packing: from Visual Sensing to Reinforcement Learning

    Authors: Juzhan Xu, Minglun Gong, Hao Zhang, Hui Huang, Ruizhen Hu

    Abstract: We present a novel learning framework to solve the transport-and-packing (TAP) problem in 3D. It constitutes a full solution pipeline from partial observations of input objects via RGBD sensing and recognition to final box placement, via robotic motion planning, to arrive at a compact packing in a target container. The technical core of our method is a neural network for TAP, trained via reinforce… ▽ More

    Submitted 16 October, 2023; originally announced November 2023.

  48. arXiv:2311.03253  [pdf, other

    cs.CL cs.AI

    Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency

    Authors: Zilin Xiao, Linjun Shou, Xingyao Zhang, Jie Wu, Ming Gong, Jian Pei, Daxin Jiang

    Abstract: Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often struggle to capture explicit discourse-level dependencies, resulting in incoherent predictions at the abstract level (e.g. topic or category). We propose CoherentED,… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 Findings

  49. arXiv:2311.03250  [pdf, other

    cs.CL cs.AI

    Instructed Language Models with Retrievers Are Powerful Entity Linkers

    Authors: Zilin Xiao, Ming Gong, Jie Wu, Xingyao Zhang, Linjun Shou, Jian Pei, Daxin Jiang

    Abstract: Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, thus unsuitable for entity-centric tasks like entity linking (EL) requiring precise entity predictions over a large knowledge base. We present Instructed Generati… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 Main

  50. arXiv:2310.20246  [pdf, other

    cs.CL cs.AI

    Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

    Authors: Nuo Chen, Zinan Zheng, Ning Wu, Ming Gong, Dongmei Zhang, Jia Li

    Abstract: Existing research predominantly focuses on developing powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multil… ▽ More

    Submitted 16 October, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Work in Progress