Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 367 results for author: Jin, Q

.
  1. arXiv:2408.11840  [pdf

    cs.CV cs.AI

    Joint PET-MRI Reconstruction with Diffusion Stochastic Differential Model

    Authors: Taofeng Xie, Zhuoxu Cui, Congcong Liu, Chen Luo, Huayu Wang, Yuanzhi Zhang, Xuemei Wang, Yihang Zhou, Qiyu Jin, Guoqing Chen, Dong Liang, Haifeng Wang

    Abstract: PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming by PET-MRI systems. We aim to accelerate MRI and improve PET image quality. This paper proposed a novel joint reconstruction model by diffusion stochastic differential equations based on learning the joint probability distribution of PET and MRI. Compare the results underscore the… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted as ISMRM 2024 Digital poster 6575. 04-09 May 2024 Singapore

    Journal ref: ISMRM 2024 Digital poster 6575

  2. arXiv:2408.08157  [pdf, ps, other

    math.LO

    A novel axiomatic approach to L-valued rough sets within an L-universe via inner product and outer product of L-subsets

    Authors: Lingqiang Li, Qiu Jin

    Abstract: The fuzzy rough approximation operator serves as the cornerstone of fuzzy rough set theory and its practical applications. Axiomatization is a crucial approach in the exploration of fuzzy rough sets, aiming to offer a clear and direct characterization of fuzzy rough approximation operators. Among the fundamental tools employed in this process, the inner product and outer product of fuzzy sets stan… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    MSC Class: 03B52; 06B23

  3. How to Best Combine Demosaicing and Denoising?

    Authors: Yu Guo, Qiyu Jin, Jean-Michel Morel, Gabriele Facciolo

    Abstract: Image demosaicing and denoising play a critical role in the raw imaging pipeline. These processes have often been treated as independent, without considering their interactions. Indeed, most classic denoising methods handle noisy RGB images, not raw images. Conversely, most demosaicing methods address the demosaicing of noise free images. The real problem is to jointly denoise and demosaic noisy r… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: This paper was accepted by Inverse Problems and Imaging on October, 2023

    Journal ref: Inverse Problems and Imaging, 2024, 18(3):571-599

  4. Deep Inertia $L_p$ Half-Quadratic Splitting Unrolling Network for Sparse View CT Reconstruction

    Authors: Yu Guo, Caiying Wu, Yaxin Li, Qiyu Jin, Tieyong Zeng

    Abstract: Sparse view computed tomography (CT) reconstruction poses a challenging ill-posed inverse problem, necessitating effective regularization techniques. In this letter, we employ $L_p$-norm ($0<p<1$) regularization to induce sparsity and introduce inertial steps, leading to the development of the inertial $L_p$-norm half-quadratic splitting algorithm. We rigorously prove the convergence of this algor… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: This paper was accepted by IEEE Signal Processing Letters on July 28, 2024

    Journal ref: IEEE Signal Processing Letters, 2024, 31:2030-2034

  5. arXiv:2408.05726  [pdf

    cond-mat.supr-con cond-mat.mtrl-sci cond-mat.str-el

    Superconductivity Discovered in Niobium Polyhydride at High Pressures

    Authors: X. He, C. L. Zhang, Z. W. Li, K. Lu, S. J. Zhang, B. S. Min, J. Zhang, L. C. Shi, S. M. Feng, Q. Q. Liu, J. Song, X. C. Wang, Y. Peng, L. H. Wang, V. B. Prakapenka, S. Chariton, H. Z. Liu, C. Q. Jin

    Abstract: Niobium polyhydride was synthesized at high pressure and high temperature conditions by using diamond anvil cell combined with in situ high pressure laser heating techniques. High pressure electric transport experiments demonstrate that superconducting transition occurs with critical temperature(Tc) 42 K at 187 GPa. The shift of Tc as function of external applied magnetic field is in consistent to… ▽ More

    Submitted 19 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by Materials Today Physics

  6. arXiv:2408.00727  [pdf, other

    cs.CL cs.AI

    Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions

    Authors: Guangzhi Xiong, Qiao Jin, Xiao Wang, Minjia Zhang, Zhiyong Lu, Aidong Zhang

    Abstract: The emergent abilities of large language models (LLMs) have demonstrated great potential in solving medical questions. They can possess considerable medical knowledge, but may still hallucinate and are inflexible in the knowledge updates. While Retrieval-Augmented Generation (RAG) has been proposed to enhance the medical question-answering capabilities of LLMs with external knowledge bases, it may… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  7. arXiv:2408.00588  [pdf, other

    cs.CL cs.AI

    Closing the gap between open-source and commercial large language models for medical evidence summarization

    Authors: Gongbo Zhang, Qiao Jin, Yiliang Zhou, Song Wang, Betina R. Idnay, Yiming Luo, Elizabeth Park, Jordan G. Nestor, Matthew E. Spotnitz, Ali Soroush, Thomas Campion, Zhiyong Lu, Chunhua Weng, Yifan Peng

    Abstract: Large language models (LLMs) hold great promise in summarizing medical evidence. Most recent studies focus on the application of proprietary LLMs. Using proprietary LLMs introduces multiple risk factors, including a lack of transparency and vendor dependency. While open-source LLMs allow better transparency and customization, their performance falls short compared to proprietary ones. In this stud… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

  8. arXiv:2407.19376  [pdf, other

    cs.CE

    CIDER: Counterfactual-Invariant Diffusion-based GNN Explainer for Causal Subgraph Inference

    Authors: Qibin Zhang, Chengshang Lyu, Lingxi Chen, Qiqi Jin, Luonan Chen

    Abstract: Inferring causal links or subgraphs corresponding to a specific phenotype or label based solely on measured data is an important yet challenging task, which is also different from inferring causal nodes. While Graph Neural Network (GNN) Explainers have shown potential in subgraph identification, existing methods with GNN often offer associative rather than causal insights. This lack of transparenc… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  9. arXiv:2407.11468  [pdf, other

    cs.CV

    AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

    Authors: Qiaoqiao Jin, Rui Shi, Yishun Dou, Bingbing Ni

    Abstract: Current Facial Action Unit (FAU) detection methods generally encounter difficulties due to the scarcity of labeled video training data and the limited number of training face IDs, which renders the trained feature extractor insufficient coverage for modeling the large diversity of inter-person facial structures and movements. To explicitly address the above challenges, we propose a novel video-lev… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  10. arXiv:2407.10810  [pdf, other

    cs.CV cs.AI cs.AR cs.LG

    FabGPT: An Efficient Large Multimodal Model for Complex Wafer Defect Knowledge Queries

    Authors: Yuqi Jiang, Xudong Lu, Qian Jin, Qi Sun, Hanming Wu, Cheng Zhuo

    Abstract: Intelligence is key to advancing integrated circuit (IC) fabrication. Recent breakthroughs in Large Multimodal Models (LMMs) have unlocked unparalleled abilities in understanding images and text, fostering intelligent fabrication. Leveraging the power of LMMs, we introduce FabGPT, a customized IC fabrication large multimodal model for wafer defect knowledge query. FabGPT manifests expertise in con… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  11. arXiv:2407.00431  [pdf, other

    cs.CV

    Location embedding based pairwise distance learning for fine-grained diagnosis of urinary stones

    Authors: Qiangguo Jin, Jiapeng Huang, Changming Sun, Hui Cui, Ping Xuan, Ran Su, Leyi Wei, Yu-Jie Wu, Chia-An Wu, Henry B. L. Duh, Yueh-Hsun Lu

    Abstract: The precise diagnosis of urinary stones is crucial for devising effective treatment strategies. The diagnostic process, however, is often complicated by the low contrast between stones and surrounding tissues, as well as the variability in stone locations across different patients. To address this issue, we propose a novel location embedding based pairwise distance learning network (LEPD-Net) that… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Journal ref: MICCAI 2024

  12. arXiv:2406.17755  [pdf, other

    cs.CL

    Accelerating Clinical Evidence Synthesis with Large Language Models

    Authors: Zifeng Wang, Lang Cao, Benjamin Danek, Yichi Zhang, Qiao Jin, Zhiyong Lu, Jimeng Sun

    Abstract: Automatic medical discovery by AI is a dream of many. One step toward that goal is to create an AI model to understand clinical studies and synthesize clinical evidence from the literature. Clinical evidence synthesis currently relies on systematic reviews of clinical trials and retrospective analyses from medical literature. However, the rapid expansion of publications presents challenges in effi… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  13. arXiv:2406.16814  [pdf, other

    math.NA math.OC

    Convergence analysis of a stochastic heavy-ball method for linear ill-posed problems

    Authors: Qinian Jin, Yanjun Liu

    Abstract: In this paper we consider a stochastic heavy-ball method for solving linear ill-posed inverse problems. With suitable choices of the step-sizes and the momentum coefficients, we establish the regularization property of the method under {\it a priori} selection of the stopping index and derive the rate of convergence under a benchmark source condition on the sought solution. Numerical results are p… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.16578  [pdf, other

    cs.RO cs.AI

    QuadrupedGPT: Towards a Versatile Quadruped Agent in Open-ended Worlds

    Authors: Ye Wang, Yuting Mei, Sipeng Zheng, Qin Jin

    Abstract: While pets offer companionship, their limited intelligence restricts advanced reasoning and autonomous interaction with humans. Considering this, we propose QuadrupedGPT, a versatile agent designed to master a broad range of complex tasks with agility comparable to that of a pet. To achieve this goal, the primary challenges include: i) effectively leveraging multimodal observations for decision-ma… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Under review

  15. arXiv:2406.16537  [pdf, other

    cs.CV cs.AI

    Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization

    Authors: Yuhang Ma, Wenting Xu, Jiji Tang, Qinfeng Jin, Rongsheng Zhang, Zeng Zhao, Changjie Fan, Zhipeng Hu

    Abstract: Customized image generation, which seeks to synthesize images with consistent characters, holds significant relevance for applications such as storytelling, portrait generation, and character design. However, previous approaches have encountered challenges in preserving characters with high-fidelity consistency due to inadequate feature extraction and concept confusion of reference characters. The… ▽ More

    Submitted 3 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  16. arXiv:2406.16301  [pdf, other

    cs.CV cs.AI cs.MM

    UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

    Authors: Yuting Mei, Linli Yao, Qin Jin

    Abstract: With the surge in the amount of video data, video summarization techniques, including visual-modal(VM) and textual-modal(TM) summarization, are attracting more and more attention. However, unimodal summarization inevitably loses the rich semantics of the video. In this paper, we focus on a more comprehensive video summarization task named Bimodal Semantic Summarization of Videos (BiSSV). Specifica… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by ACM International Conference on Multimedia Retrieval (ICMR'24)

    Journal ref: Proceedings of the 2024 International Conference on Multimedia Retrieval, May 2024, Pages 1034-1042

  17. arXiv:2406.12259  [pdf

    cs.AI

    Adversarial Attacks on Large Language Models in Medicine

    Authors: Yifan Yang, Qiao Jin, Furong Huang, Zhiyong Lu

    Abstract: The integration of Large Language Models (LLMs) into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. However, the susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  18. arXiv:2406.12036  [pdf, other

    cs.CL cs.AI

    MedCalc-Bench: Evaluating Large Language Models for Medical Calculations

    Authors: Nikhil Khandekar, Qiao Jin, Guangzhi Xiong, Soren Dunn, Serina S Applebaum, Zain Anwar, Maame Sarfo-Gyamfi, Conrad W Safranek, Abid A Anwar, Andrew Zhang, Aidan Gilson, Maxwell B Singer, Amisha Dave, Andrew Taylor, Aidong Zhang, Qingyu Chen, Zhiyong Lu

    Abstract: As opposed to evaluating computation and logic-based reasoning, current benchmarks for evaluating large language models (LLMs) in medicine are primarily focused on question-answering involving domain knowledge and descriptive reasoning. While such qualitative capabilities are vital to medical diagnosis, in real-world scenarios, doctors frequently use clinical calculators that follow quantitative e… ▽ More

    Submitted 30 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Github link: https://github.com/ncbi-nlp/MedCalc-Bench HuggingFace link: https://huggingface.co/datasets/nsk7153/MedCalc-Bench

  19. arXiv:2406.10960  [pdf, other

    cs.CL

    ESCoT: Towards Interpretable Emotional Support Dialogue Systems

    Authors: Tenggan Zhang, Xinjie Zhang, Jinming Zhao, Li Zhou, Qin Jin

    Abstract: Understanding the reason for emotional support response is crucial for establishing connections between users and emotional support dialogue systems. Previous works mostly focus on generating better responses but ignore interpretability, which is extremely important for constructing reliable dialogue systems. To empower the system with better interpretability, we propose an emotional support respo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 (Long Paper)

  20. arXiv:2406.10911  [pdf, other

    cs.SD eess.AS

    SingMOS: An extensive Open-Source Singing Voice Dataset for MOS Prediction

    Authors: Yuxun Tang, Jiatong Shi, Yuning Wu, Qin Jin

    Abstract: In speech generation tasks, human subjective ratings, usually referred to as the opinion score, are considered the "gold standard" for speech quality evaluation, with the mean opinion score (MOS) serving as the primary evaluation metric. Due to the high cost of human annotation, several MOS prediction systems have emerged in the speech domain, demonstrating good performance. These MOS prediction m… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  21. arXiv:2406.10710  [pdf, other

    cs.AI cs.CL

    SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task

    Authors: Ziije Zhong, Linqing Zhong, Zhaoze Sun, Qingyun Jin, Zengchang Qin, Xiaofan Zhang

    Abstract: Integrating Large Language Models (LLMs) with existing Knowledge Graph (KG) databases presents a promising avenue for enhancing LLMs' efficacy and mitigating their "hallucinations". Given that most KGs reside in graph databases accessible solely through specialized query languages (e.g., Cypher), there exists a critical need to bridge the divide between LLMs and KG databases by automating the tran… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 19 pages, 15 figures, 8 tables

  22. arXiv:2406.08997  [pdf, ps, other

    cs.CV

    Adaptive Temporal Motion Guided Graph Convolution Network for Micro-expression Recognition

    Authors: Fengyuan Zhang, Zhaopei Huang, Xinjie Zhang, Qin Jin

    Abstract: Micro-expressions serve as essential cues for understanding individuals' genuine emotional states. Recognizing micro-expressions attracts increasing research attention due to its various applications in fields such as business negotiation and psychotherapy. However, the intricate and transient nature of micro-expressions poses a significant challenge to their accurate recognition. Most existing wo… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by ICME 2024

  23. arXiv:2406.08905  [pdf, other

    cs.SD eess.AS

    SingOMD: Singing Oriented Multi-resolution Discrete Representation Construction from Speech Models

    Authors: Yuxun Tang, Yuning Wu, Jiatong Shi, Qin Jin

    Abstract: Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation th… ▽ More

    Submitted 20 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  24. arXiv:2406.08416  [pdf, other

    cs.SD eess.AS

    TokSing: Singing Voice Synthesis based on Discrete Tokens

    Authors: Yuning Wu, Chunlei zhang, Jiatong Shi, Yuxun Tang, Shan Yang, Qin Jin

    Abstract: Recent advancements in speech synthesis witness significant benefits by leveraging discrete tokens extracted from self-supervised learning (SSL) models. Discrete tokens offer higher storage efficiency and greater operability in intermediate representations compared to traditional continuous Mel spectrograms. However, when it comes to singing voice synthesis(SVS), achieving higher levels of melody… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  25. arXiv:2406.07725  [pdf, ps, other

    cs.SD eess.AS

    The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

    Authors: Xuankai Chang, Jiatong Shi, Jinchuan Tian, Yuning Wu, Yuxun Tang, Yihan Wu, Shinji Watanabe, Yossi Adi, Xie Chen, Qin Jin

    Abstract: Representing speech and audio signals in discrete units has become a compelling alternative to traditional high-dimensional feature vectors. Numerous studies have highlighted the efficacy of discrete units in various applications such as speech compression and restoration, speech recognition, and speech generation. To foster exploration in this domain, we introduce the Interspeech 2024 Challenge,… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: This manuscript has been accepted by Interspeech2024

  26. arXiv:2406.03688  [pdf, other

    eess.IV cs.CV

    Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification

    Authors: Benjamin Hou, Qingqing Zhu, Tejas Sudarshan Mathai, Qiao Jin, Zhiyong Lu, Ronald M. Summers

    Abstract: In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  27. arXiv:2406.02016  [pdf, other

    math.OC cs.LG stat.ML

    Adaptive and Optimal Second-order Optimistic Methods for Minimax Optimization

    Authors: Ruichen Jiang, Ali Kavis, Qiujiang Jin, Sujay Sanghavi, Aryan Mokhtari

    Abstract: We propose adaptive, line search-free second-order methods with optimal rate of convergence for solving convex-concave min-max problems. By means of an adaptive step size, our algorithms feature a simple update rule that requires solving only one linear system per iteration, eliminating the need for line search or backtracking mechanisms. Specifically, we base our algorithms on the optimistic meth… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  28. arXiv:2405.21063  [pdf, other

    cs.LG cs.AI

    Neural Network Verification with Branch-and-Bound for General Nonlinearities

    Authors: Zhouxing Shi, Qirui Jin, Zico Kolter, Suman Jana, Cho-Jui Hsieh, Huan Zhang

    Abstract: Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide whic… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Preprint

  29. arXiv:2405.17719  [pdf, other

    cs.CV

    EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?

    Authors: Boshen Xu, Ziheng Wang, Yang Du, Zhinan Song, Sipeng Zheng, Qin Jin

    Abstract: Egocentric video-language pretraining is a crucial paradigm to advance the learning of egocentric hand-object interactions (EgoHOI). Despite the great success on existing testbeds, these benchmarks focus more on closed-set visual concepts or limited scenarios. Due to the occurrence of diverse EgoHOIs in the real world, we propose an open-vocabulary benchmark named EgoHOIBench to reveal the diminis… ▽ More

    Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code: https://github.com/xuboshen/EgoNCEpp

  30. arXiv:2405.16205  [pdf

    cs.AI cs.CL

    GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases

    Authors: Zhizheng Wang, Qiao Jin, Chih-Hsuan Wei, Shubo Tian, Po-Ting Lai, Qingqing Zhu, Chi-Ping Day, Christina Ross, Zhiyong Lu

    Abstract: Gene set knowledge discovery is essential for advancing human functional genomics. Recent studies have shown promising performance by harnessing the power of Large Language Models (LLMs) on this task. Nonetheless, their results are subject to several limitations common in LLMs such as hallucinations. In response, we present GeneAgent, a first-of-its-kind language agent featuring self-verification… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 30 pages with 10 figures and/or tables

  31. arXiv:2405.14040  [pdf, other

    cs.MM

    Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline

    Authors: Dingyi Yang, Chunru Zhan, Ziheng Wang, Biao Wang, Tiezheng Ge, Bo Zheng, Qin Jin

    Abstract: Video storytelling is engaging multimedia content that utilizes video and its accompanying narration to attract the audience, where a key challenge is creating narrations for recorded visual scenes. Previous studies on dense video captioning and video story generation have made some progress. However, in practical applications, we typically require synchronized narrations for ongoing visual scenes… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 15 pages, 13 figures

  32. arXiv:2405.13340  [pdf, other

    math.NA math.OC

    Randomized block coordinate descent method for linear ill-posed problems

    Authors: Qinian Jin, Duo Liu

    Abstract: Consider the linear ill-posed problems of the form $\sum_{i=1}^{b} A_i x_i =y$, where, for each $i$, $A_i$ is a bounded linear operator between two Hilbert spaces $X_i$ and ${\mathcal Y}$. When $b$ is huge, solving the problem by an iterative method using the full gradient at each iteration step is both time-consuming and memory insufficient. Although randomized block coordinate decent (RBCD) meth… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    MSC Class: 65J20; 65J22; 65J10; 94A08

  33. arXiv:2405.10860  [pdf, other

    cs.CL

    ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains

    Authors: Zhaopei Huang, Jinming Zhao, Qin Jin

    Abstract: Understanding the process of emotion generation is crucial for analyzing the causes behind emotions. Causal Emotion Entailment (CEE), an emotion-understanding task, aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance. However, current works in CEE mainly focus on modeling semantic and emotional interactions in conversations, neglecti… ▽ More

    Submitted 21 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  34. arXiv:2405.08269  [pdf, ps, other

    math.NA math.OC

    On saturation of the discrepancy principle for nonlinear Tikhonov regularization in Hilbert spaces

    Authors: Qinian Jin

    Abstract: In this paper we revisit the discrepancy principle for Tikhonov regularization of nonlinear ill-posed problems in Hilbert spaces and provide some new and improved saturation results under less restrictive conditions, comparing with the existing results in the literature.

    Submitted 13 May, 2024; originally announced May 2024.

  35. arXiv:2404.16731  [pdf, ps, other

    math.OC

    Non-asymptotic Global Convergence Analysis of BFGS with the Armijo-Wolfe Line Search

    Authors: Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we establish the first explicit and non-asymptotic global convergence analysis of the BFGS method when deployed with an inexact line search scheme that satisfies the Armijo-Wolfe conditions. We show that BFGS achieves a global convergence rate of $(1-\frac{1}κ)^k$ for $μ$-strongly convex functions with $L$-Lipschitz gradients, where $κ=\frac{L}μ$ denotes the condition number. Furthe… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  36. arXiv:2404.16635  [pdf, other

    cs.CV

    TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning

    Authors: Liang Zhang, Anwen Hu, Haiyang Xu, Ming Yan, Yichen Xu, Qin Jin, Ji Zhang, Fei Huang

    Abstract: Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in terms of parameters and computational requirements limits their use in resource-constrained environments. In this paper, we present TinyChart, an efficien… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages, 11 figures

  37. arXiv:2404.16361  [pdf, other

    cs.LG cs.NE cs.SC

    Evolutionary Causal Discovery with Relative Impact Stratification for Interpretable Data Analysis

    Authors: Ou Deng, Shoji Nishimura, Atsushi Ogihara, Qun Jin

    Abstract: This study proposes Evolutionary Causal Discovery (ECD) for causal discovery that tailors response variables, predictor variables, and corresponding operators to research datasets. Utilizing genetic programming for variable relationship parsing, the method proceeds with the Relative Impact Stratification (RIS) algorithm to assess the relative impact of predictor variables on the response variable,… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  38. arXiv:2404.14705  [pdf, other

    cs.CV

    Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

    Authors: Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin Jin

    Abstract: This work addresses the 3D situated reasoning task which aims to answer questions given egocentric observations in a 3D environment. The task remains challenging as it requires comprehensive 3D perception and complex reasoning skills. End-to-end models trained on supervised data for 3D situated reasoning suffer from data scarcity and generalization ability. Inspired by the recent success of levera… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  39. arXiv:2404.13370  [pdf, other

    cs.CV cs.CL cs.MM

    Movie101v2: Improved Movie Narration Benchmark

    Authors: Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin

    Abstract: Automatic movie narration targets at creating video-aligned plot descriptions to assist visually impaired audiences. It differs from standard video captioning in that it requires not only describing key visual details but also inferring the plots developed across multiple movie shots, thus posing unique and ongoing challenges. To advance the development of automatic movie narrating systems, we fir… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  40. arXiv:2404.05900  [pdf, ps, other

    math.OC

    Distributionally Robust Optimization with Decision-Dependent Information Discovery

    Authors: Qing Jin, Angelos Georghiou, Phebe Vayanos, Grani A. Hanasusanto

    Abstract: We study two-stage distributionally robust optimization (DRO) problems with decision-dependent information discovery (DDID) wherein (a portion of) the uncertain parameters are revealed only if an (often costly) investment is made in the first stage. This class of problems finds many important applications in selection problems (e.g., in hiring, project portfolio optimization, or optimal sensor loc… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  41. arXiv:2404.03218  [pdf, other

    math.NA math.OC

    An adaptive heavy ball method for ill-posed inverse problems

    Authors: Qinian Jin, Qin Huang

    Abstract: In this paper we consider ill-posed inverse problems, both linear and nonlinear, by a heavy ball method in which a strongly convex regularization function is incorporated to detect the feature of the sought solution. We develop ideas on how to adaptively choose the step-sizes and the momentum coefficients to achieve acceleration over the Landweber-type method. We then analyze the method and establ… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  42. arXiv:2404.01267  [pdf, other

    math.OC

    Non-asymptotic Global Convergence Rates of BFGS with Exact Line Search

    Authors: Qiujiang Jin, Ruichen Jiang, Aryan Mokhtari

    Abstract: In this paper, we explore the non-asymptotic global convergence rates of the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method implemented with exact line search. Notably, due to Dixon's equivalence result, our findings are also applicable to other quasi-Newton methods in the convex Broyden class employing exact line search, such as the Davidon-Fletcher-Powell (DFP) method. Specifically, we focus on… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  43. arXiv:2403.15033  [pdf, other

    cs.CV

    Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

    Authors: Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni

    Abstract: Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new… ▽ More

    Submitted 16 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  44. arXiv:2403.12895  [pdf, other

    cs.CV

    mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

    Authors: Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou

    Abstract: Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for Visual Document Understanding are equipped with text recognition ability but lack general structure understanding abilities for text-rich document images. In this work, we emphasize the importance of structure informatio… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 21 pages, 15 figures

  45. Inter- and intra-uncertainty based feature aggregation model for semi-supervised histopathology image segmentation

    Authors: Qiangguo Jin, Hui Cui, Changming Sun, Yang Song, Jiangbin Zheng, Leilei Cao, Leyi Wei, Ran Su

    Abstract: Acquiring pixel-level annotations is often limited in applications such as histology studies that require domain expertise. Various semi-supervised learning approaches have been developed to work with limited ground truth annotations, such as the popular teacher-student models. However, hierarchical prediction uncertainty within the student model (intra-uncertainty) and image prediction uncertaint… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Journal ref: Expert Systems with Applications, 2024, 238: 122093

  46. arXiv:2403.12460  [pdf, ps, other

    math.NA math.OC

    Stochastic variance reduced gradient method for linear ill-posed inverse problems

    Authors: Qinian Jin, Liuhong Chen

    Abstract: In this paper we apply the stochastic variance reduced gradient (SVRG) method, which is a popular variance reduction method in optimization for accelerating the stochastic gradient method, to solve large scale linear ill-posed systems in Hilbert spaces. Under {\it a priori} choices of stopping indices, we derive a convergence rate result when the sought solution satisfies a benchmark source condit… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  47. arXiv:2403.05874  [pdf, other

    cs.CV cs.RO

    SPAFormer: Sequential 3D Part Assembly with Transformers

    Authors: Boshen Xu, Sipeng Zheng, Qin Jin

    Abstract: We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's pose and shape in sequential steps, and as the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy… ▽ More

    Submitted 3 June, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/xuboshen/SPAFormer

  48. POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

    Authors: Boshen Xu, Sipeng Zheng, Qin Jin

    Abstract: We humans are good at translating third-person observations of hand-object interactions (HOI) into an egocentric view. However, current methods struggle to replicate this ability of view adaptation from third-person to first-person. Although some approaches attempt to learn view-agnostic representation from large-scale video datasets, they ignore the relationships among multiple third-person views… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM MM 2023. Project page: https://xuboshen.github.io/

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023). Association for Computing Machinery, New York, NY, USA, 2807-2816

  49. arXiv:2403.05680  [pdf, other

    cs.AI cs.CL cs.CV

    How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analyses

    Authors: Qingqing Zhu, Benjamin Hou, Tejas S. Mathai, Pritam Mukherjee, Qiao Jin, Xiuying Chen, Zhizheng Wang, Ruida Cheng, Ronald M. Summers, Zhiyong Lu

    Abstract: Automatically interpreting CT scans can ease the workload of radiologists. However, this is challenging mainly due to the scarcity of adequate datasets and reference standards for evaluation. This study aims to bridge this gap by introducing a novel evaluation framework, named ``GPTRadScore''. This framework assesses the capabilities of multi-modal LLMs, such as GPT-4 with Vision (GPT-4V), Gemini… ▽ More

    Submitted 18 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  50. Fast, nonlocal and neural: a lightweight high quality solution to image denoising

    Authors: Yu Guo, Axel Davy, Gabriele Facciolo, Jean-Michel Morel, Qiyu Jin

    Abstract: With the widespread application of convolutional neural networks (CNNs), the traditional model based denoising algorithms are now outperformed. However, CNNs face two problems. First, they are computationally demanding, which makes their deployment especially difficult for mobile terminals. Second, experimental evidence shows that CNNs often over-smooth regular textures present in images, in contr… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 5 pages. This paper was accepted by IEEE Signal Processing Letters on July 1, 2021

    Journal ref: IEEE Signal Processing Letters, 2021, 28:1515-1519