Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 687 results for author: Song, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01598  [pdf

    cs.LG cs.AI

    Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast

    Authors: Yifan Hu, Fukang Yin, Weimin Zhang, Kaijun Ren, Junqiang Song, Kefeng Deng, Di Zhang

    Abstract: Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In this paper, we reveal that the universal mechanism for these instabilities is not only related to spectral bias but also to distortions brought by proce… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  2. arXiv:2407.00081  [pdf, other

    cs.DC cs.AI cs.ET cs.LG cs.NI

    Semantic Revolution from Communications to Orchestration for 6G: Challenges, Enablers, and Research Directions

    Authors: Masoud Shokrnezhad, Hamidreza Mazandarani, Tarik Taleb, Jaeseung Song, Richard Li

    Abstract: In the context of emerging 6G services, the realization of everything-to-everything interactions involving a myriad of physical and digital entities presents a crucial challenge. This challenge is exacerbated by resource scarcity in communication infrastructures, necessitating innovative solutions for effective service implementation. Exploring the potential of Semantic Communications (SemCom) to… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: Accepted at IEEE Network magazine special issue: Goal-oriented Semantic Communication and Networking

  3. arXiv:2406.18151  [pdf, other

    cs.CV

    SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery

    Authors: Jian Song, Hongruixuan Chen, Weihao Xuan, Junshi Xia, Naoto Yokoya

    Abstract: Global semantic 3D understanding from single-view high-resolution remote sensing (RS) imagery is crucial for Earth Observation (EO). However, this task faces significant challenges due to the high costs of annotations and data collection, as well as geographically restricted data availability. To address these challenges, synthetic data offer a promising solution by being easily accessible and thu… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.14984  [pdf, ps, other

    cs.DS

    Colorful Priority $k$-Supplier

    Authors: Chandra Chekuri, Junkai Song

    Abstract: In the Priority $k$-Supplier problem the input consists of a metric space $(F \cup C, d)$ over set of facilities $F$ and a set of clients $C$, an integer $k > 0$, and a non-negative radius $r_v$ for each client $v \in C$. The goal is to select $k$ facilities $S \subseteq F$ to minimize $\max_{v \in C} \frac{d(v,S)}{r_v}$ where $d(v,S)$ is the distance of $v$ to the closes facility in $S$. This pro… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  5. arXiv:2406.13929  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination

    Authors: Jongyoon Song, Sangwon Yu, Sungroh Yoon

    Abstract: In this paper, we identify a new category of bias that induces input-conflicting hallucinations, where large language models (LLMs) generate responses inconsistent with the content of the input context. This issue we have termed the false negative problem refers to the phenomenon where LLMs are predisposed to return negative judgments when assessing the correctness of a statement given the context… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  6. arXiv:2406.12907  [pdf, other

    cs.LG cs.CL

    Reconciling Kaplan and Chinchilla Scaling Laws

    Authors: Tim Pearce, Jinyeop Song

    Abstract: Kaplan et al. [2020] (`Kaplan') and Hoffmann et al. [2022] (`Chinchilla') studied the scaling behavior of transformers trained on next-token language prediction. These studies produced different estimates for how the number of parameters ($N$) and training tokens ($D$) should be set to achieve the lowest possible loss for a given compute budget ($C$). Kaplan: $N_\text{optimal} \propto C^{0.73}$, C… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  7. arXiv:2406.12315  [pdf, other

    cs.AI

    PruningBench: A Comprehensive Benchmark of Structural Pruning

    Authors: Haoling Li, Changhao Li, Mengqi Xue, Gongfan Fang, Sheng Zhou, Zunlei Feng, Huiqiong Wang, Yong Wang, Lechao Cheng, Mingli Song, Jie Song

    Abstract: Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three c… ▽ More

    Submitted 28 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  8. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11503  [pdf, other

    cs.CV cs.CL

    GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation

    Authors: Shihao Cai, Keqin Bao, Hangyu Guo, Jizhi Zhang, Jun Song, Bo Zheng

    Abstract: Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  10. arXiv:2406.11128  [pdf, other

    cs.LG cs.RO

    Model Adaptation for Time Constrained Embodied Control

    Authors: Jaehyun Song, Minjong Yoo, Honguk Woo

    Abstract: When adopting a deep learning model for embodied agents, it is required that the model structure be optimized for specific tasks and operational conditions. Such optimization can be static such as model compression or dynamic such as adaptive inference. Yet, these techniques have not been fully investigated for embodied control systems subject to time constraints, which necessitate sequential deci… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures, Accepted in The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024)

  11. arXiv:2406.10744   

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Jose Alvarez, Coert van Gemeren, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou , et al. (77 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 27 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The author list and contents need to be verified by all authors

  12. arXiv:2406.10289  [pdf, other

    cs.CL cs.AI cs.IR

    VeraCT Scan: Retrieval-Augmented Fake News Detection with Justifiable Reasoning

    Authors: Cheng Niu, Yang Guan, Yuanhao Wu, Juno Zhu, Juntong Song, Randy Zhong, Kaihua Zhu, Siliang Xu, Shizhe Diao, Tong Zhang

    Abstract: The proliferation of fake news poses a significant threat not only by disseminating misleading information but also by undermining the very foundations of democracy. The recent advance of generative artificial intelligence has further exacerbated the challenge of distinguishing genuine news from fabricated stories. In response to this challenge, we introduce VeraCT Scan, a novel retrieval-augmente… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  13. CircuitVAE: Efficient and Scalable Latent Circuit Optimization

    Authors: Jialin Song, Aidan Swope, Robert Kirby, Rajarshi Roy, Saad Godil, Jonathan Raiman, Bryan Catanzaro

    Abstract: Automatically designing fast and space-efficient digital circuits is challenging because circuits are discrete, must exactly implement the desired logic, and are costly to simulate. We address these challenges with CircuitVAE, a search algorithm that embeds computation graphs in a continuous space and optimizes a learned surrogate of physical simulation by gradient descent. By carefully controllin… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Design Automation Conference (DAC) 2024; the first two authors contributed equally

  14. arXiv:2406.09181  [pdf, other

    cs.CV cs.AI

    A Large-scale Universal Evaluation Benchmark For Face Forgery Detection

    Authors: Yijun Bei, Hengrui Lou, Jinsong Geng, Erteng Liu, Lechao Cheng, Jie Song, Mingli Song, Zunlei Feng

    Abstract: With the rapid development of AI-generated content (AIGC) technology, the production of realistic fake facial images and videos that deceive human visual perception has become possible. Consequently, various face forgery detection techniques have been proposed to identify such fake facial content. However, evaluating the effectiveness and generalizability of these detection techniques remains a si… ▽ More

    Submitted 13 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: This is a paper about constructing a large-scale universal evaluation benchmark for face forgery detection.The full text is 30 pages

  15. arXiv:2406.06580  [pdf, other

    cs.CL cs.AI

    Break the Chain: Large Language Models Can be Shortcut Reasoners

    Authors: Mengru Ding, Hanmeng Liu, Zhizhang Fu, Jian Song, Wenbo Xie, Yue Zhang

    Abstract: Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integratio… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  16. arXiv:2406.06562  [pdf, other

    cs.CL cs.AI

    Achieving Sparse Activation in Small Language Models

    Authors: Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao

    Abstract: Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 15 pages

  17. arXiv:2406.06558  [pdf, other

    cs.CL cs.AI

    Enhancing Text Authenticity: A Novel Hybrid Approach for AI-Generated Text Detection

    Authors: Ye Zhang, Qian Leng, Mengran Zhu, Rui Ding, Yue Wu, Jintong Song, Yulu Gong

    Abstract: The rapid advancement of Large Language Models (LLMs) has ushered in an era where AI-generated text is increasingly indistinguishable from human-generated content. Detecting AI-generated text has become imperative to combat misinformation, ensure content authenticity, and safeguard against malicious uses of AI. In this paper, we propose a novel hybrid approach that combines traditional TF-IDF tech… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  18. arXiv:2406.06340  [pdf, other

    cs.LG cs.AI

    Optimisation of federated learning settings under statistical heterogeneity variations

    Authors: Basem Suleiman, Muhammad Johan Alibasa, Rizka Widyarini Purwanto, Lewis Jeffries, Ali Anaissi, Jacky Song

    Abstract: Federated Learning (FL) enables local devices to collaboratively learn a shared predictive model by only periodically sharing model parameters with a central aggregator. However, FL can be disadvantaged by statistical heterogeneity produced by the diversity in each local devices data distribution, which creates different levels of Independent and Identically Distributed (IID) data. Furthermore, th… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 27 pages, 17 figures

  19. arXiv:2406.05135  [pdf

    cs.RO math.OC

    Smart Navigation System for Parking Assignment at Large Events: Incorporating Heterogeneous Driver Characteristics

    Authors: Xi Cheng, Gaofeng Su, Siyuan Feng, Ke Liu, Chen Zhu, Hui Lin, Jilin Song, Jianan Chen

    Abstract: Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducte… ▽ More

    Submitted 14 May, 2024; originally announced June 2024.

  20. arXiv:2406.03912  [pdf, other

    cs.AI cs.LG cs.RO eess.SY

    GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

    Authors: Zhehua Zhou, Xuan Xie, Jiayang Song, Zhan Shu, Lei Ma

    Abstract: Although deep reinforcement learning has demonstrated impressive achievements in controlling various autonomous systems, e.g., autonomous vehicles or humanoid robots, its inherent reliance on random exploration raises safety concerns in their real-world applications. To improve system safety during the learning process, a variety of Safe Reinforcement Learning (SRL) algorithms have been proposed,… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  21. arXiv:2406.03097  [pdf, other

    cs.LG cs.AI

    Enhancing the Resilience of Graph Neural Networks to Topological Perturbations in Sparse Graphs

    Authors: Shuqi He, Jun Zhuang, Ding Wang, Luyao Peng, Jun Song

    Abstract: Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian labe… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  22. arXiv:2406.01595  [pdf, other

    cs.CV

    MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

    Authors: Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin, Otmar Hilliges, Jie Song

    Abstract: We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Project page: https://eth-ait.github.io/MultiPly/

  23. arXiv:2406.00505  [pdf, other

    cs.CV

    Improving Text Generation on Images with Synthetic Captions

    Authors: Jun Young Koh, Sang Hyun Park, Joy Song

    Abstract: The recent emergence of latent diffusion models such as SDXL and SD 1.5 has shown significant capability in generating highly detailed and realistic images. Despite their remarkable ability to produce images, generating accurate text within images still remains a challenging task. In this paper, we examine the validity of fine-tuning approaches in generating legible text within the image. We propo… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 9 pages, 12 figures

  24. arXiv:2405.20701  [pdf, other

    cs.CL cs.AI

    Unveiling the Lexical Sensitivity of LLMs: Combinatorial Optimization for Prompt Enhancement

    Authors: Pengwei Zhan, Zhen Xu, Qian Tan, Jie Song, Ru Xie

    Abstract: Large language models (LLMs) demonstrate exceptional instruct-following ability to complete various downstream tasks. Although this impressive ability makes LLMs flexible task solvers, their performance in solving tasks also heavily relies on instructions. In this paper, we reveal that LLMs are over-sensitive to lexical variations in task instructions, even when the variations are imperceptible to… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  25. arXiv:2405.19811  [pdf, ps, other

    cs.LG cs.MA

    Approximate Global Convergence of Independent Learning in Multi-Agent Systems

    Authors: Ruiyang Jin, Zaiwei Chen, Yiheng Lin, Jie Song, Adam Wierman

    Abstract: Independent learning (IL), despite being a popular approach in practice to achieve scalability in large-scale multi-agent systems, usually lacks global convergence guarantees. In this paper, we study two representative algorithms, independent $Q$-learning and independent natural actor-critic, within value-based and policy-based frameworks, and provide the first finite-sample analysis for approxima… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  26. arXiv:2405.19704  [pdf, other

    stat.ML cs.LG stat.ME

    Enhancing Sufficient Dimension Reduction via Hellinger Correlation

    Authors: Seungbeom Hong, Ilmun Kim, Jun Song

    Abstract: In this work, we develop a new theory and method for sufficient dimension reduction (SDR) in single-index models, where SDR is a sub-field of supervised dimension reduction based on conditional independence. Our work is primarily motivated by the recent introduction of the Hellinger correlation as a dependency measure. Utilizing this measure, we develop a method capable of effectively detecting th… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  27. arXiv:2405.19665  [pdf

    eess.SY cs.AI cs.LG

    A novel fault localization with data refinement for hydroelectric units

    Authors: Jialong Huang, Junlin Song, Penglong Lian, Mengjie Gan, Zhiheng Su, Benhao Wang, Wenji Zhu, Xiaomin Pu, Jianxiao Zou, Shicai Fan

    Abstract: Due to the scarcity of fault samples and the complexity of non-linear and non-smooth characteristics data in hydroelectric units, most of the traditional hydroelectric unit fault localization methods are difficult to carry out accurate localization. To address these problems, a sparse autoencoder (SAE)-generative adversarial network (GAN)-wavelet noise reduction (WNR)- manifold-boosted deep learni… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 6pages,4 figures,Conference on Decision and Control(CDC) conference

  28. arXiv:2405.18844  [pdf, other

    cs.IT eess.SP

    Optical IRS for Visible Light Communication: Modeling, Design, and Open Issues

    Authors: Shiyuan Sun, Fang Yang, Weidong Mei, Jian Song, Zhu Han, Rui Zhang

    Abstract: Optical intelligent reflecting surface (OIRS) offers a new and effective approach to resolving the line-of-sight blockage issue in visible light communication (VLC) by enabling redirection of light to bypass obstacles, thereby dramatically enhancing indoor VLC coverage and reliability. This article provides a comprehensive overview of OIRS for VLC, including channel modeling, design techniques, an… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  29. arXiv:2405.18093  [pdf, other

    cs.DC cs.LG

    Pipette: Automatic Fine-grained Large Language Model Training Configurator for Real-World Clusters

    Authors: Jinkyu Yim, Jaeyong Song, Yerim Choi, Jaebeen Lee, Jaewon Jung, Hongsun Jang, Jinho Lee

    Abstract: Training large language models (LLMs) is known to be challenging because of the huge computational and memory capacity requirements. To address these issues, it is common to use a cluster of GPUs with 3D parallelism, which splits a model along the data batch, pipeline stage, and intra-layer tensor dimensions. However, the use of 3D parallelism produces the additional challenge of finding the optim… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: published at DATE 2024

  30. arXiv:2405.18004  [pdf, other

    cs.CV

    SkinCAP: A Multi-modal Dermatology Dataset Annotated with Rich Medical Captions

    Authors: Juexiao Zhou, Liyuan Sun, Yan Xu, Wenbin Liu, Shawn Afvari, Zhongyi Han, Jiaoyan Song, Yongzhi Ji, Xiaonan He, Xin Gao

    Abstract: With the widespread application of artificial intelligence (AI), particularly deep learning (DL) and vision-based large language models (VLLMs), in skin disease diagnosis, the need for interpretability becomes crucial. However, existing dermatology datasets are limited in their inclusion of concept-level meta-labels, and none offer rich medical descriptions in natural language. This deficiency imp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  31. arXiv:2405.16605  [pdf, other

    cs.CV

    Demystify Mamba in Vision: A Linear Attention Perspective

    Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

    Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  32. arXiv:2405.16248  [pdf

    eess.IV cs.CV cs.LG q-bio.QM

    Combining Radiomics and Machine Learning Approaches for Objective ASD Diagnosis: Verifying White Matter Associations with ASD

    Authors: Junlin Song, Yuzhuo Chen, Yuan Yao, Zetong Chen, Renhao Guo, Lida Yang, Xinyi Sui, Qihang Wang, Xijiao Li, Aihua Cao, Wei Li

    Abstract: Autism Spectrum Disorder is a condition characterized by a typical brain development leading to impairments in social skills, communication abilities, repetitive behaviors, and sensory processing. There have been many studies combining brain MRI images with machine learning algorithms to achieve objective diagnosis of autism, but the correlation between white matter and autism has not been fully u… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  33. arXiv:2405.15738  [pdf, other

    cs.CV

    ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

    Authors: Chunjiang Ge, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, Bo Zheng

    Abstract: High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic complexity while still generating excessive visual tokens. However, the redundancy in visual tokens is the key problem as it leads to more substantial compute. To mitigate this issue, we propose ConvLLaVA, which emplo… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages

  34. arXiv:2405.15356  [pdf, other

    cs.CV

    Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

    Authors: Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen

    Abstract: Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropri… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages. arXiv admin note: text overlap with arXiv:2311.16922 by other authors

  35. arXiv:2405.14303  [pdf, other

    cs.LG

    Similarity-Navigated Conformal Prediction for Graph Neural Networks

    Authors: Jianqing Song, Jianguo Huang, Wenyu Jiang, Baoming Zhang, Shuangjie Li, Chongjun Wang

    Abstract: Graph Neural Networks have achieved remarkable accuracy in semi-supervised node classification tasks. However, these results lack reliable uncertainty estimates. Conformal prediction methods provide a theoretical guarantee for node classification tasks, ensuring that the conformal prediction set contains the ground-truth label with a desired probability (e.g., 95%). In this paper, we empirically s… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  36. arXiv:2405.14055  [pdf, other

    cs.CL cs.AI cs.ET

    How Many Bytes Can You Take Out Of Brain-To-Text Decoding?

    Authors: Richard Antonello, Nihita Sarma, Jerry Tang, Jiaru Song, Alexander Huth

    Abstract: Brain-computer interfaces have promising medical and scientific applications for aiding speech and studying the brain. In this work, we propose an information-based evaluation metric for brain-to-text decoders. Using this metric, we examine two methods to augment existing state-of-the-art continuous text decoders. We show that these methods, in concert, can improve brain decoding performance by up… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  37. arXiv:2405.13037  [pdf, other

    cs.CL cs.AI

    Enhancing Dialogue State Tracking Models through LLM-backed User-Agents Simulation

    Authors: Cheng Niu, Xingguang Wang, Xuxin Cheng, Juntong Song, Tong Zhang

    Abstract: Dialogue State Tracking (DST) is designed to monitor the evolving dialogue state in the conversations and plays a pivotal role in developing task-oriented dialogue systems. However, obtaining the annotated data for the DST task is usually a costly endeavor. In this paper, we focus on employing LLMs to generate dialogue data to reduce dialogue collection and annotation costs. Specifically, GPT-4 is… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  38. arXiv:2405.12801  [pdf, other

    cs.CL cs.IR cs.LG

    Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

    Authors: Jonghyun Song, Cheyon Jin, Wenlong Zhao, Jay-Yoon Lee

    Abstract: A common retrieve-and-rerank paradigm involves retrieving a broad set of relevant candidates using a scalable bi-encoder, followed by expensive but more accurate cross-encoders to a limited candidate set. However, this small subset often leads to error propagation from the bi-encoders, thereby restricting the performance of the overall pipeline. To address these issues, we propose the Comparing Mu… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  39. arXiv:2405.12710  [pdf, other

    cs.CV

    Text-Video Retrieval with Global-Local Semantic Consistent Learning

    Authors: Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen

    Abstract: Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, l… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 9 pages

  40. arXiv:2405.12533  [pdf

    cs.CV

    Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering

    Authors: Hiba Maryam, Ling Fu, Jiajun Song, Tajrian ABM Shafayet, Qidi Luo, Xiang Bai, Yuliang Liu

    Abstract: The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding and interaction with Urdu-language visual data. This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new mul… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by the International Conference on Document Analysis and Recognition (ICDAR) 2024

  41. arXiv:2405.11437  [pdf, other

    cs.CV

    The First Swahili Language Scene Text Detection and Recognition Dataset

    Authors: Fadila Wendigoundi Douamba, Jianjun Song, Ling Fu, Yuliang Liu, Xiang Bai

    Abstract: Scene text recognition is essential in many applications, including automated translation, information retrieval, driving assistance, and enhancing accessibility for individuals with visual impairments. Much research has been done to improve the accuracy and performance of scene text detection and recognition models. However, most of this research has been conducted in the most common languages, E… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted to ICDAR 2024

  42. arXiv:2405.11252  [pdf, other

    cs.CV

    Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching

    Authors: Xingyu Miao, Haoran Duan, Varun Ojha, Jun Song, Tejal Shah, Yang Long, Rajiv Ranjan

    Abstract: In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversi… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  43. arXiv:2405.10452  [pdf, other

    cs.CL cs.LG

    Navigating Public Sentiment in the Circular Economy through Topic Modelling and Hyperparameter Optimisation

    Authors: Junhao Song, Yingfang Yuan, Kaiwen Chang, Bing Xu, Jin Xuan, Wei Pang

    Abstract: To advance the circular economy (CE), it is crucial to gain insights into the evolution of public sentiments, cognitive pathways of the masses concerning circular products and digital technology, and recognise the primary concerns. To achieve this, we collected data related to the CE from diverse platforms including Twitter, Reddit, and The Guardian. This comprehensive data collection spanned acro… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  44. arXiv:2405.09883  [pdf, other

    cs.CV

    RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

    Authors: Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye

    Abstract: We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within… ▽ More

    Submitted 19 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Technical report. 32 pages, 21 figures, 13 tables. https://github.com/xiaosu-zhu/RoScenes

  45. arXiv:2405.08298  [pdf, other

    cs.LG

    Deep Reinforcement Learning for Real-Time Ground Delay Program Revision and Corresponding Flight Delay Assignments

    Authors: Ke Liu, Fan Hu, Hui Lin, Xi Cheng, Jianan Chen, Jilin Song, Siyuan Feng, Gaofeng Su, Chen Zhu

    Abstract: This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  46. arXiv:2405.08293  [pdf, other

    cs.LG

    Airport Delay Prediction with Temporal Fusion Transformers

    Authors: Ke Liu, Kaijing Ding, Xi Cheng, Jianan Chen, Siyuan Feng, Hui Lin, Jilin Song, Chen Zhu

    Abstract: Since flight delay hurts passengers, airlines, and airports, its prediction becomes crucial for the decision-making of all stakeholders in the aviation industry and thus has been attempted by various previous research. However, previous delay predictions are often categorical and at a highly aggregated level. To improve that, this study proposes to apply the novel Temporal Fusion Transformer model… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  47. arXiv:2405.07260  [pdf

    cs.LG cs.AI eess.SP

    A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition

    Authors: Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu

    Abstract: This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 5 pages, 3 figures, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  48. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  49. arXiv:2405.00558  [pdf, other

    cs.NI

    Cross-Cluster Networking to Support Extended Reality Services

    Authors: Theodoros Theodoropoulos, Luis Rosa, Abderrahmane Boudi, Tarik Zakaria Benmerar, Antonios Makris, Tarik Taleb, Luis Cordeiro, Konstantinos Tserpes, JaeSeung Song

    Abstract: Extented Reality (XR) refers to a class of contemporary services that are intertwined with a plethora of rather demanding Quality of Service (QoS) and functional requirements. Despite Kubernetes being the de-facto standard in terms of deploying and managing contemporary containerized microservices, it lacks adequate support for cross-cluster networking, hindering service-to-service communication a… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  50. arXiv:2404.19149  [pdf, other

    cs.CV

    SAGS: Structure-Aware 3D Gaussian Splatting

    Authors: Evangelos Ververas, Rolandos Alexandros Potamias, Jifei Song, Jiankang Deng, Stefanos Zafeiriou

    Abstract: Following the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inhe… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures, 3 tables