Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,053 results for author: Hu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.03583  [pdf, other

    cs.CV

    Text-Guided Mixup Towards Long-Tailed Image Categorization

    Authors: Richard Franklin, Jiawei Yao, Deyang Zhong, Qi Qian, Juhua Hu

    Abstract: In many real-world applications, the frequency distribution of class labels for training data can exhibit a long-tailed distribution, which challenges traditional approaches of training deep neural networks that require heavy amounts of balanced data. Gathering and labeling data to balance out the class label distribution can be both costly and time-consuming. Many existing solutions that enable e… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Accepted by BMVC'24, code is available at https://github.com/rsamf/text-guided-mixup

  2. arXiv:2409.03412  [pdf

    cs.CV physics.med-ph

    TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model

    Authors: Yihao Zhao, Enhao Zhong, Cuiyun Yuan, Yang Li, Man Zhao, Chunxia Li, Jun Hu, Chenbin Liu

    Abstract: We propose TG-LMM (Text-Guided Large Multi-Modal Model), a novel approach that leverages textual descriptions of organs to enhance segmentation accuracy in medical images. Existing medical image segmentation methods face several challenges: current medical automatic segmentation models do not effectively utilize prior knowledge, such as descriptions of organ locations; previous text-visual models… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: 11 pages, 2 figures

    MSC Class: 68T07

  3. arXiv:2409.02512  [pdf, other

    cs.LG cs.AI

    Continual Diffuser (CoD): Mastering Continual Offline Reinforcement Learning with Experience Rehearsal

    Authors: Jifeng Hu, Li Shen, Sili Huang, Zhejian Yang, Hechang Chen, Lichao Sun, Yi Chang, Dacheng Tao

    Abstract: Artificial neural networks, especially recent diffusion-based models, have shown remarkable superiority in gaming, control, and QA systems, where the training tasks' datasets are usually static. However, in real-world applications, such as robotic control of reinforcement learning (RL), the tasks are changing, and new tasks arise in a sequential order. This situation poses the new challenge of pla… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.01688  [pdf, ps, other

    cs.DS cs.AI cs.LG stat.ML

    Differentially Private Kernel Density Estimation

    Authors: Erzhi Liu, Jerry Yao-Chieh Hu, Alex Reneau, Zhao Song, Han Liu

    Abstract: We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE), offering not only improved privacy-utility tradeoff but also better efficiency over prior results. Specifically, we study the mathematical problem: given a similarity function $f$ (or DP KDE) and a private dataset $X \subset \mathbb{R}^d$, our goal is to preprocess $X$ so that for any query… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  5. arXiv:2409.01552  [pdf, other

    cs.CL cs.AI

    Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs

    Authors: Zhuo Li, Yuhao Du, Jinpeng Hu, Xiang Wan, Anningzhe Gao

    Abstract: Large language models (LLMs) have shown success in generating high-quality responses. In order to achieve better alignment with LLMs with human preference, various works are proposed based on specific optimization process, which, however, is not suitable to Black-Box LLMs like GPT-4, due to inaccessible parameters. In Black-Box LLMs case, their performance is highly dependent on the quality of the… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  6. arXiv:2409.01348  [pdf, other

    cs.CV cs.CE cs.LG

    PatternPaint: Generating Layout Patterns Using Generative AI and Inpainting Techniques

    Authors: Guanglei Zhou, Bhargav Korrapati, Gaurav Rajavendra Reddy, Jiang Hu, Yiran Chen, Dipto G. Thakurta

    Abstract: Generation of VLSI layout patterns is essential for a wide range of Design For Manufacturability (DFM) studies. In this study, we investigate the potential of generative machine learning models for creating design rule legal metal layout patterns. Our results demonstrate that the proposed model can generate legal patterns in complex design rule settings and achieves a high diversity score. The des… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  7. arXiv:2409.00510  [pdf, other

    cs.CV cs.AI

    Streamlining Forest Wildfire Surveillance: AI-Enhanced UAVs Utilizing the FLAME Aerial Video Dataset for Lightweight and Efficient Monitoring

    Authors: Lemeng Zhao, Junjie Hu, Jianchao Bi, Yanbing Bai, Erick Mas, Shunichi Koshimura

    Abstract: In recent years, unmanned aerial vehicles (UAVs) have played an increasingly crucial role in supporting disaster emergency response efforts by analyzing aerial images. While current deep-learning models focus on improving accuracy, they often overlook the limited computing resources of UAVs. This study recognizes the imperative for real-time data processing in disaster response scenarios and intro… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: accpeted by Proceedings of the International Conference on Intelligent Robots and Systems (2024 IROS)

  8. arXiv:2409.00342  [pdf, other

    cs.CV

    AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation

    Authors: Zanlin Ni, Yulin Wang, Renping Zhou, Rui Lu, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Yuan Yao, Gao Huang

    Abstract: Recent studies have demonstrated the effectiveness of token-based methods for visual content generation. As a representative work, non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps. However, NATs usually necessitate configuring a complicated generation policy comprising multiple manually-designed scheduling rules. These heuristic-dr… ▽ More

    Submitted 30 August, 2024; originally announced September 2024.

    Comments: Accepted by ECCV2024

  9. arXiv:2408.15542  [pdf, other

    cs.CV cs.AI cs.MM

    Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

    Authors: Jiajun Liu, Yibing Wang, Hanghang Ma, Xiaoping Wu, Xiaoqi Ma, Xiaoming Wei, Jianbin Jiao, Enhua Wu, Jie Hu

    Abstract: Rapid advancements have been made in extending Large Language Models (LLMs) to Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data remains a challenging endeavor, especially for long videos. Due to insufficient access to large-scale high-quality video data and the excessive compression of visual features, current methods exhibit limitations in effectively proce… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  10. arXiv:2408.15251  [pdf, other

    cs.CV cs.LG

    TrajFM: A Vehicle Trajectory Foundation Model for Region and Task Transferability

    Authors: Yan Lin, Tonglong Wei, Zeyu Zhou, Haomin Wen, Jilin Hu, Shengnan Guo, Youfang Lin, Huaiyu Wan

    Abstract: Vehicle trajectories provide valuable movement information that supports various downstream tasks and powers real-world applications. A desirable trajectory learning model should transfer between different regions and tasks without retraining, thus improving computational efficiency and effectiveness with limited training data. However, a model's ability to transfer across regions is limited by th… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  11. arXiv:2408.15205  [pdf, other

    cs.CV

    Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation

    Authors: Jian Hu, Jiayi Lin, Junchi Yan, Shaogang Gong

    Abstract: Promptable segmentation typically requires instance-specific manual prompts to guide the segmentation of each desired object. To minimize such a need, task-generic promptable segmentation has been introduced, which employs a single task-generic prompt to segment various images of different objects in the same task. Current methods use Multimodal Large Language Models (MLLMs) to reason detailed ins… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: We propose using hallucinations as prior knowledge to extract and validate task-related information, which helps generate instance-specific prompts for reducing reliance on manual prompts in promptable segmentation

  12. arXiv:2408.14460  [pdf, other

    eess.SP cs.NI

    Cloud-Based Federation Framework and Prototype for Open, Scalable, and Shared Access to NextG and IoT Testbeds

    Authors: Maxwell McManus, Tenzin Rinchen, Annoy Dey, Sumanth Thota, Zhaoxi Zhang, Jiangqi Hu, Xi Wang, Mingyue Ji, Nicholas Mastronarde, Elizabeth Serena Bentley, Michael Medley, Zhangyu Guan

    Abstract: In this work, we present a new federation framework for UnionLabs, an innovative cloud-based resource-sharing infrastructure designed for next-generation (NextG) and Internet of Things (IoT) over-the-air (OTA) experiments. The framework aims to reduce the federation complexity for testbeds developers by automating tedious backend operations, thereby providing scalable federation and remote access… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  13. arXiv:2408.13963  [pdf, other

    cs.CV

    Shifted Window Fourier Transform And Retention For Image Captioning

    Authors: Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi

    Abstract: Image Captioning is an important Language and Vision task that finds application in a variety of contexts, ranging from healthcare to autonomous vehicles. As many real-world applications rely on devices with limited resources, much effort in the field was put into the development of lighter and faster models. However, much of the current optimizations focus on the Transformer architecture in contr… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Pre-print version of paper accepted for ICONIP 2024

  14. arXiv:2408.13959  [pdf, other

    cs.CL

    Bidirectional Awareness Induction in Autoregressive Seq2Seq Models

    Authors: Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi

    Abstract: Autoregressive Sequence-To-Sequence models are the foundation of many Deep Learning achievements in major research fields such as Vision and Natural Language Processing. Despite that, they still present significant limitations. For instance, when errors occur in the early steps of the prediction, the whole output is severely affected. Such reliance on previously predicted tokens and the inherent c… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  15. arXiv:2408.13351  [pdf, other

    cs.CV cs.LG

    SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning

    Authors: Qi Qian, Yuanhong Xu, Juhua Hu

    Abstract: Deep features extracted from certain layers of a pre-trained deep model show superior performance over the conventional hand-crafted features. Compared with fine-tuning or linear probing that can explore diverse augmentations, \eg, random crop/flipping, in the original input space, the appropriate augmentations for learning with fixed deep features are more challenging and have been less investiga… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: accepted by ECCV'24

  16. arXiv:2408.13320  [pdf, other

    cs.CV cs.LG

    Online Zero-Shot Classification with CLIP

    Authors: Qi Qian, Juhua Hu

    Abstract: Vision-language pre-training such as CLIP enables zero-shot transfer that can classify images according to the candidate class names. While CLIP demonstrates an impressive zero-shot performance on diverse downstream tasks, the distribution from the target data has not been leveraged sufficiently. In this work, we study a novel online zero-shot transfer scenario, where each image arrives in a rando… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: accepted by ECCV'24

  17. arXiv:2408.12599  [pdf, other

    cs.CL

    Controllable Text Generation for Large Language Models: A Survey

    Authors: Xun Liang, Hanyu Wang, Yezhaohui Wang, Shichao Song, Jiawei Yang, Simin Niu, Jie Hu, Dan Liu, Shunyu Yao, Feiyu Xiong, Zhiyu Li

    Abstract: In Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated high text generation quality. However, in real-world applications, LLMs must meet increasingly complex requirements. Beyond avoiding misleading or inappropriate content, LLMs are also expected to cater to specific user needs, such as imitating particular writing styles or generating text with poetic richness. Thes… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 52 pages, 11 figures, 7 tables, 11 equations

    ACM Class: A.2; I.2.7

  18. arXiv:2408.11609  [pdf, other

    cs.CL cs.AI

    Xinyu: An Efficient LLM-based System for Commentary Generation

    Authors: Yiquan Wu, Bo Tang, Chenyang Xi, Yu Yu, Pengyu Wang, Yifei Liu, Kun Kuang, Haiying Deng, Zhiyu Li, Feiyu Xiong, Jie Hu, Peng Cheng, Zhonghao Wang, Yi Wang, Yi Luo, Mingchuan Yang

    Abstract: Commentary provides readers with a deep understanding of events by presenting diverse arguments and evidence. However, creating commentary is a time-consuming task, even for skilled commentators. Large language models (LLMs) have simplified the process of natural language generation, but their direct application in commentary creation still faces challenges due to unique task requirements. These r… ▽ More

    Submitted 22 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    ACM Class: I.2.7

  19. arXiv:2408.11518  [pdf, other

    cs.CV

    EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention

    Authors: Yihong Lin, Liang Peng, Jianqiao Hu, Xiandong Li, Wenxiong Kang, Songju Lei, Xianjia Wu, Huang Xu

    Abstract: The creation of increasingly vivid 3D virtual digital humans has become a hot topic in recent years. Currently, most speech-driven work focuses on training models to learn the relationship between phonemes and visemes to achieve more realistic lips. However, they fail to capture the correlations between emotions and facial expressions effectively. To solve this problem, we propose a new model, ter… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  20. arXiv:2408.10614  [pdf, other

    cs.CV cs.AI

    Generalizable Facial Expression Recognition

    Authors: Yuhang Zhang, Xiuqi Zheng, Chenyi Liang, Jiani Hu, Weihong Deng

    Abstract: SOTA facial expression recognition (FER) methods fail on test sets that have domain gaps with the train set. Recent domain adaptation FER methods need to acquire labeled or unlabeled samples of target domains to fine-tune the FER model, which might be infeasible in real-world deployment. In this paper, we aim to improve the zero-shot generalization ability of FER methods on different unseen test s… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024

  21. arXiv:2408.10000  [pdf, other

    cs.HC

    Working in Extended Reality in the Wild: Worker and Bystander Experiences of XR Virtual Displays in Real-World Settings

    Authors: Leonardo Pavanatto, Verena Biener, Jennifer Chandran, Snehanjali Kalamkar, Feiyu Lu, John J. Dudley, Jinghui Hu, G. Nikki Ramirez-Saffy, Per Ola Kristensson, Alexander Giovannelli, Luke Schlueter, Jörg Müller, Jens Grubert, Doug A. Bowman

    Abstract: Although access to sufficient screen space is crucial to knowledge work, workers often find themselves with limited access to display infrastructure in remote or public settings. While virtual displays can be used to extend the available screen space through extended reality (XR) head-worn displays (HWD), we must better understand the implications of working with them in public settings from both… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2310.09786

  22. arXiv:2408.09790  [pdf, other

    cs.LG

    Structure-enhanced Contrastive Learning for Graph Clustering

    Authors: Xunlian Wu, Jingqi Hu, Anqi Zhang, Yining Quan, Qiguang Miao, Peng Gang Sun

    Abstract: Graph clustering is a crucial task in network analysis with widespread applications, focusing on partitioning nodes into distinct groups with stronger intra-group connections than inter-group ones. Recently, contrastive learning has achieved significant progress in graph clustering. However, most methods suffer from the following issues: 1) an over-reliance on meticulously designed data augmentati… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  23. arXiv:2408.09757  [pdf, other

    cs.LG cs.CL cs.CY

    Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning

    Authors: Jingyu Hu, Weiru Liu, Mengnan Du

    Abstract: Recent studies highlight the effectiveness of using in-context learning (ICL) to steer large language models (LLMs) in processing tabular data, a challenging task given the structured nature of such data. Despite advancements in performance, the fairness implications of these methods are less understood. This study investigates how varying demonstrations within ICL prompts influence the fairness o… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  24. arXiv:2408.09297  [pdf, other

    cond-mat.mtrl-sci cs.LG

    Out-of-distribution materials property prediction using adversarial learning based fine-tuning

    Authors: Qinyang Li, Nicholas Miklaucic, Jianjun Hu

    Abstract: The accurate prediction of material properties is crucial in a wide range of scientific and engineering disciplines. Machine learning (ML) has advanced the state of the art in this field, enabling scientists to discover novel materials and design materials with specific desired properties. However, one major challenge that persists in material property prediction is the generalization of models to… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  25. arXiv:2408.08959  [pdf, other

    cs.AI cs.CL

    Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning

    Authors: Jinwei Hu, Yi Dong, Xiaowei Huang

    Abstract: Guardrails have become an integral part of Large language models (LLMs), by moderating harmful or toxic response in order to maintain LLMs' alignment to human expectations. However, the existing guardrail methods do not consider different needs and access rights of individual users, and treat all the users with the same rule. This study introduces an adaptive guardrail mechanism, supported by trus… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Under Review

  26. arXiv:2408.08050  [pdf, other

    cs.CV

    CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection

    Authors: Xunfa Lai, Zhiyu Yang, Jie Hu, Shengchuan Zhang, Liujuan Cao, Guannan Jiang, Zhiyu Wang, Songan Zhang, Rongrong Ji

    Abstract: Existing camouflaged object detection~(COD) methods depend heavily on large-scale pixel-level annotations.However, acquiring such annotations is laborious due to the inherent camouflage characteristics of the objects.Semi-supervised learning offers a promising solution to this challenge.Yet, its application in COD is hindered by significant pseudo-label noise, both pixel-level and instance-level.W… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

  27. arXiv:2408.07476  [pdf, other

    cs.CV

    One Step Diffusion-based Super-Resolution with Time-Aware Distillation

    Authors: Xiao He, Huaao Tang, Zhijun Tu, Junchao Zhang, Kun Cheng, Hanting Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu

    Abstract: Diffusion-based image super-resolution (SR) methods have shown promise in reconstructing high-resolution images with fine details from low-resolution counterparts. However, these approaches typically require tens or even hundreds of iterative samplings, resulting in significant latency. Recently, techniques have been devised to enhance the sampling efficiency of diffusion-based SR models via knowl… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages

  28. arXiv:2408.05517  [pdf, other

    cs.CL

    SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

    Authors: Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wenmeng Zhou, Yingda Chen

    Abstract: Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal task… ▽ More

    Submitted 18 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

  29. arXiv:2408.05102  [pdf, other

    cs.CL

    How Well Do LLMs Identify Cultural Unity in Diversity?

    Authors: Jialin Li, Junli Wang, Junjie Hu, Ming Jiang

    Abstract: Much work on the cultural awareness of large language models (LLMs) focuses on the models' sensitivity to geo-cultural diversity. However, in addition to cross-cultural differences, there also exists common ground across cultures. For instance, a bridal veil in the United States plays a similar cultural-relevant role as a honggaitou in China. In this study, we introduce a benchmark dataset CUNIT f… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  30. arXiv:2408.04957   

    cs.CV cs.AI

    LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

    Authors: Yizhang Jin, Jian Li, Jiangning Zhang, Jianlong Hu, Zhenye Gan, Xin Tan, Yong Liu, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: Visual Spatial Description (VSD) aims to generate texts that describe the spatial relationships between objects within images. Traditional visual spatial relationship classification (VSRC) methods typically output the spatial relationship between two objects in an image, often neglecting world knowledge and lacking general language capabilities. In this paper, we propose a Large Language-and-Visio… ▽ More

    Submitted 28 August, 2024; v1 submitted 9 August, 2024; originally announced August 2024.

    Comments: We have discovered a significant error in the paper that affects the main conclusions. To ensure the accuracy of our research, we have decided to withdraw this paper and will resubmit it after making the necessary corrections

  31. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  32. arXiv:2408.04238  [pdf, other

    cs.OS

    Crash Consistency in DRAM-NVM-Disk Hybrid Storage System

    Authors: Guoyu Wang, Xilong Che, Haoyang Wei, Chenju Pei, Juncheng Hu

    Abstract: NVM is used as a new hierarchy in the storage system, due to its intermediate speed and capacity between DRAM, and its byte granularity. However, consistency problems emerge when we attempt to put DRAM, NVM, and disk together as an efficient whole. In this paper, we discuss the challenging consistency problems faced by heterogeneous storage systems, and propose our solution to the problems. The di… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  33. arXiv:2408.03539  [pdf, other

    cs.RO cs.LG

    Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes

    Authors: Chen Tang, Ben Abbatematteo, Jiaheng Hu, Rohan Chandra, Roberto Martín-Martín, Peter Stone

    Abstract: Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of inte… ▽ More

    Submitted 15 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: The first three authors contributed equally. Accepted to Annual Review of Control, Robotics, and Autonomous Systems

  34. arXiv:2408.03350  [pdf, other

    cs.AI cs.CL cs.LG

    miniCTX: Neural Theorem Proving with (Long-)Contexts

    Authors: Jiewen Hu, Thomas Zhu, Sean Welleck

    Abstract: We introduce miniCTX, which tests a model's ability to prove formal mathematical theorems that depend on new definitions, lemmas, or other contextual information that was not observed during training. miniCTX contains theorems sourced from real Lean projects and textbooks, each associated with a context that can span tens of thousands of tokens. Models are tasked with proving a theorem given acces… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  35. arXiv:2408.03001  [pdf, other

    cs.CV cs.MM

    Multitask and Multimodal Neural Tuning for Large Models

    Authors: Hao Sun, Yu Song, Jihong Hu, Yen-Wei Chen, Lanfen Lin

    Abstract: In recent years, large-scale multimodal models have demonstrated impressive capabilities across various domains. However, enabling these models to effectively perform multiple multimodal tasks simultaneously remains a significant challenge. To address this, we introduce a novel tuning method called neural tuning, designed to handle diverse multimodal tasks concurrently, including reasoning segment… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  36. arXiv:2408.02911  [pdf, other

    cs.OS

    NVPC: A Transparent NVM Page Cache

    Authors: Guoyu Wang, Xilong Che, Haoyang Wei, Shuo Chen, Puyi He, Juncheng Hu

    Abstract: Towards a compatible utilization of NVM, NVM-specialized kernel file systems and NVM-based disk file system accelerators have been proposed. However, these studies only focus on one or several characteristics of NVM, while failing to exploit its best practice by putting NVM in the proper position of the whole storage stack. In this paper, we present NVPC, a transparent acceleration to existing ker… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  37. arXiv:2408.01669  [pdf, other

    cs.CV cs.MM

    SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses

    Authors: Chaolei Tan, Zihang Lin, Junfu Pu, Zhongang Qi, Wei-Yi Pei, Zhi Qu, Yexin Wang, Ying Shan, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: Video grounding is a fundamental problem in multimodal content understanding, aiming to localize specific natural language queries in an untrimmed video. However, current video grounding datasets merely focus on simple events and are either limited to shorter videos or brief sentences, which hinders the model from evolving toward stronger multimodal understanding capabilities. To address these lim… ▽ More

    Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024. Project page: https://synopground.github.io/

  38. arXiv:2408.00346  [pdf, other

    cs.LG cs.AI

    Neural Graph Matching for Video Retrieval in Large-Scale Video-driven E-commerce

    Authors: Houye Ji, Ye Tang, Zhaoxin Chen, Lixi Deng, Jun Hu, Lei Su

    Abstract: With the rapid development of the short video industry, traditional e-commerce has encountered a new paradigm, video-driven e-commerce, which leverages attractive videos for product showcases and provides both video and item services for users. Benefitting from the dynamic and visualized introduction of items,video-driven e-commerce has shown huge potential in stimulating consumer confidence and p… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  39. arXiv:2407.21661  [pdf

    cs.AR

    Towards Error Correction for Computing in Racetrack Memory

    Authors: Preston Brazzle, Benjamin F. Morris III, Evan McKinney, Peipei Zhou, Jingtong Hu, Asif Ali Khan, Alex K. Jones

    Abstract: Computing-in-memory (CIM) promises to alleviate the Von Neumann bottleneck and accelerate data-intensive applications. Depending on the underlying technology and configuration, CIM enables implementing compute primitives in place, such as multiplication, search operations, and bulk bitwise logic operations. Emerging nonvolatile memory technologies such as spintronic Racetrack memory (RTM) promise… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 4 pages, 6 figures, to be submitted to IEEE CAL

  40. arXiv:2407.21475  [pdf, other

    cs.CV cs.AI

    Fine-gained Zero-shot Video Sampling

    Authors: Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu

    Abstract: Incorporating a temporal dimension into pretrained image diffusion models for video generation is a prevalent approach. However, this method is computationally demanding and necessitates large-scale video datasets. More critically, the heterogeneity between image and video datasets often results in catastrophic forgetting of the image expertise. Recent attempts to directly extract video snippets f… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  41. arXiv:2407.21428  [pdf, other

    cs.GR cs.AI

    Deformable 3D Shape Diffusion Model

    Authors: Dengsheng Chen, Jie Hu, Xiaoming Wei, Enhua Wu

    Abstract: The Gaussian diffusion model, initially designed for image generation, has recently been adapted for 3D point cloud generation. However, these adaptations have not fully considered the intrinsic geometric characteristics of 3D shapes, thereby constraining the diffusion model's potential for 3D shape manipulation. To address this limitation, we introduce a novel deformable 3D shape diffusion model… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  42. arXiv:2407.20761  [pdf, other

    cs.AI

    OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance

    Authors: Yongqiang Yao, Jingru Tan, Jiahao Hu, Feizhao Zhang, Xin Jin, Bo Li, Ruihao Gong, Pengfei Liu

    Abstract: Recently, vision-language instruct-tuning models have made significant progress due to their more comprehensive understanding of the world. In this work, we discovered that large-scale 3D parallel training on those models leads to an imbalanced computation load across different devices. The vision and language parts are inherently heterogeneous: their data distribution and model architecture diffe… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  43. arXiv:2407.20053  [pdf, other

    cs.LG physics.ao-ph

    Orca: Ocean Significant Wave Height Estimation with Spatio-temporally Aware Large Language Models

    Authors: Zhe Li, Ronghui Xu, Jilin Hu, Zhong Peng, Xi Lu, Chenjuan Guo, Bin Yang

    Abstract: Significant wave height (SWH) is a vital metric in marine science, and accurate SWH estimation is crucial for various applications, e.g., marine energy development, fishery, early warning systems for potential risks, etc. Traditional SWH estimation methods that are based on numerical models and physical theories are hindered by computational inefficiencies. Recently, machine learning has emerged a… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  44. arXiv:2407.19832  [pdf, other

    cs.CV cs.AI cs.CL

    ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2

    Authors: Wenjun Huang, Jiakai Pan, Jiahao Tang, Yanyu Ding, Yifei Xing, Yuhe Wang, Zhengzhuo Wang, Jianguo Hu

    Abstract: Multimodal Large Language Models (MLLMs) have attracted much attention for their multifunctionality. However, traditional Transformer architectures incur significant overhead due to their secondary computational complexity. To address this issue, we introduce ML-Mamba, a multimodal language model, which utilizes the latest and efficient Mamba-2 model for inference. Mamba-2 is known for its linear… ▽ More

    Submitted 21 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  45. arXiv:2407.19789  [pdf, other

    cs.CV

    Interpreting Low-level Vision Models with Causal Effect Maps

    Authors: Jinfan Hu, Jinjin Gu, Shiyao Yu, Fanghua Yu, Zheyuan Li, Zhiyuan You, Chaochao Lu, Chao Dong

    Abstract: Deep neural networks have significantly improved the performance of low-level vision tasks but also increased the difficulty of interpretability. A deep understanding of deep models is beneficial for both network design and practical reliability. To take up this challenge, we introduce causality theory to interpret low-level vision models and propose a model-/task-agnostic method called Causal Eff… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  46. arXiv:2407.19768  [pdf, other

    cs.CV

    Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network

    Authors: Wenjie Li, Heng Guo, Xuannan Liu, Kongming Liang, Jiani Hu, Zhanyu Ma, Jun Guo

    Abstract: Face super-resolution aims to reconstruct a high-resolution face image from a low-resolution face image. Previous methods typically employ an encoder-decoder structure to extract facial structural features, where the direct downsampling inevitably introduces distortions, especially to high-frequency features such as edges. To address this issue, we propose a wavelet-based feature enhancement netwo… ▽ More

    Submitted 30 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  47. arXiv:2407.18046  [pdf, other

    cs.CV cs.AI

    GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

    Authors: Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang

    Abstract: Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 13 pages, 12 figures

  48. arXiv:2407.16128  [pdf, other

    cs.CV cs.AI

    Advancing Brain Imaging Analysis Step-by-step via Progressive Self-paced Learning

    Authors: Yanwu Yang, Hairui Chen, Jiesi Hu, Xutao Guo, Ting Ma

    Abstract: Recent advancements in deep learning have shifted the development of brain imaging analysis. However, several challenges remain, such as heterogeneity, individual variations, and the contradiction between the high dimensionality and small size of brain imaging datasets. These issues complicate the learning process, preventing models from capturing intrinsic, meaningful patterns and potentially lea… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: miccai-2024

  49. arXiv:2407.15424  [pdf, other

    cs.CV

    Bidirectional skip-frame prediction for video anomaly detection with intra-domain disparity-driven attention

    Authors: Jiahao Lyu, Minghua Zhao, Jing Hu, Runtao Xi, Xuewen Huang, Shuangli Du, Cheng Shi, Tian Ma

    Abstract: With the widespread deployment of video surveillance devices and the demand for intelligent system development, video anomaly detection (VAD) has become an important part of constructing intelligent surveillance systems. Expanding the discriminative boundary between normal and abnormal events to enhance performance is the common goal and challenge of VAD. To address this problem, we propose a Bidi… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 11 pages,7 figures, 4 tables

  50. arXiv:2407.15085  [pdf, other

    cs.CV

    Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization

    Authors: Jiajun Hu, Jian Zhang, Lei Qi, Yinghuan Shi, Yang Gao

    Abstract: Domain generalization (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs. Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability and showing promising direction for solving the DG problem. However, fully Fine-Tuning (… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.