Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 188 results for author: Fu, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.10016  [pdf, other

    cs.CL cs.AI

    AceParse: A Comprehensive Dataset with Diverse Structured Texts for Academic Literature Parsing

    Authors: Huawei Ji, Cheng Deng, Bo Xue, Zhouyang Jin, Jiaxin Ding, Xiaoying Gan, Luoyi Fu, Xinbing Wang, Chenghu Zhou

    Abstract: With the development of data-centric AI, the focus has shifted from model-driven approaches to improving data quality. Academic literature, as one of the crucial types, is predominantly stored in PDF formats and needs to be parsed into texts before further processing. However, parsing diverse structured texts in academic literature remains challenging due to the lack of datasets that cover various… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 5 pages, 3 figures, 3 tables

  2. arXiv:2409.08349  [pdf, other

    physics.soc-ph cs.IT cs.SI

    Scientific and technological knowledge grows linearly over time

    Authors: Huquan Kang, Luoyi Fu, Russell J. Funk, Xinbing Wang, Jiaxin Ding, Shiyu Liang, Jianghao Wang, Lei Zhou, Chenghu Zhou

    Abstract: The past few centuries have witnessed a dramatic growth in scientific and technological knowledge. However, the nature of that growth - whether exponential or otherwise - remains controversial, perhaps partly due to the lack of quantitative characterizations. We evaluated knowledge as a collective thinking structure, using citation networks as a representation, by examining extensive datasets that… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  3. arXiv:2408.15980  [pdf, other

    cs.RO cs.AI

    In-Context Imitation Learning via Next-Token Prediction

    Authors: Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg

    Abstract: We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  4. arXiv:2408.06787  [pdf, other

    cs.CL

    Unlock the Power of Frozen LLMs in Knowledge Graph Completion

    Authors: Bo Xue, Yi Xu, Yunchong Song, Yiming Pang, Yuyang Ren, Jiaxin Ding, Luoyi Fu, Xinbing Wang

    Abstract: Classical knowledge graph completion (KGC) methods rely solely on structural information, struggling with the inherent sparsity of knowledge graphs (KGs). Large Language Models (LLMs) learn extensive knowledge from large corpora with powerful context modeling, which is ideal for mitigating the limitations of previous methods. Directly fine-tuning LLMs offers great capability but comes at the cost… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  5. arXiv:2408.06646  [pdf, other

    cs.CV

    Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

    Authors: Chenqian Yan, Songwei Liu, Hongjian Liu, Xurui Peng, Xiaojian Wang, Fangming Chen, Lean Fu, Xing Mei

    Abstract: Stable Diffusion Models (SDMs) have shown remarkable proficiency in image synthesis. However, their broad application is impeded by their large model sizes and intensive computational requirements, which typically require expensive cloud servers for deployment. On the flip side, while there are many compact models tailored for edge devices that can reduce these demands, they often compromise on se… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  6. arXiv:2408.04673  [pdf, other

    cs.CL cs.AI cs.LG

    AutoFAIR : Automatic Data FAIRification via Machine Reading

    Authors: Tingyan Ma, Wei Liu, Bin Lu, Xiaoying Gan, Yunqiang Zhu, Luoyi Fu, Chenghu Zhou

    Abstract: The explosive growth of data fuels data-driven research, facilitating progress across diverse domains. The FAIR principles emerge as a guiding standard, aiming to enhance the findability, accessibility, interoperability, and reusability of data. However, current efforts primarily focus on manual data FAIRification, which can only handle targeted data and lack efficiency. To address this issue, we… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  7. arXiv:2408.04667  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    LLM Stability: A detailed analysis with some surprises

    Authors: Berk Atil, Alexa Chittams, Liseng Fu, Ferhan Ture, Lixinyu Xu, Breck Baldwin

    Abstract: LLM (large language model) practitioners commonly notice that outputs can vary for the same inputs, but we have been unable to find work that evaluates LLM stability as the main objective. In our study of 6 deterministically configured LLMs across 8 common tasks with 5 identical runs, we see accuracy variations up to 10\%. In addition, no LLM consistently delivers repeatable accuracy across all ta… ▽ More

    Submitted 12 September, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  8. arXiv:2407.18483  [pdf

    cs.CL cs.AI

    A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation

    Authors: Laiyi Fu, Binbin Fan, Hongkai Du, Yanxiang Feng, Chunhua Li, Huping Song

    Abstract: Ophthalmology consultations are crucial for diagnosing, treating, and preventing eye diseases. However, the growing demand for consultations exceeds the availability of ophthalmologists. By leveraging large pre-trained language models, we can design effective dialogues for specific scenarios, aiding in consultations. Traditional fine-tuning strategies for question-answering tasks are impractical d… ▽ More

    Submitted 31 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  9. arXiv:2407.15537  [pdf, other

    cs.LG cs.RO

    Exterior Penalty Policy Optimization with Penalty Metric Network under Constraints

    Authors: Shiqing Gao, Jiaxin Ding, Luoyi Fu, Xinbing Wang, Chenghu Zhou

    Abstract: In Constrained Reinforcement Learning (CRL), agents explore the environment to learn the optimal policy while satisfying constraints. The penalty function method has recently been studied as an effective approach for handling constraints, which imposes constraints penalties on the objective to transform the constrained problem into an unconstrained one. However, it is challenging to choose appropr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: To be published in the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024)

  10. arXiv:2407.01245  [pdf, other

    cs.AI cs.CY

    SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model

    Authors: Lingyue Fu, Hao Guan, Kounianhua Du, Jianghao Lin, Wei Xia, Weinan Zhang, Ruiming Tang, Yasheng Wang, Yong Yu

    Abstract: Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question, which is a crucial task in intelligent tutoring systems (ITS). In educational KT scenarios, transductive ID-based methods often face severe data sparsity and cold start problems, where interactions between individual students and questions are sparse, and new questions and concepts consistently a… ▽ More

    Submitted 23 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.00928  [pdf, other

    cs.LG cs.CL

    FoldGPT: Simple and Effective Large Language Model Compression Scheme

    Authors: Songwei Liu, Chao Zeng, Lianqiang Li, Chenqian Yan, Lean Fu, Xing Mei, Fangmin Chen

    Abstract: The demand for deploying large language models(LLMs) on mobile devices continues to increase, driven by escalating data security concerns and cloud costs. However, network bandwidth and memory limitations pose challenges for deploying billion-level models on mobile devices. In this study, we investigate the outputs of different layers across various scales of LLMs and found that the outputs of mos… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  12. arXiv:2406.07992  [pdf, other

    cs.LG eess.SP

    A Federated Online Restless Bandit Framework for Cooperative Resource Allocation

    Authors: Jingwen Tong, Xinran Li, Liqun Fu, Jun Zhang, Khaled B. Letaief

    Abstract: Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.00779  [pdf, other

    cs.LG

    Differentiation of Multi-objective Data-driven Decision Pipeline

    Authors: Peng Li, Lixia Wu, Chaoqun Feng, Haoyuan Hu, Lei Fu, Jieping Ye

    Abstract: Real-world scenarios frequently involve multi-objective data-driven optimization problems, characterized by unknown problem coefficients and multiple conflicting objectives. Traditional two-stage methods independently apply a machine learning model to estimate problem coefficients, followed by invoking a solver to tackle the predicted optimization problem. The independent use of optimization solve… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  14. arXiv:2405.17158  [pdf, other

    cs.CV

    PatchScaler: An Efficient Patch-Independent Diffusion Model for Super-Resolution

    Authors: Yong Liu, Hang Dong, Jinshan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, Fei Wang

    Abstract: Diffusion models significantly improve the quality of super-resolved images with their impressive content generation capabilities. However, the huge computational costs limit the applications of these methods.Recent efforts have explored reasonable inference acceleration to reduce the number of sampling steps, but the computational cost remains high as each step is performed on the entire image.Th… ▽ More

    Submitted 11 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  15. arXiv:2405.12533  [pdf

    cs.CV

    Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering

    Authors: Hiba Maryam, Ling Fu, Jiajun Song, Tajrian ABM Shafayet, Qidi Luo, Xiang Bai, Yuliang Liu

    Abstract: The development of Urdu scene text detection, recognition, and Visual Question Answering (VQA) technologies is crucial for advancing accessibility, information retrieval, and linguistic diversity in digital content, facilitating better understanding and interaction with Urdu-language visual data. This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new mul… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted by the International Conference on Document Analysis and Recognition (ICDAR) 2024

  16. arXiv:2405.11437  [pdf, other

    cs.CV

    The First Swahili Language Scene Text Detection and Recognition Dataset

    Authors: Fadila Wendigoundi Douamba, Jianjun Song, Ling Fu, Yuliang Liu, Xiang Bai

    Abstract: Scene text recognition is essential in many applications, including automated translation, information retrieval, driving assistance, and enhancing accessibility for individuals with visual impairments. Much research has been done to improve the accuracy and performance of scene text detection and recognition models. However, most of this research has been conducted in the most common languages, E… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: Accepted to ICDAR 2024

  17. arXiv:2405.08245  [pdf

    cs.CV cs.AI

    Progressive enhancement and restoration for mural images under low-light and defected conditions based on multi-receptive field strategy

    Authors: Xiameng Wei, Binbin Fan, Ying Wang, Yanxiang Feng, Laiyi Fu

    Abstract: Ancient murals are valuable cultural heritage with great archaeological value. They provide insights into ancient religions, ceremonies, folklore, among other things through their content. However, due to long-term oxidation and inadequate protection, ancient murals have suffered continuous damage, including peeling and mold etc. Additionally, since ancient murals were typically painted indoors, t… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  18. arXiv:2405.07233  [pdf, other

    cs.LG cs.AI physics.ao-ph

    OXYGENERATOR: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

    Authors: Bin Lu, Ze Zhao, Luyu Han, Xiaoying Gan, Yuntao Zhou, Lei Zhou, Luoyi Fu, Xinbing Wang, Chenghu Zhou, Jing Zhang

    Abstract: Accurately reconstructing the global ocean deoxygenation over a century is crucial for assessing and protecting marine ecosystem. Existing expert-dominated numerical simulations fail to catch up with the dynamic variation caused by global warming and human activities. Besides, due to the high-cost data collection, the historical observations are severely sparse, leading to big challenge for precis… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  19. arXiv:2405.02818  [pdf, other

    cs.IT

    Site-Specific Deployment Optimization of Intelligent Reflecting Surface for Coverage Enhancement

    Authors: Dongsheng Fu, Xintong Chen, Jiangbin Lyu, Liqun Fu

    Abstract: Intelligent Reflecting Surface (IRS) is a promising technology for next generation wireless networks. Despite substantial research in IRS-aided communications, the assumed antenna and channel models are typically simplified without considering site-specific characteristics, which in turn critically affect the IRS deployment and performance in a given environment. In this paper, we first investigat… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 7 pages, 7 figures. To appear in VTC2024-Spring

  20. arXiv:2405.02660  [pdf, other

    cs.IT eess.SP

    AFDM Channel Estimation in Multi-Scale Multi-Lag Channels

    Authors: Rongyou Cao, Yuheng Zhong, Jiangbin Lyu, Deqing Wang, Liqun Fu

    Abstract: Affine Frequency Division Multiplexing (AFDM) is a brand new chirp-based multi-carrier (MC) waveform for high mobility communications, with promising advantages over Orthogonal Frequency Division Multiplexing (OFDM) and other MC waveforms. Existing AFDM research focuses on wireless communication at high carrier frequency (CF), which typically considers only Doppler frequency shift (DFS) as a resul… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Investigate AFDM under underwater multi-scale multi-lag channels. Derive the new input-output formula with the impact of Doppler time scaling. Propose two new channel estimation methods to tackle different level of Doppler factors. Perform diversity analyis based on CFR overlap probability (COP) and mutual incoherent property (MIP)

  21. arXiv:2405.02655  [pdf, other

    cs.IT

    Fast Online Movement Optimization of Aerial Base Stations Based on Global Connectivity Map

    Authors: Yiling Wang, Jiangbin Lyu, Liqun Fu

    Abstract: Unmanned aerial vehicles (UAVs) can serve as aerial base stations (ABSs) to provide wireless connectivity for ground users (GUs) in diverse scenarios. However, it is an NP-hard problem with exponential complexity in $M$ and $N$, in order to maximize the coverage rate (CR) of $M$ GUs by jointly placing $N$ ABSs with limited coverage range. This problem becomes even more intricate when the coverage… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures. Investigate site-specific movement optimization of UAV-mounted aerial base stations to cover a group of moving ground users, based on site-specific Global Connectivity Map. arXiv admin note: text overlap with arXiv:2312.10490

  22. arXiv:2405.02355  [pdf, other

    cs.SE cs.AI

    CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation

    Authors: Kounianhua Du, Renting Rui, Huacan Chai, Lingyue Fu, Wei Xia, Yasheng Wang, Ruiming Tang, Yong Yu, Weinan Zhang

    Abstract: Utilizing large language models to generate codes has shown promising meaning in software development revolution. Despite the intelligence shown by the general large language models, their specificity in code generation can still be improved due to the syntactic gap and mismatched vocabulary existing among natural language and different programming languages. In addition, programming languages are… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  23. arXiv:2404.19563  [pdf, other

    cs.CL

    RepEval: Effective Text Evaluation with LLM Representation

    Authors: Shuqian Sheng, Yi Xu, Tianhang Zhang, Zanwei Shen, Luoyi Fu, Jiaxin Ding, Lei Zhou, Xinbing Wang, Chenghu Zhou

    Abstract: Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  24. arXiv:2404.15282  [pdf, other

    cs.DL cs.AI

    Patent Value Characterization -- An Empirical Analysis of Elevator Industry Patents

    Authors: Yuhang Guan, Runzheng Wang, Lei Fu, Huanle Zhang

    Abstract: The global patent application count has steadily increased, achieving eight consecutive years of growth.The global patent industry has shown a general trend of expansion. This is attributed to the increasing innovation activities, particularly in the fields of technology, healthcare, and biotechnology. Some emerging market countries, such as China and India, have experienced significant growth in… ▽ More

    Submitted 20 February, 2024; originally announced April 2024.

  25. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  26. arXiv:2404.07493  [pdf, other

    cs.LG cs.AI

    Characterizing the Influence of Topology on Graph Learning Tasks

    Authors: Kailong Wu, Yule Xie, Jiaxin Ding, Yuxiang Ren, Luoyi Fu, Xinbing Wang, Chenghu Zhou

    Abstract: Graph neural networks (GNN) have achieved remarkable success in a wide range of tasks by encoding features combined with topology to create effective representations. However, the fundamental problem of understanding and analyzing how graph topology influences the performance of learning models on downstream tasks has not yet been well understood. In this paper, we propose a metric, TopoInf, which… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  27. arXiv:2404.05595  [pdf, other

    cs.CV

    UniFL: Improve Stable Diffusion via Unified Feedback Learning

    Authors: Jiacheng Zhang, Jie Wu, Yuxi Ren, Xin Xia, Huafeng Kuang, Pan Xie, Jiashi Li, Xuefeng Xiao, Min Zheng, Lean Fu, Guanbin Li

    Abstract: Diffusion models have revolutionized the field of image generation, leading to the proliferation of high-quality models and diverse downstream applications. However, despite these significant advancements, the current competitive solutions still suffer from several limitations, including inferior visual quality, a lack of aesthetic appeal, and inefficient inference, without a comprehensive solutio… ▽ More

    Submitted 22 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  28. arXiv:2404.04860  [pdf, other

    cs.CV

    ByteEdit: Boost, Comply and Accelerate Generative Image Editing

    Authors: Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu

    Abstract: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  29. From Learning to Analytics: Improving Model Efficacy with Goal-Directed Client Selection

    Authors: Jingwen Tong, Zhenzhen Chen, Liqun Fu, Jun Zhang, Zhu Han

    Abstract: Federated learning (FL) is an appealing paradigm for learning a global model among distributed clients while preserving data privacy. Driven by the demand for high-quality user experiences, evaluating the well-trained global model after the FL process is crucial. In this paper, we propose a closed-loop model analytics framework that allows for effective evaluation of the trained global model using… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: This work was partly presented at IEEE ICC 2022

    MSC Class: 14J60 ACM Class: I.2.7

  30. arXiv:2403.16112  [pdf, other

    cs.CV cs.AI cs.LG

    Opportunities and challenges in the application of large artificial intelligence models in radiology

    Authors: Liangrui Pan, Zhenyu Zhao, Ying Lu, Kewei Tang, Liyong Fu, Qingchun Liang, Shaoliang Peng

    Abstract: Influenced by ChatGPT, artificial intelligence (AI) large models have witnessed a global upsurge in large model research and development. As people enjoy the convenience by this AI large model, more and more large models in subdivided fields are gradually being proposed, especially large models in radiology imaging field. This article first introduces the development history of large models, techn… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  31. arXiv:2403.14275  [pdf, other

    cs.CL

    Is Reference Necessary in the Evaluation of NLG Systems? When and Where?

    Authors: Shuqian Sheng, Yi Xu, Luoyi Fu, Jiaxin Ding, Lei Zhou, Xinbing Wang, Chenghu Zhou

    Abstract: The majority of automatic metrics for evaluating NLG systems are reference-based. However, the challenge of collecting human annotation results in a lack of reliable references in numerous application scenarios. Despite recent advancements in reference-free metrics, it has not been well understood when and where they can be used as an alternative to reference-based metrics. In this study, by emplo… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  32. arXiv:2403.10494  [pdf, other

    cs.RO

    Lifelong LERF: Local 3D Semantic Inventory Monitoring Using FogROS2

    Authors: Adam Rashid, Chung Min Kim, Justin Kerr, Letian Fu, Kush Hari, Ayah Ahmad, Kaiyuan Chen, Huang Huang, Marcus Gualtieri, Michael Wang, Christian Juette, Nan Tian, Liu Ren, Ken Goldberg

    Abstract: Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: See project webpage at: https://sites.google.com/berkeley.edu/lifelonglerf/home

  33. arXiv:2403.08479  [pdf, other

    eess.IV cs.CV physics.med-ph

    MD-Dose: A Diffusion Model based on the Mamba for Radiotherapy Dose Prediction

    Authors: Linjie Fu, Xia Li, Xiuding Cai, Yingkai Wang, Xueyao Wang, Yali Shen, Yu Yao

    Abstract: Radiation therapy is crucial in cancer treatment. Experienced experts typically iteratively generate high-quality dose distribution maps, forming the basis for excellent radiation therapy plans. Therefore, automated prediction of dose distribution maps is significant in expediting the treatment process and providing a better starting point for developing radiation therapy plans. With the remarkabl… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  34. arXiv:2403.06877  [pdf, other

    cs.RO cs.CV

    SiLVR: Scalable Lidar-Visual Reconstruction with Neural Radiance Fields for Robotic Inspection

    Authors: Yifu Tao, Yash Bhalgat, Lanke Frank Tarimo Fu, Matias Mattamala, Nived Chebrolu, Maurice Fallon

    Abstract: We present a neural-field-based large-scale reconstruction system that fuses lidar and vision data to generate high-quality reconstructions that are geometrically accurate and capture photo-realistic textures. This system adapts the state-of-the-art neural radiance field (NeRF) representation to also incorporate lidar data which adds strong geometric constraints on the depth and surface normals. W… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted at ICRA 2024; Website: https://ori-drs.github.io/projects/silvr/

  35. arXiv:2403.02576  [pdf, other

    cs.DL cs.LG cs.SI

    AceMap: Knowledge Discovery through Academic Graph

    Authors: Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jianping Zhou , et al. (1 additional authors not shown)

    Abstract: The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publicatio… ▽ More

    Submitted 14 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Technical Report for AceMap (https://www.acemap.info)

  36. arXiv:2403.02084  [pdf, other

    cs.CV

    ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

    Authors: Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Min Zheng, Lean Fu

    Abstract: Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 21 pages, 16 figures

  37. arXiv:2402.17360  [pdf, other

    cs.CV cs.AI cs.RO

    CAPT: Category-level Articulation Estimation from a Single Point Cloud Using Transformer

    Authors: Lian Fu, Ryoichi Ishikawa, Yoshihiro Sato, Takeshi Oishi

    Abstract: The ability to estimate joint parameters is essential for various applications in robotics and computer vision. In this paper, we propose CAPT: category-level articulation estimation from a point cloud using Transformer. CAPT uses an end-to-end transformer-based architecture for joint parameter and state estimation of articulated objects from a single point cloud. The proposed CAPT methods accurat… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to ICRA 2024

  38. arXiv:2402.14245  [pdf, other

    cs.RO cs.AI cs.LG

    Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models

    Authors: Jinyi Liu, Yifu Yuan, Jianye Hao, Fei Ni, Lingzhi Fu, Yibin Chen, Yan Zheng

    Abstract: Recently, there has been considerable attention towards leveraging large language models (LLMs) to enhance decision-making processes. However, aligning the natural language text instructions generated by LLMs with the vectorized operations required for execution presents a significant challenge, often necessitating task-specific details. To circumvent the need for such task-specific granularity, i… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Presented at AAAI 2024 RL+LLMs Workshop

  39. arXiv:2402.13232  [pdf, other

    cs.CV cs.RO

    A Touch, Vision, and Language Dataset for Multimodal Alignment

    Authors: Letian Fu, Gaurav Datta, Huang Huang, William Chung-Ho Panitch, Jaimyn Drake, Joseph Ortiz, Mustafa Mukadam, Mike Lambeta, Roberto Calandra, Ken Goldberg

    Abstract: Touch is an important sensing modality for humans, but it has not yet been incorporated into a multimodal generative language model. This is partially due to the difficulty of obtaining natural language labels for tactile data and the complexity of aligning tactile readings with both visual observations and language descriptions. As a step towards bridging that gap, this work introduces a new data… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  40. arXiv:2401.14391  [pdf, other

    cs.CV

    Rethinking Patch Dependence for Masked Autoencoders

    Authors: Letian Fu, Long Lian, Renhao Wang, Baifeng Shi, Xudong Wang, Adam Yala, Trevor Darrell, Alexei A. Efros, Ken Goldberg

    Abstract: In this work, we re-examine inter-patch dependencies in the decoding mechanism of masked autoencoders (MAE). We decompose this decoding mechanism for masked patch reconstruction in MAE into self-attention and cross-attention. Our investigations suggest that self-attention between mask patches is not essential for learning good representations. To this end, we propose a novel pretraining framework:… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  41. arXiv:2401.13998  [pdf, other

    eess.IV cs.CV

    WAL-Net: Weakly supervised auxiliary task learning network for carotid plaques classification

    Authors: Haitao Gan, Lingchao Fu, Ran Zhou, Weiyan Gan, Furong Wang, Xiaoyan Wu, Zhi Yang, Zhongwei Huang

    Abstract: The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this appro… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  42. arXiv:2401.08664  [pdf, other

    cs.AI cs.CL

    Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges

    Authors: Qingyao Li, Lingyue Fu, Weiming Zhang, Xianyu Chen, Jingwei Yu, Wei Xia, Weinan Zhang, Ruiming Tang, Yong Yu

    Abstract: Online education platforms, leveraging the internet to distribute education resources, seek to provide convenient education but often fall short in real-time communication with students. They often struggle to address the diverse obstacles students encounter throughout their learning journey. Solving the problems encountered by students poses a significant challenge for traditional deep learning m… ▽ More

    Submitted 26 April, 2024; v1 submitted 27 December, 2023; originally announced January 2024.

    Comments: 31 pages, 5 figures, 1 table

  43. arXiv:2401.00434  [pdf, other

    cs.CL

    GeoGalactica: A Scientific Large Language Model in Geoscience

    Authors: Zhouhan Lin, Cheng Deng, Le Zhou, Tianhang Zhang, Yi Xu, Yutong Xu, Zhongmou He, Yuanyuan Shi, Beiya Dai, Yunchong Song, Boyi Zeng, Qiyuan Chen, Yuxun Miao, Bo Xue, Shu Wang, Luoyi Fu, Weinan Zhang, Junxian He, Yunqiang Zhu, Xinbing Wang, Chenghu Zhou

    Abstract: Large language models (LLMs) have achieved huge success for their general knowledge and ability to solve a wide spectrum of tasks in natural language processing (NLP). Due to their impressive abilities, LLMs have shed light on potential inter-discipline applications to foster scientific discoveries of a specific domain by using artificial intelligence (AI for science, AI4S). In the meantime, utili… ▽ More

    Submitted 13 April, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    ACM Class: I.2.7; F.4.1

  44. arXiv:2312.10490  [pdf, other

    cs.IT cs.LG

    Spatial Deep Learning for Site-Specific Movement Optimization of Aerial Base Stations

    Authors: Jiangbin Lyu, Xu Chen, Jiefeng Zhang, Liqun Fu

    Abstract: Unmanned aerial vehicles (UAVs) can be utilized as aerial base stations (ABSs) to provide wireless connectivity for ground users (GUs) in various emergency scenarios. However, it is a NP-hard problem with exponential complexity in $M$ and $N$, in order to maximize the coverage rate of $M$ GUs by jointly placing $N$ ABSs with limited coverage range. The problem is further complicated when the cover… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Manuscript submitted to IEEE Trans. Wireless Communications on 15 Jan. 2023; revised 11 Sep. 2023; accepted 5 Dec. 2023

  45. arXiv:2312.10475  [pdf, ps, other

    cs.IT eess.SP

    IRS-Aided Sectorized Base Station Design and 3D Coverage Performance Analysis

    Authors: Xintong Chen, Jiangbin Lyu, Liqun Fu

    Abstract: Intelligent reflecting surface (IRS) is regarded as a revolutionary paradigm that can reconfigure the wireless propagation environment for enhancing the desired signal and/or weakening the interference, and thus improving the quality of service (QoS) for communication systems. In this paper, we propose an IRS-aided sectorized BS design where the IRS is mounted in front of a transmitter (TX) and re… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Manuscript submitted to IEEE IWQoS 2023 on 12 Feb. 2023; accepted 13 April 2023; published 27 July 2023. An associated Chinese patent was applied on 9 Aug. 2022 and granted on 1 Sep. 2023, under No. ZL202210948626.X

  46. arXiv:2312.06187  [pdf, other

    eess.IV cs.CV

    SP-DiffDose: A Conditional Diffusion Model for Radiation Dose Prediction Based on Multi-Scale Fusion of Anatomical Structures, Guided by SwinTransformer and Projector

    Authors: Linjie Fu, Xia Li, Xiuding Cai, Yingkai Wang, Xueyao Wang, Yu Yao, Yali Shen

    Abstract: Radiation therapy serves as an effective and standard method for cancer treatment. Excellent radiation therapy plans always rely on high-quality dose distribution maps obtained through repeated trial and error by experienced experts. However, due to individual differences and complex clinical situations, even seasoned expert teams may need help to achieve the best treatment plan every time quickly… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  47. arXiv:2311.17624  [pdf, other

    eess.SP cs.NI

    Combating Multi-path Interference to Improve Chirp-based Underwater Acoustic Communication

    Authors: Wenjun Xie, Enqi Zhang, Lizhao You, Deqing Wang, Zhaorui Wang, Liqun Fu

    Abstract: Linear chirp-based underwater acoustic communication has been widely used due to its reliability and long-range transmission capability. However, unlike the counterpart chirp technology in wireless -- LoRa, its throughput is severely limited by the number of modulated chirps in a symbol. The fundamental challenge lies in the underwater multi-path channel, where the delayed copied of one symbol may… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  48. arXiv:2311.16555  [pdf, other

    cs.CV

    Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

    Authors: Ling Fu, Zijie Wu, Yingying Zhu, Yuliang Liu, Xiang Bai

    Abstract: Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and time-consuming. As a solution, researchers have widely adopted synthetic text images as a complementary resource to real text images during pre-training. Yet there… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  49. arXiv:2311.16278  [pdf, other

    cs.CV

    VehicleGAN: Pair-flexible Pose Guided Image Synthesis for Vehicle Re-identification

    Authors: Baolu Li, Ping Liu, Lan Fu, Jinlong Li, Jianwu Fang, Zhigang Xu, Hongkai Yu

    Abstract: Vehicle Re-identification (Re-ID) has been broadly studied in the last decade; however, the different camera view angle leading to confused discrimination in the feature subspace for the vehicles of various poses, is still challenging for the Vehicle Re-ID models in the real world. To promote the Vehicle Re-ID models, this paper proposes to synthesize a large number of vehicle images in the target… ▽ More

    Submitted 17 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  50. arXiv:2311.13230  [pdf, other

    cs.CL cs.AI

    Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus

    Authors: Tianhang Zhang, Lin Qiu, Qipeng Guo, Cheng Deng, Yue Zhang, Zheng Zhang, Chenghu Zhou, Xinbing Wang, Luoyi Fu

    Abstract: Large Language Models (LLMs) have gained significant popularity for their impressive performance across diverse fields. However, LLMs are prone to hallucinate untruthful or nonsensical outputs that fail to meet user expectations in many real-world applications. Existing works for detecting hallucinations in LLMs either rely on external knowledge for reference retrieval or require sampling multiple… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted by EMNLP 2023 (main conference)