Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 428 results for author: Song, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18483  [pdf

    cs.CL cs.AI

    A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation

    Authors: Laiyi Fu, Binbin Fan, Hongkai Du, Yanxiang Feng, Chunhua Li, Huping Song

    Abstract: Ophthalmology consultations are crucial for diagnosing, treating, and preventing eye diseases. However, the growing demand for consultations exceeds the availability of ophthalmologists. By leveraging large pre-trained language models, we can design effective dialogues for specific scenarios, aiding in consultations. Traditional fine-tuning strategies for question-answering tasks are impractical d… ▽ More

    Submitted 30 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2407.14198  [pdf

    cs.CV eess.IV

    Double-Shot 3D Shape Measurement with a Dual-Branch Network

    Authors: Mingyang Lei, Jingfan Fan, Long Shao, Hong Song, Deqiang Xiao, Danni Ai, Tianyu Fu, Ying Gu, Jian Yang

    Abstract: The structured light (SL)-based 3D measurement techniques with deep learning have been widely studied, among which speckle projection profilometry (SPP) and fringe projection profilometry (FPP) are two popular methods. However, they generally use a single projection pattern for reconstruction, resulting in fringe order ambiguity or poor reconstruction accuracy. To alleviate these problems, we prop… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2407.11793  [pdf, other

    cs.CV cs.AI cs.GR

    Click-Gaussian: Interactive Segmentation to Any 3D Gaussians

    Authors: Seokhun Choi, Hyeonseop Song, Jaechul Kim, Taehyeong Kim, Hoseok Do

    Abstract: Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D s… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024. The first two authors contributed equally to this work

  4. arXiv:2407.11449  [pdf, other

    cs.CV cs.AI

    Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights

    Authors: Shunqi Mao, Chaoyi Zhang, Hang Su, Hwanjun Song, Igor Shalyminov, Weidong Cai

    Abstract: Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentu… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  5. arXiv:2407.10990  [pdf

    cs.CL cs.AI

    MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

    Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

    Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 25 pages.4 figures

  6. arXiv:2407.10186  [pdf, other

    cs.NI

    Toward Explainable Reasoning in 6G: A Proof of Concept Study on Radio Resource Allocation

    Authors: Farhad Rezazadeh, Sergio Barrachina-Muñoz, Hatim Chergui, Josep Mangues, Mehdi Bennis, Dusit Niyato, Houbing Song, Lingjia Liu

    Abstract: The move toward artificial intelligence (AI)-native sixth-generation (6G) networks has put more emphasis on the importance of explainability and trustworthiness in network management operations, especially for mission-critical use-cases. Such desired trust transcends traditional post-hoc explainable AI (XAI) methods to using contextual explanations for guiding the learning process in an in-hoc way… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 21 pages, 11 Figures, 5 Tables

  7. arXiv:2407.08252  [pdf, other

    eess.IV cs.CV

    Spatially-Variant Degradation Model for Dataset-free Super-resolution

    Authors: Shaojie Guo, Haofei Song, Qingli Li, Yan Wang

    Abstract: This paper focuses on the dataset-free Blind Image Super-Resolution (BISR). Unlike existing dataset-free BISR methods that focus on obtaining a degradation kernel for the entire image, we are the first to explicitly design a spatially-variant degradation model for each pixel. Our method also benefits from having a significantly smaller number of learnable parameters compared to data-driven spatial… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. arXiv:2407.04218  [pdf, other

    cs.CV cs.AI

    Batch Transformer: Look for Attention in Batch

    Authors: Myung Beom Her, Jisu Jeong, Hojoon Song, Ji-Hyeong Han

    Abstract: Facial expression recognition (FER) has received considerable attention in computer vision, with "in-the-wild" environments such as human-computer interaction. However, FER images contain uncertainties such as occlusion, low resolution, pose variation, illumination variation, and subjectivity, which includes some expressions that do not match the target label. Consequently, little information is o… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  9. arXiv:2407.01320  [pdf, other

    cs.LG cs.AI cs.CL

    Increasing Model Capacity for Free: A Simple Strategy for Parameter Efficient Fine-tuning

    Authors: Haobo Song, Hao Zhao, Soumajit Majumder, Tao Lin

    Abstract: Fine-tuning large pre-trained foundation models, such as the 175B GPT-3, has attracted more attention for downstream tasks recently. While parameter-efficient fine-tuning methods have been proposed and proven effective without retraining all model parameters, their performance is limited by the capacity of incremental modules, especially under constrained parameter budgets. \\ To overcome this cha… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICLR 2024. Code at https://github.com/LINs-lab/CapaBoost

  10. arXiv:2407.00908  [pdf, other

    cs.CL cs.AI

    FineSurE: Fine-grained Summarization Evaluation using LLMs

    Authors: Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour

    Abstract: Automated evaluation is crucial for streamlining text summarization benchmarking and model development, given the costly and time-consuming nature of human evaluation. Traditional methods like ROUGE do not correlate well with human judgment, while recently proposed LLM-based metrics provide only summary-level assessment using Likert-scale scores. This limits deeper model analysis, e.g., we can onl… ▽ More

    Submitted 22 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024 (main, long)

  11. arXiv:2407.00224  [pdf, other

    cs.CV stat.AP

    Multimodal Prototyping for cancer survival prediction

    Authors: Andrew H. Song, Richard J. Chen, Guillaume Jaume, Anurag J. Vaidya, Alexander S. Baras, Faisal Mahmood

    Abstract: Multimodal survival methods combining gigapixel histology whole-slide images (WSIs) and transcriptomic profiles are particularly promising for patient prognostication and stratification. Current approaches involve tokenizing the WSIs into smaller patches (>10,000 patches) and transcriptomics into gene groups, which are then integrated using a Transformer for predicting outcomes. However, this proc… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: ICML 2024

  12. arXiv:2406.19812  [pdf, other

    cs.SE cs.AI

    Fuzzy Logic Guided Reward Function Variation: An Oracle for Testing Reinforcement Learning Programs

    Authors: Shiyu Zhang, Haoyang Song, Qixin Wang, Yu Pei

    Abstract: Reinforcement Learning (RL) has gained significant attention across various domains. However, the increasing complexity of RL programs presents testing challenges, particularly the oracle problem: defining the correctness of the RL program. Conventional human oracles struggle to cope with the complexity, leading to inefficiencies and potential unreliability in RL testing. To alleviate this problem… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

    MSC Class: 68T05; 68T27; 93C42 ACM Class: D.2.5; I.2.3

  13. arXiv:2406.16192  [pdf, other

    cs.CV

    HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

    Authors: Guillaume Jaume, Paul Doucet, Andrew H. Song, Ming Y. Lu, Cristina Almagro-PĂ©rez, Sophia J. Wagner, Anurag J. Vaidya, Richard J. Chen, Drew F. K. Williamson, Ahrong Kim, Faisal Mahmood

    Abstract: Spatial transcriptomics (ST) enables interrogating the molecular composition of tissue with ever-increasing resolution, depth, and sensitivity. However, costs, rapidly evolving technology, and lack of standards have constrained computational methods in ST to narrow tasks and small cohorts. In addition, the underlying tissue morphology as reflected by H&E-stained whole slide images (WSIs) encodes r… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Under review

  14. arXiv:2406.15333  [pdf, other

    cs.CV

    GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation

    Authors: Chubin Zhang, Hongliang Song, Yi Wei, Yu Chen, Jiwen Lu, Yansong Tang

    Abstract: In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation a… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: The code is available at https://github.com/alibaba-yuanjing-aigclab/GeoLRM

  15. arXiv:2406.14876  [pdf, other

    cs.LG cs.AI

    Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

    Authors: Deokjae Lee, Hyun Oh Song, Kyunghyun Cho

    Abstract: Active learning is increasingly adopted for expensive multi-objective combinatorial optimization problems, but it involves a challenging subset selection problem, optimizing the batch acquisition score that quantifies the goodness of a batch for evaluation. Due to the excessively large search space of the subset selection problem, prior methods optimize the batch acquisition on the latent space, w… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: ICML 2024; Codes at https://github.com/snu-mllab/GreedyPolicyForMOCO

  16. arXiv:2406.13855  [pdf, other

    cs.CR

    Advancing Blockchain Scalability: An Introduction to Layer 1 and Layer 2 Solutions

    Authors: Han Song, Zhongche Qu, Yihao Wei

    Abstract: Bitcoin rise has put blockchain technology into the mainstream, amplifying its potential and broad utility. While Bitcoin has become incredibly famous, its transaction rate has not match such a corresponding increase. It still takes approximately 10 minutes to mine a block and add it to the chain. This limitation highlights the importance of seeking scale-up solutions that solve the low throughput… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  17. arXiv:2406.13133  [pdf, other

    cs.CL cs.LG q-bio.GN

    PathoLM: Identifying pathogenicity from the DNA sequence through the Genome Foundation Model

    Authors: Sajib Acharjee Dip, Uddip Acharjee Shuvo, Tran Chau, Haoqiu Song, Petra Choi, Xuan Wang, Liqing Zhang

    Abstract: Pathogen identification is pivotal in diagnosing, treating, and preventing diseases, crucial for controlling infections and safeguarding public health. Traditional alignment-based methods, though widely used, are computationally intense and reliant on extensive reference databases, often failing to detect novel pathogens due to their low sensitivity and specificity. Similarly, conventional machine… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 9 pages, 3 figures

  18. arXiv:2406.12837  [pdf, other

    cs.LG cs.CV

    LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

    Authors: Jinuk Kim, Marwa El Halabi, Mingi Ji, Hyun Oh Song

    Abstract: Recent works show that reducing the number of layers in a convolutional neural network can enhance efficiency while maintaining the performance of the network. Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer. However, these methods suffer from a critical drawback; the kernel size of the merged laye… ▽ More

    Submitted 8 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  19. arXiv:2406.12738  [pdf, other

    cs.CL cs.AI

    Large Language Model as a Universal Clinical Multi-task Decoder

    Authors: Yujiang Wu, Hongjian Song, Jiawen Zhang, Xumeng Wen, Shun Zheng, Jiang Bian

    Abstract: The development of effective machine learning methodologies for enhancing the efficiency and accuracy of clinical systems is crucial. Despite significant research efforts, managing a plethora of diversified clinical tasks and adapting to emerging new tasks remain significant challenges. This paper presents a novel paradigm that employs a pre-trained large language model as a universal clinical mul… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Work in progress

  20. arXiv:2406.11389  [pdf, other

    cs.LG

    SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

    Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

    Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  21. arXiv:2406.09684  [pdf, other

    cs.LG cs.AI cs.CR

    Explainable AI for Comparative Analysis of Intrusion Detection Models

    Authors: Pap M. Corea, Yongxin Liu, Jian Wang, Shuteng Niu, Houbing Song

    Abstract: Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class class… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE MeditCom 2024 - WS-05

  22. arXiv:2406.07061  [pdf, other

    eess.IV cs.CV

    Triage of 3D pathology data via 2.5D multiple-instance learning to guide pathologist assessments

    Authors: Gan Gao, Andrew H. Song, Fiona Wang, David Brenes, Rui Wang, Sarah S. L. Chow, Kevin W. Bishop, Lawrence D. True, Faisal Mahmood, Jonathan T. C. Liu

    Abstract: Accurate patient diagnoses based on human tissue biopsies are hindered by current clinical practice, where pathologists assess only a limited number of thin 2D tissue slices sectioned from 3D volumetric tissue. Recent advances in non-destructive 3D pathology, such as open-top light-sheet microscopy, enable comprehensive imaging of spatially heterogeneous tissue morphologies, offering the feasibili… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR CVMI 2024

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6955-6965

  23. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (50 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  24. arXiv:2406.05431  [pdf

    cs.CL

    MaTableGPT: GPT-based Table Data Extractor from Materials Science Literature

    Authors: Gyeong Hoon Yi, Jiwoo Choi, Hyeongyun Song, Olivia Miano, Jaewoong Choi, Kihoon Bang, Byungju Lee, Seok Su Sohn, David Buttler, Anna Hiszpanski, Sang Soo Han, Donghun Kim

    Abstract: Efficiently extracting data from tables in the scientific literature is pivotal for building large-scale databases. However, the tables reported in materials science papers exist in highly diverse forms; thus, rule-based extractions are an ineffective approach. To overcome this challenge, we present MaTableGPT, which is a GPT-based table data extractor from the materials science literature. MaTabl… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  25. arXiv:2406.04064  [pdf, other

    cs.CL cs.AI cs.CY

    Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models

    Authors: Jisu Shin, Hoyun Song, Huije Lee, Soyeong Jeong, Jong C. Park

    Abstract: Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities. To fully understand such social bias in large language models (LLMs), it is essential to consider the composite of social perceptions from diverse perspectives among identities. Previous studies have either evaluated biases in LLMs by indirectly assessing the presence of sentiment… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Findings of ACL 2024

  26. arXiv:2406.00439  [pdf, other

    cs.RO cs.CV

    Learning Manipulation by Predicting Interaction

    Authors: Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

    Abstract: Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to RSS 2024. Project page: https://github.com/OpenDriveLab/MPI

  27. arXiv:2406.00030  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Model Pruning

    Authors: Hanjuan Huang, Hao-Jia Song, Hsing-Kuo Pao

    Abstract: We surely enjoy the larger the better models for their superior performance in the last couple of years when both the hardware and software support the birth of such extremely huge models. The applied fields include text mining and others. In particular, the success of LLMs on text understanding and text generation draws attention from researchers who have worked on NLP and related areas for years… ▽ More

    Submitted 24 May, 2024; originally announced June 2024.

    Comments: 17 pages, 7 figures, 2 tables

  28. arXiv:2405.19257  [pdf, other

    cs.RO cs.DC

    Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots

    Authors: Zekai Sun, Xiuxian Guan, Junming Wang, Haoze Song, Yuhao Qing, Tianxiang Shen, Dong Huang, Fangming Liu, Heming Cui

    Abstract: The rapid advancements in machine learning techniques have led to significant achievements in various real-world robotic tasks. These tasks heavily rely on fast and energy-efficient inference of deep neural network (DNN) models when deployed on robots. To enhance inference performance, distributed inference has emerged as a promising approach, parallelizing inference across multiple powerful GPU d… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  29. arXiv:2405.16152  [pdf, other

    cs.CV cs.HC

    SuDA: Support-based Domain Adaptation for Sim2Real Motion Capture with Flexible Sensors

    Authors: Jiawei Fang, Haishan Song, Chengxu Zuo, Xiaoxia Gao, Xiaowei Chen, Shihui Guo, Yipeng Qin

    Abstract: Flexible sensors hold promise for human motion capture (MoCap), offering advantages such as wearability, privacy preservation, and minimal constraints on natural movement. However, existing flexible sensor-based MoCap methods rely on deep learning and necessitate large and diverse labeled datasets for training. These data typically need to be collected in MoCap studios with specialized equipment a… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 20 pages conference, accepted ICML paper

  30. arXiv:2405.15082  [pdf, other

    cs.RO

    Advancements in Translation Accuracy for Stereo Visual-Inertial Initialization

    Authors: Han Song, Zhongche Qu, Zhi Zhang, Zihan Ye, Cong Liu

    Abstract: As the current initialization method in the state-of-the-art Stereo Visual-Inertial SLAM framework, ORB-SLAM3 has limitations. Its success depends on the performance of the pure stereo SLAM system and is based on the underlying assumption that pure visual SLAM can accurately estimate the camera trajectory, which is essential for inertial parameter estimation. Meanwhile, the further improved initia… ▽ More

    Submitted 20 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  31. arXiv:2405.13993  [pdf, other

    cs.CV cs.CE cs.IR cs.LG

    AutoLCZ: Towards Automatized Local Climate Zone Mapping from Rule-Based Remote Sensing

    Authors: Chenying Liu, Hunsoo Song, Anamika Shreevastava, Conrad M Albrecht

    Abstract: Local climate zones (LCZs) established a standard classification system to categorize the landscape universe for improved urban climate studies. Existing LCZ mapping is guided by human interaction with geographic information systems (GIS) or modelled from remote sensing (RS) data. GIS-based methods do not scale to large areas. However, RS-based methods leverage machine learning techniques to autom… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: accepted at 2024 IGARSS

  32. arXiv:2405.13014  [pdf, other

    cs.CL cs.AI

    QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models

    Authors: Wei Wang, Zhaowei Li, Qi Xu, Yiqing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao

    Abstract: Deploying large language models (LLMs) poses challenges in terms of resource limitations and inference efficiency. To address these challenges, recent research has focused on using smaller task-specific language models, which are enhanced by distilling the knowledge rationales generated by LLMs. However, previous works mostly emphasize the effectiveness of positive knowledge, while overlooking the… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  33. arXiv:2405.11794  [pdf, other

    cs.CV

    ViViD: Video Virtual Try-on using Diffusion Models

    Authors: Zixun Fang, Wei Zhai, Aimin Su, Hongliang Song, Kai Zhu, Mao Wang, Yu Chen, Zhiheng Liu, Yang Cao, Zheng-Jun Zha

    Abstract: Video virtual try-on aims to transfer a clothing item onto the video of a target person. Directly applying the technique of image-based try-on to the video domain in a frame-wise manner will cause temporal-inconsistent outcomes while previous video-based try-on solutions can only generate low visual quality and blurring results. In this work, we present ViViD, a novel framework employing powerful… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  34. arXiv:2405.11643  [pdf, other

    cs.CV cs.LG stat.AP

    Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

    Authors: Andrew H. Song, Richard J. Chen, Tong Ding, Drew F. K. Williamson, Guillaume Jaume, Faisal Mahmood

    Abstract: Representation learning of pathology whole-slide images (WSIs) has been has primarily relied on weak supervision with Multiple Instance Learning (MIL). However, the slide representations resulting from this approach are highly tailored to specific clinical tasks, which limits their expressivity and generalization, particularly in scenarios with limited data. Instead, we hypothesize that morphologi… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  35. arXiv:2405.11618  [pdf, other

    cs.CV cs.AI

    Transcriptomics-guided Slide Representation Learning in Computational Pathology

    Authors: Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya, Richard J. Chen, Drew F. K. Williamson, Thomas Peeters, Andrew H. Song, Faisal Mahmood

    Abstract: Self-supervised learning (SSL) has been successful in building patch embeddings of small histology images (e.g., 224x224 pixels), but scaling these models to learn slide embeddings from the entirety of giga-pixel whole-slide images (WSIs) remains challenging. Here, we leverage complementary information from gene expression profiles to guide slide representation learning using multimodal pre-traini… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: CVPR'24, Oral

  36. arXiv:2405.10647  [pdf, other

    cs.LG cs.AI cs.DC

    Cyclical Weight Consolidation: Towards Solving Catastrophic Forgetting in Serial Federated Learning

    Authors: Haoyue Song, Jiacheng Wang, Liansheng Wang

    Abstract: Federated Learning (FL) has gained attention for addressing data scarcity and privacy concerns. While parallel FL algorithms like FedAvg exhibit remarkable performance, they face challenges in scenarios with diverse network speeds and concerns about centralized control, especially in multi-institutional collaborations like the medical domain. Serial FL presents an alternative solution, circumventi… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 8 figures

  37. arXiv:2405.10096  [pdf, other

    cs.LG cs.CR cs.DC

    The Effect of Quantization in Federated Learning: A RĂ©nyi Differential Privacy Perspective

    Authors: Tianqu Kang, Lumin Liu, Hengtao He, Jun Zhang, S. H. Song, Khaled B. Letaief

    Abstract: Federated Learning (FL) is an emerging paradigm that holds great promise for privacy-preserving machine learning using distributed data. To enhance privacy, FL can be combined with Differential Privacy (DP), which involves adding Gaussian noise to the model weights. However, FL faces a significant challenge in terms of large communication overhead when transmitting these model weights. To address… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 6 pages, 5 figures, submitted to 2024 IEEE MeditCom

  38. arXiv:2405.05989  [pdf, other

    cs.LG cs.AI

    Clustering-based Multitasking Deep Neural Network for Solar Photovoltaics Power Generation Prediction

    Authors: Hui Song, Zheng Miao, Ali Babalhavaeji, Saman Mehrnia, Mahdi Jalili, Xinghuo Yu

    Abstract: The increasing installation of Photovoltaics (PV) cells leads to more generation of renewable energy sources (RES), but results in increased uncertainties of energy scheduling. Predicting PV power generation is important for energy management and dispatch optimization in smart grid. However, the PV power generation data is often collected across different types of customers (e.g., residential, agr… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  39. I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis

    Authors: Haofei Song, Xintian Mao, Jing Yu, Qingli Li, Yan Wang

    Abstract: Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution fr… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  40. arXiv:2405.01920  [pdf

    cs.CV

    Lightweight Change Detection in Heterogeneous Remote Sensing Images with Online All-Integer Pruning Training

    Authors: Chengyang Zhang, Weiming Li, Gang Li, Huina Song, Zhaohui Song, Xueqian Wang, Antonio Plaza

    Abstract: Detection of changes in heterogeneous remote sensing images is vital, especially in response to emergencies like earthquakes and floods. Current homogenous transformation-based change detection (CD) methods often suffer from high computation and memory costs, which are not friendly to edge-computation devices like onboard CD devices at satellites. To address this issue, this paper proposes a new l… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  41. arXiv:2404.11996  [pdf, other

    cs.AI

    DST-GTN: Dynamic Spatio-Temporal Graph Transformer Network for Traffic Forecasting

    Authors: Songtao Huang, Hongjin Song, Tianqi Jiang, Akbar Telikani, Jun Shen, Qingguo Zhou, Binbin Yong, Qiang Wu

    Abstract: Accurate traffic forecasting is essential for effective urban planning and congestion management. Deep learning (DL) approaches have gained colossal success in traffic forecasting but still face challenges in capturing the intricacies of traffic dynamics. In this paper, we identify and address this challenges by emphasizing that spatial features are inherently dynamic and change over time. A novel… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  42. arXiv:2404.11936  [pdf, other

    cs.LG cs.AI cs.CV

    LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

    Authors: Thibault Castells, Hyoung-Kyu Song, Bo-Kyeong Kim, Shinkook Choi

    Abstract: Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured prunin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 8 pages, accepted to CVPR24 First Workshop on Efficient and On-Device Generation (EDGE)

  43. arXiv:2404.11925  [pdf, other

    cs.LG cs.AI cs.CV

    EdgeFusion: On-Device Text-to-Image Generation

    Authors: Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim

    Abstract: The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 4 pages, accepted to CVPR24 First Workshop on Efficient and On-Device Generation (EDGE)

  44. arXiv:2404.11905  [pdf, other

    cs.LG cs.CR

    FedMID: A Data-Free Method for Using Intermediate Outputs as a Defense Mechanism Against Poisoning Attacks in Federated Learning

    Authors: Sungwon Han, Hyeonho Song, Sungwon Park, Meeyoung Cha

    Abstract: Federated learning combines local updates from clients to produce a global model, which is susceptible to poisoning attacks. Most previous defense strategies relied on vectors derived from projections of local updates on a Euclidean space; however, these methods fail to accurately represent the functionality and structure of local models, resulting in inconsistent performance. Here, we present a n… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  45. arXiv:2404.04841  [pdf, other

    cs.CR

    Unveiling Decentralization: A Comprehensive Review of Technologies, Comparison, Challenges in Bitcoin, Ethereum, and Solana Blockchain

    Authors: Han Song, Yihao Wei, Zhongche Qu, Weihan Wang

    Abstract: Bitcoin stands as a groundbreaking development in decentralized exchange throughout human history, enabling transactions without the need for intermediaries. By leveraging cryptographic proof mechanisms, Bitcoin eliminates the reliance on third-party financial institutions. Ethereum, ranking as the second-largest cryptocurrency by market capitalization, builds upon Bitcoin's groundwork by introduc… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  46. arXiv:2404.01524  [pdf, other

    cs.CV cs.AI

    On Train-Test Class Overlap and Detection for Image Retrieval

    Authors: Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang, Shunghyun Choi, Yeong Hyeon Gu, Yannis Avrithis

    Abstract: How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  47. arXiv:2404.01156  [pdf, other

    cs.CV cs.AI

    SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining

    Authors: Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, Shunghyun Choi, Yeong Hyeon Gu

    Abstract: Vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets often exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are no… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Accepted

  48. arXiv:2403.19886  [pdf, other

    cs.RO

    BundledSLAM: An Accurate Visual SLAM System Using Multiple Cameras

    Authors: Han Song, Cong Liu, Huafeng Dai

    Abstract: Multi-camera SLAM systems offer a plethora of advantages, primarily stemming from their capacity to amalgamate information from a broader field of view, thereby resulting in heightened robustness and improved localization accuracy. In this research, we present a significant extension and refinement of the state-of-the-art stereo SLAM system, known as ORB-SLAM2, with the objective of attaining even… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  49. arXiv:2403.18193  [pdf, other

    cs.CV

    Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking

    Authors: Qiming Wang, Yongqiang Bai, Hongxing Song

    Abstract: RGB-T tracking, a vital downstream task of object tracking, has made remarkable progress in recent years. Yet, it remains hindered by two major challenges: 1) the trade-off between performance and efficiency; 2) the scarcity of training data. To address the latter challenge, some recent methods employ prompts to fine-tune pre-trained RGB tracking models and leverage upstream knowledge in a paramet… ▽ More

    Submitted 9 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  50. arXiv:2403.17980  [pdf, other

    cs.CR cs.LG

    EG-ConMix: An Intrusion Detection Method based on Graph Contrastive Learning

    Authors: Lijin Wu, Shanshan Lei, Feilong Liao, Yuanjun Zheng, Yuxin Liu, Wentao Fu, Hao Song, Jiajun Zhou

    Abstract: As the number of IoT devices increases, security concerns become more prominent. The impact of threats can be minimized by deploying Network Intrusion Detection System (NIDS) by monitoring network traffic, detecting and discovering intrusions, and issuing security alerts promptly. Most intrusion detection research in recent years has been directed towards the pair of traffic itself without conside… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.