Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 165 results for author: Han, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chen Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2406.10655  [pdf, ps, other

    cs.CR

    E-SAGE: Explainability-based Defense Against Backdoor Attacks on Graph Neural Networks

    Authors: Dingqiang Yuan, Xiaohua Xu, Lei Yu, Tongchang Han, Rongchang Li, Meng Han

    Abstract: Graph Neural Networks (GNNs) have recently been widely adopted in multiple domains. Yet, they are notably vulnerable to adversarial and backdoor attacks. In particular, backdoor attacks based on subgraph insertion have been shown to be effective in graph classification tasks while being stealthy, successfully circumventing various existing defense methods. In this paper, we propose E-SAGE, a novel… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  3. arXiv:2405.19758  [pdf, other

    cs.RO

    InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning

    Authors: Muzhi Han, Yifeng Zhu, Song-Chun Zhu, Ying Nian Wu, Yuke Zhu

    Abstract: Learning abstract state representations and knowledge is crucial for long-horizon robot planning. We present InterPreT, an LLM-powered framework for robots to learn symbolic predicates from language feedback of human non-experts during embodied interaction. The learned predicates provide relational abstractions of the environment state, facilitating the learning of symbolic operators that capture… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: RSS 2024; https://interpret-robot.github.io

  4. arXiv:2405.15322  [pdf, other

    cs.CR cs.AR

    Dishonest Approximate Computing: A Coming Crisis for Cloud Clients

    Authors: Ye Wang, Jian Dong, Ming Han, Jin Wu, Gang Qu

    Abstract: Approximate Computing (AC) has emerged as a promising technique for achieving energy-efficient architectures and is expected to become an effective technique for reducing the electricity cost for cloud service providers (CSP). However, the potential misuse of AC has not received adequate attention, which is a coming crisis behind the blueprint of AC. Driven by the pursuit of illegal financial prof… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures

  5. arXiv:2405.12079  [pdf, other

    cs.DC cs.OS

    PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation

    Authors: Zhuobin Huang, Xingda Wei, Yingyi Hao, Rong Chen, Mingcong Han, Jinyu Gu, Haibo Chen

    Abstract: Checkpointing (C) and restoring (R) are key components for GPU tasks. POS is an OS-level GPU C/R system: It can transparently checkpoint or restore processes that use the GPU, without requiring any cooperation from the application, a key feature required by modern systems like the cloud. Moreover, POS is the first OS-level C/R system that can concurrently execute C/R with the application execution… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  6. arXiv:2405.08318  [pdf, other

    cs.LG

    No-Regret Learning of Nash Equilibrium for Black-Box Games via Gaussian Processes

    Authors: Minbiao Han, Fengxue Zhang, Yuxin Chen

    Abstract: This paper investigates the challenge of learning in black-box games, where the underlying utility function is unknown to any of the agents. While there is an extensive body of literature on the theoretical analysis of algorithms for computing the Nash equilibrium with complete information about the game, studies on Nash equilibrium in black-box games are less common. In this paper, we focus on le… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 40th Conference on Uncertainty in Artificial Intelligence (UAI 2024)

  7. arXiv:2405.04752  [pdf, other

    eess.AS cs.SD

    HILCodec: High Fidelity and Lightweight Neural Audio Codec

    Authors: Sunghwan Ahn, Beom Jun Woo, Min Hyun Han, Chanyeong Moon, Nam Soo Kim

    Abstract: The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consist… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2404.19334  [pdf, other

    cs.CV

    Multi-Scale Heterogeneity-Aware Hypergraph Representation for Histopathology Whole Slide Images

    Authors: Minghao Han, Xukun Zhang, Dingkang Yang, Tao Liu, Haopeng Kuang, Jinghui Feng, Lihua Zhang

    Abstract: Survival prediction is a complex ordinal regression task that aims to predict the survival coefficient ranking among a cohort of patients, typically achieved by analyzing patients' whole slide images. Existing deep learning approaches mainly adopt multiple instance learning or graph neural networks under weak supervision. Most of them are unable to uncover the diverse interactions between differen… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 9 pages, 6 figures, accepted by ICME2024

  9. arXiv:2404.17521  [pdf, other

    cs.RO cs.CV

    Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

    Authors: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task repre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Project website and open-source code: https://xiaoyao-li.github.io/research/ag2manip

  10. arXiv:2404.10358  [pdf, other

    cs.CV

    Improving Bracket Image Restoration and Enhancement with Flow-guided Alignment and Enhanced Feature Aggregation

    Authors: Wenjie Lin, Zhen Liu, Chengzhi Jiang, Mingyan Han, Ting Jiang, Shuaicheng Liu

    Abstract: In this paper, we address the Bracket Image Restoration and Enhancement (BracketIRE) task using a novel framework, which requires restoring a high-quality high dynamic range (HDR) image from a sequence of noisy, blurred, and low dynamic range (LDR) multi-exposure RAW inputs. To overcome this challenge, we present the IREANet, which improves the multiple exposure alignment and aggregation with a Fl… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  11. arXiv:2404.10220  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

    Authors: Peiyuan Zhi, Zhiyuan Zhang, Muzhi Han, Zeyu Zhang, Zhitian Li, Ziyuan Jiao, Baoxiong Jia, Siyuan Huang

    Abstract: Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. We present COME-robot, the first closed-loop framework utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. We meticulously construct a library of action primitives for robot exploration, navigation, a… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  12. arXiv:2404.03384  [pdf, other

    cs.CV

    LongVLM: Efficient Long Video Understanding via Large Language Models

    Authors: Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang

    Abstract: Empowered by Large Language Models (LLMs), recent advancements in VideoLLMs have driven progress in various video understanding tasks. These models encode video representations through pooling or query aggregation over a vast number of visual tokens, making computational and memory costs affordable. Despite successfully providing an overall comprehension of video content, existing VideoLLMs still… ▽ More

    Submitted 10 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  13. arXiv:2404.03310  [pdf

    physics.ao-ph cs.LG

    Site-specific Deterministic Temperature and Humidity Forecasts with Explainable and Reliable Machine Learning

    Authors: MengMeng Han, Tennessee Leeuwenburg, Brad Murphy

    Abstract: Site-specific weather forecasts are essential to accurate prediction of power demand and are consequently of great interest to energy operators. However, weather forecasts from current numerical weather prediction (NWP) models lack the fine-scale detail to capture all important characteristics of localised real-world sites. Instead they provide weather information representing a rectangular gridbo… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 27 Pages, 16 Figures, 11 Tables

  14. arXiv:2403.17801  [pdf, other

    cs.CV eess.IV

    Towards 3D Vision with Low-Cost Single-Photon Cameras

    Authors: Fangzhou Mu, Carter Sifferman, Sacha Jungerman, Yiquan Li, Mark Han, Michael Gleicher, Mohit Gupta, Yin Li

    Abstract: We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to mo… ▽ More

    Submitted 29 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  15. arXiv:2403.11552  [pdf, other

    cs.RO cs.AI

    LLM3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning

    Authors: Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu

    Abstract: Conventional Task and Motion Planning (TAMP) approaches rely on manually crafted interfaces connecting symbolic task planning with continuous motion generation. These domain-specific and labor-intensive modules are limited in addressing emerging tasks in real-world settings. Here, we present LLM^3, a novel Large Language Model (LLM)-based TAMP framework featuring a domain-independent interface. Sp… ▽ More

    Submitted 20 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Submitted to IROS 2024. Codes available: https://github.com/AssassinWS/LLM-TAMP

  16. arXiv:2403.08010  [pdf, other

    cs.CL

    Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM

    Authors: Jingcong Liang, Rong Ye, Meng Han, Ruofei Lai, Xinyu Zhang, Xuanjing Huang, Zhongyu Wei

    Abstract: How can we construct an automated debate judge to evaluate an extensive, vibrant, multi-turn debate? This task is challenging, as judging a debate involves grappling with lengthy texts, intricate argument relationships, and multi-dimensional assessments. At the same time, current research mainly focuses on short dialogues, rarely touching upon the evaluation of an entire debate. In this paper, by… ▽ More

    Submitted 19 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  17. arXiv:2403.02075  [pdf, other

    cs.CV

    DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction

    Authors: Weiyi Lv, Yuhang Huang, Ning Zhang, Ruei-Sung Lin, Mei Han, Dan Zeng

    Abstract: In Multiple Object Tracking, objects often exhibit non-linear motion of acceleration and deceleration, with irregular direction changes. Tacking-by-detection (TBD) trackers with Kalman Filter motion prediction work well in pedestrian-dominant scenarios but fall short in complex situations when multiple objects perform non-linear and diverse motion simultaneously. To tackle the complex non-linear m… ▽ More

    Submitted 20 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  18. arXiv:2403.00257  [pdf

    cs.CV cs.LG

    Robust deep labeling of radiological emphysema subtypes using squeeze and excitation convolutional neural networks: The MESA Lung and SPIROMICS Studies

    Authors: Artur Wysoczanski, Nabil Ettehadi, Soroush Arabshahi, Yifei Sun, Karen Hinkley Stukovsky, Karol E. Watson, MeiLan K. Han, Erin D Michos, Alejandro P. Comellas, Eric A. Hoffman, Andrew F. Laine, R. Graham Barr, Elsa D. Angelini

    Abstract: Pulmonary emphysema, the progressive, irreversible loss of lung tissue, is conventionally categorized into three subtypes identifiable on pathology and on lung computed tomography (CT) images. Recent work has led to the unsupervised learning of ten spatially-informed lung texture patterns (sLTPs) on lung CT, representing distinct patterns of emphysematous lung parenchyma based on both textural app… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  19. arXiv:2402.04356  [pdf, other

    cs.SD cs.CV eess.AS

    Bidirectional Autoregressive Diffusion Model for Dance Generation

    Authors: Canyu Zhang, Youbao Tang, Ning Zhang, Ruei-Sung Lin, Mei Han, Jing Xiao, Song Wang

    Abstract: Dance serves as a powerful medium for expressing human emotions, but the lifelike generation of dance is still a considerable challenge. Recently, diffusion models have showcased remarkable generative abilities across various domains. They hold promise for human motion generation due to their adaptable many-to-many nature. Nonetheless, current diffusion-based motion generation models often create… ▽ More

    Submitted 22 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  20. arXiv:2402.03688  [pdf, other

    cs.CR cs.AI cs.LG

    A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective

    Authors: Lei Yu, Meng Han, Yiming Li, Changting Lin, Yao Zhang, Mingyang Zhang, Yan Liu, Haiqin Weng, Yuseok Jeon, Ka-Ho Chow, Stacy Patterson

    Abstract: Vertical Federated Learning (VFL) is a federated learning paradigm where multiple participants, who share the same set of samples but hold different features, jointly train machine learning models. Although VFL enables collaborative machine learning without sharing raw data, it is still susceptible to various privacy threats. In this paper, we conduct the first comprehensive survey of the state-of… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  21. arXiv:2312.14973  [pdf, other

    cs.GR

    Interactive Visualization of Time-Varying Flow Fields Using Particle Tracing Neural Networks

    Authors: Mengjiao Han, Jixian Li, Sudhanshu Sane, Shubham Gupta, Bei Wang, Steve Petruzza, Chris R. Johnson

    Abstract: In this paper, we present a comprehensive evaluation to establish a robust and efficient framework for Lagrangian-based particle tracing using deep neural networks (DNNs). Han et al. (2021) first proposed a DNN-based approach to learn Lagrangian representations and demonstrated accurate particle tracing for an analytic 2D flow field. In this paper, we extend and build upon this prior work in signi… ▽ More

    Submitted 15 May, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted by Pacific Vis 2024

  22. arXiv:2312.13746  [pdf, other

    cs.CV

    Video Recognition in Portrait Mode

    Authors: Mingfei Han, Linjie Yang, Xiaojie Jin, Jiashi Feng, Xiaojun Chang, Heng Wang

    Abstract: The creation of new datasets often presents new challenges for video recognition and can inspire novel ideas while addressing these challenges. While existing datasets mainly comprise landscape mode videos, our paper seeks to introduce portrait mode videos to the research community and highlight the unique challenges associated with this video format. With the growing popularity of smartphones and… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: See mingfei.info/PMV for data and code information

  23. arXiv:2312.13608  [pdf, other

    cs.CL

    Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation

    Authors: Jiayu Lin, Rong Ye, Meng Han, Qi Zhang, Ruofei Lai, Xinyu Zhang, Zhao Cao, Xuanjing Huang, Zhongyu Wei

    Abstract: Counter-argument generation -- a captivating area in computational linguistics -- seeks to craft statements that offer opposing views. While most research has ventured into paragraph-level generation, sentence-level counter-argument generation beckons with its unique constraints and brevity-focused challenges. Furthermore, the diverse nature of counter-arguments poses challenges for evaluating mod… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: EMNLP2023, main conference

  24. arXiv:2312.12575  [pdf, other

    cs.CR

    LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks

    Authors: Saad Ullah, Mingji Han, Saurabh Pujar, Hammond Pearce, Ayse Coskun, Gianluca Stringhini

    Abstract: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set… ▽ More

    Submitted 13 April, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted for publication in IEEE Symposium on Security and Privacy 2024

  25. arXiv:2312.10300  [pdf, other

    cs.CV

    Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

    Authors: Mingfei Han, Linjie Yang, Xiaojun Chang, Heng Wang

    Abstract: A short clip of video may contain progression of multiple events and an interesting story line. A human need to capture both the event in every shot and associate them together to understand the story behind it. In this work, we present a new multi-shot video understanding benchmark Shot2Story20K with detailed shot-level captions and comprehensive video summaries. To facilitate better semantic und… ▽ More

    Submitted 18 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: See https://mingfei.info/shot2story for updates and more information

  26. arXiv:2312.09869  [pdf, other

    cs.GT cs.LG

    Learning in Online Principal-Agent Interactions: The Power of Menus

    Authors: Minbiao Han, Michael Albert, Haifeng Xu

    Abstract: We study a ubiquitous learning challenge in online principal-agent problems during which the principal learns the agent's private information from the agent's revealed preferences in historical interactions. This paradigm includes important special cases such as pricing and contract design, which have been widely studied in recent literature. However, existing work considers the case where the pri… ▽ More

    Submitted 28 December, 2023; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: AAAI Conference on Artificial Intelligence (AAAI 2024)

  27. arXiv:2312.06065  [pdf, other

    eess.AS cs.SD

    EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings

    Authors: Sung Hwan Mun, Min Hyun Han, Canyeong Moon, Nam Soo Kim

    Abstract: In recent years, there have been studies to further improve the end-to-end neural speaker diarization (EEND) systems. This letter proposes the EEND-DEMUX model, a novel framework utilizing demultiplexed speaker embeddings. In this work, we focus on disentangling speaker-relevant information in the latent space and then transform each separated latent variable into its corresponding speech activity… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  28. arXiv:2312.02226  [pdf, other

    cs.CV

    Generating Action-conditioned Prompts for Open-vocabulary Video Action Recognition

    Authors: Chengyou Jia, Minnan Luo, Xiaojun Chang, Zhuohang Dang, Mingfei Han, Mengmeng Wang, Guang Dai, Sizhe Dang, Jingdong Wang

    Abstract: Exploring open-vocabulary video action recognition is a promising venture, which aims to recognize previously unseen actions within any arbitrary set of categories. Existing methods typically adapt pretrained image-text models to the video domain, capitalizing on their inherent strengths in generalization. A common thread among such methods is the augmentation of visual embeddings with temporal in… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

  29. arXiv:2312.00874  [pdf, other

    cs.CL

    Hi-ArG: Exploring the Integration of Hierarchical Argumentation Graphs in Language Pretraining

    Authors: Jingcong Liang, Rong Ye, Meng Han, Qi Zhang, Ruofei Lai, Xinyu Zhang, Zhao Cao, Xuanjing Huang, Zhongyu Wei

    Abstract: The knowledge graph is a structure to store and represent knowledge, and recent studies have discussed its capability to assist language models for various applications. Some variations of knowledge graphs aim to record arguments and their relations for computational argumentation tasks. However, many must simplify semantic types to fit specific schemas, thus losing flexibility and expression abil… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: to be published in EMNLP 2023

  30. arXiv:2311.18397  [pdf, other

    cs.CL

    IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions

    Authors: Zhebin Zhang, Xinyu Zhang, Yuanhang Ren, Saijiang Shi, Meng Han, Yongkang Wu, Ruofei Lai, Zhao Cao

    Abstract: Retrieval-Augmented Generation (RAG), by incorporating external knowledge with parametric memory of language models, has become the state-of-the-art architecture for open-domain QA tasks. However, common knowledge bases are inherently constrained by limited coverage and noisy information, making retrieval-based approaches inadequate to answer implicit reasoning questions. In this paper, we propose… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  31. arXiv:2311.16459  [pdf, other

    cs.LG cs.DC cs.GT

    On the Effect of Defections in Federated Learning and How to Prevent Them

    Authors: Minbiao Han, Kumar Kshitij Patel, Han Shao, Lingxiao Wang

    Abstract: Federated learning is a machine learning protocol that enables a large population of agents to collaborate over multiple rounds to produce a single consensus model. There are several federated learning applications where agents may choose to defect permanently$-$essentially withdrawing from the collaboration$-$if they are content with their instantaneous model in that round. This work demonstrates… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  32. arXiv:2311.11046  [pdf

    q-bio.QM cs.LG q-bio.NC

    DenseNet and Support Vector Machine classifications of major depressive disorder using vertex-wise cortical features

    Authors: Vladimir Belov, Tracy Erwin-Grabner, Ling-Li Zeng, Christopher R. K. Ching, Andre Aleman, Alyssa R. Amod, Zeynep Basgoze, Francesco Benedetti, Bianca Besteher, Katharina Brosch, Robin Bülow, Romain Colle, Colm G. Connolly, Emmanuelle Corruble, Baptiste Couvy-Duchesne, Kathryn Cullen, Udo Dannlowski, Christopher G. Davey, Annemiek Dols, Jan Ernsting, Jennifer W. Evans, Lukas Fisch, Paola Fuentes-Claramonte, Ali Saffet Gonul, Ian H. Gotlib , et al. (63 additional authors not shown)

    Abstract: Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, h… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  33. arXiv:2310.18954  [pdf, other

    cs.CV cs.AI

    Mask Propagation for Efficient Video Semantic Segmentation

    Authors: Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

    Abstract: Video Semantic Segmentation (VSS) involves assigning a semantic label to each pixel in a video sequence. Prior work in this field has demonstrated promising results by extending image semantic segmentation models to exploit temporal relationships across video frames; however, these approaches often incur significant computational costs. In this paper, we propose an efficient mask propagation frame… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  34. arXiv:2310.17843  [pdf, other

    cs.LG cs.GT

    A Data-Centric Online Market for Machine Learning: From Discovery to Pricing

    Authors: Minbiao Han, Jonathan Light, Steven Xia, Sainyam Galhotra, Raul Castro Fernandez, Haifeng Xu

    Abstract: Data fuels machine learning (ML) - rich and high-quality training data is essential to the success of ML. However, to transform ML from the race among a few large corporations to an accessible technology that serves numerous normal users' data analysis requests, there still exist important challenges. One gap we observed is that many ML users can benefit from new data that other data owners posses… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  35. arXiv:2310.04152  [pdf, other

    cs.CV

    Improving Neural Radiance Field using Near-Surface Sampling with Point Cloud Generation

    Authors: Hye Bin Yoo, Hyun Min Han, Sung Soo Hwang, Il Yong Chun

    Abstract: Neural radiance field (NeRF) is an emerging view synthesis method that samples points in a three-dimensional (3D) space and estimates their existence and color probabilities. The disadvantage of NeRF is that it requires a long training time since it samples many 3D points. In addition, if one samples points from occluded regions or in the space where an object is unlikely to exist, the rendering q… ▽ More

    Submitted 17 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: 14 figures, 3 tables

  36. arXiv:2309.09730  [pdf, other

    cs.CV

    Scribble-based 3D Multiple Abdominal Organ Segmentation via Triple-branch Multi-dilated Network with Pixel- and Class-wise Consistency

    Authors: Meng Han, Xiangde Luo, Wenjun Liao, Shichuan Zhang, Shaoting Zhang, Guotai Wang

    Abstract: Multi-organ segmentation in abdominal Computed Tomography (CT) images is of great importance for diagnosis of abdominal lesions and subsequent treatment planning. Though deep learning based methods have attained high performance, they rely heavily on large-scale pixel-level annotations that are time-consuming and labor-intensive to obtain. Due to its low dependency on annotation, weakly supervised… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 10 pages, 3 figures, MICCAI2023

  37. arXiv:2309.07841  [pdf, other

    cs.CR cs.AI

    Two Timin': Repairing Smart Contracts With A Two-Layered Approach

    Authors: Abhinav Jain, Ehan Masud, Michelle Han, Rohan Dhillon, Sumukh Rao, Arya Joshi, Salar Cheema, Saurav Kumar

    Abstract: Due to the modern relevance of blockchain technology, smart contracts present both substantial risks and benefits. Vulnerabilities within them can trigger a cascade of consequences, resulting in significant losses. Many current papers primarily focus on classifying smart contracts for malicious intent, often relying on limited contract characteristics, such as bytecode or opcode. This paper propos… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to the 2023 ICI Conference

  38. arXiv:2309.05451  [pdf, other

    cs.CV cs.MM

    Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

    Authors: Yabing Wang, Shuhui Wang, Hao Luo, Jianfeng Dong, Fan Wang, Meng Han, Xun Wang, Meng Wang

    Abstract: Current research on cross-modal retrieval is mostly English-oriented, as the availability of a large number of English-oriented human-labeled vision-language corpora. In order to break the limit of non-English labeled data, cross-lingual cross-modal retrieval (CCR) has attracted increasing attention. Most CCR methods construct pseudo-parallel vision-language corpora via Machine Translation (MT) to… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  39. FuseFPS: Accelerating Farthest Point Sampling with Fusing KD-tree Construction for Point Clouds

    Authors: Meng Han, Liang Wang, Limin Xiao, Hao Zhang, Chenhao Zhang, Xilong Xie, Shuai Zheng, Jin Dong

    Abstract: Point cloud analytics has become a critical workload for embedded and mobile platforms across various applications. Farthest point sampling (FPS) is a fundamental and widely used kernel in point cloud processing. However, the heavy external memory access makes FPS a performance bottleneck for real-time point cloud processing. Although bucket-based farthest point sampling can significantly reduce u… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: conference for ASP-DAC 2024

    Report number: 979-8-3503-9354-5

    Journal ref: ASP-DAC 2024

  40. arXiv:2308.13714  [pdf, other

    cs.LG cs.CR cs.CY

    Uncovering Promises and Challenges of Federated Learning to Detect Cardiovascular Diseases: A Scoping Literature Review

    Authors: Sricharan Donkada, Seyedamin Pouriyeh, Reza M. Parizi, Meng Han, Nasrin Dehbozorgi, Nazmus Sakib, Quan Z. Sheng

    Abstract: Cardiovascular diseases (CVD) are the leading cause of death globally, and early detection can significantly improve outcomes for patients. Machine learning (ML) models can help diagnose CVDs early, but their performance is limited by the data available for model training. Privacy concerns in healthcare make it harder to acquire data to train accurate ML models. Federated learning (FL) is an emerg… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  41. arXiv:2308.12757  [pdf, other

    cs.CV

    PartSeg: Few-shot Part Segmentation via Part-aware Prompt Learning

    Authors: Mengya Han, Heliang Zheng, Chaoyue Wang, Yong Luo, Han Hu, Jing Zhang, Yonggang Wen

    Abstract: In this work, we address the task of few-shot part segmentation, which aims to segment the different parts of an unseen object using very few labeled examples. It is found that leveraging the textual space of a powerful pre-trained image-language model (such as CLIP) can be beneficial in learning visual features. Therefore, we develop a novel method termed PartSeg for few-shot part segmentation ba… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  42. Deep Reinforcement Learning-driven Cross-Community Energy Interaction Optimal Scheduling

    Authors: Yang Li, Wenjie Ma, Fanjin Bu, Zhen Yang, Bin Wang, Meng Han

    Abstract: In order to coordinate energy interactions among various communities and energy conversions among multi-energy subsystems within the multi-community integrated energy system under uncertain conditions, and achieve overall optimization and scheduling of the comprehensive energy system, this paper proposes a comprehensive scheduling model that utilizes a multi-agent deep reinforcement learning algor… ▽ More

    Submitted 2 September, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

    Comments: in Chinese language, Accepted by Electric Power Construction

    Journal ref: Electric Power Construction 45 (2024) 59-70

  43. arXiv:2308.07418  [pdf, other

    cs.LG stat.ML

    Locally Adaptive and Differentiable Regression

    Authors: Mingxuan Han, Varun Shankar, Jeff M Phillips, Chenglong Ye

    Abstract: Over-parameterized models like deep nets and random forests have become very popular in machine learning. However, the natural goals of continuity and differentiability, common in regression models, are now often ignored in modern overparametrized, locally-adaptive models. We propose a general framework to construct a global continuous and differentiable model based on a weighted average of locall… ▽ More

    Submitted 12 October, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

    Journal ref: Journal of Machine Learning for Modeling and Computing 2023

  44. arXiv:2308.06962  [pdf, other

    cs.CV

    Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

    Authors: Licheng Zhong, Lixin Yang, Kailin Li, Haoyu Zhen, Mei Han, Cewu Lu

    Abstract: The reconstruction of object surfaces from multi-view images or monocular video is a fundamental issue in computer vision. However, much of the recent research concentrates on reconstructing geometry through implicit or explicit methods. In this paper, we shift our focus towards reconstructing mesh in conjunction with color. We remove the view-dependent color from neural volume rendering while ret… ▽ More

    Submitted 19 December, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

  45. arXiv:2307.16420  [pdf, other

    cs.RO

    Part-level Scene Reconstruction Affords Robot Interaction

    Authors: Zeyu Zhang, Lexing Zhang, Zaijin Wang, Ziyuan Jiao, Muzhi Han, Yixin Zhu, Song-Chun Zhu, Hangxin Liu

    Abstract: Existing methods for reconstructing interactive scenes primarily focus on replacing reconstructed objects with CAD models retrieved from a limited database, resulting in significant discrepancies between the reconstructed and observed scenes. To address this issue, our work introduces a part-level reconstruction approach that reassembles objects using primitive shapes. This enables us to precisely… ▽ More

    Submitted 31 July, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    Comments: IROS 2023 paper

  46. arXiv:2307.13343  [pdf, other

    eess.AS cs.CR cs.SD

    On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

    Authors: Md Asif Jalal, Pablo Peso Parada, Jisi Zhang, Karthikeyan Saravanan, Mete Ozay, Myoungji Han, Jung In Lee, Seokyeong Jung

    Abstract: Smart devices serviced by large-scale AI models necessitates user data transfer to the cloud for inference. For speech applications, this means transferring private user information, e.g., speaker identity. Our paper proposes a privacy-enhancing framework that targets speaker identity anonymization while preserving speech recognition accuracy for our downstream task~-~Automatic Speech Recognition… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Proceedings of INTERSPEECH 2023

  47. arXiv:2306.06288  [pdf, other

    cs.CV

    SAGE-NDVI: A Stereotype-Breaking Evaluation Metric for Remote Sensing Image Dehazing Using Satellite-to-Ground NDVI Knowledge

    Authors: Zepeng Liu, Zhicheng Yang, Mingye Zhu, Andy Wong, Yibing Wei, Mei Han, Jun Yu, Jui-Hsin Lai

    Abstract: Image dehazing is a meaningful low-level computer vision task and can be applied to a variety of contexts. In our industrial deployment scenario based on remote sensing (RS) images, the quality of image dehazing directly affects the grade of our crop identification and growth monitoring products. However, the widely used peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) prov… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted by ICME 2023 Industry Track

  48. arXiv:2305.19972  [pdf, other

    eess.AS cs.AI cs.CL

    VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition

    Authors: Ziyi Ni, Minglun Han, Feilong Chen, Linghui Meng, Jing Shi, Pin Lv, Bo Xu

    Abstract: Enhancing automatic speech recognition (ASR) performance by leveraging additional multimodal information has shown promising results in previous studies. However, most of these works have primarily focused on utilizing visual cues derived from human lip motions. In fact, context-dependent visual and linguistic cues can also benefit in many scenarios. In this paper, we first propose ViLaS (Vision a… ▽ More

    Submitted 18 December, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted to ICASSP 2024

  49. arXiv:2305.19051  [pdf, other

    eess.AS cs.AI cs.SD

    Towards single integrated spoofing-aware speaker verification embeddings

    Authors: Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

    Abstract: This study aims to develop a single integrated spoofing-aware speaker verification (SASV) embeddings that satisfy two aspects. First, rejecting non-target speakers' input as well as target speakers' spoofed inputs should be addressed. Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outpe… ▽ More

    Submitted 1 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted by INTERSPEECH 2023. Code and models are available in https://github.com/sasv-challenge/ASVSpoof5-SASVBaseline

  50. arXiv:2305.14022  [pdf, other

    cs.CV eess.IV

    Realistic Noise Synthesis with Diffusion Models

    Authors: Qi Wu, Mingyan Han, Ting Jiang, Haoqiang Fan, Bing Zeng, Shuaicheng Liu

    Abstract: Deep image denoising models often rely on large amount of training data for the high quality performance. However, it is challenging to obtain sufficient amount of data under real-world scenarios for the supervised training. As such, synthesizing realistic noise becomes an important solution. However, existing techniques have limitations in modeling complex noise distributions, resulting in residu… ▽ More

    Submitted 3 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.