Author: Wan, Bo : Search

research-article

SARS: A Personalized Federated Learning Framework Towards Fairness and Robustness against Backdoor Attacks

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 8, Issue 4Article No.: 140, Pages 1–24https://doi.org/10.1145/3678571

Federated Learning (FL), an emerging distributed machine learning framework that enables each client to collaboratively train a global model by sharing local knowledge without disclosing local private data, is vulnerable to backdoor model poisoning ...

research-article

Open Access

ByteMQ: A Cloud-native Streaming Data Layer in ByteDance

SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 774–791https://doi.org/10.1145/3698038.3698536

Real-time streaming data is generated in high volumes and consumed for statistical and analytical purposes, requiring efficient and effective management by Message Queuing Systems (MQS) that ensure high throughput and low latency. ByteDance relies ...

Article

Animate Your Motion: Turning Still Images into Dynamic Videos

Computer Vision – ECCV 2024Pages 409–425https://doi.org/10.1007/978-3-031-72848-8_24

Abstract

In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on employing ...

Article

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

Computer Vision – ECCV 2024Pages 75–95https://doi.org/10.1007/978-3-031-72784-9_5

Abstract

Parameter-efficient transfer learning (PETL) has emerged as a flourishing research field for adapting large pre-trained models to downstream tasks, greatly reducing trainable parameters while grappling with memory challenges during fine-tuning. To ...

research-article

Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs

Information Processing and Management: an International Journal (IPRM), Volume 61, Issue 5https://doi.org/10.1016/j.ipm.2024.103805

Abstract

Medical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to ...

research-article

Hierarchical reinforcement learning from imperfect demonstrations through reachable coverage-based subgoal filtering

Knowledge-Based Systems (KNBS), Volume 294, Issue Chttps://doi.org/10.1016/j.knosys.2024.111736

Abstract

Reinforcement learning (RL) has shown remarkable success in navigating complex robotic and gaming landscapes. However, achieving such results often requires a substantial number of interaction episodes between the agent and its environment, ...

Highlights

The use of HRLfD vastly improves the performance of RL in large and complex tasks.
We greatly alleviate the inevitable problem of imperfect demonstrations in LfD.
We propose a novel measure to discriminate negative noise ...

research-article

Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis

Neurocomputing (NEUROC), Volume 572, Issue Chttps://doi.org/10.1016/j.neucom.2023.127181

Abstract

Multimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific ...

Highlights

Novel multimodal adaptive weight matrix enables accurate sentiment analysis by considering unique contributions of each modality.
Multimodal attention mechanism addresses over-focusing on intra-modality attention.
Multiple Softmax ...

research-article

Gist, Content, Target-Oriented: A 3-Level Human-Like Framework for Video Moment Retrieval

IEEE Transactions on Multimedia (TOM), Volume 26Pages 11044–11056https://doi.org/10.1109/TMM.2024.3443672

Video moment retrieval (VMR) aims to locate corresponding moments in an untrimmed video via a given natural language query. While most existing approaches treat this task as a cross-modal content matching or boundary prediction problem, recent studies ...

research-article

Anonymity in Attribute-Based Access Control: Framework and Metric

IEEE Transactions on Dependable and Secure Computing (TDSC), Volume 21, Issue 1Pages 463–475https://doi.org/10.1109/TDSC.2023.3261309

Anonymous access is an effective method for preserving privacy in access control. This study assumes that anonymous access control requires both frameworks and policies. Numerous solutions have been proposed for anonymous access at the framework level. In ...

research-article

Global semantic enhancement network for video captioning

Pattern Recognition (PATT), Volume 145, Issue Chttps://doi.org/10.1016/j.patcog.2023.109906

Abstract

Video captioning aims to briefly describe the content of a video in accurate and fluent natural language, which is a hot research topic in multimedia processing. As a bridge between video and natural language, video captioning is a challenging ...

Highlights

A video captioning framework called global semantic enhancement network is proposed.
It highlights features of informative frames in aggregated video representations.
It enhances semantic correlations between video and language ...

research-article

Language-Guided Visual Aggregation Network for Video Question Answering

MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 5195–5203https://doi.org/10.1145/3581783.3613909

Video Question Answering (VideoQA) aims to comprehend intricate relationships, actions, and events within video content, as well as the inherent links between objects and scenes, to answer text-based questions accurately. Transferring knowledge from the ...

Article

Enhancing CLIP-Based Text-Person Retrieval by Leveraging Negative Samples

Pattern Recognition and Computer VisionPages 271–283https://doi.org/10.1007/978-981-99-8540-1_22

Abstract

Text-person retrieval (TPR) is a fine-grained cross-modal retrieval task that aims to find matching person images through detailed text descriptions. Various recent cross-modal pre-trained vision-language based models (e.g., CLIP [12]) have ...

research-article

Bi-Attention enhanced representation learning for image-text matching

Pattern Recognition (PATT), Volume 140, Issue Chttps://doi.org/10.1016/j.patcog.2023.109548

Highlights

A novel image-text representation learning network BAERL is proposed.
It captures both inter- and intra-modality correlations between image regions and words.
Self-similarity polynomial loss is used to enhance the matching ...

Abstract

Image-text matching has become a research hotspot in recent years. The key point of image-text matching is to accurately measure the similarity between an image and a sentence. However, most existing methods either focus on the inter-modality ...

research-article

A Task Scheduling Algorithm for Micro-cloud Platform Based on Task Real-time

CNCIT '23: Proceedings of the 2023 2nd International Conference on Networks, Communications and Information TechnologyPages 196–200https://doi.org/10.1145/3605801.3605839

To enhance the efficiency of different subsystems within embedded devices, thereby augmenting their capabilities, we incorporate cloud computing architecture within these devices to create an in-device micro-cloud platform. This allows subsystems, which ...

research-article

VM performance-aware virtual machine migration method based on ant colony optimization in cloud environment

Journal of Parallel and Distributed Computing (JPDC), Volume 176, Issue CPages 17–27https://doi.org/10.1016/j.jpdc.2023.02.003

Abstract

Many virtual machine (VM) allocation methods have been proposed to reduce the number of physical machines (PMs), improve resource utilization for cloud service providers. If VMs are migrated on the same PM, then there will be substantial resource ...

Highlights

VM Performance-Aware VMM method of improving users' experience and cloud service providers' benefits.
Optimizations include: maximizing VM performance, and minimizing the number of active PMs and the total migration cost.
An ...

research-article

A deep neural network model for coreference resolution in geological domain

Information Processing and Management: an International Journal (IPRM), Volume 60, Issue 3https://doi.org/10.1016/j.ipm.2023.103268

Highlights

The coreference resolution of geological entities in the geological domain is explored for the first time.
A novel framework for geological entity coreference resolution based on deep learning methods is proposed.
A CNN-based multi-...

Abstract

Coreference resolution of geological entities is an important task in geological information mining. Although the existing generic coreference resolution models can handle geological texts, a dramatic decline in their performance can occur ...

research-article

Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition

Information Sciences: an International Journal (ISCI), Volume 578, Issue CPages 195–213https://doi.org/10.1016/j.ins.2021.07.034

Abstract

Facial expression recognition (FER) is challenging because the appearance of an expression varies significantly depending on head pose and inter-subject characteristics. With existing techniques, it is often difficult to learn both pose-aware and ...

research-article

Single Image 3D Object Estimation with Primitive Graph Networks

MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 2353–2361https://doi.org/10.1145/3474085.3475398

Reconstructing 3D object from a single image (RGB or depth) is a fundamental problem in visual scene understanding and yet remains challenging due to its ill-posed nature and complexity in real-world scenes. To address those challenges, we adopt a ...

research-article

Retrieving point cloud models of target objects in a scene from photographed images

Multimedia Tools and Applications (MTAA), Volume 80, Issue 4Pages 6311–6328https://doi.org/10.1007/s11042-020-09879-2

Abstract

Image-based reconstruction is devoted to recovering the 3D point cloud models of target objects from scene images photographed at different viewpoints, and the existing methods often produce a large number of redundant background points, which ...

research-article

Diagnosis and Prediction of Bearing Fault Using EEMD and CNN

ICASIT 2020: Proceedings of the 2020 International Conference on Aviation Safety and Information TechnologyPages 282–289https://doi.org/10.1145/3434581.3434669

Rolling bearing is a very essential component of the industrial machinery. The bearing fault could cause a significant loss. Therefore, it is necessary to perform fault diagnosis and prediction on the bearing. This paper combines Ensemble Empirical Mode ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences