Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
SARS: A Personalized Federated Learning Framework Towards Fairness and Robustness against Backdoor Attacks
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 8, Issue 4Article No.: 140, Pages 1–24https://doi.org/10.1145/3678571Federated Learning (FL), an emerging distributed machine learning framework that enables each client to collaboratively train a global model by sharing local knowledge without disclosing local private data, is vulnerable to backdoor model poisoning ...
- research-articleNovember 2024
ByteMQ: A Cloud-native Streaming Data Layer in ByteDance
- Yancan Mao,
- Ruohang Yin,
- Liyuan Lei,
- Peng Ye,
- Shengfu Zou,
- Shizheng Tang,
- Yunzhe Guo,
- Ye Yuan,
- Xiaochen Yu,
- Bo Wan,
- Yunfei Gong,
- Changli Gao,
- Guanghui Zhang,
- Jian Shen,
- Rui Shi,
- Richard T. B. Ma
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 774–791https://doi.org/10.1145/3698038.3698536Real-time streaming data is generated in high volumes and consumed for statistical and analytical purposes, requiring efficient and effective management by Message Queuing Systems (MQS) that ensure high throughput and low latency. ByteDance relies ...
- ArticleNovember 2024
Animate Your Motion: Turning Still Images into Dynamic Videos
AbstractIn recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on employing ...
- ArticleSeptember 2024
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
AbstractParameter-efficient transfer learning (PETL) has emerged as a flourishing research field for adapting large pre-trained models to downstream tasks, greatly reducing trainable parameters while grappling with memory challenges during fine-tuning. To ...
- research-articleSeptember 2024
Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs
Information Processing and Management: an International Journal (IPRM), Volume 61, Issue 5https://doi.org/10.1016/j.ipm.2024.103805AbstractMedical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to ...
-
- research-articleJune 2024
Hierarchical reinforcement learning from imperfect demonstrations through reachable coverage-based subgoal filtering
AbstractReinforcement learning (RL) has shown remarkable success in navigating complex robotic and gaming landscapes. However, achieving such results often requires a substantial number of interaction episodes between the agent and its environment, ...
Highlights- The use of HRLfD vastly improves the performance of RL in large and complex tasks.
- We greatly alleviate the inevitable problem of imperfect demonstrations in LfD.
- We propose a novel measure to discriminate negative noise ...
- research-articleMarch 2024
Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
AbstractMultimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific ...
Highlights- Novel multimodal adaptive weight matrix enables accurate sentiment analysis by considering unique contributions of each modality.
- Multimodal attention mechanism addresses over-focusing on intra-modality attention.
- Multiple Softmax ...
- research-articleJanuary 2024
Gist, Content, Target-Oriented: A 3-Level Human-Like Framework for Video Moment Retrieval
IEEE Transactions on Multimedia (TOM), Volume 26Pages 11044–11056https://doi.org/10.1109/TMM.2024.3443672Video moment retrieval (VMR) aims to locate corresponding moments in an untrimmed video via a given natural language query. While most existing approaches treat this task as a cross-modal content matching or boundary prediction problem, recent studies ...
- research-articleJanuary 2024
Anonymity in Attribute-Based Access Control: Framework and Metric
IEEE Transactions on Dependable and Secure Computing (TDSC), Volume 21, Issue 1Pages 463–475https://doi.org/10.1109/TDSC.2023.3261309Anonymous access is an effective method for preserving privacy in access control. This study assumes that anonymous access control requires both frameworks and policies. Numerous solutions have been proposed for anonymous access at the framework level. In ...
- research-articleJanuary 2024
Global semantic enhancement network for video captioning
AbstractVideo captioning aims to briefly describe the content of a video in accurate and fluent natural language, which is a hot research topic in multimedia processing. As a bridge between video and natural language, video captioning is a challenging ...
Highlights- A video captioning framework called global semantic enhancement network is proposed.
- It highlights features of informative frames in aggregated video representations.
- It enhances semantic correlations between video and language ...
- research-articleOctober 2023
Language-Guided Visual Aggregation Network for Video Question Answering
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 5195–5203https://doi.org/10.1145/3581783.3613909Video Question Answering (VideoQA) aims to comprehend intricate relationships, actions, and events within video content, as well as the inherent links between objects and scenes, to answer text-based questions accurately. Transferring knowledge from the ...
- ArticleDecember 2023
Enhancing CLIP-Based Text-Person Retrieval by Leveraging Negative Samples
AbstractText-person retrieval (TPR) is a fine-grained cross-modal retrieval task that aims to find matching person images through detailed text descriptions. Various recent cross-modal pre-trained vision-language based models (e.g., CLIP [12]) have ...
- research-articleAugust 2023
Bi-Attention enhanced representation learning for image-text matching
Highlights- A novel image-text representation learning network BAERL is proposed.
- It captures both inter- and intra-modality correlations between image regions and words.
- Self-similarity polynomial loss is used to enhance the matching ...
Image-text matching has become a research hotspot in recent years. The key point of image-text matching is to accurately measure the similarity between an image and a sentence. However, most existing methods either focus on the inter-modality ...
- research-articleAugust 2023
A Task Scheduling Algorithm for Micro-cloud Platform Based on Task Real-time
CNCIT '23: Proceedings of the 2023 2nd International Conference on Networks, Communications and Information TechnologyPages 196–200https://doi.org/10.1145/3605801.3605839To enhance the efficiency of different subsystems within embedded devices, thereby augmenting their capabilities, we incorporate cloud computing architecture within these devices to create an in-device micro-cloud platform. This allows subsystems, which ...
- research-articleJune 2023
VM performance-aware virtual machine migration method based on ant colony optimization in cloud environment
Journal of Parallel and Distributed Computing (JPDC), Volume 176, Issue CPages 17–27https://doi.org/10.1016/j.jpdc.2023.02.003AbstractMany virtual machine (VM) allocation methods have been proposed to reduce the number of physical machines (PMs), improve resource utilization for cloud service providers. If VMs are migrated on the same PM, then there will be substantial resource ...
Highlights- VM Performance-Aware VMM method of improving users' experience and cloud service providers' benefits.
- Optimizations include: maximizing VM performance, and minimizing the number of active PMs and the total migration cost.
- An ...
- research-articleMay 2023
A deep neural network model for coreference resolution in geological domain
Information Processing and Management: an International Journal (IPRM), Volume 60, Issue 3https://doi.org/10.1016/j.ipm.2023.103268Highlights- The coreference resolution of geological entities in the geological domain is explored for the first time.
- A novel framework for geological entity coreference resolution based on deep learning methods is proposed.
- A CNN-based multi-...
Coreference resolution of geological entities is an important task in geological information mining. Although the existing generic coreference resolution models can handle geological texts, a dramatic decline in their performance can occur ...
- research-articleNovember 2021
Dynamic multi-channel metric network for joint pose-aware and identity-invariant facial expression recognition
Information Sciences: an International Journal (ISCI), Volume 578, Issue CPages 195–213https://doi.org/10.1016/j.ins.2021.07.034AbstractFacial expression recognition (FER) is challenging because the appearance of an expression varies significantly depending on head pose and inter-subject characteristics. With existing techniques, it is often difficult to learn both pose-aware and ...
- research-articleOctober 2021
Single Image 3D Object Estimation with Primitive Graph Networks
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 2353–2361https://doi.org/10.1145/3474085.3475398Reconstructing 3D object from a single image (RGB or depth) is a fundamental problem in visual scene understanding and yet remains challenging due to its ill-posed nature and complexity in real-world scenes. To address those challenges, we adopt a ...
- research-articleFebruary 2021
Retrieving point cloud models of target objects in a scene from photographed images
Multimedia Tools and Applications (MTAA), Volume 80, Issue 4Pages 6311–6328https://doi.org/10.1007/s11042-020-09879-2AbstractImage-based reconstruction is devoted to recovering the 3D point cloud models of target objects from scene images photographed at different viewpoints, and the existing methods often produce a large number of redundant background points, which ...
- research-articleDecember 2020
Diagnosis and Prediction of Bearing Fault Using EEMD and CNN
ICASIT 2020: Proceedings of the 2020 International Conference on Aviation Safety and Information TechnologyPages 282–289https://doi.org/10.1145/3434581.3434669Rolling bearing is a very essential component of the industrial machinery. The bearing fault could cause a significant loss. Therefore, it is necessary to perform fault diagnosis and prediction on the bearing. This paper combines Ensemble Empirical Mode ...