Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleNovember 2024
Conditional Prototypical Optimal Transport for Enhanced Clue Identification in Multiple Choice Question Answering
AbstractThis paper introduces the Conditional Prototypical Optimal Transport (CPOT) algorithm for clue identification in Multiple Choice Question Answering (MCQA) tasks. Existing clue-based methods suffer from inefficiencies, often relying on pseudo-...
- ArticleNovember 2024
Frequency-Domain Transformation-Based Dynamic Gesture Recognition with Skeleton
AbstractGraph convolutional networks (GCNs) have been widely used in skeleton-based hand gesture recognition due to strong ability in mining non-Euclidean features. However, GCNs cannot effectively extract long temporal information. To address this issue, ...
- research-articleOctober 2024
Improving smartphone GNSS positioning in challenging urban environments using GA-BPNN
AbstractSmartphones have become the mainstream terminals in the field of location services due to their low cost, portability, and ubiquity. In highly dynamic situations, the challenging urban environment causes the received Global Navigation Satellite ...
- ArticleSeptember 2024
Extractive Question Answering with Contrastive Puzzles and Reweighted Clues
Document Analysis and Recognition - ICDAR 2024Pages 97–112https://doi.org/10.1007/978-3-031-70552-6_6AbstractThe task of Extractive Question Answering (EQA) involves identifying correct answer spans in response to provided questions and passages. The emergence of Pretrained Language Models (PLMs) has sparked increased interest in leveraging these models ...
- ArticleSeptember 2024
ConClue: Conditional Clue Extraction for Multiple Choice Question Answering
Document Analysis and Recognition - ICDAR 2024Pages 183–198https://doi.org/10.1007/978-3-031-70552-6_11AbstractThe task of Multiple Choice Question Answering (MCQA) aims to identify the correct answer from a set of candidates, given a background passage and an associated question. Considerable research efforts have been dedicated to addressing this task, ...
-
- research-articleJuly 2024
Static graph convolution with learned temporal and channel-wise graph topology generation for skeleton-based action recognition
Computer Vision and Image Understanding (CVIU), Volume 244, Issue Chttps://doi.org/10.1016/j.cviu.2024.104012AbstractGraph convolutional networks (GCNs) are widely used in skeleton-based action recognition. It is known that the graph topology is a vital part in GCNs, and different kinds of graph topologies have been proposed for skeleton-based action ...
Highlights- Temporal frame-wise and channel-wise topology based GCNs (TC-GCNs) are developed instead of using a predefined topology.
- The proposed TC-GCNs can be integrated with the conventional dynamic graph to improve performance.
- Extensive ...
- research-articleJuly 2024
DFN: A deep fusion network for flexible single and multi-modal action recognition
Expert Systems with Applications: An International Journal (EXWA), Volume 245, Issue Chttps://doi.org/10.1016/j.eswa.2024.123145AbstractMulti-modal action recognition methods can be generally classified into two categories: (1) fusing multi-modal features with simple concatenation or fusing the classification scores of individual modalities without considering the interaction ...
Highlights- End-to-end trainable deep fusion network (DFN) for action recognition.
- DFN outperforms the commonly used fusion methods.
- DFN performs better than single modality cases when one modality is missing.
- Competitive performance ...
- research-articleJune 2024
Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 20, Issue 8Article No.: 250, Pages 1–19https://doi.org/10.1145/3663570Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, ...
- research-articleFebruary 2024
An attention-based CNN for automatic whole-body postural assessment
Expert Systems with Applications: An International Journal (EXWA), Volume 238, Issue PFhttps://doi.org/10.1016/j.eswa.2023.122391AbstractFully automatic postural assessment is highly useful, but has been challenging. Conventional methods either require manual assessment by ergonomists or depend on special devices that are intrusive, thus being hardly feasible in daily activities ...
Highlights- A novel attention-based CNN for automatic whole-body postural assessment.
- The network works directly on single color images rather than 3D skeletons.
- A new multi-view and multi-modality dataset is created for postural assessment ...
- research-articleJuly 2024
Hierarchical Aggregated Graph Neural Network for Skeleton-Based Action Recognition
IEEE Transactions on Multimedia (TOM), Volume 26Pages 11003–11017https://doi.org/10.1109/TMM.2024.3428330Supervised human action recognition methods based on skeleton data have achieved impressive performance recently. However, many current works emphasize the design of different contrastive strategies to gain stronger supervised signals, ignoring the ...
- research-articleNovember 2023
mmHSV: In-Air Handwritten Signature Verification via Millimeter-Wave Radar
ACM Transactions on Internet of Things (TIOT), Volume 4, Issue 4Article No.: 27, Pages 1–22https://doi.org/10.1145/3614443Electronic signatures are widely used in financial business, telecommuting, and identity authentication. Offline electronic signatures are vulnerable to copy or replay attacks. Contact-based online electronic signatures are limited by indirect contact ...
- research-articleSeptember 2023
A consensus model under framework of prospect theory with acceptable adjustment and endo-confidence
Highlights- Considering the dynamic reference in prospect theory to reflect the idea of DMs.
Consensus is an important issue in group decision making to make a reliable and scientific decision, and it has become a hot topic recently. Due to the complexity and uncertainty of decision-making problems, several aspects of ...
- research-articleJuly 2023
Modeling Long-range Dependencies and Epipolar Geometry for Multi-view Stereo
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 19, Issue 6Article No.: 200, Pages 1–17https://doi.org/10.1145/3596445This article proposes a network, referred to as Multi-View Stereo TRansformer (MVSTR) for depth estimation from multi-view images. By modeling long-range dependencies and epipolar geometry, the proposed MVSTR is capable of extracting dense features with ...
- research-articleJune 2023
Neural network model based on global and local features for multi-view mammogram classification
- Lili Xia,
- Jianpeng An,
- Chao Ma,
- Hongjun Hou,
- Yanpeng Hou,
- Linyang Cui,
- Xuheng Jiang,
- Wanqing Li,
- Zhongke Gao
AbstractMammography is an important screening criterion for breast cancer, one of the major diseases causing numerous deaths among female patients. Meanwhile, manual diagnosis of mammography is a time-consuming and labor-consuming job. ...
- research-articleMay 2023
Novel View Synthesis from a Single Unposed Image via Unsupervised Learning
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 19, Issue 6Article No.: 186, Pages 1–23https://doi.org/10.1145/3587467Novel view synthesis aims to generate novel views from one or more given source views. Although existing methods have achieved promising performance, they usually require paired views with different poses to learn a pixel transformation. This article ...
- research-articleMarch 2023
Sign language recognition via dimensional global–local shift and cross-scale aggregation
Neural Computing and Applications (NCAA), Volume 35, Issue 17Pages 12481–12493https://doi.org/10.1007/s00521-023-08380-9AbstractSign languages generally consist of a sequence of upper body gestures and are cooperative processes among various parts such as the hands, arms, and face. Therefore, the dynamics of the parts as well as the holistic appearance of the upper body ...
- ArticleMarch 2023
Learning Using Privileged Information for Zero-Shot Action Recognition
AbstractZero-Shot Action Recognition (ZSAR) aims to recognize video actions that have never been seen during training. Most existing methods assume a shared semantic space between seen and unseen actions and intend to directly learn a mapping from a ...
- ArticleMarch 2023
Focal and Global Spatial-Temporal Transformer for Skeleton-Based Action Recognition
AbstractDespite great progress achieved by transformer in various vision tasks, it is still underexplored for skeleton-based action recognition with only a few attempts. Besides, these methods directly calculate the pair-wise global self-attention equally ...
- research-articleApril 2024
Towards video text visual question answering: benchmark and baseline
- Minyi Zhao,
- Bingjia Li,
- Jie Wang,
- Wanqing Li,
- Wenjing Zhou,
- Lan Zhang,
- Shijie Xuyang,
- Zhihang Yu,
- Xinkun Yu,
- Guangze Li,
- Aobotao Dai,
- Shuigeng Zhou
NIPS '22: Proceedings of the 36th International Conference on Neural Information Processing SystemsArticle No.: 2576, Pages 35549–35562There are already some text-based visual question answering (TextVQA) benchmarks for developing machine's ability to answer questions based on texts in images in recent years. However, models developed on these benchmarks cannot work effectively in many ...
- ArticleOctober 2022
Contrastive Positive Mining for Unsupervised 3D Action Representation Learning
AbstractRecent contrastive based 3D action representation learning has made great progress. However, the strict positive/negative constraint is yet to be relaxed and the use of non-self positive is yet to be explored. In this paper, a Contrastive Positive ...