research-article

DPNET: Dynamic Poly-attention Network for Trustworthy Multi-modal Classification

Authors:

Xin Zou,

Chang Tang,

Xiao Zheng,

Zhenglai Li,

Xiao He,

Shan An,

Xinwang LiuAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 3550 - 3559

https://doi.org/10.1145/3581783.3612652

Published: 27 October 2023 Publication History

Get Access

Abstract

With advances in sensing technology, multi-modal data collected from different sources are increasingly available. Multi-modal classification aims to integrate complementary information from multi-modal data to improve model classification performance. However, existing multi-modal classification methods are basically weak in integrating global structural information and providing trustworthy multi-modal fusion, especially in safety-sensitive practical applications (e.g., medical diagnosis). In this paper, we propose a novel Dynamic Poly-attention Network (DPNET) for trustworthy multi-modal classification. Specifically, DPNET has four merits: (i) To capture the intrinsic modality-specific structural information, we design a structure-aware feature aggregation module to learn the corresponding structure-preserved global compact feature representation. (ii) A transparent fusion strategy based on the modality confidence estimation strategy is induced to track information variation within different modalities for dynamical fusion. (iii) To facilitate more effective and efficient multi-modal fusion, we introduce a cross-modal low-rank fusion module to reduce the complexity of tensor-based fusion and activate the implication of different rank-wise features via a rank attention mechanism. (iv) A label confidence estimation module is devised to drive the network to generate more credible confidence. An intra-class attention loss is introduced to supervise the network training. Extensive experiments on four real-world multi-modal biomedical datasets demonstrate that the proposed method achieves competitive performance compared to other state-of-the-art ones.

References

[1]

Moloud Abdar, Farhad Pourpanah, Sadiq Hussain, Dana Rezazadegan, Li Liu, Mohammad Ghavamzadeh, Paul Fieguth, Xiaochun Cao, Abbas Khosravi, U Rajendra Acharya, et al. 2021. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Information Fusion, Vol. 76 (2021), 243--297.

Abstract

References

Cited By

Index Terms

Recommendations

Learning-Based Confidence Estimation for Multi-modal Classifier Fusion

Focusing computational visual attention in multi-modal human-robot interaction

Attention and Gaze in Situated Language Interaction

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations