research-article

Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality Contribution

Authors:

Haifeng HuAuthors Info & Claims

IEEE Transactions on Multimedia, Volume 26

Pages 3018 - 3033

https://doi.org/10.1109/TMM.2023.3306489

Published: 01 January 2024 Publication History

Abstract

In multimodal representation learning, different modalities do not contribute equally. Especially when learning with noisy modalities that convey non-discriminative information, the prediction based on multimodal representation is often biased and even ignores the knowledge from informative modalities. In this paper, we aim to address the noisy modality problem and balance the contributions of multiple modalities dynamically in a parallel format. Specifically, we construct multiple base learners and formulate our framework as a boosting-like algorithm, where different base learners focus on different aspects of multimodal learning. To identify the contributions of individual base learners, we develop a contribution learning network that dynamically determines the contribution and noise level of each base learner. In contrast to the commonly considered attention mechanism, we define the transformation of predictive loss as the supervision signal to train the contribution learning network, which enables more accurate learning of modality importance. We derive the final prediction by incorporating the predictions of base learners based on their contributions. Notably, different from late fusion, we devise a multimodal base learner to explore the cross-modal interactions. To update the network, we design the ‘complementary update mechanism’, where for each base learner, we assign higher weights to those samples that are incorrectly predicted by other base learners. In this way, we can leverage the available information to correctly predict each sample to the utmost extent and enable different base learners to learn different aspects of multimodal information. Extensive experiments demonstrate that the proposed method achieves superior performance on multimodal sentiment analysis and emotion recognition.

References

[1]

T. Baltrušaitis, C. Ahuja, and L.-P. Morency, “Multimodal machine learning: A survey and taxonomy,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 2, pp. 423–443, Feb. 2019.

Digital Library

[2]

P. P. Liang et al., “MultiBench: Multiscale benchmarks for multimodal representation learning,” in Proc. 35th Conf. Neural Inf. Process. Syst. Datasets Benchmarks Track (Round 1), 2021, pp. 1–97.

[3]

Y. Li, M. Yang, and Z. Zhang, “A survey of multi-view representation learning,” IEEE Trans. Knowl. Data Eng., vol. 31, no. 10, pp. 1863–1883, Oct. 2019.

[4]

P. P. Liang, Z. Liu, A. Zadeh, and L. P. Morency, “Multimodal language analysis with recurrent multistage fusion,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2018, pp. 150–161.

[5]

W. Rahman et al., “Integrating multimodal information in large pretrained transformers,” in Proc. Conf. Assoc. Comput. Linguistics. Meeting, 2020, pp. 2359–2369.

[6]

S. Mai, H. Hu, J. Xu, and S. Xing, “Multi-fusion residual memory network for multimodal human sentiment comprehension,” IEEE Trans. Affect. Comput., vol. 13, no. 1, pp. 320–334, Jan.–Mar. 2022.

[7]

C. Lee and M. Schaar, “A variational information bottleneck approach to multi-omics data integration,” in Proc. Int. Conf. Artif. Intell. Statist., 2021, pp. 1513–1521.

[8]

W. Kay et al., “The kinetics human action video dataset,” 2017, arXiv:1705.06950.

[9]

S. Poria, E. Cambria, R. Bajpai, and A. Hussain, “A review of affective computing: From unimodal analysis to multimodal fusion,” Inf. Fusion, vol. 37, pp. 98–125, 2017.

Digital Library

[10]

S. Poria et al., “Context-dependent sentiment analysis in user-generated videos,” in Proc. 55th Annu. Meeting Assoc. Comput. Linguistics, 2017, pp. 873–883.

[11]

N. Sebe, I. Cohen, and T. S. Huang, “Multimodal emotion recognition,” in Handbook of Pattern Recognition and Computer Vision. Singapore: World Sci., 2005, pp. 387–409.

[12]

T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “Emotions don't lie: An audio-visual deepfake detection method using affective cues,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 2823–2832.

Digital Library

[13]

L. Stappen et al., “MuSe-Toolbox: The multimodal sentiment analysis continuous annotation fusion and discrete class transformation toolbox,” in Proc. 2nd Multimodal Sentiment Anal. Challenge, 2021, pp. 75–82.

Digital Library

[14]

S. Sankaran, D. Yang, and S.-N. Lim, “Multimodal fusion refiner networks,” 2021, arXiv:2104.03435.

[15]

S. Mai, H. Hu, and S. Xing, “A unimodal representation learning and recurrent decomposition fusion structure for utterance-level multimodal embedding learning,” IEEE Trans. Multimedia, vol. 24, pp. 2488–2501, 2022.

Digital Library

[16]

Y.-H. H. Tsai et al., “Multimodal transformer for unaligned multimodal language sequences,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics, Jul. 2019, pp. 6558–6569. [Online]. Available: https://www.aclweb.org/anthology/P19-1656

[17]

Y. Zhu, O. Groth, M. Bernstein, and L. Fei-Fei, “Visual7W: Grounded question answering in images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4995–5004.

[18]

S. Mai, H. Hu, and S. Xing, “Divide, conquer and combine: Hierarchical feature fusion network with local and global perspectives for multimodal affective computing,” in Proc.IEEE 57th Conf. Assoc. Comput. Linguistics, Jul. 2019, pp. 481–492. [Online]. Available: https://www.aclweb.org/anthology/P19-1046

[19]

S. Mai, S. Xing, and H. Hu, “Analyzing multimodal sentiment via acoustic- and visual-LSTM with channel-aware temporal convolution network,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 29, pp. 1424–1437, 2021.

Digital Library

[20]

Y. Zeng, S. Mai, and H. Hu, “Which is making the contribution: Modulating unimodal and cross-modal dynamics for multimodal sentiment analysis,” in Proc. Findings Assoc. Comput. Linguistics: EMNLP, 2021, pp. 1262–1274.

[21]

D. S. Chauhan, A. Ekbal, and P. Bhattacharyya, “An efficient fusion mechanism for multimodal low-resource setting,” in Proc. 45th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2022, pp. 2583–2588.

Digital Library

[22]

C. Zhang et al., “Boosting-based multimodal speaker detection for distributed meeting videos,” IEEE Trans. Multimedia, vol. 10, pp. 1541–1552, 2008.

Digital Library

[23]

R. E. Schapire, “A brief introduction to boosting,” in Proc. Int. Joint Conf. Artif. Intell., 1999, pp. 1401–1406.

[24]

J.-M. Pérez-Rúa, V. Vielzeuf, S. Pateux, M. Baccouche, and F. Jurie, “MFAS: Multimodal fusion architecture search,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 6966–6975.

[25]

O. Kampman, E. J. Barezi, D. Bertero, and P. Fung, “Investigating audio, visual, and text fusion methods for end-to-end automatic personality prediction,” in Proc. 56th Annu. Meeting Assoc. Computat. Linguistics (Volume 2: Short Papers), 2018, pp. 606–611.

[26]

A. Zadeh, R. Zellers, E. Pincus, and L. P. Morency, “Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages,” IEEE Intell. Syst., vol. 31, no. 6, pp. 82–88, Nov./Dec. 2016.

Digital Library

[27]

A. Zadeh, M. Chen, S. Poria, E. Cambria, and L. P. Morency, “Tensor fusion network for multimodal sentiment analysis,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2017, pp. 1114–1125.

[28]

S. Mai, H. Hu, and S. Xing, “Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 164–172.

[29]

A. Rahate, R. Walambe, S. Ramanna, and K. Kotecha, “Multimodal co-learning: Challenges, applications with datasets, recent advances and future directions,” Inf. Fusion, vol. 81, pp. 203–239, 2022.

Digital Library

[30]

L. Meng, A.-H. Tan, and D. Xu, “Semi-supervised heterogeneous fusion for multimedia data co-clustering,” IEEE Trans. Knowl. Data Eng., vol. 26, no. 9, pp. 2293–2306, Sep. 2014.

[31]

B. Nojavanasghari, D. Gopinath, J. Koushik, T. Baltrušaitis, and L.-P. Morency, “Deep multimodal fusion for persuasiveness prediction,” in Proc. 18th ACM Int. Conf. Multimodal Interaction, 2016, pp. 284–288.

Digital Library

[32]

V. Rozgic, S. Ananthakrishnan, S. Saleem, R. Kumar, and R. Prasad, “Ensemble of SVM trees for multimodal emotion recognition,” in Proc. IEEE Signal Inf. Process. Assoc. Summit Conf., 2012, pp. 1–4.

[33]

A. Anastasopoulos, S. Kumar, and H. Liao, “Neural language modeling with visual features,” 2019, arXiv:1903.02930.

[34]

W. Wu, C. Zhang, and P. C. Woodland, “Emotion recognition by fusing time synchronous and time asynchronous representations,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2021, pp. 6269–6273.

[35]

Z. Liu et al., “Efficient low-rank multimodal fusion with modality-specific factors,” in Proc. Assoc. Comput. Linguistics, 2018, pp. 2247–2256.

[36]

S. Mai, S. Xing, and H. Hu, “Locally confined modality fusion network with a global perspective for multimodal human affective computing,” IEEE Trans. Multimedia, vol. 22, pp. 122–137, 2020.

Digital Library

[37]

M. Hou, J. Tang, J. Zhang, W. Kong, and Q. Zhao, “Deep multimodal multilinear fusion with high-order polynomial pooling,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 12113–12122.

[38]

J.-H. Kim et al., “Hadamard product for low-rank bilinear pooling,” in Proc. Int. Conf. Learn. Representations, 2017, pp. 1–14.

[39]

Z. Yu, J. Yu, C. Xiang, J. Fan, and D. Tao, “Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering,” IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 5947–5959, Dec. 2018.

[40]

W. Peng, X. Hong, and G. Zhao, “Adaptive modality distillation for separable multimodal sentiment analysis,” IEEE Intell. Syst., vol. 36, no. 3, pp. 82–89, May/Jun. 2021.

[41]

A. Zadeh et al., “Memory fusion network for multi-view sequential learning,” in Proc. AAAI Conf. Artif. Intell., 2018, pp. 5634–5641.

[42]

K. Zhang, Y. Li, J. Wang, E. Cambria, and X. Li, “Real-time video emotion recognition based on reinforcement learning and domain knowledge,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1034–1047, Mar. 2022.

[43]

Y. Peng and J. Qi, “CM-GANs: Cross-modal generative adversarial networks for common representation learning,” ACM Trans. Multimedia Comput., Commun., Appl., vol. 15, no. 1, pp. 1–24, 2019.

Digital Library

[44]

H. Pham, P. P. Liang, T. Manzini, L. P. Morency, and P. Barnabǎs, “Found in translation: Learning robust joint representations by cyclic translations between modalities,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 6892–6899.

[45]

S. Mai, S. Xing, J. He, Y. Zeng, and H. Hu, “Multimodal graph for unaligned multimodal sequence analysis via graph convolution and graph pooling,” ACM Trans. Multimedia Comput., Commun. Appl., vol. 19, no. 2, pp. 1–24, 2023.

Digital Library

[46]

M. Behmanesh, P. Adibi, S. M. S. Ehsani, and J. Chanussot, “Geometric multimodal deep learning with multiscaled graph wavelet convolutional network,” IEEE Trans. Neural Netw. Learn. Syst., early access, Oct. 25, 2022.

[47]

M. Angelou, V. Solachidis, N. Vretos, and P. Daras, “Graph-based multimodal fusion with metric learning for multimodal classification,” Pattern Recognit., vol. 95, pp. 296–307, 2019.

Digital Library

[48]

Q. Li, D. Gkoumas, C. Lioma, and M. Melucci, “Quantum-inspired multimodal fusion for video sentiment analysis,” Inf. Fusion, vol. 65, pp. 58–71, 2021. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S1566253520303365

[49]

Y.-H. H. Tsai, M. Q. Ma, M. Yang, R. Salakhutdinov, and L.-P. Morency, “Multimodal routing: Improving local and global interpretability of multimodal language analysis,” in Proc. Conf. Empirical Methods Natural Lang. Process. Conf. Empirical Methods Natural Lang. Process., vol. 2020, Art. no.

[50]

M. S. Akhtar et al., “Multi-task learning for multi-modal emotion recognition and sentiment analysis,” in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2019, pp. 370–379.

[51]

D. S. Chauhan, S. Dhanush, A. Ekbal, and P. Bhattacharyya, “Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis,” in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, 2020, pp. 4351–4360.

[52]

Z. Sun, P. Sarma, W. Sethares, and Y. Liang, “Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 8992–8999.

[53]

S. H. Dumpala, I. Sheikh, R. Chakraborty, and S. K. Kopparapu, “Audio-visual fusion for sentiment classification using cross-modal autoencoder,” in Proc. 32nd Conf. Neural Inf. Process. Syst.2018, pp. 1–4.

[54]

S. Shankar, “Multimodal fusion via cortical network inspired losses,” in Proc. 60th Annu. Meeting Assoc. Comput. Linguistics, 2022, pp. 1167–1178.

[55]

Y. Sun, S. Mai, and H. Hu, “Learning to learn better unimodal representations via adaptive multimodal meta-learning,” IEEE Trans. Affect. Comput., early access, May 27, 2022.

Digital Library

[56]

W. Han, H. Chen, and S. Poria, “Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2021, pp. 9180–9192.

[57]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. North Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2019, pp. 4171–4186.

[58]

H. Tan and M. Bansal, “LXMERT: Learning cross-modality encoder representations from transformers,” in Proc. Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process., 2019, pp. 5100–5111.

[59]

J. Lu, D. Batra, D. Parikh, and S. Lee, “ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks,” in Proc. Adv. Neural Inf. Process. Syst., 2019.

[60]

G. Li, N. Duan, Y. Fang, M. Gong, and D. Jiang, “Unicoder-VL: A universal encoder for vision and language by cross-modal pre-training,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 11336–11344.

[61]

C. Sun, A. Myers, C. Vondrick, K. Murphy, and C. Schmid, “VideoBert: A joint model for video and language representation learning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 7464–7473.

[62]

Z. Yang et al., “XLNet: Generalized autoregressive pretraining for language understanding,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 1–11.

[63]

J. Lu, J. Yang, D. Batra, and D. Parikh, “Hierarchical question-image co-attention for visual question answering,” in Proc. Adv. Neural Inf. Process. Syst., 2016, pp. 1–9.

[64]

X. Xue, C. Zhang, Z. Niu, and X. Wu, “Multi-level attention map network for multimodal sentiment analysis,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 5, pp. 5105–5118, May 2023.

Digital Library

[65]

S. A. Qureshi, S. Saha, M. Hasanuzzaman, and G. Dias, “Multitask representation learning for multimodal estimation of depression level,” IEEE Intell. Syst., vol. 34, no. 5, pp. 45–52, Sep./Oct. 2019.

[66]

E. Cambria, N. Howard, J. Hsu, and A. Hussain, “Sentic blending: Scalable multimodal fusion for the continuous interpretation of semantics and sentics,” in Proc. IEEE Symp. Comput. Intell. Hum.-like Intell., 2013, pp. 108–117.

[67]

T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha, “M3er: Multiplicative multimodal emotion recognition using facial, textual, and speech cues,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 1359–1367.

[68]

M. Chen et al., “Multimodal sentiment analysis with word-level fusion and reinforcement learning,” in Proc. 19th ACM Int. Conf. Multimodal Interaction, 2017, pp. 163–171.

Digital Library

[69]

C. Louizos, M. Welling, and D. P. Kingma, “Learning sparse neural networks through L₀ regularization,” in Proc. Int. Conf. Learn. Representations, 2018, pp. 1–9.

[70]

S. Mai, Y. Zeng, and H. Hu, “Multimodal information bottleneck: Learning minimal sufficient unimodal and multimodal representations,” IEEE Trans. Multimedia, early access, May 03, 2022.

Digital Library

[71]

T. Rahman, B. Xu, and L. Sigal, “Watch, listen and tell: Multi-modal weakly supervised dense event captioning,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2019, pp. 8908–8917.

[72]

W. Yu, H. Xu, Z. Yuan, and J. Wu, “Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis,” in Proc. AAAI Conf. Artif. Intell., 2021, pp. 10790–10797.

[73]

Y. Hirano, S. Okada, and K. Komatani, “Recognizing social signals with weakly supervised multitask learning for multimodal dialogue systems,” in Proc. Int. Conf. Multimodal Interaction, 2021, pp. 141–149.

[74]

W. Dai, S. Cahyawijaya, Y. Bang, and P. Fung, “Weakly-supervised multi-task learning for multimodal affect recognition,” 2021, arXiv:2104.11560.

[75]

A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 1–11.

[76]

J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proc. Empirical Methods Natural Lang. Process., 2014, pp. 1532–1543. [Online]. Available: http://www.aclweb.org/anthology/D14-1162

[77]

D. Hazarika, R. Zimmermann, and S. Poria, “MISA: Modality-invariant and -specific representations for multimodal sentiment analysis,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 1122–1131.

Digital Library

[78]

A. Zadeh et al., “Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph,” in Proc. Assoc. Comput. Linguistics, 2018, pp. 2236–2246.

[79]

C. Busso et al., “IEMOCAP: Interactive emotional dyadic motion capture database,” Lang. Resour. Eval., vol. 42, no. 4, pp. 335–359, 2008.

[80]

Y. Wang et al., “Words can shift: Dynamically adjusting word representations using nonverbal behaviors,” in Proc. AAAI Conf. Artif. Intell., 2019, pp. 7216–7223.

[81]

Z. Yuan, W. Li, H. Xu, and W. Yu, “Transformer-based feature reconstruction network for robust multimodal sentiment analysis,” in Proc. 29th ACM Int. Conf. Multimedia, 2021, pp. 4400–4407.

Digital Library

[82]

S. Mai, Y. Zeng, S. Zheng, and H. Hu, “Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis,” IEEE Trans. Affect. Comput., early access, May 03, 2022.

Digital Library

[83]

G. Degottex, J. Kane, T. Drugman, T. Raitio, and S. Scherer, “COVAREP: A collaborative voice analysis repository for speech technologies,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 960–964.

[84]

K. Yang, H. Xu, and K. Gao, “CM-BERT: Cross-modal bert for text-audio sentiment analysis,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 521–528.

Digital Library

[85]

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014, arXiv:1412.6980.

[86]

D. Gkoumas, Q. Li, C. Lioma, Y. Yu, and D. wei Song, “What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis,” Inf. Fusion, vol. 66, pp. 184–197, 2021.

[87]

W. Han et al., “Bi-bimodal modality fusion for correlation-controlled multimodal sentiment analysis,” in Proc. Int. Conf. Multimodal Interaction, 2021, pp. 6–15.

[88]

S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” 2018, arXiv: 1803.01271.

[89]

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Digital Library

[90]

K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2014, pp. 1724–1734.

[91]

W. Yu et al., “CH-SIMS: A chinese multimodal sentiment analysis dataset with fine-grained annotation of modality,” in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, 2020, pp. 3718–3727.

Cited By

Zhang DNayak RBashar M(2024)Pre-gating and contextual attention gate — A new fusion method for multi-modal data tasksNeural Networks10.1016/j.neunet.2024.106553179:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.neunet.2024.106553

Index Terms

Multimodal Boosting: Addressing Noisy Modalities and Identifying Modality Contribution
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
Abstract
Multimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific ...
Highlights
- Novel multimodal adaptive weight matrix enables accurate sentiment analysis by considering unique contributions of each modality.
- Multimodal attention mechanism addresses over-focusing on intra-modality attention.
- Multiple Softmax ...
Incorporating non-verbal modalities in spoken language understanding for multimodal conversational systems
Modality translation-based multimodal sentiment analysis under uncertain missing modalities
Abstract
Multimodal sentiment analysis (MSA) with uncertain missing modalities poses a new challenge in sentiment analysis. To address this problem, efficient MSA models that consider missing modalities have been proposed. However, existing studies have ...
Highlights
- We propose a Multimodal Sentiment Analysis model for uncertain missing modalities.
- We translate the visual and audio to the text to improve modalities’ quality.
- We apply a pre-trained model to supervise the model to handle ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Multimedia

IEEE Transactions on Multimedia Volume 26, Issue

2024

11427 pages

ISSN:1520-9210

Issue’s Table of Contents

1520-9210 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang DNayak RBashar M(2024)Pre-gating and contextual attention gate — A new fusion method for multi-modal data tasksNeural Networks10.1016/j.neunet.2024.106553179:COnline publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1016/j.neunet.2024.106553

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents