Gaussian Adaptive Attention is All You Need: Robust Contextual Representations Across Multiple Modalities

Ioannides, Georgios; Chadha, Aman; Elkins, Aaron

Computer Science > Machine Learning

arXiv:2401.11143v1 (cs)

[Submitted on 20 Jan 2024 (this version), latest version 29 Sep 2024 (v4)]

Title:Gaussian Adaptive Attention is All You Need: Robust Contextual Representations Across Multiple Modalities

Authors:Georgios Ioannides, Aman Chadha, Aaron Elkins

View PDF HTML (experimental)

Abstract:We propose the Multi-Head Gaussian Adaptive Attention Mechanism (GAAM), a novel probabilistic attention framework, and the Gaussian Adaptive Transformer (GAT), designed to enhance information aggregation across multiple modalities, including Speech, Text and Vision. GAAM integrates learnable mean and variance into its attention mechanism, implemented in a Multi-Headed framework enabling it to collectively model any Probability Distribution for dynamic recalibration of feature significance. This method demonstrates significant improvements, especially with highly non-stationary data, surpassing the state-of-the-art attention techniques in model performance (up to approximately +20% in accuracy) by identifying key elements within the feature space. GAAM's compatibility with dot-product-based attention models and relatively low number of parameters showcases its adaptability and potential to boost existing attention frameworks. Empirically, GAAM exhibits superior adaptability and efficacy across a diverse range of tasks, including emotion recognition in speech, image classification, and text classification, thereby establishing its robustness and versatility in handling multi-modal data. Furthermore, we introduce the Importance Factor (IF), a new learning-based metric that enhances the explainability of models trained with GAAM-based methods. Overall, GAAM represents an advancement towards development of better performing and more explainable attention models across multiple modalities.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
Cite as:	arXiv:2401.11143 [cs.LG]
	(or arXiv:2401.11143v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2401.11143

Submission history

From: Georgios Ioannides [view email]
[v1] Sat, 20 Jan 2024 06:42:32 UTC (4,158 KB)
[v2] Thu, 25 Jan 2024 22:48:28 UTC (4,160 KB)
[v3] Wed, 31 Jan 2024 01:22:43 UTC (4,157 KB)
[v4] Sun, 29 Sep 2024 00:45:46 UTC (4,151 KB)

Computer Science > Machine Learning

Title:Gaussian Adaptive Attention is All You Need: Robust Contextual Representations Across Multiple Modalities

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Gaussian Adaptive Attention is All You Need: Robust Contextual Representations Across Multiple Modalities

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators