DAM: Dynamic Adapter Merging for Continual Video QA Learning

Cheng, Feng; Wang, Ziyang; Sung, Yi-Lin; Lin, Yan-Bo; Bansal, Mohit; Bertasius, Gedas

Computer Science > Computer Vision and Pattern Recognition

arXiv:2403.08755 (cs)

[Submitted on 13 Mar 2024 (v1), last revised 22 Apr 2024 (this version, v2)]

Title:DAM: Dynamic Adapter Merging for Continual Video QA Learning

Authors:Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

View PDF HTML (experimental)

Abstract:We present a parameter-efficient method for continual video question-answering (VidQA) learning. Our method, named DAM, uses the proposed Dynamic Adapter Merging to (i) mitigate catastrophic forgetting, (ii) enable efficient adaptation to continually arriving datasets, (iii) handle inputs from unknown datasets during inference, and (iv) enable knowledge sharing across similar dataset domains. Given a set of continually streaming VidQA datasets, we sequentially train dataset-specific adapters for each dataset while freezing the parameters of a large pretrained video-language backbone. During inference, given a video-question sample from an unknown domain, our method first uses the proposed non-parametric router function to compute a probability for each adapter, reflecting how relevant that adapter is to the current video-question input instance. Subsequently, the proposed dynamic adapter merging scheme aggregates all the adapter weights into a new adapter instance tailored for that particular test sample to compute the final VidQA prediction, mitigating the impact of inaccurate router predictions and facilitating knowledge sharing across domains. Our DAM model outperforms prior state-of-the-art continual learning approaches by 9.1% while exhibiting 1.9% less forgetting on 6 VidQA datasets spanning various domains. We further extend DAM to continual image classification and image QA and outperform prior methods by a large margin. The code is publicly available at: this https URL

Comments:	The first two authors contribute equally
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2403.08755 [cs.CV]
	(or arXiv:2403.08755v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2403.08755

Submission history

From: Feng Cheng [view email]
[v1] Wed, 13 Mar 2024 17:53:47 UTC (2,668 KB)
[v2] Mon, 22 Apr 2024 19:17:49 UTC (2,668 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:DAM: Dynamic Adapter Merging for Continual Video QA Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DAM: Dynamic Adapter Merging for Continual Video QA Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators