In the Era of Prompt Learning with Vision-Language Models

Jha, Ankit

Computer Science > Computer Vision and Pattern Recognition

arXiv:2411.04892 (cs)

[Submitted on 7 Nov 2024]

Title:In the Era of Prompt Learning with Vision-Language Models

Authors:Ankit Jha

View PDF HTML (experimental)

Abstract:Large-scale foundation models like CLIP have shown strong zero-shot generalization but struggle with domain shifts, limiting their adaptability. In our work, we introduce \textsc{StyLIP}, a novel domain-agnostic prompt learning strategy for Domain Generalization (DG). StyLIP disentangles visual style and content in CLIP`s vision encoder by using style projectors to learn domain-specific prompt tokens and combining them with content features. Trained contrastively, this approach enables seamless adaptation across domains, outperforming state-of-the-art methods on multiple DG benchmarks. Additionally, we propose AD-CLIP for unsupervised domain adaptation (DA), leveraging CLIP`s frozen vision backbone to learn domain-invariant prompts through image style and content features. By aligning domains in embedding space with entropy minimization, AD-CLIP effectively handles domain shifts, even when only target domain samples are available. Lastly, we outline future work on class discovery using prompt learning for semantic segmentation in remote sensing, focusing on identifying novel or rare classes in unstructured environments. This paves the way for more adaptive and generalizable models in complex, real-world scenarios.

Comments:	ICVGIP 2024, Young Faculty Symposium
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2411.04892 [cs.CV]
	(or arXiv:2411.04892v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2411.04892

Submission history

From: Ankit Jha [view email]
[v1] Thu, 7 Nov 2024 17:31:21 UTC (1,335 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:In the Era of Prompt Learning with Vision-Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:In the Era of Prompt Learning with Vision-Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators