OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

Sarkar, Rohan; Bodla, Navaneeth; Vasileva, Mariya; Lin, Yen-Liang; Beniwal, Anurag; Lu, Alan; Medioni, Gerard

Computer Science > Computer Vision and Pattern Recognition

arXiv:2204.04812v1 (cs)

[Submitted on 11 Apr 2022 (this version), latest version 15 Apr 2022 (v2)]

Title:OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

Authors:Rohan Sarkar, Navaneeth Bodla, Mariya Vasileva, Yen-Liang Lin, Anurag Beniwal, Alan Lu, Gerard Medioni

View PDF

Abstract:Learning an effective outfit-level representation is critical for predicting the compatibility of items in an outfit, and retrieving complementary items for a partial outfit. We present a framework, OutfitTransformer, that uses the proposed task-specific tokens and leverages the self-attention mechanism to learn effective outfit-level representations encoding the compatibility relationships between all items in the entire outfit for addressing both compatibility prediction and complementary item retrieval tasks. For compatibility prediction, we design an outfit token to capture a global outfit representation and train the framework using a classification loss. For complementary item retrieval, we design a target item token that additionally takes the target item specification (in the form of a category or text description) into consideration. We train our framework using a proposed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification as inputs. The generated target item embedding is then used to retrieve compatible items that match the rest of the outfit. Additionally, we adopt a pre-training approach and a curriculum learning strategy to improve retrieval performance. Since our framework learns at an outfit-level, it allows us to learn a single embedding capturing higher-order relations among multiple items in the outfit more effectively than pairwise methods. Experiments demonstrate that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks. We further validate the quality of our retrieval results with a user study.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2204.04812 [cs.CV]
	(or arXiv:2204.04812v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.04812

Submission history

From: Rohan Sarkar [view email]
[v1] Mon, 11 Apr 2022 00:55:40 UTC (36,314 KB)
[v2] Fri, 15 Apr 2022 23:28:15 UTC (36,313 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:OutfitTransformer: Learning Outfit Representations for Fashion Recommendation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators