Cross-domain images have been witnessed in an increasing number of applications. This new trend triggers demands for cross-domain image retrieval (CDIR), which finds images in one visual domain according to a query image from another visual domain. Although image retrieval has been studied extensively, exploration of the CDIR remains at its initial stage. This study systematically surveys the methods and applications of the CDIR. Since images from different visual domains exhibit different features, learning discriminative feature representations while preserving domain-invariant features of images from different visual domains is the main challenge of the CDIR. According to the feature transformation stage of images from different visual domains, existing CDIR methods are categorized and analyzed. One is based on feature space migration and the other is based on image domain migration. Then, applications of CDIR in clothing, infrared, remote sensing, sketch, and other scenarios are summarized. Finally, the existing CDIR schemes are concluded, and new directions for future research are proposed.
The funding were provided by the Beijing Natural Science Foundation (Grant No. 4202017), the Key Research and Development Program of Anhui Province of China (Grant No. 202104a07020017) and the the Youth Talent Support Program of Beijing Municipal Education Commission (Grant No. CIT&TCD201904050).
Appendix 1: Terminologies of cross-domain image retrieval
Figure 5 presents an example to illustrate the main concepts of CDIR. One day, you see someone else wearing a pair of very beautiful shoes on the street, and you want to buy a pair of the same shoes. However, you think it is a bit abrupt to ask directly and it is not polite to take pictures, so you silently write down the style of the shoes. When you get home, you sketch the shoe and retrieve it on e-commerce sites to find the same model. First, you upload the hand-drawn sketch (source domain) to the e-commerce site. E-commerce site analyzes the sketch and extracts the sketch features to store in the feature space. Subsequently, similar features are found in the database based on the extracted sketch features. Note that the images in the database of the e-commerce site are preprocessed pictures. Map (mapping function) the retrieved features and sketch features into the same space (common space). Finally, output the search result. The frequently used terminologies are listed in Table 4.
Appendix 2: Evaluation metrics of cross-domain image retrieval
The commonly used evaluation metrics for CDIR are shown in Table 5.
Table 6 shows the computation of TP, FN, FP, and TN. In Table 6, P represents the correct prediction of the model, and N represents the wrong prediction of the model. The precision is defined as TP divided by the sum of TP and FP, and the recall is defined as TP divided by the sum of TP and FN. The relevant results are shown in Table 6.
