面向视觉数据处理与分析的解耦表示学习综述

李雅婷; 肖晶; 廖良; 王正; 陈文益; 王密

doi:10.11834/jig.211261

综述 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

面向视觉数据处理与分析的解耦表示学习综述
A review of disentangled representation learning for visual data processing and analysis
2023年28卷第4期页码：903-934
纸质出版日期： 2023-04-16 ，
DOI： 10.11834/jig.211261
稿件说明：

移动端阅览

李雅婷，肖晶，廖良，王正，陈文益，王密. 2023. 面向视觉数据处理与分析的解耦表示学习综述. 中国图象图形学报， 28(04):0903-0934

Li Yating， Xiao Jing， Liao Liang， Wang Zheng， Chen Wenyi， Wang Mi. 2023. A review of disentangled representation learning for visual data processing and analysis. Journal of Image and Graphics， 28(04):0903-0934
李雅婷，肖晶，廖良，王正，陈文益，王密. 2023. 面向视觉数据处理与分析的解耦表示学习综述. 中国图象图形学报， 28(04):0903-0934 DOI： 10.11834/jig.211261.

Li Yating， Xiao Jing， Liao Liang， Wang Zheng， Chen Wenyi， Wang Mi. 2023. A review of disentangled representation learning for visual data processing and analysis. Journal of Image and Graphics， 28(04):0903-0934 DOI： 10.11834/jig.211261.

摘要

表示学习是机器学习研究的核心问题之一。机器学习算法的输入表征从过去主流的手工特征过渡到现在面向多媒体数据的潜在表示，使算法性能获得了巨大提升。然而，视觉数据的表示通常是高度耦合的，即输入数据的所有信息成分被编码进同一个特征空间，从而互相影响且难以区分，使得表示的可解释性不高。解耦表示学习旨在学习一种低维的可解释性抽象表示，可以识别并分离出隐藏在高维观测数据中的不同潜在变化因素。通过解耦表示学习，可以捕获到单个变化因素信息并通过相对应的潜在子空间进行控制，因此解耦表示更具有可解释性。解耦表征可用于提高样本效率和对无关干扰因素的容忍度，为数据中的复杂变化提供一种鲁棒性表示，提取的语义信息对识别分类、域适应等人工智能下游任务具有重要意义。本文首先介绍并分析解耦表示的研究现状及其因果机制，总结解耦表示的3个重要性质。然后，将解耦表示学习算法分为4类，并从数学描述、类型特点及适用范围3个方面进行归纳及对比。随后，分类总结了现有解耦表示工作中常用的损失函数、数据集及客观评估指标。最后，总结了解耦表示学习在实际问题中的各类应用，并对其未来发展进行了探讨。

Abstract

Representation learning is essential for machine learning technique nowadays. The transition of input representations have been developing intensively in algorithm performance benefited from the growth of hand-crafted features to the representation for multi-media data. However， the representations of visual data are often highly entangled. The interpretation challenges are to be faced because all information components are encoded into the same feature space. Disentangled representation learning （DRL） aims to learn a low-dimensional interpretable abstract representation that can sort the multiple factors of variation out in high-dimensional observations. In the disentangled representation， we can capture and manipulate the information of a single factor of variation through the corresponding latent subspace， which makes it more interpretable. DRL can improve sample efficiency and tolerance to the nuisance variables and offer robust representation of complex variations. Their semantic information is extracted and beneficial for artificial intelligence （AI） downstream tasks like recognition， classification and domain adaptation. Our summary is focused on brief introduction to the definition， research development and applications of DRL. Some of independent component analysis （ICA）-nonlinear DRL researches are covered as well since the DRL is similar to the identifiability issue of nonlinear independent component analysis （nonlinear ICA）. The cause and effects mechanism of DRL as high-dimensional ground truth data is generated by a set of unobserved changing factors （generating factors）. The DRL can be used to model the factors of variation in terms of latent representation， and the observed data generation process is restored. We summarize the key elements that a well-defined disentangled representation should be qualified into three aspects， which are 1） modularity， 2） compactness， and 3） explicitness. First， explicitness consists of the two sub-requirements of completeness and informativeness. Then， current DRL types are categorized into 1） dimension-wise disentanglement， 2） semantic-based disentanglement， 3） hierarchical disentanglement， and 4） nonlinear ICA four types in terms of its formulation， characteristics， and scope of application. Dimension-wise disentanglement is assumed that the generative factors are solely and each dimension of latent vector can be separated and mapped， which is suitable for learning the disentangled representation of simple synthetic visual data. Semantic-based disentanglement is hypnotized that some semantic information is solely as well. The generative factors are group-disentangled in terms of specific semantics and they are mapped to different latent spaces， which is suitable for complicated ground truth data. Hierarchical disentanglement is based on the assumption that there is a correlation between generative factors at different levels of abstraction. The generative factors are disentangled by group from the bottom up and they can be mapped to latent space of different semantic abstraction levels to form a hierarchical disentangled representation. Nonlinear ICA provides an identifiable method for observed data-mixed disentangling unknown generative factors through a nonlinear reversible generator. For the motivation of loss functions， the loss functions can be commonly used in disentangled representation learning， which are grouped into three categories： 1） modularity constraint： a single latent variable-constrained in the disentangled representation to capture only a single or a single group of factors of variation， and it promotes the separation of factors of variation mutually； 2） explicitness constraint： current latent variable of the latent representation is activated to encode the ground truth of the corresponding generating factor effectively， and the entire latent representation contains complete information about all generative factors； and 3） multi-purpose constraint： loss-related can optimize multiple disentangled representation， including modularity， compactness， and explicitness of the disentangled representation at the same time. The model-relevant can combine multiple loss constraint terms to form the final hybrid objective function. We compare the scope of application and limitations of each type of loss functions and summarize the classical disentangled representation works using the hybrid objective function further.

关键词

解耦表示学习视觉数据潜在表征变化因素潜空间

Keywords

disentangled representation learningvisual datalatent representationfactors of variationlatent space

references

Achille A， Eccles T， Matthey L， Burgess C， Watters N， Lerchner A and Higgins I. 2018. Life-long disentangled representation learning with cross-domain latent homologies//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal， Canada： Curran Associates Inc

Aifanti N， Papachristou C and Delopoulos A. 2010. The MUG facial expression database//Proceedings of the 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10. Desenzano del Garda， Italy： IEEE： 1-4

Aubry M， Maturana D， Efros A A， Russell B C and Sivic J. 2014. Seeing 3D chairs： exemplar part-based 2D-3D alignment using a large dataset of CAD models//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 3762-3769 ［DOI： 10.1109/CVPR.2014.487http://dx.doi.org/10.1109/CVPR.2014.487］

Bai J W， Kong S F and Gomes C. 2020a. Disentangled variational autoencoder based multi-label classification with covariance-aware multivariate probit model//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama， Japan： IJCAI.org： 4313-4321 ［DOI： 10.24963/ijcai.2020/595http://dx.doi.org/10.24963/ijcai.2020/595］

Bai Y， Lou Y H， Dai Y X， Liu J， Chen Z Q and Duan L Y. 2020b. Disentangled feature learning network for vehicle re-identification//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama， Japan： IJCAI.org： 474-480 ［DOI： 10.24963/ijcai.2020/66http://dx.doi.org/10.24963/ijcai.2020/66］

Baktashmotlagh M， Faraki M， Drummond T and Salzmann M. 2018. Learning factorized representations for open-set domain adaptation. ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1805.12277.pdfhttps://arxiv.org/pdf/1805.12277.pdf

Bass C， da Silva M， Sudre C， Tudosiu P D， Smith S M and Robinson E C. 2020. ICAM： interpretable classification via disentangled representations and feature attribution mapping//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 7697-7709

Bengio Y， Courville A and Vincent P. 2013. Representation learning： a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence， 35（8）： 1798-1828 ［DOI： 10.1109/TPAMI.2013.50http://dx.doi.org/10.1109/TPAMI.2013.50］

Bepler T， Zhong E D， Kelley K， Brignole E and Berger B. 2019. Explicitly disentangling image content from translation and rotation with spatial-VAE//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates， Inc.： 15435-15445

Bi S， Sunkavalli K， Perazzi F， Shechtman E， Kim V G and Ramamoorthi R. 2019. Deep CG2Real： synthetic-to-real translation via image disentanglement//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 2730-2739 ［DOI： 10.1109/ICCV.2019.00282http://dx.doi.org/10.1109/ICCV.2019.00282］

Blank M， Gorelick L， Shechtman E， Irani M and Basri R. 2005. Actions as space-time shapes//Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing， China： IEEE： 1395-1402 ［DOI： 10.1109/ICCV.2005.28http://dx.doi.org/10.1109/ICCV.2005.28］

Bouchacourt D， Tomioka R and Nowozin S. 2018. Multi-level variational autoencoder： learning disentangled representations from grouped observations. Proceedings of 2018 AAAI Conference on Artificial Intelligence， 32（1）： 2095-2102 ［DOI： 10.1609/aaai.v32i1.11867http://dx.doi.org/10.1609/aaai.v32i1.11867］

Bromley J， Guyon I， LeCun Y， Säckinger E and Shah R. 1993. Signature verification using a “Siamese” time delay neural network//Proceedings of the 6th International Conference on Neural Information Processing Systems. Denver， Colorado， USA： Morgan Kaufmann Publishers Inc.： 737-744

Burgess C and Kim H. 2018. 3D shapes dataset ［EB/OL］. ［2022-01-21］. https://github.com/deepmind/3d-shapeshttps://github.com/deepmind/3d-shapes

Burgess C P， Higgins I， Pal A， Matthey L， Watters N， Desjardins G and Lerchner A. 2018. Understanding disentangling in β-VAE ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1804.03599.pdfhttps://arxiv.org/pdf/1804.03599.pdf

Cai R C， Li Z J， Wei P F， Qiao J， Zhang K and Hao Z F. 2019. Learning disentangled semantic representation for domain adaptation//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao， China： IJCAI.org： 2060-2066 ［DOI： 10.24963/ijcai.2019/285http://dx.doi.org/10.24963/ijcai.2019/285］

Carbonneau M A， Zaïdi J， Boilard J and Gagnon G. 2022. Measuring disentanglement： a review of metrics.IEEE Transactions on Neural Networks and Learning Systems. 2022：1-15 ［DOI： 10.1109/TNNLS.2022.3218982http://dx.doi.org/10.1109/TNNLS.2022.3218982］

Chang M B， Ullman T， Torralba A and Tenenbaum J B. 2017. A compositional object-based approach to learning physical dynamics ［EB/OL］. ［2022-01-21］. http：//arxiv.org/pdf/1612.00341.pdfhttp://arxiv.org/pdf/1612.00341.pdf

Chartsias A， Joyce T， Papanastasiou G， Semple S， Williams M， Newby D E， Dharmakumar R and Tsaftaris S A. 2019. Disentangled representation learning in cardiac image analysis. Medical Image Analysis， 58： #101535 ［DOI： 10.1016/j.media.2019.101535http://dx.doi.org/10.1016/j.media.2019.101535］

Chartsias A， Papanastasiou G， Wang C J， Stirrat C， Semple S， Newby D， Dharmakumar R and Tsaftaris S A. 2020. Multimodal cardiac segmentation using disentangled representation learning//Proceedings of the 10th International Workshop on Statistical Atlases and Computational Models of the Heart. Multi-Sequence CMR Segmentation， CRT-EPiggy and LV Full Quantification Challenges. Shenzhen， China： Springer： 128-137 ［DOI： 10.1007/978-3-030-39074-7_14http://dx.doi.org/10.1007/978-3-030-39074-7_14］

Chen H Y， Chen F and He H J. 2021a. SSC-GAN： a novel gan based on the same solution constraints of first-order ODEs. International Journal of Pattern Recognition and Artificial Intelligence. 35（11）： #2152018 ［DOI： 10.1142/S0218001421530062http://dx.doi.org/10.1142/S0218001421530062］

Chen H， Lagadec B and Bremond F. 2021b. ICE： inter-instance contrastive encoding for unsupervised person re-identification//Proceedings of 2021 IEEE International Conference on Computer Vision. IEEE： 14960-14969

Chen R T Q， Li X C， Grosse R and Duvenaud D. 2019. Isolating sources of disentanglement in variational autoencoders ［EB/OL］.［2022-01-21］. https://arxiv.org/pdf/1802.04942.pdfhttps://arxiv.org/pdf/1802.04942.pdf

Chen X， Duan Y， Houthooft R， Schulman J， Sutskever I and Abbeel P. 2016. InfoGAN： interpretable representation learning by information maximizing generative adversarial nets//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona， Spain： Curran Associates Inc.： 2180-2188

Chen X， Lian C F， Wang L， Deng H N， Kuang T S， Fung S H， Gateno J， Shen D G， Xia J J and Yap P T. 2021c. Diverse data augmentation for learning image segmentation with cross-modality annotations//Medical Image Analysis. 71： #102060 ［DOI： 10.1016/j.media.2021.102060］

Choi Y J， Choi M J， Kim M Y， Ha J W， Kim S H and Choo J. 2018. StarGAN： unified generative adversarial networks for multi-domain image-to-image translation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8789-8797

Cohen G， Afshar S， Tapson J and van Schaik A. 2017. EMNIST： an extension of MNIST to handwritten letters//Proceedings of 2017 International Joint Conference on Neural Networks （IJCNN）. Anchorage， USA： IEEE： 2921-2926 ［DOI： 10.1109/IJCNN.2017.7966217http://dx.doi.org/10.1109/IJCNN.2017.7966217］

Creager E， Madras D， Jacobsen J H， Weis M A， Swersky K， Pitassi T and Zemel R. 2019. Flexibly fair representation learning by disentanglement//Proceedings of the 36th International Conference on Machine Learning. Long Beach， USA： PMLR： 1436-1445

Deng Y， Yang J L， Chen D， Wen F and Tong X. 2020. Disentangled and controllable face image generation via 3D imitative-contrastive learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 5153-5162 ［DOI： 10.1109/CVPR42600.2020.00520http://dx.doi.org/10.1109/CVPR42600.2020.00520］

Denton E and Birodkar V. 2017. Unsupervised learning of disentangled representations from video//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 4417-4426

Detlefsen N S and Hauberg S. 2019. Explicit disentanglement of appearance and perspective in generative models//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Curran Associates， Inc.： 1018-1028

Ding W， Li L， Huang L and Zhuang X. 2022. Unsupervised multi-modality registration network based on spatially encoded gradient information//Statistical Atlases and Computational Models of the Heart. Multi-Disease， Multi-View， and Multi-Center Right Ventricular Segmentation in Cardiac MRI Challenge. Strasbourg， France： Cham： Springer International Publishing： 151-159 ［DOI： 10.1007/978-3-030-93722-5_17http://dx.doi.org/10.1007/978-3-030-93722-5_17］

Ding Z， Xu Y F， Xu W J， Parmar G， Yang Y， Welling M and Tu Z W. 2020. Guided variational autoencoder for disentanglement learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， USA： IEEE： 7917-7926 ［DOI： 10.1109/CVPR42600.2020.00794http://dx.doi.org/10.1109/CVPR42600.2020.00794］

Dinh L， Sohl-Dickstein J and Bengio S. 2017. Density estimation using real NVP ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1605.08803.pdfhttps://arxiv.org/pdf/1605.08803.pdf

Dou Q， Ouyang C， Chen C， Chen H， Glocker B， Zhuang X H and Heng P A. 2019. PnP-AdaNet： plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation. IEEE Access. 7： 99065-99076 ［DOI： 10.1109/ACCESS.2019.2929258http://dx.doi.org/10.1109/ACCESS.2019.2929258］

Duan B Y， Fu C Y， Li Y， Song X G and He R. 2020. Cross-spectral face hallucination via disentangling independent factors//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 7927-7935 ［DOI： 10.1109/CVPR42600.2020.00795http://dx.doi.org/10.1109/CVPR42600.2020.00795］

Dupont E. 2018. Learning disentangled joint continuous and discrete representations//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal， Canada： Curran Associates Inc.： 708-718

Dutta A and Akata Z. 2019. Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 5089-5098

Dutta T， Singh A and Biswas S. 2021. StyleGuide： zero-shot sketch-based image retrieval using style-guided image generation. IEEE Transactions on Multimedia. 23： 2833-2842 ［DOI： 10.1109/TMM.2020.3017918http://dx.doi.org/10.1109/TMM.2020.3017918］

Eastwood C and Williams C K I. 2018. A framework for the quantitative evaluation of disentangled representations//Proceedings of the 6th International Conference on Learning Representations. Vancouver， Canada： OpenReview.net

Eitz M， Richter R， Boubekeur T and Hildebrand K. 2012. Sketch-based shape retrieval. ACM Transactions on graphics （TOG）. 31（4）： 1-10 ［DOI： 10.1145/2185520.2185527http://dx.doi.org/10.1145/2185520.2185527］

Eom C and Ham B. 2019. Learning disentangled representation for robust person re-identification//Proceedings of the 33rd International Conference on Neural Information Processing Systems.Curran Associates， Inc.： 5297-5308

Esmaeili B， Wu H， Jain S， Bozkurt A， Siddharth N， Paige B， Brooks D H， Dy J and Van de Meent J W. 2019. Structured disentangled representations//Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. Naha， Japan： PMLR： 2525-2534

Estermann B， Marks M and Yanik M F. 2020. Robust disentanglement of a few factors at a time using rPU-VAE//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 13387-13398

Fidler S， Dickinson S and Urtasun R. 2012. 3D object detection and viewpoint estimation with a deformable 3D cuboid model//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe， USA： Curran Associates Inc.： 611-619

Fraccaro M， Kamronn S， Paquet U and Winther O. 2017. A disentangled recognition and nonlinear dynamics model for unsupervised learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 3604-3613

Fu Y， Wei Y， Zhou Y， Shi H， Huang G， Wang X， Yao Z and Huang T. 2019. Horizontal pyramid matching for person re-identification.Proceedings of 2019 AAAI Conference on Artificial Intelligence， 33（1）， 8295-8302 ［DOI： 10.1609/aaai.v33i01.33018295http://dx.doi.org/10.1609/aaai.v33i01.33018295］

Gilbert A， Collomosse J， Jin H L and Price B. 2018. Disentangling structure and aesthetics for style-aware image completion//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1848-1856 ［DOI： 10.1109/CVPR.2018.00198http://dx.doi.org/10.1109/CVPR.2018.00198］

Gondal M W， Wüthrich M， Miladinović Đ， Locatello F， Breidt M， Volchkov V， Akpo J， Bachem O， Schölkopf B and Bauer S. 2019. On the transfer of inductive bias from simulation to the real world： a new disentanglement dataset//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates， Inc.： 15740-15751

Gonzalez-Garcia A， Van de Weijer J and Bengio Y. 2018. Image-to-image translation for cross-domain disentanglement//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal， Canada： Curran Associates Inc.： 1294-1305

Goodfellow I J， Pouget-Abadie J， Mirza M， Xu B， Warde-Farley D， Ozair S， Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 2672-2680

Gowal S， Qin C L， Huang P S， Cemgil T， Dvijotham K， Mann T and Kohli P. 2020. Achieving robustness in the wild via adversarial mixing with disentangled representations//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 1208-1217 ［DOI： 10.1109/CVPR42600.2020.00129http://dx.doi.org/10.1109/CVPR42600.2020.00129］

Grathwohl W and Wilson A. 2016. Disentangling space and time in video with hierarchical variational auto-encoders ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1612.04440.pdfhttps://arxiv.org/pdf/1612.04440.pdf

Gulrajani I， Kumar K， Ahmed F， Taïga A A， Visin F， V􀅡zquez D and Courville A C. 2016. PixelVAE： a latent variable model for natural images. ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1611.05013.pdfhttps://arxiv.org/pdf/1611.05013.pdf

Guo W K， Huang H B， Kong X W and He R. 2019. Learning disentangled representation for cross-modal retrieval with deep mutual information estimation//Proceedings of the 27th ACM International Conference on Multimedia. Nice， France： ACM： 1712-1720 ［DOI： 10.1145/3343031.3351053http://dx.doi.org/10.1145/3343031.3351053］

Hadad N， Wolf L and Shahar M. 2018. A two-step disentanglement method//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 772-780 ［DOI： 10.1109/CVPR.2018.00087http://dx.doi.org/10.1109/CVPR.2018.00087］

Hamaguchi R， Sakurada K and Nakamura R. 2019. Rare event detection using disentangled representation learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9319-9327 ［DOI： 10.1109/CVPR.2019.00955http://dx.doi.org/10.1109/CVPR.2019.00955］

Higgins I， Amos D， Pfau D， Racaniere S， Matthey L， Rezende D and Lerchner A. 2018. Towards a definition of disentangled representations ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1812.02230.pdfhttps://arxiv.org/pdf/1812.02230.pdf

Higgins I， Matthey L， Pal A， Burgess C， Glorot X， Botvinick M， Mohamed S and Lerchner A. 2017. β-VAE： learning basic visual concepts with a constrained variational framework//Proceedings of the 5th International Conference on Learning Representations. Toulon， France： OpenReview. net

Hinton G E and Salakhutdinov R R. 2006. Reducing the dimensionality of data with neural networks. Science， 313（5786）： 504-507 ［DOI： 10.1126/science.1127647http://dx.doi.org/10.1126/science.1127647］

Hochreiter S and Schmidhuber J. 1997. Long short-term memory. Neural Computation， 9（8）： 1735-1780 ［DOI： 10.1162/neco.1997.9.8.1735http://dx.doi.org/10.1162/neco.1997.9.8.1735］

Hsieh J T， Liu B B， Huang D A， Li F F and Niebles J C. 2018. Learning to decompose and disentangle representations for video prediction//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal， Canada： Curran Associates Inc.： 515-524

Hsu W N， Zhang Y and Glass J. 2017. Unsupervised learning of disentangled and interpretable representations from sequential data//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach， USA： Curran Associates Inc.： 1876-1887

Huang X， Liu M Y， Belongie S and Kautz J. 2018. Multimodal unsupervised image-to-image translation//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 172-189

Hwang H， Kim G H， Hong S and Kim K E. 2020. Variational interaction information maximization for cross-domain disentanglement//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 22479-22491

Jiang Z H， Wu Q Y， Chen K Y and Zhang J Y. 2019. Disentangled representation learning for 3D face shape//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 11949-11958 ［DOI： 10.1109/CVPR.2019.01223http://dx.doi.org/10.1109/CVPR.2019.01223］

Jung D， Lee J， Yi J H and Yoon S. 2020. ICAPS： an interpretable classifier via disentangled capsule networks//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 314-330 ［DOI： 10.1007/978-3-030-58529-7_19http://dx.doi.org/10.1007/978-3-030-58529-7_19］

Kim H and Mnih A. 2018. Disentangling by factorising//Proceedings of the 35th International Conference on Machine Learning. Stockholm， Sweden： PMLR： 2649-2658

Khemakhem I， Kingma D， Monti R and Hyvarinen A. 2020. Variational autoencoders and nonlinear ICA： a unifying framework//Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. Palermo， Italy： PMLR： 2207-2217

Kingma D P and Welling M. 2013. Auto-encoding variational Bayes ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1312.6114v1.pdfhttps://arxiv.org/pdf/1312.6114v1.pdf

Klindt D， Schott L， Sharma Y， Ustyuzhaninov I， Brendel W， Bethge M and Paiton D. 2021. Towards nonlinear disentanglement in natural data with temporal sparse coding. ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/2007.10930.pdfhttps://arxiv.org/pdf/2007.10930.pdf

Kondo R， Kawano K， Koide S and Kutsuna T. 2019. Flow-based image-to-image translation with feature disentanglement//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates， Inc.： 4168-4178

Kotovenko D， Sanakoyeu A， Lang S and Ommer B. 2019. Content and style disentanglement for artistic style transfer//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 4421-4430 ［DOI： 10.1109/ICCV.2019.00452http://dx.doi.org/10.1109/ICCV.2019.00452］

Krause J， Stark M， Deng J and Li F F. 2013. 3D object representations for fine-grained categorization//Proceedings of 2013 IEEE International Conference on Computer Vision （ICCV） Workshops. Sydney， Australia： IEEE： 554-561

Kulkarni T D， Whitney W F， Kohli P and Tenenbaum J B. 2015. Deep convolutional inverse graphics network//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 2539-2547

Kumar A， Sattigeri P and Balakrishnan A. 2018. Variational inference of disentangled latent concepts from unlabeled observations ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1711.00848.pdfhttps://arxiv.org/pdf/1711.00848.pdf

Lai C S， You Z Z， Huang C C， Tsai Y H and Chiu W C. 2020. Colorization of depth map via disentanglement//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 450-466 ［DOI： 10.1007/978-3-030-58571-6_27http://dx.doi.org/10.1007/978-3-030-58571-6_27］

LeCun Y， Bottou L， Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE， 86（11）： 2278-2324 ［DOI： 10.1109/5.726791http://dx.doi.org/10.1109/5.726791］

LeCun Y， Huang F J and Bottou L. 2004. Learning methods for generic object recognition with invariance to pose and lighting//Proceedings of 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington， USA： IEEE： 97-104 ［DOI： 10.1109/CVPR.2004.144http://dx.doi.org/10.1109/CVPR.2004.144］

Lee H Y， Tseng H Y， Huang J B， Singh M and Yang M H. 2018. Diverse image-to-image translation via disentangled representations//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 36-52 ［DOI： 10.1007/978-3-030-01246-5_3http://dx.doi.org/10.1007/978-3-030-01246-5_3］

Lee W， Kim D， Hong S and Lee H. 2020. High-fidelity synthesis with disentangled representation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 157-174 ［DOI： 10.1007/978-3-030-58574-7_10http://dx.doi.org/10.1007/978-3-030-58574-7_10］

Li P P， Huang H B， Hu Y B， Wu X， He R and Sun Z N. 2020a. Hierarchical face aging through disentangled latent characteristics//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 86-101 ［DOI： 10.1007/978-3-030-58580-8_6http://dx.doi.org/10.1007/978-3-030-58580-8_6］

Li P P， Liu Y L， Shi H L， Wu X， Hu Y B， He R and Sun Z N. 2020b. Dual-structure disentangling variational generation for data-limited face parsing//Proceedings of the 28th ACM International Conference on Multimedia. Seattle， USA： ACM： 556-564 ［DOI： 10.1145/3394171.3413919http://dx.doi.org/10.1145/3394171.3413919］

Li S， Hooi B and Lee G H. 2020c. Identifying through flows for recovering latent representations ［EB/OL］.［2022-01-21］. https://arxiv.org/pdf/1909.12555.pdfhttps://arxiv.org/pdf/1909.12555.pdf

Li W， Zhao R， Xiao T and Wang X. 2014. DeepReID： deep filter pairing neural network for person re-identification//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， USA： IEEE： 152-159

Li X， Jin X， Lin J X， Liu S， Wu Y J， Yu T， Zhou W and Chen Z B. 2020d. Learning disentangled feature representation for hybrid-distorted image restoration//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 313-329 ［DOI： 10.1007/978-3-030-58526-6_19http://dx.doi.org/10.1007/978-3-030-58526-6_19］

Li X， Makihara Y， Xu C， Yagi Y and Ren M W. 2020e. Gait recognition via semi-supervised disentangled representation learning to identity and covariate features//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 13306-13316 ［DOI： 10.1109/CVPR42600.2020.01332http://dx.doi.org/10.1109/CVPR42600.2020.01332］

Li Y H， Singh K K， Ojha U and Lee Y J. 2020f. MixNMatch： multifactor disentanglement and encoding for conditional image generation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8036-8045 ［DOI： 10.1109/CVPR42600.2020.00806http://dx.doi.org/10.1109/CVPR42600.2020.00806］

Li Y Z and Mandt S. 2018. Disentangled sequential autoencoder//Proceedings of the 35th International Conference on Machine Learning. Stockholm， Sweden： PMLR： 5670-5679

Li Z Y， Murkute J V， Gyawali P K and Wang L W. 2020g. Progressive learning and disentanglement of hierarchical representations ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/2002.10549.pdfhttps://arxiv.org/pdf/2002.10549.pdf

Liao L， Hu R M， Xiao J and Wang Z Y. 2019. Artist-Net： decorating the inferred content with unified style for image inpainting. IEEE Access， 7： 36921-36933 ［DOI： 10.1109/ACCESS.2019.2905268http://dx.doi.org/10.1109/ACCESS.2019.2905268］

Liu A H， Liu Y C， Yeh Y Y and Wang Y C F. 2018a. A unified feature disentangler for multi-domain image translation and manipulation//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal， Canada： Curran Associates Inc.： 2595-2604

Liu F， Zhu R H， Zeng D， Zhao Q J and Liu X M. 2018b. Disentangling features in 3D face shapes for joint face reconstruction and recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 5216-5225 ［DOI： 10.1109/CVPR.2018.00547http://dx.doi.org/10.1109/CVPR.2018.00547］

Liu Y， Wang Z W， Jin H L and Wassell I. 2018c. Multi-task adversarial network for disentangled feature learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 3743-3751 ［DOI： 10.1109/CVPR.2018.00394http://dx.doi.org/10.1109/CVPR.2018.00394］

Liu Y， Wei F Y， Shao J， Sheng L， Yan J J and Wang X G. 2018e. Exploring disentangled feature representation beyond face identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 2080-2089 ［DOI： 10.1109/CVPR.2018.00222http://dx.doi.org/10.1109/CVPR.2018.00222］

Liu Y C， Yeh Y Y， Fu T C， Wang S D， Chiu W C and Wang Y C F. 2018d. Detach and adapt： learning cross-domain disentangled deep representation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 8867-8876 ［DOI： 10.1109/CVPR.2018.00924http://dx.doi.org/10.1109/CVPR.2018.00924］

Liu Z W， Luo P， Wang X G and Tang X O. 2015. Deep learning face attributes in the wild//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 3730-3738 ［DOI： 10.1109/ICCV.2015.425http://dx.doi.org/10.1109/ICCV.2015.425］

Liu Z Y， Zhang H W， Chen Z H， Wang Z Y and Ouyang W L. 2020. Disentangling and unifying graph convolutions for skeleton-based action recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 140-149 ［DOI： 10.1109/CVPR42600.2020.00022http://dx.doi.org/10.1109/CVPR42600.2020.00022］

Locatello F， Abbati G， Rainforth T， Bauer S， Schölkopf B and Bachem O. 2019a. On the fairness of disentangled representations//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates， Inc.： 14611-14624

Locatello F， Bauer S， Lucic M， Raetsch G， Gelly S， Schölkopf B and Bachem O. 2019b. Challenging common assumptions in the unsupervised learning of disentangled representations//Proceedings of the 36th International Conference on Machine Learning. Long Beach， USA： Curran Associates， Inc.： 7247-7283

Lorenz D， Bereska L， Milbich T and Ommer B. 2019. Unsupervised part-based disentangling of object shape and appearance//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 10947-10956 ［DOI： 10.1109/CVPR.2019.01121http://dx.doi.org/10.1109/CVPR.2019.01121］

Lu B Y， Chen J C and Chellappa R. 2019. Unsupervised domain-specific deblurring via disentangled representations//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 10217-10226 ［DOI： 10.1109/CVPR.2019.01047http://dx.doi.org/10.1109/CVPR.2019.01047］

Ma J X， Zhou C， Cui P， Yang H X and Zhu W W. 2019. Learning disentangled representations for recommendation//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver Canada： Curran Associates， Inc.： 5711-5722

Ma L Q， Sun Q R， Georgoulis S， Van Gool L， Schiele B and Fritz M. 2018. Disentangled person image generation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 99-108 ［DOI： 10.1109/CVPR.2018.00018http://dx.doi.org/10.1109/CVPR.2018.00018］

Massagué A C， Zhang C， Feric Z， Camps O and Yu R. 2020. Learning disentangled representations of video with missing data//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 3625-3635

Matthey L， Higgins I， Hassabis D and Lerchner A. 2017. dSprites： disentanglement testing sprites dataset ［EB/OL］. ［2022-01-21］. https://github.com/deepmind/dsprites-dataset/https://github.com/deepmind/dsprites-dataset/

Miyato T， Kataoka T， Koyama M and Yoshida Y. 2018. Spectral normalization for generative adversarial networks// Proceedings of the 35th International Conference on Machine Learning. Stockholm， Sweden： PMLR

Netzer Y， Wang T， Coates A， Bissacco A， Wu B and Ng A Y. 2011. Reading digits in natural images with unsupervised feature learning//NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011

Nie Q， Liu Z W and Liu Y H. 2020a. Unsupervised 3D human pose representation with viewpoint and pose disentanglement//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 102-118 ［DOI： 10.1007/978-3-030-58529-7_7http://dx.doi.org/10.1007/978-3-030-58529-7_7］

Nie W L， Karras T， Garg A， Debnath S， Patney A， Patel A B and Anandkumar A. 2020b. Semi-supervised StyleGAN for disentanglement learning//Proceedings of the 37th International Conference on Machine Learning. Virtual： JMLR.org： 7360-7369

Niu X S， Yu Z T， Han H， Li X B， Shan S G and Zhao G Y. 2020. Video-based remote physiological measurement via cross-verified feature disentangling//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 295-310 ［DOI： 10.1007/978-3-030-58536-5_18http://dx.doi.org/10.1007/978-3-030-58536-5_18］

Ojha U， Singh K K， Hsieh C J and Lee Y J. 2020. Elastic-InfoGAN： unsupervised disentangled representation learning in class-imbalanced data//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 18063-18075

Ouyang C， Biffi C， Chen C， Kart T， Qiu H Q and Rueckert D. 2022. Self-supervised learning for few-shot medical image segmentation. IEEE Transactions on Medical Imaging. 41（7）：1837-1848 ［DOI： 10.1109/TMI.2022.3150682http://dx.doi.org/10.1109/TMI.2022.3150682］

Painter M， Hare J and Prugel-Bennett A. 2020. Linear disentangled representations and unsupervised action estimation//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 13297-13307

Paysan P， Knothe R， Amberg B， Romdhani S and Vetter T. 2009. A 3D face model for pose and illumination invariant face recognition//Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance. Genova， Italy： IEEE： 296-301 ［DOI： 10.1109/AVSS.2009.58http://dx.doi.org/10.1109/AVSS.2009.58］

Peebles W， Peebles J， Zhu J Y， Efros A and Torralba A. 2020. The hessian penalty： a weak prior for unsupervised disentanglement//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 581-597 ［DOI： 10.1007/978-3-030-58539-6_35http://dx.doi.org/10.1007/978-3-030-58539-6_35］

Pei C H， Wu F P， Huang L Q and Zhuang X H. 2021. Disentangle domain features for cross-modality cardiac image segmentation//Medical Image Analysis. 71： #102078 ［DOI： 10.1016/j.media.2021.102078］

Peng X， Yu X， Sohn K， Metaxas D N and Chandraker M. 2017. Reconstruction-based disentanglement for pose-invariant face recognition//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 1632-1641 ［DOI： 10.1109/ICCV.2017.180http://dx.doi.org/10.1109/ICCV.2017.180］

Peng X C， Huang Z J， Sun X M and Saenko K. 2019. Domain agnostic learning with disentangled representations//Proceedings of the 36th International Conference on Machine Learning. Long Beach， USA： PMLR： 5102-5112

Pu N， Chen W， Liu Y， Bakker E M and Lew M S. 2020. Dual Gaussian-based variational subspace disentanglement for visible-infrared person re-identification//Proceedings of the 28th ACM International Conference on Multimedia. Seattle， USA： ACM： 2149-2158 ［DOI： 10.1145/3394171.3413673http://dx.doi.org/10.1145/3394171.3413673］

Reed S， Sohn K， Zhang Y T and Lee H. 2014. Learning to disentangle factors of variation with manifold interaction//Proceedings of the 31st International Conference on Machine Learning. Beijing， China： JMLR.org： 1431-1439

Rezende D J， Mohamed S and Wierstra D. 2014. Stochastic backpropagation and approximate inference in deep generative models//Proceedings of the 31st International Conference on Machine Learning. Beijing， China： JMLR.org： 1278-1286

Rezende D J and Mohamed S. 2015. Variational inference with normalizing flows//Proceedings of the 32nd International Conference on Machine Learning. Lille， France： JMLR.org： 1530-1538

Ridgeway K and Mozer M C. 2018. Learning deep disentangled embeddings with the f-statistic loss//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal， Canada： Curran Associates Inc.： 185-194

Roberson P L， McLaughlin P W， Narayana V， Troyer S， Hixson G V and Kessler M L. 2005. Use and uncertainties of mutual information for computed tomography/magnetic resonance （CT/MR） registration post permanent implant of the prostate//Medical physics， 32（2）： 473-482

Ruan D L， Yan Y， Chen S， Xue J H and Wang H Z. 2020. Deep disturbance-disentangled learning for facial expression recognition//Proceedings of the 28th ACM International Conference on Multimedia. Seattle， USA： ACM： 2833-2841 ［DOI： 10.1145/3394171.3413907http://dx.doi.org/10.1145/3394171.3413907］

Sanchez E H， Serrurier M and Ortner M. 2020. Learning disentangled representations via mutual information estimation//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 205-221 ［DOI： 10.1007/978-3-030-58542-6_13http://dx.doi.org/10.1007/978-3-030-58542-6_13］

Sangkloy P， Burnell N， Ham C and Hays James. 2016. The sketchy database： learning to retrieve badly drawn bunnies//ACM Transactions on Graphics （TOG）. 35（4）：1-12 ［DOI： 10.1145/2897824.2925954http://dx.doi.org/10.1145/2897824.2925954］

Schuldt C， Laptev I and Caputo B. 2004. Recognizing human actions： a local SVM approach//Proceedings of the 17th International Conference on Pattern Recognition. Cambridge， UK： IEEE： 32-36 ［DOI： 10.1109/ICPR.2004.1334462http://dx.doi.org/10.1109/ICPR.2004.1334462］

Shen Z Q， Huang M Y， Shi J P， Xue X Y and Huang T S. 2019. Towards instance-level image-to-image translation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 3683-3692

Singh K K， Ojha U and Lee Y J. 2019. FineGAN： unsupervised hierarchical disentanglement for fine-grained object generation and discovery//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 6483-6492 ［DOI： 10.1109/CVPR.2019.00665http://dx.doi.org/10.1109/CVPR.2019.00665］

Sønderby C K， Raiko T， Maaløe L， Sønderby S K and Winther O. 2016. Ladder variational autoencoders//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona， Spain： Curran Associates Inc.： 3745-3753

Soomro K， Zamir A R and Shah M. 2012. UCF101： a dataset of 101 human actions classes from videos in the wild ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1212.0402.pdfhttps://arxiv.org/pdf/1212.0402.pdf

Sorrenson P， Rother C and Köthe U. 2020. Disentanglement by nonlinear ICA with general incompressible-flow networks （GIN）［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/2001.04872.pdfhttps://arxiv.org/pdf/2001.04872.pdf

Srivastava N， Mansimov E and Salakhutdinov R. 2015. Unsupervised learning of video representations using LSTMs//Proceedings of the 32nd International Conference on Machine Learning. Lille， France： JMLR.org： 843-852

Sun H L， Mehta R， Zhou H， Huang Z C， Johnson S， Prabhakaran V and Singh V. 2019a. DUAL-GLOW： conditional flow-based generative model for modality transfer//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 10610-10619 ［DOI： 10.1109/ICCV.2019.01071http://dx.doi.org/10.1109/ICCV.2019.01071］

Sun Y， Ye Y， Liu W， Gao W P， Fu Y L and Mei T. 2019b. Human mesh recovery from monocular images via a skeleton-disentangled representation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 5348-5357 ［DOI： 10.1109/ICCV.2019.00545http://dx.doi.org/10.1109/ICCV.2019.00545］

Tong B， Wang C， Klinkigt M， Kobayashi Y and Nonaka Y. 2019. Hierarchical disentanglement of discriminative latent features for zero-shot learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 11459-11468 ［DOI： 10.1109/CVPR.2019.01173http://dx.doi.org/10.1109/CVPR.2019.01173］

Tran L， Yin X and Liu X M. 2017. Disentangled representation learning GAN for pose-invariant face recognition//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1283-1292 ［DOI： 10.1109/CVPR.2017.141http://dx.doi.org/10.1109/CVPR.2017.141］

Tsai Y H H， Liang P P， Zadeh A， Morency L P and Salakhutdinov R. 2019. Learning factorized multimodal representations ［EB/OL］.［2022-01-21］. https://arxiv.org/pdf/1806.06176.pdfhttps://arxiv.org/pdf/1806.06176.pdf

Tulyakov S， Liu M Y， Yang X D and Kautz J. 2018. MoCoGAN： decomposing motion and content for video generation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1526-1535 ［DOI： 10.1109/CVPR.2018.00165http://dx.doi.org/10.1109/CVPR.2018.00165］

Van Steenkiste S， Locatello F， Schmidhuber J and Bachem O. 2019. Are disentangled representations helpful for abstract visual reasoning？//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates， Inc.： 14245-14258

Wah C， Branson S， Welinder P， Perona P and Belongie S. 2011. The Caltech-UCSD Birds-200-2011 Dataset. California Institute of Technology

Wang G Q， Han H， Shan S G and Chen X L. 2020a. Cross-domain face presentation attack detection via multi-domain disentangled representation learning//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 6677-6686 ［DOI： 10.1109/CVPR42600.2020.00671http://dx.doi.org/10.1109/CVPR42600.2020.00671］

Wang H， Deng C， Liu T and Tao D. 2021. Transferable coupled network for zero-shot sketch-based image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence. 44（12）：9181-9194 ［DOI： 10.1109/TPAMI.2021.3123315http://dx.doi.org/10.1109/TPAMI.2021.3123315］

Wang W J， Shi Y F， Chen S M， Peng Q M， Zheng F and You X G. 2021. Norm-guided adaptive visual embedding for zero-shot sketch-based image retrieval//Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021：1106-1112 ［DOI： 10.24963/ijcai.2021/153http://dx.doi.org/10.24963/ijcai.2021/153］

Wang Y H， Bilinski P， Bremond F and Dantcheva A. 2020b. G3AN： disentangling appearance and motion for video generation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 5263-5272 ［DOI： 10.1109/CVPR42600.2020.00531http://dx.doi.org/10.1109/CVPR42600.2020.00531］

Wei L， Zhang S， Gao W and Tian Q. 2018. Person transfer GAN to bridge domain gap for person re-identification//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 79-88

Wu R L and Lu S J. 2020. LEED： label-free expression editing via disentanglement//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 781-798 ［DOI： 10.1007/978-3-030-58610-2_46http://dx.doi.org/10.1007/978-3-030-58610-2_46］

Wu S， Deng G C， Li J C， Li R， Yu Z W and Wong H S. 2019. Enhancing TripleGAN for semi-supervised conditional instance synthesis and classification//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 10091-10100

Xiao F Y， Liu H T and Lee Y J. 2019a. Identity from here， pose from there： self-supervised disentanglement and generation of objects using unlabeled videos//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul， Korea （South）： IEEE： 7012-7021 ［DOI： 10.1109/ICCV.2019.00711http://dx.doi.org/10.1109/ICCV.2019.00711］

Xiao H， Rasul K and Vollgraf R. 2017. Fashion-MNIST： a novel image dataset for benchmarking machine learning algorithms ［EB/OL］. ［2022-01-21］. https://arxiv.org/pdf/1708.07747.pdfhttps://arxiv.org/pdf/1708.07747.pdf

Xiao J， Liao L， Liu Q G and Hu R M. 2019b. CISI-net： explicit latent content inference and imitated style rendering for image inpainting. Proceedings of 2019 AAAI Conference on Artificial Intelligence， 33（1）： 354-362 ［DOI： 10.1609/aaai.v33i01.3301354http://dx.doi.org/10.1609/aaai.v33i01.3301354］

Xu X X， Yang M L， Yang Y H and Wang H. 2021. Progressive domain-independent feature decomposition network for zero-shot sketch-based image retrieval//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama， Japan： IJCAI.org： 984-990

Xuan S Y and Zhang S L. 2021. Intra-inter camera similarity for unsupervised person re-identification//Proceedings of 2021 IEEE Conference on Computer Vision and Pattern Recognition. IEEE： 11926-11935

Yang J L， Dvornek N C， Zhang F， Chapiro J， Lin M D and Duncan J S. 2019. Unsupervised domain adaptation via disentangled representations： application to cross-modality liver segmentation//Proceedings of the 22nd International Conference on Medical Image Computing and Computer Assisted Intervention. Shenzhen， China： Springer： 255-263 ［DOI： 10.1007/978-3-030-32245-8_29http://dx.doi.org/10.1007/978-3-030-32245-8_29］

Yang J M， Reed S， Yang M H and Lee H. 2015. Weakly-supervised disentangling with recurrent transformations for 3D view synthesis//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 1099-1107

Yang L L and Yao A. 2019. Disentangling latent hands for image synthesis and pose estimation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9869-9878 ［DOI： 10.1109/CVPR.2019.01011http://dx.doi.org/10.1109/CVPR.2019.01011］

Yin G J， Liu B， Sheng L， Yu N H， Wang X G and Shao J. 2019. Semantics disentangling for text-to-image generation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2322-2331 ［DOI： 10.1109/CVPR.2019.00243http://dx.doi.org/10.1109/CVPR.2019.00243］

Yu X M， Chen Y Q， Li T， Liu S and Li G. 2019. Multi-mapping image-to-image translation via learning disentanglement//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates， Inc.： 2994-3004

Yu X M， Ying Z Q， Li T， Liu S and Li G. 2018. Multi-mapping image-to-image translation with central biasing normalization ［EB/OL］.［2022-01-21］. https://arxiv.org/pdf/1806.10050.pdfhttps://arxiv.org/pdf/1806.10050.pdf

Zang X H， Li G， Gao W and Shu X J. 2021. Learning to disentangle scenes for person re-identification. Image and Vision Computing. 116： #104330 ［DOI： 10.1016/j.imavis.2021.104330http://dx.doi.org/10.1016/j.imavis.2021.104330］

Zhang J F， Huang Y Y， Li Y Y， Zhao W J and Zhang L Q. 2019a. Multi-attribute transfer via disentangled representation. Proceedings of 2019 AAAI Conference on Artificial Intelligence， 33（1）： 9195-9202 ［DOI： 10.1609/aaai.v33i01.33019195http://dx.doi.org/10.1609/aaai.v33i01.33019195］

Zhang K Y， Yao T P， Zhang J， Tai Y， Ding S H， Li J L， Huang F Y， Song H C and Ma L Z. 2020. Face anti-spoofing via disentangled representation learning//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 641-657 ［DOI： 10.1007/978-3-030-58529-7_38http://dx.doi.org/10.1007/978-3-030-58529-7_38］

Zhang Z Y， Tran L， Yin X， Atoum Y， Liu X M， Wan J and Wang N X. 2019b. Gait recognition via disentangled representation learning//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4705-4714 ［DOI： 10.1109/CVPR.2019.00484http://dx.doi.org/10.1109/CVPR.2019.00484］

Zhao J， Cheng Y， Cheng Y， Yang Y， Zhao F， Li J S， Liu H Z， Yan S C and Feng J S. 2019. Look across elapse： disentangled representation learning and photorealistic cross-age face synthesis for age-invariant face recognition. Proceedings of 2019 AAAI Conference on Artificial Intelligence， 33（1）： 9251-9258 ［DOI： 10.1609/aaai.v33i01.33019251http://dx.doi.org/10.1609/aaai.v33i01.33019251］

Zhao S J， Song J M and Ermon S. 2017. Learning hierarchical features from deep generative models//Proceedings of the 34th International Conference on Machine Learning. Sydney， Australia： JMLR.org： 4091-4099

Zhao Y， Xiong Y J and Lin D H. 2018. Recognize actions by disentangling components of dynamics//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 6566-6575 ［DOI： 10.1109/CVPR.2018.00687http://dx.doi.org/10.1109/CVPR.2018.00687］

Zheng L， Shen L， Tian L， Wang S， Wang J and Tian Q. 2015. Scalable person re-identification： a benchmark//Proceedings of 2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， Chile： IEEE： 1116-1124

Zheng Z， Zheng L and Yang Y. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 3754-3762

Zheng Z， Yang X， Yu Z， Zheng L， Yang Y and Kautz J. 2019. Joint discriminative and generative learning for person re-identification//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2138-2147

Zhu J Y， Park T， Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice， Italy： IEEE： 2242-2251 ［DOI： 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244］

Zhu J Y， Zhang Z T， Zhang C K， Wu J J， Torralba A， Tenenbaum J B and Freeman W T. 2018. Visual object networks： image generation with disentangled 3D representation//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal， Canada： Curran Associates Inc.： 118-129

Zhu X Q， Xu C and Tao D C. 2020a. Learning disentangled representations with latent variation predictability//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 684-700 ［DOI： 10.1007/978-3-030-58607-2_40http://dx.doi.org/10.1007/978-3-030-58607-2_40］

Zhu Y Z， Min M R， Kadav A and Graf H P. 2020b. S3VAE： self-supervised sequential VAE for representation disentanglement and data generation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 6537-6546 ［DOI： 10.1109/CVPR42600.2020.00657http://dx.doi.org/10.1109/CVPR42600.2020.00657］

Zhu Z Y， Luo P， Wang X G and Tang X O. 2014. Multi-view perceptron： a deep model for learning face identity and view representations//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal， Canada： MIT Press： 217-225

Zou Y， Yang X D， Yu Z D， Vijaya Kumar B V K and Kautz J. 2020. Joint disentangling and adaptation for cross-domain person re-identification//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 87-104 ［DOI： 10.1007/978-3-030-58536-5_6http://dx.doi.org/10.1007/978-3-030-58536-5_6］

Zwicker M， Hu Q Y， Szabó A， Portenier T and Favaro P. 2018. Disentangling factors of variation by mixing them//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 3399-3407 ［DOI： 10.1109/CVPR.2018.00358http://dx.doi.org/10.1109/CVPR.2018.00358］

文章被引用时，请邮件提醒。

提交

暂无数据