Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3123266.3123270acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

SketchParse: Towards Rich Descriptions for Poorly Drawn Sketches using Multi-Task Hierarchical Deep Networks

Published: 19 October 2017 Publication History

Abstract

The ability to semantically interpret hand-drawn line sketches, although very challenging, can pave way for novel applications in multimedia. We propose SKETCHPARSE, the first deep-network architecture for fully automatic parsing of freehand object sketches. SKETCHPARSE is configured as a two-level fully convolutional network. The first level contains shared layers common to all object categories. The second level contains a number of expert sub-networks. Each expert specializes in parsing sketches from object categories which contain structurally similar parts. Effectively, the two-level configuration enables our architecture to scale up efficiently as additional categories are added. We introduce a router layer which (i) relays sketch features from shared layers to the correct expert (ii) eliminates the need to manually specify object category during inference. To bypass laborious part-level annotation, we sketchify photos from semantic object-part image datasets and use them for training. Our architecture also incorporates object pose prediction as a novel auxiliary task which boosts overall performance while providing supplementary information regarding the sketch. We demonstrate SKETCHPARSE's abilities (i) on two challenging large-scale sketch datasets (ii) in parsing unseen, semantically related object categories (iii) in improving fine-grained sketch-based image retrieval. As a novel application, we also outline how SKETCHPARSE's output can be used to generate caption-style descriptions for hand-drawn sketches.

References

[1]
Abrar H Abdulnabi, Gang Wang, Jiwen Lu, and Kui Jia. 2015. Multi-task CNN model for attribute prediction. IEEE Transactions on Multimedia 17, 11 (2015), 1949--1959. 2, 3
[2]
Karim Ahmed, Mohammad Haris Baig, and Lorenzo Torresani. 2016. Network of Experts for Large-Scale Image Categorization. In 14th European Conference on Computer Vision (Part-VII). Springer International Publishing, 516--532. 3
[3]
Alessandro Bergamo and Lorenzo Torresani. 2010. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In NIPS. 181--189. 3
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2015. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. In ICLR. 1
[5]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2016. DeepLab: Semantic Image Segmentation with Deep Convo- lutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv preprint arXiv:1606.00915 (2016). 2, 4, 6
[6]
Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts. In CVPR. 1, 7
[7]
Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted Random Walks for Graph Matching. In ECCV. Springer-Verlag, 492--505. 7
[8]
Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-Aware Semantic Segmen- tation via Multi-Task Network Cascades. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1, 2, 3
[9]
David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of IEEE ICCV. 2650--2658. 5
[10]
Mathias Eitz, James Hays, and Marc Alexa. 2012. How do humans sketch objects? ACM Transactions on Graphics (TOG) 31, 4 (2012), 44. 3, 7
[11]
Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, and Ahmed Elgammal. 2016. A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation. In Proceedings of ICML, Vol. 48. JMLR.org, 888--897. 3
[12]
Ali Farhadi, Ian Endres, and Derek Hoiem. 2010. Attáribute-centric recognition for cross-category generalization. In IEEE CVPR. IEEE, 2352--2359. 3
[13]
Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Hy- percolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE CVPR. 447--456. 1, 2
[14]
Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network. In Proceedings of the IEEE CVPR. 3
[15]
Zhe Huang, Hongbo Fu, and Rynson W. H. Lau. 2014. Data-driven Segmentation and Labeling of Freehand Sketches. Proceedings of SIGGRAPH Asia (2014). 1, 2
[16]
Rubaiat Habib Kazi, Fanny Chevalier, Tovi Grossman, Shengdong Zhao, and George Fitzmaurice. 2014. Draco: bringing life to illustrations with kinetic textures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 351--360. 1
[17]
Maksim Lapin, Bernt Schiele, and Matthias Hein. 2014. Scalable multitask rep- resentation learning for scene classification. In Proceedings of the IEEE CVPR. 1434--1441. 2, 3
[18]
Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, and Jingdong Wang. 2016. DeepSaliency: Multi-task deep neural network model for salient object detection. IEEE Transactions on Image Processing 25, 8 (2016), 3919--3930. 3
[19]
Yi Li, Timothy M. Hospedales, Yi-Zhe Song, and Shaogang Gong. 2014. Fine-Grained Sketch-Based Image Retrieval by Matching Deformable Part Models. In BMVC. 1
[20]
Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic Object Parsing With Local-Global Long Short-Term Memory. In The IEEE CVPR. 1, 2
[21]
Joseph J Lim, C Lawrence Zitnick, and Piotr Dollár. 2013. Sketch tokens: A learned mid-level representation for contour and object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3158--3165. 3
[22]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE CVPR. 3431--3440. 6
[23]
Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Seman- tic Segmentation using Adversarial Networks. In NIPS Workshop on Adversarial Training. 3
[24]
Behrooz Mahasseni and Sinisa Todorovic. 2013. Latent multitask learning for view-invariant action recognition. In Proceedings of the IEEE ICCV. 3128--3135. 2, 3
[25]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807--814. 5
[26]
Vladimir Nekrasov, Janghoon Ju, and Jaesik Choi. 2016. Global Deconvolutional Networks for Semantic Segmentation. CoRR abs/1602.03930 (2016). 3
[27]
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning decon- volution network for semantic segmentation. In Proceedings of the IEEE ICCV. 1520--1528. 1, 2
[28]
Vishal M Patel, Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. 2015. Visual domain adaptation: A survey of recent advances. IEEE signal processing magazine 32, 3 (2015), 53--69. 3
[29]
Nikita Prabhu and R Venkatesh Babu. 2015. Attribute-Graph: A Graph based approach to Image Ranking. In Proceedings of the IEEE ICCV. 1071--1079. 7
[30]
Rajeev Ranjan, Vishal M Patel, and Rama Chellappa. 2016. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249 (2016). 2, 3
[31]
German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M Lopez. 2016. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE CVPR. 3234--3243. 3
[32]
Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. 2010. Adapting visual category models to new domains. In ECCV. Springer, 213--226. 3
[33]
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. ACM Trans. Graph. 35, 4, Article 119 (July 2016), 12 pages. 3, 7
[34]
Ravi Kiran Sarvadevabhatla, Jogendra Kundu, and R. Venkatesh Babu. 2016. Enabling My Robot To Play Pictionary: Recurrent Neural Networks For Sketch Recognition. In Proceedings of the ACMMM. 247--251. 1
[35]
Rosália G. Schneider and Tinne Tuytelaars. 2014. Sketch Classification and Classification-driven Analysis Using Fisher Vectors. ACM Trans. Graph. 33, 6, Article 174 (Nov. 2014), 174:1--174:9 pages. 2
[36]
Rosália G. Schneider and Tinne Tuytelaars. 2016. Example-Based Sketch Segmentation and Labeling Using CRFs. ACM Trans. Graph. 35, 5, Article 151 (July 2016), 9 pages. 1, 2
[37]
Omar Seddati, Stephane Dupont, and Said Mahmoudi. 2015. Deepsketch: deep convolutional neural networks for sketch recognition and similarity search. In 13th International Workshop on Content-Based Multimedia Indexing (CBMI). IEEE, 1--6. 1, 2, 5
[38]
Anja Theobald. 2003. An Ontology for Domain-oriented Semantic Similarity Search on XML Data. In BTW 2003, Datenbanksysteme für Business, Technologie und Web, Tagungsband der 10. BTW-Konferenz, 26.-28. Februar 2003, Leipzig. 217--226. 4
[39]
Annegreet van Opbroek, M Arfan Ikram, Meike W Vernooij, and Marleen De Bruijne. 2015. Transfer learning improves supervised image segmentation across imaging protocols. IEEE transactions on medical imaging 34, 5 (2015), 1018--1030. 3
[40]
Alexander Vezhnevets and Joachim M Buhmann. 2010. Towards weakly su- pervised semantic segmentation by means of multiple instance and multitask learning. In IEEE CVPR. IEEE, 3249--3256. 3
[41]
Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, and Alan L Yuille. 2015. Joint object and part segmentation using deep learned potentials. In Proceedings of the IEEE ICCV. 1573--1581. 3
[42]
Wikipedia. 2017. Cardinal direction -- Wikipedia, The Free Encyclopedia. https: //en.wikipedia.org/wiki/Cardinal direction. (2017). 3
[43]
Fangting Xia, Peng Wang, Liang-Chieh Chen, and Alan L. Yuille. 2016. Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net. In Proceedings of 14th European Conference in Computer Vision: Part V. 648--663. 1, 2
[44]
Ren Xiaofeng and Liefeng Bo. 2012. Discriminatively trained sparse code gradients for contour detection. In Advances in neural information processing systems. 584--592. 3
[45]
Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, and Yizhou Yu. 2015. HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE ICCV. 2740--2748. 3
[46]
Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015). 5
[47]
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen-Change Loy. 2016. Sketch Me That Shoe. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1
[48]
Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy Hospedales. 2015. Sketch-a-Net that Beats Humans. BMVC (2015). 1, 2, 3, 4
[49]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiao- gang Wang, and Dimitris Metaxas. 2016. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. arXiv preprint arXiv:1612.03242 (2016). 7
[50]
Yuqi Zhang, Yuting Zhang, and Xueming Qian. 2016. Deep Neural Networks for Free-Hand Sketch Recognition. In 17th Pacific-Rim Conference on Multimedia, Xi'an, China, September 15--16, 2016. 3
[51]
Bin Zhao, Fei Li, and Eric P Xing. 2011. Large-scale category structure aware image categorization. In NIPS. 1251--1259. 2
[52]
Jiaping Zhao and Laurent Itti. 2016. Improved Deep Learning of Object Category using Pose Information. CoRR abs/1607.05836 (2016). http://arxiv.org/abs/1607. 05836 3

Cited By

View all
  • (2024)CreativeSeg: Semantic Segmentation of Creative SketchesIEEE Transactions on Image Processing10.1109/TIP.2024.337419633(2266-2278)Online publication date: 2024
  • (2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
  • (2024)Annotation-Free Human Sketch Quality AssessmentInternational Journal of Computer Vision10.1007/s11263-024-02001-1132:8(2743-2764)Online publication date: 17-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '17: Proceedings of the 25th ACM international conference on Multimedia
October 2017
2028 pages
ISBN:9781450349062
DOI:10.1145/3123266
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. multi-task learning
  3. object segmentation
  4. sketch
  5. transfer learning

Qualifiers

  • Research-article

Conference

MM '17
Sponsor:
MM '17: ACM Multimedia Conference
October 23 - 27, 2017
California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;
Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24
The 32nd ACM International Conference on Multimedia
October 28 - November 1, 2024
Melbourne , VIC , Australia

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)CreativeSeg: Semantic Segmentation of Creative SketchesIEEE Transactions on Image Processing10.1109/TIP.2024.337419633(2266-2278)Online publication date: 2024
  • (2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
  • (2024)Annotation-Free Human Sketch Quality AssessmentInternational Journal of Computer Vision10.1007/s11263-024-02001-1132:8(2743-2764)Online publication date: 17-Feb-2024
  • (2023)Deep learning for studying drawing behavior: A reviewFrontiers in Psychology10.3389/fpsyg.2023.99254114Online publication date: 8-Feb-2023
  • (2023)Deep Learning for Free-Hand Sketch: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.314885345:1(285-312)Online publication date: 1-Jan-2023
  • (2023)Prediction With Visual Evidence: Sketch Classification Explanation via Stroke-Level AttributionsIEEE Transactions on Image Processing10.1109/TIP.2023.329740432(4393-4406)Online publication date: 2023
  • (2023)A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attentionNeural Computing and Applications10.1007/s00521-023-08504-135:21(15295-15313)Online publication date: 8-Apr-2023
  • (2023)cGAN-Based Garment Line Draft Colorization Using a Garment-Line DatasetAdvances in Computer Graphics10.1007/978-3-031-50072-5_27(337-348)Online publication date: 29-Dec-2023
  • (2022)Instance GNN: A Learning Framework for Joint Symbol Segmentation and Recognition in Online Handwritten DiagramsIEEE Transactions on Multimedia10.1109/TMM.2021.308700024(2580-2594)Online publication date: 2022
  • (2022)Exploring Local Detail Perception for Scene Sketch Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2022.314251131(1447-1461)Online publication date: 2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media