research-article

SketchParse: Towards Rich Descriptions for Poorly Drawn Sketches using Multi-Task Hierarchical Deep Networks

Authors:

Ravi Kiran Sarvadevabhatla,

Abhijat Biswas,

Venkatesh Babu R.Authors Info & Claims

MM '17: Proceedings of the 25th ACM international conference on Multimedia

Pages 10 - 18

https://doi.org/10.1145/3123266.3123270

Published: 19 October 2017 Publication History

Abstract

The ability to semantically interpret hand-drawn line sketches, although very challenging, can pave way for novel applications in multimedia. We propose SKETCHPARSE, the first deep-network architecture for fully automatic parsing of freehand object sketches. SKETCHPARSE is configured as a two-level fully convolutional network. The first level contains shared layers common to all object categories. The second level contains a number of expert sub-networks. Each expert specializes in parsing sketches from object categories which contain structurally similar parts. Effectively, the two-level configuration enables our architecture to scale up efficiently as additional categories are added. We introduce a router layer which (i) relays sketch features from shared layers to the correct expert (ii) eliminates the need to manually specify object category during inference. To bypass laborious part-level annotation, we sketchify photos from semantic object-part image datasets and use them for training. Our architecture also incorporates object pose prediction as a novel auxiliary task which boosts overall performance while providing supplementary information regarding the sketch. We demonstrate SKETCHPARSE's abilities (i) on two challenging large-scale sketch datasets (ii) in parsing unseen, semantically related object categories (iii) in improving fine-grained sketch-based image retrieval. As a novel application, we also outline how SKETCHPARSE's output can be used to generate caption-style descriptions for hand-drawn sketches.

References

[1]

Abrar H Abdulnabi, Gang Wang, Jiwen Lu, and Kui Jia. 2015. Multi-task CNN model for attribute prediction. IEEE Transactions on Multimedia 17, 11 (2015), 1949--1959. 2, 3

Digital Library

[2]

Karim Ahmed, Mohammad Haris Baig, and Lorenzo Torresani. 2016. Network of Experts for Large-Scale Image Categorization. In 14th European Conference on Computer Vision (Part-VII). Springer International Publishing, 516--532. 3

[3]

Alessandro Bergamo and Lorenzo Torresani. 2010. Exploiting weakly-labeled web images to improve object classification: a domain adaptation approach. In NIPS. 181--189. 3

Digital Library

[4]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2015. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. In ICLR. 1

[5]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2016. DeepLab: Semantic Image Segmentation with Deep Convo- lutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv preprint arXiv:1606.00915 (2016). 2, 4, 6

[6]

Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts. In CVPR. 1, 7

Digital Library

[7]

Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted Random Walks for Graph Matching. In ECCV. Springer-Verlag, 492--505. 7

Digital Library

[8]

Jifeng Dai, Kaiming He, and Jian Sun. 2016. Instance-Aware Semantic Segmen- tation via Multi-Task Network Cascades. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1, 2, 3

[9]

David Eigen and Rob Fergus. 2015. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proceedings of IEEE ICCV. 2650--2658. 5

Digital Library

[10]

Mathias Eitz, James Hays, and Marc Alexa. 2012. How do humans sketch objects? ACM Transactions on Graphics (TOG) 31, 4 (2012), 44. 3, 7

Digital Library

[11]

Mohamed Elhoseiny, Tarek El-Gaaly, Amr Bakry, and Ahmed Elgammal. 2016. A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation. In Proceedings of ICML, Vol. 48. JMLR.org, 888--897. 3

Digital Library

[12]

Ali Farhadi, Ian Endres, and Derek Hoiem. 2010. Attáribute-centric recognition for cross-category generalization. In IEEE CVPR. IEEE, 2352--2359. 3

[13]

Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik. 2015. Hy- percolumns for object segmentation and fine-grained localization. In Proceedings of the IEEE CVPR. 447--456. 1, 2

[14]

Seunghoon Hong, Junhyuk Oh, Honglak Lee, and Bohyung Han. 2016. Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network. In Proceedings of the IEEE CVPR. 3

[15]

Zhe Huang, Hongbo Fu, and Rynson W. H. Lau. 2014. Data-driven Segmentation and Labeling of Freehand Sketches. Proceedings of SIGGRAPH Asia (2014). 1, 2

Digital Library

[16]

Rubaiat Habib Kazi, Fanny Chevalier, Tovi Grossman, Shengdong Zhao, and George Fitzmaurice. 2014. Draco: bringing life to illustrations with kinetic textures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 351--360. 1

Digital Library

[17]

Maksim Lapin, Bernt Schiele, and Matthias Hein. 2014. Scalable multitask rep- resentation learning for scene classification. In Proceedings of the IEEE CVPR. 1434--1441. 2, 3

Digital Library

[18]

Xi Li, Liming Zhao, Lina Wei, Ming-Hsuan Yang, Fei Wu, Yueting Zhuang, Haibin Ling, and Jingdong Wang. 2016. DeepSaliency: Multi-task deep neural network model for salient object detection. IEEE Transactions on Image Processing 25, 8 (2016), 3919--3930. 3

[19]

Yi Li, Timothy M. Hospedales, Yi-Zhe Song, and Shaogang Gong. 2014. Fine-Grained Sketch-Based Image Retrieval by Matching Deformable Part Models. In BMVC. 1

[20]

Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic Object Parsing With Local-Global Long Short-Term Memory. In The IEEE CVPR. 1, 2

[21]

Joseph J Lim, C Lawrence Zitnick, and Piotr Dollár. 2013. Sketch tokens: A learned mid-level representation for contour and object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3158--3165. 3

Digital Library

[22]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE CVPR. 3431--3440. 6

[23]

Pauline Luc, Camille Couprie, Soumith Chintala, and Jakob Verbeek. 2016. Seman- tic Segmentation using Adversarial Networks. In NIPS Workshop on Adversarial Training. 3

[24]

Behrooz Mahasseni and Sinisa Todorovic. 2013. Latent multitask learning for view-invariant action recognition. In Proceedings of the IEEE ICCV. 3128--3135. 2, 3

Digital Library

[25]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807--814. 5

Digital Library

[26]

Vladimir Nekrasov, Janghoon Ju, and Jaesik Choi. 2016. Global Deconvolutional Networks for Semantic Segmentation. CoRR abs/1602.03930 (2016). 3

[27]

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning decon- volution network for semantic segmentation. In Proceedings of the IEEE ICCV. 1520--1528. 1, 2

Digital Library

[28]

Vishal M Patel, Raghuraman Gopalan, Ruonan Li, and Rama Chellappa. 2015. Visual domain adaptation: A survey of recent advances. IEEE signal processing magazine 32, 3 (2015), 53--69. 3

[29]

Nikita Prabhu and R Venkatesh Babu. 2015. Attribute-Graph: A Graph based approach to Image Ranking. In Proceedings of the IEEE ICCV. 1071--1079. 7

Digital Library

[30]

Rajeev Ranjan, Vishal M Patel, and Rama Chellappa. 2016. Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. arXiv preprint arXiv:1603.01249 (2016). 2, 3

[31]

German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M Lopez. 2016. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE CVPR. 3234--3243. 3

[32]

Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell. 2010. Adapting visual category models to new domains. In ECCV. Springer, 213--226. 3

Digital Library

[33]

Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The Sketchy Database: Learning to Retrieve Badly Drawn Bunnies. ACM Trans. Graph. 35, 4, Article 119 (July 2016), 12 pages. 3, 7

Digital Library

[34]

Ravi Kiran Sarvadevabhatla, Jogendra Kundu, and R. Venkatesh Babu. 2016. Enabling My Robot To Play Pictionary: Recurrent Neural Networks For Sketch Recognition. In Proceedings of the ACMMM. 247--251. 1

Digital Library

[35]

Rosália G. Schneider and Tinne Tuytelaars. 2014. Sketch Classification and Classification-driven Analysis Using Fisher Vectors. ACM Trans. Graph. 33, 6, Article 174 (Nov. 2014), 174:1--174:9 pages. 2

Digital Library

[36]

Rosália G. Schneider and Tinne Tuytelaars. 2016. Example-Based Sketch Segmentation and Labeling Using CRFs. ACM Trans. Graph. 35, 5, Article 151 (July 2016), 9 pages. 1, 2

Digital Library

[37]

Omar Seddati, Stephane Dupont, and Said Mahmoudi. 2015. Deepsketch: deep convolutional neural networks for sketch recognition and similarity search. In 13th International Workshop on Content-Based Multimedia Indexing (CBMI). IEEE, 1--6. 1, 2, 5

[38]

Anja Theobald. 2003. An Ontology for Domain-oriented Semantic Similarity Search on XML Data. In BTW 2003, Datenbanksysteme für Business, Technologie und Web, Tagungsband der 10. BTW-Konferenz, 26.-28. Februar 2003, Leipzig. 217--226. 4

[39]

Annegreet van Opbroek, M Arfan Ikram, Meike W Vernooij, and Marleen De Bruijne. 2015. Transfer learning improves supervised image segmentation across imaging protocols. IEEE transactions on medical imaging 34, 5 (2015), 1018--1030. 3

[40]

Alexander Vezhnevets and Joachim M Buhmann. 2010. Towards weakly su- pervised semantic segmentation by means of multiple instance and multitask learning. In IEEE CVPR. IEEE, 3249--3256. 3

[41]

Peng Wang, Xiaohui Shen, Zhe Lin, Scott Cohen, Brian Price, and Alan L Yuille. 2015. Joint object and part segmentation using deep learned potentials. In Proceedings of the IEEE ICCV. 1573--1581. 3

Digital Library

[42]

Wikipedia. 2017. Cardinal direction -- Wikipedia, The Free Encyclopedia. https: //en.wikipedia.org/wiki/Cardinal direction. (2017). 3

[43]

Fangting Xia, Peng Wang, Liang-Chieh Chen, and Alan L. Yuille. 2016. Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom Net. In Proceedings of 14th European Conference in Computer Vision: Part V. 648--663. 1, 2

[44]

Ren Xiaofeng and Liefeng Bo. 2012. Discriminatively trained sparse code gradients for contour detection. In Advances in neural information processing systems. 584--592. 3

Digital Library

[45]

Zhicheng Yan, Hao Zhang, Robinson Piramuthu, Vignesh Jagadeesh, Dennis DeCoste, Wei Di, and Yizhou Yu. 2015. HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE ICCV. 2740--2748. 3

Digital Library

[46]

Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015). 5

[47]

Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M. Hospedales, and Chen-Change Loy. 2016. Sketch Me That Shoe. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1

[48]

Qian Yu, Yongxin Yang, Yi-Zhe Song, Tao Xiang, and Timothy Hospedales. 2015. Sketch-a-Net that Beats Humans. BMVC (2015). 1, 2, 3, 4

[49]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaolei Huang, Xiao- gang Wang, and Dimitris Metaxas. 2016. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. arXiv preprint arXiv:1612.03242 (2016). 7

[50]

Yuqi Zhang, Yuting Zhang, and Xueming Qian. 2016. Deep Neural Networks for Free-Hand Sketch Recognition. In 17th Pacific-Rim Conference on Multimedia, Xi'an, China, September 15--16, 2016. 3

[51]

Bin Zhao, Fei Li, and Eric P Xing. 2011. Large-scale category structure aware image categorization. In NIPS. 1251--1259. 2

Digital Library

[52]

Jiaping Zhao and Laurent Itti. 2016. Improved Deep Learning of Object Category using Pose Information. CoRR abs/1607.05836 (2016). http://arxiv.org/abs/1607. 05836 3

Cited By

Zheng YPang KDas AChang DSong YMa Z(2024)CreativeSeg: Semantic Segmentation of Creative SketchesIEEE Transactions on Image Processing10.1109/TIP.2024.337419633(2266-2278)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3374196
Yang FIsmail NPang YKebande VAl-Dhaqm AKoh T(2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3357939
Yang LPang KZhang HSong Y(2024)Annotation-Free Human Sketch Quality AssessmentInternational Journal of Computer Vision10.1007/s11263-024-02001-1132:8(2743-2764)Online publication date: 17-Feb-2024
https://doi.org/10.1007/s11263-024-02001-1
Show More Cited By

Index Terms

SketchParse: Towards Rich Descriptions for Poorly Drawn Sketches using Multi-Task Hierarchical Deep Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation
      2. Computer vision representations
        Shape representations
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning

Recommendations

How do humans sketch objects?

Humans have used sketching to depict our visual world since prehistoric times. Even today, sketching is possibly the only rendering technique readily available to all humans. This paper is the first large scale exploration of human sketches. We analyze ...
‘CADSketchNet’ - An Annotated Sketch dataset for 3D CAD Model Retrieval with Deep Neural Networks▪
Highlights
- The goal of this paper is to create a sketch dataset that is suitable for developing deep learning-based solutions to the problem of search and retrieval in 3D CAD models.
- A sketch dataset of query images, called ‘CADSketchNet’ has ...
Graphical abstract

Display Omitted

Abstract
Ongoing advancements in the fields of 3D modelling and digital archiving have led to an outburst in the amount of data stored digitally. Consequently, several retrieval systems have been developed depending on the type of data stored in these ...
Free2CAD: parsing freehand drawings into CAD commands

CAD modeling, despite being the industry-standard, remains restricted to usage by skilled practitioners due to two key barriers. First, the user must be able to mentally parse a final shape into a valid sequence of supported CAD commands; and second, the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '17: Proceedings of the 25th ACM international conference on Multimedia

October 2017

2028 pages

ISBN:9781450349062

DOI:10.1145/3123266

General Chairs:
Qiong Liu
FXPAL, USA
,
Rainer Lienhart
Universität Augsburg, Germany
,
Haohong Wang
TCL America, USA
,
Program Chairs:
Sheng-Wei "Kuan-Ta" Chen
Academia Sinica, Taiwan
,
Susanne Boll
University of Oldenburg, Germany
,
Phoebe Chen
La Trobe University, Australia
,
Gerald Friedland
Lawrence Livermore National Lab, USA
,
Jia Li
Google, USA
,
Shuicheng Yan
Qihoo 360, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '17

Sponsor:

SIGMM

MM '17: ACM Multimedia Conference

October 23 - 27, 2017

California, Mountain View, USA

Acceptance Rates

MM '17 Paper Acceptance Rate 189 of 684 submissions, 28%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
373
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng YPang KDas AChang DSong YMa Z(2024)CreativeSeg: Semantic Segmentation of Creative SketchesIEEE Transactions on Image Processing10.1109/TIP.2024.337419633(2266-2278)Online publication date: 2024
https://doi.org/10.1109/TIP.2024.3374196
Yang FIsmail NPang YKebande VAl-Dhaqm AKoh T(2024)A Systematic Literature Review of Deep Learning Approaches for Sketch-Based Image Retrieval: Datasets, Metrics, and Future DirectionsIEEE Access10.1109/ACCESS.2024.335793912(14847-14869)Online publication date: 2024
https://doi.org/10.1109/ACCESS.2024.3357939
Yang LPang KZhang HSong Y(2024)Annotation-Free Human Sketch Quality AssessmentInternational Journal of Computer Vision10.1007/s11263-024-02001-1132:8(2743-2764)Online publication date: 17-Feb-2024
https://doi.org/10.1007/s11263-024-02001-1
Beltzung BPelé MRenoult JSueur C(2023)Deep learning for studying drawing behavior: A reviewFrontiers in Psychology10.3389/fpsyg.2023.99254114Online publication date: 8-Feb-2023
https://doi.org/10.3389/fpsyg.2023.992541
Xu PHospedales TYin QSong YXiang TWang L(2023)Deep Learning for Free-Hand Sketch: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.314885345:1(285-312)Online publication date: 1-Jan-2023
https://doi.org/10.1109/TPAMI.2022.3148853
Liu SLi JZhang HXu LCao X(2023)Prediction With Visual Evidence: Sketch Classification Explanation via Stroke-Level AttributionsIEEE Transactions on Image Processing10.1109/TIP.2023.329740432(4393-4406)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3297404
Wang LZhang SWang WZhao W(2023)A sketch semantic segmentation method using novel local feature aggregation and segment-level self-attentionNeural Computing and Applications10.1007/s00521-023-08504-135:21(15295-15313)Online publication date: 8-Apr-2023
https://doi.org/10.1007/s00521-023-08504-1
He RYang XHuang J(2023)cGAN-Based Garment Line Draft Colorization Using a Garment-Line DatasetAdvances in Computer Graphics10.1007/978-3-031-50072-5_27(337-348)Online publication date: 29-Dec-2023
https://doi.org/10.1007/978-3-031-50072-5_27
Yun XZhang YYin FLiu C(2022)Instance GNN: A Learning Framework for Joint Symbol Segmentation and Recognition in Online Handwritten DiagramsIEEE Transactions on Multimedia10.1109/TMM.2021.308700024(2580-2594)Online publication date: 2022
https://doi.org/10.1109/TMM.2021.3087000
Ge CSun HSong YMa ZLiao J(2022)Exploring Local Detail Perception for Scene Sketch Semantic SegmentationIEEE Transactions on Image Processing10.1109/TIP.2022.314251131(1447-1461)Online publication date: 2022
https://doi.org/10.1109/TIP.2022.3142511
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents