research-article

Pose-Aware Placement of Objects with Semantic Labels - Brandname-based Affordance Prediction and Cooperative Dual-Arm Active Manipulation

Authors:

Hsueh-Cheng Wang,

Hong-Ming Huang,

Hung-Wen ChenAuthors Info & Claims

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Pages 4760 - 4767

https://doi.org/10.1109/IROS40897.2019.8967755

Published: 01 November 2019 Publication History

Abstract

The Amazon Picking Challenge and the Amazon Robotics Challenge have shown significant progress in object picking from a cluttered scene, yet object placement remains challenging. It is useful to have pose-aware placement based on human and machine readable pieces on an object. For example, the brandname of an object placed on a shelf should be facing the human customers. The robotic vision challenges in the object placement task: a) the semantics and geometry of the object to be placed need to be analysed jointly; b) and the occlusions among objects in a cluttered scene could make it hard for proper understanding and manipulation. To overcome these challenges, we develop a pose-aware placement approach by spotting the semantic labels (e.g., brandnames) of objects in a cluttered tote and then carrying out a sequence of actions to place the objects on a shelf or on a conveyor with desired poses. Our major contributions include 1) providing an open benchmark dataset of objects and brandnames with multi-view segmentation for training and evaluations; 2) carrying out comprehensive evaluations for our brandname-based fully convolutional network (FCN) that can predict the affordance and grasp to achieve pose-aware placement, whose success rates decrease along with clutters; 3) showing that active manipulation with two cooperative manipulators and grippers can effectively handle the occlusion of brandnames. We analyzed the success rates and discussed the failure cases to provide insights for future applications. All data and benchmarks are available at https://text-pick-n-place.github.io/TextPNP/

References

[1]

J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.

Digital Library

[2]

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.

Digital Library

[3]

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.

[4]

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.

[5]

F. Pomerleau, F. Colas, R. Siegwart, and S. Magnenat, “Comparing icp variants on real-world data sets,” Autonomous Robots, vol. 34, no. 3, pp. 133–148, 2013.

Digital Library

[6]

A. Zeng, S. Song, K.-T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu, E. Romo, N. Fazeli, F. Alet, N. C. Dafle, R. Holladay, I. Morona, P. Q. Nair, D. Green, I. Taylor, W. Liu, T. Funkhouser, and A. Rodriguez, “Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2018.

[7]

J. Redmon and A. Angelova, “Real-time grasp detection using convolutional neural networks,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 1316–1322.

[8]

J. Mahler, F. T. Pokorny, B. Hou, M. Roderick, M. Laskey, M. Aubry, K. Kohlhoff, T. Kröger, J. Kuffner, and K. Goldberg, “Dex-net 1.0: A cloud-based network of 3d objects for robust grasp planning using a multi-armed bandit model with correlated rewards,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 1957–1964.

[9]

J. Mahler, J. Liang, S. Niyaz, M. Laskey, R. Doan, X. Liu, J. A. Ojea, and K. Goldberg, “Dex-Net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics,” arXiv preprint arXiv:1703.09312, 2017.

[10]

J. Mahler, M. Matl, X. Liu, A. Li, D. Gealy, and K. Goldberg, “Dexnet 3.0: Computing robust robot vacuum suction grasp targets in point clouds using a new analytic model and deep learning,” arXiv preprint arXiv:1709.06670, 2017.

[11]

NCTU mobile manipulation 2019. [Online]. Available: https://text-pick-n-place.github.io/TextPNP/

[12]

N. Sünderhauf, O. Brock, W. Scheirer, R. Hadsell, D. Fox, J. Leitner, B. Upcroft, P. Abbeel, W. Burgard, M. Milford, et al., “The limits and potentials of deep learning for robotics,” The International Journal of Robotics Research, vol. 37, no. 4-5, pp. 405–420, 2018.

[13]

N. Atanasov, B. Sankaran, J. Le Ny, G. J. Pappas, and K. Daniilidis, “Nonmyopic view planning for active object classification and pose estimation,” IEEE Transactions on Robotics, vol. 30, no. 5, pp. 1078–1090, 2014.

[14]

A. Doumanoglou, R. Kouskouridas, S. Malassiotis, and T.-K. Kim, “Recovering 6d object pose and predicting next-best-view in the crowd,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3583–3592.

[15]

M. Malmir, K. Sikka, D. Forster, I. Fasel, J. R. Movellan, and G. W. Cottrell, “Deep active object recognition by joint label and action prediction,” Computer Vision and Image Understanding, vol. 156, pp. 128–137, 2017.

Digital Library

[16]

C. Hernandez, M. Bharatheesha, W. Ko, H. Gaiser, J. Tan, K. van Deurzen, M. de Vries, B. Van Mil, J. van Egmond, R. Burger, et al., “Team delfts robot winner of the amazon picking challenge 2016,” in Robot World Cup. Springer, 2016, pp. 613–624.

[17]

L. Pinto and A. Gupta, “Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016, pp. 3406–3413.

[18]

A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. Funkhouser, “Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,” arXiv preprint arXiv:1803.09956, 2018.

[19]

C. Smith, Y. Karayiannidis, L. Nalpantidis, X. Gratal, P. Qi, D. V. Dimarogonas, and D. Kragic, “Dual arm manipulationa survey,” Robotics and Autonomous systems, vol. 60, no. 10, pp. 1340–1353, 2012.

Digital Library

[20]

M. Schwarz, C. Lenz, G. M. García, S. Koo, A. S. Periyasamy, M. Schreiber, and S. Behnke, “Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing,” in 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018, pp. 3347–3354.

[21]

K. Harada, T. Foissotte, T. Tsuji, K. Nagata, N. Yamanobe, A. Nakamura, and Y. Kawai, “Pick and place planning for dual-arm manipulators,” in 2012 IEEE International Conference on Robotics and Automation. IEEE, 2012, pp. 2281–2286.

[22]

W. Miyazaki and J. Miura, “Object placement estimation with occlusions and planning of robotic handling strategies,” in 2017 IEEE International Conference on Advanced Intelligent Mechatronics (AIM). IEEE, 2017, pp. 602–607.

[23]

M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman, “Synthetic data and artificial neural networks for natural scene text recognition,” arXiv preprint arXiv:1406.2227, 2014.

[24]

P. Jund, N. Abdo, A. Eitel, and W. Burgard, “The freiburg groceries dataset,” vol. abs/1611.05799, 2016. [Online]. Available: https://arxiv.org/abs/1611.05799

[25]

A. Zeng, K.-T. Yu, S. Song, D. Suo, E. Walker Jr, A. Rodriguez, and J. Xiao, “Multi-view self-supervised deep learning for 6-D pose estimation in the amazon picking challenge,” in ICRA, 2017.

[26]

R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.

Digital Library

[27]

B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman, “Labelme: a database and web-based tool for image annotation,” International journal of computer vision, vol. 77, no. 1-3, pp. 157–173, 2008.

Digital Library

[28]

3D Builder. [Online]. Available: https://www.microsoft.com/zh-tw/p/3d-builder/9wzdncrfj3t6?activetab=pivot:overviewtab

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

6597 pages

Copyright © 2019.

Publisher

IEEE Press

Publication History

Published: 01 November 2019

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten