research-article

Deep learning for detecting robotic grasps

Authors:

Ashutosh SaxenaAuthors Info & Claims

The International Journal of Robotics Research, Volume 34, Issue 4-5

Pages 705 - 724

https://doi.org/10.1177/0278364914549607

Published: 01 April 2015 Publication History

Abstract

We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a huge number of candidate grasps. In order to make detection fast and robust, we present a two-step cascaded system with two deep networks, where the top detections from the first are re-evaluated by the second. The first network has fewer features, is faster to run, and can effectively prune out unlikely candidate grasps. The second, with more features, is slower but has to run only on the top few detections. Second, we need to handle multimodal inputs effectively, for which we present a method that applies structured regularization on the weights based on multimodal group regularization. We show that our method improves performance on an RGBD robotic grasping dataset, and can be used to successfully execute grasps on two different robotic platforms.

References

[1]

Bengio Y (2009) Learning deep architectures for AI. FTML 2(1): 1–127.

[2]

Bicchi A and Kumar V (2000) Robotic grasping and contact: a review. In: ICRA.

[3]

Bo L, Ren X, and Fox D (2012) Unsupervised feature learning for RGB-D based object recognition. In: ISER.

[4]

Bohg J, Morales A, Asfour T, and Kragic D (2014) Data-driven grasp synthesis - a survey. IEEE Transactions on Robotics 30: 289–309.

Digital Library

[5]

Bollini M, Barry J, and Rus D (2011) Bakebot: Baking cookies with the pr2. In: IROS PR2 Workshop.

[6]

Bowers D and Lumia R (2003) Manipulation of unmodeled objects using intelligent grasping schemes. IEEE Transactions on Fuzzy Systems 11(3): 320–330.

Digital Library

[7]

Cadena C and Kosecka J (2013) Semantic parsing for priming object detection in RGB-D scenes. In: ICRA workshop on semantic perception, mapping and exploration.

[8]

Coates A, Carpenter B, Case C, et al. (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR.

[9]

Coates A and Ng AY (2011) Selecting receptive fields in deep networks. In: NIPS.

[10]

Collet Romea A, Berenson D, Srinivasa S, and Ferguson D (2009) Object recognition and full pose registration from a single image for robotic manipulation. In: ICRA.

[11]

Collet Romea A, Martinez Torres M, and Srinivasa S (2011) The moped framework: Object recognition and pose estimation for manipulation. The International Journal of Robotics Research 30(10): 1284–1306.

[12]

Collobert R, Weston J, and Bottou L (2011) Natural language processing (almost) from scratch. Journal of Machine Learning Research 12: 2493–2537.

Digital Library

[13]

Detry R, Baseski E, Popovic M, et al. (2009) Learning object-specific grasp affordance densities. In: ICDL.

[14]

Dogar M, Hsiao K, Ciocarlie M, and Srinivasa S (2012) Physics-based grasp planning through clutter. In: RSS.

[15]

Ekvall S and Kragic D (2007) Learning and evaluation of the approach vector for automatic grasp generation and planning. In: ICRA.

[16]

Endres F, Hess J, Sturm J, Cremers D, and Burgard W (2013) 3D mapping with an RGB-D camera. IEEE Transactions on Robotics 30: 177–187.

[17]

Ferrari C and Canny J (1992) Planning optimal grasps. In: ICRA.

[18]

Gallegos CR, Porta J, and Ros L (2011) Global optimization of robotic grasps. In: RSS.

[19]

Girshick R, Donahue J, Darrell T, and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR.

[20]

Goldfeder C, Ciocarlie M, Dang H, and Allen PK (2009) The Columbia grasp database. In: ICRA.

[21]

Goodfellow I, Le Q, Saxe A, Lee H, and Ng AY (2009) Measuring invariances in deep networks. In: NIPS.

[22]

Hinton G and Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786): 504–507.

[23]

Huebner K and Kragic D (2008) Selection of robot pre-grasps using box-based shape approximation. In: IROS.

[24]

Hyvärinen A, Hoyer PO, and Inki M (2001) Topographic independent component analysis. Neural computation 13(7): 1527–1558.

Digital Library

[25]

Hyvärinen A, Karhunen J, and Oja E (2002) Principal Component Analysis and Whitening, chapter 6. New York: John Wiley & Sons, Inc., pp. 125–144.

[26]

Jalali A, Ravikumar P, Sanghavi S, and Ruan C (2010) A dirty model for multi-task learning. In: NIPS.

[27]

Jared Glover NR Daniela Rus (2008) Probabilistic models of object geometry for grasp planning. In: RSS.

[28]

Jiang Y, Amend JR, Lipson H, and Saxena A (2012a) Learning hardware agnostic grasps for a universal jamming gripper. In: ICRA.

[29]

Jiang Y, Lim M, Zheng C, and Saxena A (2012b) Learning to place new objects in a scene. The International Journal of Robotics Research 31(9): 1021–1043.

Digital Library

[30]

Jiang Y, Moseson S, and Saxena A (2011) Efficient grasping from RGBD images: Learning using a new rectangle representation. In: ICRA.

[31]

Kamon I, Flash T, and Edelman S (1996) Learning to grasp using visual information. In: ICRA.

[32]

Kragic D and Christensen HI (2003) Robust visual servoing. The International Journal of Robotics Research 22: 923–939.

[33]

Lai K, Bo L, Ren X, and Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: ICRA.

[34]

Lakshminarayana K (1978) Mechanics of form closure. In: ASME.

[35]

Le Q, Ranzato M, Monga R, et al. (2012) Building high-level features using large scale unsupervised learning. In: ICML.

[36]

Le QV, Kamm D, Kara AF, and Ng AY (2010) Learning to grasp objects with multiple contact points. In: ICRA.

[37]

LeCun Y, Huang F, and Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR.

[38]

Lee H, Grosse R, Ranganath R, and Ng AY (2009a) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML.

[39]

Lee H, Largman Y, Pham P, and Ng AY (2009b) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: NIPS.

[40]

Maitin-shepard J, Cusumano-towner M, Lei J, and Abbeel P (2010) Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: ICRA.

[41]

Mohamed AR, Dahl G, and Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20(1): 14–22.

Digital Library

[42]

Morales A, Sanz PJ, and Àngel P del Pobil (2002) Vision-based computation of three-finger grasps on unknown planar objects. In: IROS.

[43]

Ngiam J, Khosla A, Kim M, Nam J, Lee H, and Ng AY (2011) Multimodal deep learning. In: ICML.

[44]

Nguyen V (1986) Constructing stable force-closure grasps. In: ACM fall joint computer conference.

[45]

Osadchy M, LeCun Y, and Miller M (2007) Synergistic face detection and pose estimation with energy-based models. Journal of Machine Learning Research 8: 1197–1215.

Digital Library

[46]

Papazov C, Haddadin S, Parusel S, Krieger K, and Burschka D (2012) Rigid 3D geometry matching for grasping of known objects in cluttered scenes. The International Journal of Robotics Research 31(4): 538–553.

Digital Library

[47]

Piater JH (2002) Learning visual features to predict hand orientations. In: ICML.

[48]

Pokorny FT, Hang K, and Kragic D (2013) Grasp moduli spaces. In: RSS.

[49]

Ponce J, Stam D, and Faverjon B (1993) On computing two-finger force-closure grasps of curved 2D objects. The International Journal of Robotics Research 12(3): 263.

[50]

Rodriguez A, Mason M, and Ferry S (2011) From caging to grasping. In: RSS.

[51]

Rusu RB, Blodow N, and Beetz M (2009) Fast point feature histograms (FPFH) for 3D registration. In: ICRA.

[52]

Rusu RB, Bradski G, Thibaux R, and Hsu J (2010) Fast 3D recognition and pose using the viewpoint feature histogram. In: IROS.

[53]

Sahbani A, El-Khoury S, and Bidaud P (2012) An overview of 3D object grasp synthesis algorithms. Robotics and Autonomous Systems 60(3): 326–336.

Digital Library

[54]

Saxena A, Driemeyer J, Kearns J, and Ng A (2006) Robotic grasping of novel objects. In: NIPS.

[55]

Saxena A, Driemeyer J, and Ng AY (2008a) Robotic grasping of novel objects using vision. The International Journal of Robotics Research 27(2): 157–173.

Digital Library

[56]

Saxena A, Wong LLS, and Ng AY (2008b) Learning grasp strategies with partial shape information. In: AAAI.

[57]

Sermanet P, Kavukcuoglu K, Chintala S, and LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR.

[58]

Shimoga KB (1996) Robot grasp synthesis algorithms: A survey. The International Journal of Robotics Research 15(3): 230–266.

Digital Library

[59]

Socher R, Huval B, Bhat B, Manning CD, and Ng AY (2012) Convolutional-recursive deep learning for 3D object classification. In: NIPS.

[60]

Sohn K, Jung DY, Lee H, and Hero III A (2011) Efficient learning of sparse, distributed, convolutional feature representations for object recognition. In: ICCV.

[61]

Srivastava N and Salakhutdinov R (2012) Multimodal learning with deep Boltzmann machines. In: NIPS.

[62]

Szegedy C, Toshev A, and Erhan D (2013) Deep neural networks for object detection. In: NIPS.

[63]

Teuliere C and Marchand E (2012) Direct 3D servoing using dense depth maps. In: IROS.

[64]

Viola P and Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR.

[65]

Weisz J and Allen PK (2012) Pose error robust grasping from contact wrench space metrics. In: ICRA.

[66]

Whelan T, Johannsson H, Kaess M, Leonard J, and McDonald J (2013) Robust real-time visual odometry for dense RGB-D mapping. In: ICRA.

[67]

Zhang L, Ciocarlie M, and Hsiao K (2011) Grasp evaluation with graspable feature matching. In: RSS workshop on mobile manipulation.

Cited By

Honda NKamiya TKido S(2024)Classification of Histological Types of Primary Lung Cancer from CT Images Using Clinical InformationProceedings of the 2024 14th International Conference on Biomedical Engineering and Technology10.1145/3678935.3678969(128-132)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3678935.3678969
Han DLee RKim KKang H(2024)VR-HandNet: A Visually and Physically Plausible Hand Manipulation System in Virtual RealityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.325599130:7(4170-4182)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3255991
Wang XXu Q(2024)Transferring Grasping Across Grippers: Learning–Optimization Hybrid Framework for Generalized Planar Grasp GenerationIEEE Transactions on Robotics10.1109/TRO.2024.342205440(3388-3405)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3422054
Show More Cited By

Index Terms

Deep learning for detecting robotic grasps
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
    2. Planning and scheduling
      1. Robotic planning
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Robotic grasping recognition using multi-modal deep extreme learning machine

Recognizing which part of an object is graspable or not is important for intelligent robot to perform some complicated tasks. In order to obtain good grasping performance, learning rich representations efficiently from multi-modal RGB-D images is ...
Multi-segment soft robotic fingers enable robust precision grasping

In this work, we discuss the design of soft robotic fingers for robust precision grasping. Through a conceptual analysis of the finger shape and compliance during grasping, we confirm that antipodal grasps are more stable when contact with the object ...
Robotic Pushing and Grasping Knowledge Learning via Attention Deep Q-learning Network
Knowledge Science, Engineering and Management
Abstract
Robotic grasping is a fundamental manipulation in multiple robotic tasks, which has great research significance. It is challenging to perform robotic grasping in cluttered environments due to the occlusion and stacking of objects. We propose an ...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research

International Journal of Robotics Research Volume 34, Issue 4-5

Apr 2015

326 pages

ISSN:0278-3649

Issue’s Table of Contents

© The Author(s) 2015.

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 April 2015

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

256
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Honda NKamiya TKido S(2024)Classification of Histological Types of Primary Lung Cancer from CT Images Using Clinical InformationProceedings of the 2024 14th International Conference on Biomedical Engineering and Technology10.1145/3678935.3678969(128-132)Online publication date: 14-Jun-2024
https://dl.acm.org/doi/10.1145/3678935.3678969
Han DLee RKim KKang H(2024)VR-HandNet: A Visually and Physically Plausible Hand Manipulation System in Virtual RealityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.325599130:7(4170-4182)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TVCG.2023.3255991
Wang XXu Q(2024)Transferring Grasping Across Grippers: Learning–Optimization Hybrid Framework for Generalized Planar Grasp GenerationIEEE Transactions on Robotics10.1109/TRO.2024.342205440(3388-3405)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TRO.2024.3422054
Arampatzakis VPavlidis GMitianoudis NPapamarkos N(2024)Monocular Depth Estimation: A Thorough ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333094446:4(2396-2414)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TPAMI.2023.3330944
Yang LZhang CLiu GZhong ZLi Y(2024)A Model for Robot Grasping: Integrating Transformer and CNN With RGB-D FusionIEEE Transactions on Consumer Electronics10.1109/TCE.2024.340384870:2(4673-4684)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1109/TCE.2024.3403848
Yan SZhang L(2024)CR‐NetIET Computer Vision10.1049/cvi2.1225218:3(420-433)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1049/cvi2.12252
Yang RLi XZhang YChen J(2024)Automated grasp labeling and detection framework with pixel-level precisionKnowledge-Based Systems10.1016/j.knosys.2024.112559304:COnline publication date: 25-Nov-2024
https://dl.acm.org/doi/10.1016/j.knosys.2024.112559
Ran HChen DChen QLi YLuo YZhang XLi JZhang X(2024)6-DoF grasp estimation method that fuses RGB-D data based on external attentionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104173101:COnline publication date: 18-Jul-2024
https://dl.acm.org/doi/10.1016/j.jvcir.2024.104173
Niu WZhu ZWang HZhuang C(2024)Customizable 6 degrees of freedom grasping dataset and an interactive training method for graph convolutional networkEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109320138:PAOnline publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.109320
Yuan ZXu SZhu M(2024)Federated reinforcement learning for robot motion planning with zero-shot generalizationAutomatica (Journal of IFAC)10.1016/j.automatica.2024.111709166:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.automatica.2024.111709
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents