Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Deep learning for detecting robotic grasps

Published: 01 April 2015 Publication History

Abstract

We consider the problem of detecting robotic grasps in an RGB-D view of a scene containing objects. In this work, we apply a deep learning approach to solve this problem, which avoids time-consuming hand-design of features. This presents two main challenges. First, we need to evaluate a huge number of candidate grasps. In order to make detection fast and robust, we present a two-step cascaded system with two deep networks, where the top detections from the first are re-evaluated by the second. The first network has fewer features, is faster to run, and can effectively prune out unlikely candidate grasps. The second, with more features, is slower but has to run only on the top few detections. Second, we need to handle multimodal inputs effectively, for which we present a method that applies structured regularization on the weights based on multimodal group regularization. We show that our method improves performance on an RGBD robotic grasping dataset, and can be used to successfully execute grasps on two different robotic platforms.

References

[1]
Bengio Y (2009) Learning deep architectures for AI. FTML 2(1): 1–127.
[2]
Bicchi A and Kumar V (2000) Robotic grasping and contact: a review. In: ICRA.
[3]
Bo L, Ren X, and Fox D (2012) Unsupervised feature learning for RGB-D based object recognition. In: ISER.
[4]
Bohg J, Morales A, Asfour T, and Kragic D (2014) Data-driven grasp synthesis - a survey. IEEE Transactions on Robotics 30: 289–309.
[5]
Bollini M, Barry J, and Rus D (2011) Bakebot: Baking cookies with the pr2. In: IROS PR2 Workshop.
[6]
Bowers D and Lumia R (2003) Manipulation of unmodeled objects using intelligent grasping schemes. IEEE Transactions on Fuzzy Systems 11(3): 320–330.
[7]
Cadena C and Kosecka J (2013) Semantic parsing for priming object detection in RGB-D scenes. In: ICRA workshop on semantic perception, mapping and exploration.
[8]
Coates A, Carpenter B, Case C, et al. (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: ICDAR.
[9]
Coates A and Ng AY (2011) Selecting receptive fields in deep networks. In: NIPS.
[10]
Collet Romea A, Berenson D, Srinivasa S, and Ferguson D (2009) Object recognition and full pose registration from a single image for robotic manipulation. In: ICRA.
[11]
Collet Romea A, Martinez Torres M, and Srinivasa S (2011) The moped framework: Object recognition and pose estimation for manipulation. The International Journal of Robotics Research 30(10): 1284–1306.
[12]
Collobert R, Weston J, and Bottou L (2011) Natural language processing (almost) from scratch. Journal of Machine Learning Research 12: 2493–2537.
[13]
Detry R, Baseski E, Popovic M, et al. (2009) Learning object-specific grasp affordance densities. In: ICDL.
[14]
Dogar M, Hsiao K, Ciocarlie M, and Srinivasa S (2012) Physics-based grasp planning through clutter. In: RSS.
[15]
Ekvall S and Kragic D (2007) Learning and evaluation of the approach vector for automatic grasp generation and planning. In: ICRA.
[16]
Endres F, Hess J, Sturm J, Cremers D, and Burgard W (2013) 3D mapping with an RGB-D camera. IEEE Transactions on Robotics 30: 177–187.
[17]
Ferrari C and Canny J (1992) Planning optimal grasps. In: ICRA.
[18]
Gallegos CR, Porta J, and Ros L (2011) Global optimization of robotic grasps. In: RSS.
[19]
Girshick R, Donahue J, Darrell T, and Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR.
[20]
Goldfeder C, Ciocarlie M, Dang H, and Allen PK (2009) The Columbia grasp database. In: ICRA.
[21]
Goodfellow I, Le Q, Saxe A, Lee H, and Ng AY (2009) Measuring invariances in deep networks. In: NIPS.
[22]
Hinton G and Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786): 504–507.
[23]
Huebner K and Kragic D (2008) Selection of robot pre-grasps using box-based shape approximation. In: IROS.
[24]
Hyvärinen A, Hoyer PO, and Inki M (2001) Topographic independent component analysis. Neural computation 13(7): 1527–1558.
[25]
Hyvärinen A, Karhunen J, and Oja E (2002) Principal Component Analysis and Whitening, chapter 6. New York: John Wiley & Sons, Inc., pp. 125–144.
[26]
Jalali A, Ravikumar P, Sanghavi S, and Ruan C (2010) A dirty model for multi-task learning. In: NIPS.
[27]
Jared Glover NR Daniela Rus (2008) Probabilistic models of object geometry for grasp planning. In: RSS.
[28]
Jiang Y, Amend JR, Lipson H, and Saxena A (2012a) Learning hardware agnostic grasps for a universal jamming gripper. In: ICRA.
[29]
Jiang Y, Lim M, Zheng C, and Saxena A (2012b) Learning to place new objects in a scene. The International Journal of Robotics Research 31(9): 1021–1043.
[30]
Jiang Y, Moseson S, and Saxena A (2011) Efficient grasping from RGBD images: Learning using a new rectangle representation. In: ICRA.
[31]
Kamon I, Flash T, and Edelman S (1996) Learning to grasp using visual information. In: ICRA.
[32]
Kragic D and Christensen HI (2003) Robust visual servoing. The International Journal of Robotics Research 22: 923–939.
[33]
Lai K, Bo L, Ren X, and Fox D (2011) A large-scale hierarchical multi-view rgb-d object dataset. In: ICRA.
[34]
Lakshminarayana K (1978) Mechanics of form closure. In: ASME.
[35]
Le Q, Ranzato M, Monga R, et al. (2012) Building high-level features using large scale unsupervised learning. In: ICML.
[36]
Le QV, Kamm D, Kara AF, and Ng AY (2010) Learning to grasp objects with multiple contact points. In: ICRA.
[37]
LeCun Y, Huang F, and Bottou L (2004) Learning methods for generic object recognition with invariance to pose and lighting. In: CVPR.
[38]
Lee H, Grosse R, Ranganath R, and Ng AY (2009a) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML.
[39]
Lee H, Largman Y, Pham P, and Ng AY (2009b) Unsupervised feature learning for audio classification using convolutional deep belief networks. In: NIPS.
[40]
Maitin-shepard J, Cusumano-towner M, Lei J, and Abbeel P (2010) Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: ICRA.
[41]
Mohamed AR, Dahl G, and Hinton GE (2012) Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20(1): 14–22.
[42]
Morales A, Sanz PJ, and Àngel P del Pobil (2002) Vision-based computation of three-finger grasps on unknown planar objects. In: IROS.
[43]
Ngiam J, Khosla A, Kim M, Nam J, Lee H, and Ng AY (2011) Multimodal deep learning. In: ICML.
[44]
Nguyen V (1986) Constructing stable force-closure grasps. In: ACM fall joint computer conference.
[45]
Osadchy M, LeCun Y, and Miller M (2007) Synergistic face detection and pose estimation with energy-based models. Journal of Machine Learning Research 8: 1197–1215.
[46]
Papazov C, Haddadin S, Parusel S, Krieger K, and Burschka D (2012) Rigid 3D geometry matching for grasping of known objects in cluttered scenes. The International Journal of Robotics Research 31(4): 538–553.
[47]
Piater JH (2002) Learning visual features to predict hand orientations. In: ICML.
[48]
Pokorny FT, Hang K, and Kragic D (2013) Grasp moduli spaces. In: RSS.
[49]
Ponce J, Stam D, and Faverjon B (1993) On computing two-finger force-closure grasps of curved 2D objects. The International Journal of Robotics Research 12(3): 263.
[50]
Rodriguez A, Mason M, and Ferry S (2011) From caging to grasping. In: RSS.
[51]
Rusu RB, Blodow N, and Beetz M (2009) Fast point feature histograms (FPFH) for 3D registration. In: ICRA.
[52]
Rusu RB, Bradski G, Thibaux R, and Hsu J (2010) Fast 3D recognition and pose using the viewpoint feature histogram. In: IROS.
[53]
Sahbani A, El-Khoury S, and Bidaud P (2012) An overview of 3D object grasp synthesis algorithms. Robotics and Autonomous Systems 60(3): 326–336.
[54]
Saxena A, Driemeyer J, Kearns J, and Ng A (2006) Robotic grasping of novel objects. In: NIPS.
[55]
Saxena A, Driemeyer J, and Ng AY (2008a) Robotic grasping of novel objects using vision. The International Journal of Robotics Research 27(2): 157–173.
[56]
Saxena A, Wong LLS, and Ng AY (2008b) Learning grasp strategies with partial shape information. In: AAAI.
[57]
Sermanet P, Kavukcuoglu K, Chintala S, and LeCun Y (2013) Pedestrian detection with unsupervised multi-stage feature learning. In: CVPR.
[58]
Shimoga KB (1996) Robot grasp synthesis algorithms: A survey. The International Journal of Robotics Research 15(3): 230–266.
[59]
Socher R, Huval B, Bhat B, Manning CD, and Ng AY (2012) Convolutional-recursive deep learning for 3D object classification. In: NIPS.
[60]
Sohn K, Jung DY, Lee H, and Hero III A (2011) Efficient learning of sparse, distributed, convolutional feature representations for object recognition. In: ICCV.
[61]
Srivastava N and Salakhutdinov R (2012) Multimodal learning with deep Boltzmann machines. In: NIPS.
[62]
Szegedy C, Toshev A, and Erhan D (2013) Deep neural networks for object detection. In: NIPS.
[63]
Teuliere C and Marchand E (2012) Direct 3D servoing using dense depth maps. In: IROS.
[64]
Viola P and Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: CVPR.
[65]
Weisz J and Allen PK (2012) Pose error robust grasping from contact wrench space metrics. In: ICRA.
[66]
Whelan T, Johannsson H, Kaess M, Leonard J, and McDonald J (2013) Robust real-time visual odometry for dense RGB-D mapping. In: ICRA.
[67]
Zhang L, Ciocarlie M, and Hsiao K (2011) Grasp evaluation with graspable feature matching. In: RSS workshop on mobile manipulation.

Cited By

View all
  • (2024)Classification of Histological Types of Primary Lung Cancer from CT Images Using Clinical InformationProceedings of the 2024 14th International Conference on Biomedical Engineering and Technology10.1145/3678935.3678969(128-132)Online publication date: 14-Jun-2024
  • (2024)VR-HandNet: A Visually and Physically Plausible Hand Manipulation System in Virtual RealityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.325599130:7(4170-4182)Online publication date: 1-Jul-2024
  • (2024)Transferring Grasping Across Grippers: Learning–Optimization Hybrid Framework for Generalized Planar Grasp GenerationIEEE Transactions on Robotics10.1109/TRO.2024.342205440(3388-3405)Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Robotics Research
International Journal of Robotics Research  Volume 34, Issue 4-5
Apr 2015
326 pages

Publisher

Sage Publications, Inc.

United States

Publication History

Published: 01 April 2015

Author Tags

  1. Robotic grasping
  2. deep learning
  3. RGB-D multi-modal data
  4. Baxter
  5. PR2
  6. 3D feature learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Classification of Histological Types of Primary Lung Cancer from CT Images Using Clinical InformationProceedings of the 2024 14th International Conference on Biomedical Engineering and Technology10.1145/3678935.3678969(128-132)Online publication date: 14-Jun-2024
  • (2024)VR-HandNet: A Visually and Physically Plausible Hand Manipulation System in Virtual RealityIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2023.325599130:7(4170-4182)Online publication date: 1-Jul-2024
  • (2024)Transferring Grasping Across Grippers: Learning–Optimization Hybrid Framework for Generalized Planar Grasp GenerationIEEE Transactions on Robotics10.1109/TRO.2024.342205440(3388-3405)Online publication date: 1-Jan-2024
  • (2024)Monocular Depth Estimation: A Thorough ReviewIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.333094446:4(2396-2414)Online publication date: 1-Apr-2024
  • (2024)A Model for Robot Grasping: Integrating Transformer and CNN With RGB-D FusionIEEE Transactions on Consumer Electronics10.1109/TCE.2024.340384870:2(4673-4684)Online publication date: 21-May-2024
  • (2024)CR‐NetIET Computer Vision10.1049/cvi2.1225218:3(420-433)Online publication date: 12-Apr-2024
  • (2024)Automated grasp labeling and detection framework with pixel-level precisionKnowledge-Based Systems10.1016/j.knosys.2024.112559304:COnline publication date: 25-Nov-2024
  • (2024)6-DoF grasp estimation method that fuses RGB-D data based on external attentionJournal of Visual Communication and Image Representation10.1016/j.jvcir.2024.104173101:COnline publication date: 18-Jul-2024
  • (2024)Customizable 6 degrees of freedom grasping dataset and an interactive training method for graph convolutional networkEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.109320138:PAOnline publication date: 1-Dec-2024
  • (2024)Federated reinforcement learning for robot motion planning with zero-shot generalizationAutomatica (Journal of IFAC)10.1016/j.automatica.2024.111709166:COnline publication date: 1-Aug-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media