research-article

6-DoF grasp estimation method that fuses RGB-D data based on external attention

Authors:

Diansheng Chen,

Xiaochuan ZhangAuthors Info & Claims

Volume 101, Issue C

https://doi.org/10.1016/j.jvcir.2024.104173

Published: 18 July 2024 Publication History

Abstract

6-DoF grasp estimation methods based on point clouds have long been a challenge in robotics due to the limitations of single data input, which hinder the robot’s perception of real-world scenarios, thus reducing its robustness. In this work, we propose a 6-DoF grasp pose estimation method based on RGB-D data, which leverages ResNet to extract color image features, utilizes the PointNet++ network to extract geometric information features, and employs an external attention mechanism to fuse both features. Our method is an end-to-end design, and we validate its performance through benchmark tests on a large-scale dataset and evaluations in a simulated robot environment. Our method outperforms previous state-of-the-art methods on public datasets, achieving 47.75mAP and 40.08mAP for seen and unseen objects, respectively. We also test our grasp pose estimation method on multiple objects in a simulated robot environment, demonstrating that our approach exhibits higher grasp accuracy and robustness than previous methods.

References

[1]

Bicchi A., Kumar V., Robotic grasping and contact: A review, in: Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), Vol. 1, IEEE, 2000, pp. 348–353.

[2]

Dang H., Allen P.K., Semantic grasping: Planning robotic grasps functionally suitable for an object manipulation task, in: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2012, pp. 1311–1317.

[3]

Bohg J., Morales A., Asfour T., Kragic D., Data-driven grasp synthesis—a survey, IEEE Trans. Robot. 30 (2) (2013) 289–309.

Digital Library

[4]

Lenz I., Lee H., Saxena A., Deep learning for detecting robotic grasps, Int. J. Robot. Res. 34 (4–5) (2015) 705–724.

Digital Library

[5]

Morrison D., Corke P., Leitner J., Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach, 2018, arXiv preprint arXiv:1804.05172.

[6]

S. Peng, Y. Liu, Q. Huang, X. Zhou, H. Bao, Pvnet: Pixel-wise voting network for 6dof pose estimation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4561–4570.

[7]

Y. He, W. Sun, H. Huang, J. Liu, H. Fan, J. Sun, Pvn3d: A deep point-wise 3d keypoints voting network for 6dof pose estimation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11632–11641.

[8]

Zhao Z., Peng G., Wang H., Fang H.-S., Li C., Lu C., Estimating 6D pose from localizing designated surface keypoints, 2018, arXiv preprint arXiv:1812.01387.

[9]

Gou M., Fang H.-S., Zhu Z., Xu S., Wang C., Lu C., Rgb matters: Learning 7-dof grasp poses on monocular rgbd images, in: 2021 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2021, pp. 13459–13466.

[10]

C. Wang, D. Xu, Y. Zhu, R. Martín-Martín, C. Lu, L. Fei-Fei, S. Savarese, Densefusion: 6d object pose estimation by iterative dense fusion, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3343–3352.

[11]

Y. He, H. Huang, H. Fan, Q. Chen, J. Sun, Ffb6d: A full flow bidirectional fusion network for 6d pose estimation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 3003–3013.

[12]

Guo M.-H., Liu Z.-N., Mu T.-J., Hu S.-M., Beyond self-attention: External attention using two linear layers for visual tasks, IEEE Trans. Pattern Anal. Mach. Intell. 45 (5) (2022) 5436–5447.

[13]

Chen I.-M., Burdick J.W., Finding antipodal point grasps on irregularly shaped objects, IEEE Trans. Robot. Autom. 9 (4) (1993) 507–512.

[14]

Pinto L., Gupta A., Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours, in: 2016 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2016, pp. 3406–3413.

[15]

Mahler J., Liang J., Niyaz S., Laskey M., Doan R., Liu X., Ojea J.A., Goldberg K., Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics, 2017, arXiv preprint arXiv:1703.09312.

[16]

Park D., Chun S.Y., Classification based grasp detection using spatial transformer network, 2018, arXiv preprint arXiv:1803.01356.

[17]

Zhang Q., Qu D., Xu F., Zou F., Robust robot grasp detection in multimodal fusion, in: MATEC Web of Conferences, Vol. 139, EDP Sciences, 2017, p. 00060.

[18]

Kumra S., Kanan C., Robotic grasp detection using deep convolutional neural networks, in: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2017, pp. 769–776.

[19]

Park D., Seo Y., Chun S.Y., Real-time, highly accurate robotic grasp detection using fully convolutional neural network with rotation ensemble module, in: 2020 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2020, pp. 9397–9403.

[20]

Ten Pas A., Gualtieri M., Saenko K., Platt R., Grasp pose detection in point clouds, Int. J. Robot. Res. 36 (13–14) (2017) 1455–1473.

[21]

Liang H., Ma X., Li S., Görner M., Tang S., Fang B., Sun F., Zhang J., Pointnetgpd: Detecting grasp configurations from point sets, in: 2019 International Conference on Robotics and Automation, ICRA, IEEE, 2019, pp. 3629–3635.

[22]

Qin Y., Chen R., Zhu H., Song M., Xu J., Su H., S4g: Amodal single-view single-shot se (3) grasp detection in cluttered scenes, in: Conference on Robot Learning, PMLR, 2020, pp. 53–65.

[23]

H.-S. Fang, C. Wang, M. Gou, C. Lu, Graspnet-1billion: A large-scale benchmark for general object grasping, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11444–11453.

[24]

LeCun Y., Bottou L., Bengio Y., Haffner P., Gradient-based learning applied to document recognition, Proc. IEEE 86 (11) (1998) 2278–2324.

[25]

LeCun Y., The MNIST database of handwritten digits, 1998, http://yann.lecun.com/exdb/mnist/.

[26]

Simonyan K., Zisserman A., Very deep convolutional networks for large-scale image recognition, 2014, arXiv preprint arXiv:1409.1556.

[27]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.

[28]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.

[29]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2818–2826.

[30]

Iandola F.N., Han S., Moskewicz M.W., Ashraf K., Dally W.J., Keutzer K., SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and¡ 0.5 MB model size, 2016, arXiv preprint arXiv:1602.07360.

[31]

Howard A.G., Zhu M., Chen B., Kalenichenko D., Wang W., Weyand T., Andreetto M., Adam H., Mobilenets: Efficient convolutional neural networks for mobile vision applications, 2017, arXiv preprint arXiv:1704.04861.

[32]

J. Hu, L. Shen, G. Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.

[33]

S. Woo, J. Park, J.-Y. Lee, I.S. Kweon, Cbam: Convolutional block attention module, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 3–19.

[34]

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).

Digital Library

[35]

Devlin J., Chang M.-W., Lee K., Toutanova K., Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.

[36]

Radford A., Narasimhan K., Salimans T., Sutskever I., Improving Language Understanding with Unsupervised Learning, Technical report, OpenAI, 2018.

[37]

Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I., et al., Language models are unsupervised multitask learners, OpenAI Blog 1 (8) (2019) 9.

[38]

Brown T., Mann B., Ryder N., Subbiah M., Kaplan J.D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., et al., Language models are few-shot learners, Adv. Neural Inf. Process. Syst. 33 (2020) 1877–1901.

[39]

Hou R., Chang H., Ma B., Shan S., Chen X., Cross attention network for few-shot classification, Adv. Neural Inf. Process. Syst. 32 (2019).

[40]

Zhang H., Goodfellow I., Metaxas D., Odena A., Self-attention generative adversarial networks, in: International Conference on Machine Learning, PMLR, 2019, pp. 7354–7363.

[41]

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2921–2929.

[42]

Lu Y., Deng B., Wang Z., Zhi P., Li Y., Wang S., Hybrid physical metric for 6-DoF grasp pose detection, in: 2022 International Conference on Robotics and Automation, ICRA, IEEE, 2022, pp. 8238–8244.

[43]

Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., et al., An image is worth 16x16 words: Transformers for image recognition at scale. arxiv 2020, 2010, arXiv preprint arXiv:2010.11929.

[44]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.

[45]

X. Yu, L. Tang, Y. Rao, T. Huang, J. Zhou, J. Lu, Point-bert: Pre-training 3d point cloud transformers with masked point modeling, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19313–19322.

Index Terms

6-DoF grasp estimation method that fuses RGB-D data based on external attention
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
      2. Computer vision tasks
        Scene understanding
        Vision for robotics
  2. Computer graphics
    1. Shape modeling
      1. Shape analysis
2. Social and professional topics
  1. Professional topics
    1. History of computing
      1. History of computing theory

Index terms have been assigned to the content through auto-classification.

Recommendations

A Novel Camera Fusion Method Based on Switching Scheme and Occlusion-Aware Object Detection for Real-Time Robotic Grasping
Abstract
Real-time vision-based robotic grasping is challenging in clutter. In such scene, the target object should be perceived accurately, where it may be occluded and misrecognized by many distractors including irrelevant objects and the robotic arm. In ...
A leg proprioception based 6 DOF odometry for statically stable walking robots

This article presents a 3D odometry algorithm for statically stable walking robots that only uses proprioceptive data delivered by joint angle and joint torque sensors embedded within the legs. The algorithm intrinsically handles each kind of emerging ...
Pose Estimation of Six-axis Industrial Robots Based on Deep Learning
EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering

Industrial robot pose estimation has important applications in industrial robot safety detection and abnormal pose analysis. Aiming at the current problems of low accuracy of industrial robot pose estimation, this paper proposes a pose estimation method ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Visual Communication and Image Representation

Journal of Visual Communication and Image Representation Volume 101, Issue C

May 2024

313 pages

Issue’s Table of Contents

Elsevier Inc.

Publisher

Academic Press, Inc.

United States

Publication History

Published: 18 July 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents