Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Kasaei, Hamidreza; Kasaei, Mohammadreza; Tziafas, Georgios; Luo, Sha; Sasso, Remo

doi:10.1007/s10846-024-02092-5

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Regular paper
Open access
Published: 16 April 2024

Volume 110, article number 62, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Download PDF

Hamidreza Kasaei ORCID: orcid.org/0000-0001-9408-7730¹,
Mohammadreza Kasaei²,
Georgios Tziafas¹,
Sha Luo¹ &
…
Remo Sasso¹

1148 Accesses
1 Citation
Explore all metrics

Abstract

To aid humans in everyday tasks, robots need to know which objects exist in the scene, where they are, and how to grasp and manipulate them in different situations. Therefore, object recognition and grasping are two key functionalities for autonomous robots. Most state-of-the-art approaches treat object recognition and grasping as two separate problems, even though both use visual input. Furthermore, the knowledge of the robot is fixed after the training phase. In such cases, if the robot encounters new object categories, it must be retrained to incorporate new information without catastrophic forgetting. To resolve this problem, we propose a deep learning architecture with an augmented memory capacity to handle open-ended object recognition and grasping simultaneously. In particular, our approach takes multi-views of an object as input and jointly estimates pixel-wise grasp configuration as well as a deep scale- and rotation-invariant representation as output. The obtained representation is then used for open-ended object recognition through a meta-active learning technique. We demonstrate the ability of our approach to grasp never-seen-before objects and to rapidly learn new object categories using very few examples on-site in both simulation and real-world settings. Our approach empowers a robot to acquire knowledge about new object categories using, on average, less than five instances per category and achieve \(95\%\) object recognition accuracy and above \(91\%\) grasp success rate on (highly) cluttered scenarios in both simulation and real-robot experiments. A video of these experiments is available online at: https://youtu.be/n9SMpuEkOgk

Article PDF

Learn to grasp unknown objects in robotic manipulation

Article 18 August 2021

Development of a robust cascaded architecture for intelligent robot grasping using limited labelled data

Article 13 September 2023

A Method for Object Recognition and Robot Grasping Detection in Multi-object Scenes

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Wang, J., Chakraborty, R., Stella, X.Y.: Spatial transformer for 3d point clouds. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Yu, C., Wang, J., Gao, C., Yu, G., Shen, C., Sang, N.: Context prior for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), (2020)
Fang, H.-S., Wang, C., Gou, M., Lu, C.: Graspnet-1billion: a large-scale benchmark for general object grasping. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11 444–11 453 (2020)
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Bohg, J., Morales, A., Asfour, T., Kragic, D.: Data-driven grasp synthesis–a survey. IEEE Trans. Rob. 30(2), 289–309 (2013)
Article Google Scholar
Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. The International Journal of Robotics Research 34(4–5), 705–724 (2015)
Article Google Scholar
Mahler, J., Liang, J., Niyaz, S., Laskey, M., Doan, R., Liu, X., Ojea, J.A., Goldberg, K.: Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics (2017). arXiv preprint arXiv:1703.09312
Morrison, D., Corke, P., Leitner, J.: Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach. In: Processing of robotics: science and systems (RSS), (2018)
Klokov , R., Lempitsky, V.: Escape from cells: Deep kd-networks for the recognition of 3D point cloud models. In: Proceedings of the IEEE international conference on computer vision, pp. 863–872 (2017)
Kanezaki, A., Matsushita, Y., Nishida, Y.: RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5010–5019 (2018)
Kumra, S., Joshi, S., Sahin, F.: Antipodal robotic grasping using generative residual convolutional neural network. In: IEEE/RSJ International conference on intelligent robots and systems (IROS) 2020, 9626–9633 (2020)
Breyer, M., Chung, J.J., Ott, L., Roland, S., Juan, N.: Volumetric grasping network: Real-time 6 dof grasp detection in clutter. In: Conference on robot learning, (2020)
Mousavian, A., Eppner, C., Fox, D.: 6-dof graspnet: Variational grasp generation for object manipulation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2901–2910 (2019)
Newbury, R., Gu, M., Chumbley, L., Mousavian, A., Eppner, C., Leitner, J., Bohg, J., Morales, A., Asfour, T., Kragic D et al.: Deep learning approaches to grasp synthesis: A review. IEEE Trans. Robot. (2023)
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: Yolov4: Optimal speed and accuracy of object detection (2020). arXiv preprint arXiv:2004.10934
Bendale, A., Boult, T.E.: Towards open set deep networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1563–1572 (2016)
Subramanya, A., Pillai, V., Pirsiavash, H.: Fooling network interpretation in image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 2020–2029 (2019)
Da, Q., Yu, Y., Zhou, Z.-H., Learning with augmented class by exploiting unlabeled data. In: Proceedings of the AAAI conference on artificial intelligence, 28(1), 2014
Scheirer, W.J., Jain, L.P., Boult, T.E.: Probability models for open set recognition. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2317–2324 (2014)
Article Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920 (2015)
Maturana, D., Scherer, S.: VoxNet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, pp. 922–928 (2015)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5648–5656 (2016)
Shi, B., Bai, S., Zhou, Z., Bai, X.: Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE Signal Process. Lett. 22(12), 2339–2343 (2015)
Article Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp. 945–953 (2015)
Parisotto, T., Mukherjee, S., Kasaei, H.: More: simultaneous multi-view 3d object recognition and pose estimation. Int. Serv. Robot. pp. 1–12 (2023)
Xiong, K.H., Songsong.: Enhancing fine-grained 3d object recognition using hybrid multi-modal vision transformer-cnn models. In: 2023 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, (2023)
Kasaei, S.H., Melsen, J., van Beers, F., Steenkist, C., Voncina, K.: The state of lifelong learning in service robots: Current bottlenecks in object perception and manipulation. Journal of Intelligent & Robotic Systems 103, 1–31 (2021)
Article Google Scholar
Sener, O., Savarese, S.: Active learning for convolutional neural networks: A core-set approach (2017). arXiv preprint arXiv:1708.00489
Aggarwal, U., Popescu, A., Hudelot, C.: Active learning for imbalanced datasets. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), (2020)
Siddiqui, Y., Valentin, J., Niessner, M.: Viewal: Active learning with viewpoint entropy for semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), (2020)
Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: International conference on machine learning. PMLR, pp. 1183–1192 (2017)
Kasaei, S.H.O.: OrthographicNet: A deep transfer learning approach for 3D object recognition in open-ended domains. IEEE/ASME Trans. Mechatronics, pp 1–1 (2020)
Kasaei, S.H., Tomé, A.M., Lopes, L.S.: Hierarchical object representation for open-ended object category learning and recognition. In: Advances in neural information processing systems, pp. 1948–1956 (2016)
Kasaei, X.S., Hamidreza.: Lifelong ensemble learning based on multiple representations for few-shot object recognition. Robot. Auton. Syst. (2023)
Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Gupta, B.B., Chen, X., Wang, X.: A survey of deep active learning. ACM computing surveys (CSUR) 54(9), 1–40 (2021)
Article Google Scholar
Liu, S., Li, T., Chen, W., Li, H.: Soft rasterizer: A differentiable renderer for image-based 3D reasoning. In: Proceedings of the IEEE international conference on computer vision, pp. 7708–7717 (2019)
Thrun, S.: Probabilistic robotics. Commun. ACM 45(3), 52–57 (2002)
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly S et al.: An image is worth 16x16 words: Transformers for image recognition at scale (2020). arXiv preprint arXiv:2010.11929
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In.: IEEE Conference on computer vision and pattern recognition. Ieee 2009, 248–255 (2009)
Calli, B., Singh, A., Bruce, J., Walsman, A., Konolige, K., Srinivasa, S., Abbeel, P., Dollar, A.M.: Yale-cmu-berkeley dataset for robotic manipulation research. The International Journal of Robotics Research 36(3), 261–268 (2017)
Article Google Scholar
Kirkpatrick, S., Gelatt Jr, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science, 220(4598), 671–680 (1983)
Kasaei, S.H., Oliveira, M., Lim, G.H., Lopes, L.S., Tomé, A.M.: Interactive open-ended learning for 3D object recognition: An approach and experiments. Journal of Intelligent & Robotic Systems 80(3–4), 537–553 (2015)
Article Google Scholar
Keunecke, N., Kasaei, S.H.: Combining shape features with multiple color spaces in open-ended 3d object recognition. IEEE-RAS International conference on humanoid robots (Humanoids), (2020)
Ji, R., Wen, L., Zhang, L., Du, D., Wu, Y., Zhao, C., Liu, X., Huang, F.: Attention convolutional binary neural tree for fine-grained visual categorization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10 468–10 477 (2020)
Chauhan, A., Lopes, L.S.: Using spoken words to guide open-ended category formation. Cogn. Process. 12(4), 341 (2011)
Article Google Scholar
Kasaei, S.H., Lopes, L.S., Tomé, A.M.: Coping with context change in open-ended object recognition without explicit context information. In: 2018 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, pp. 1–7 (2018)
Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: Robotics and automation (ICRA), 2011 IEEE international conference on. IEEE, pp. 1817–1824 (2011)
Kasaei, S.H., Oliveira, M., Lim, G.H., Lopes, L.S., Tomé, A.M.: Towards lifelong assistive robotics: A tight coupling between object perception and manipulation. Neurocomputing 291, 151–166 (2018)
Article Google Scholar
Hoffman, M., Bach, F.R., Blei, D.M.: Online learning for latent dirichlet allocation. In: Advances in neural information processing systems, pp. 856–864 (2010)
Kasaei, S.H., Sock, J., Lopes, L.S., Tomé, A.M., Kim, T.-K.: Perceiving, learning, and recognizing 3D objects: An approach to cognitive service robots. In: Thirty-second AAAI conference on artificial intelligence, (2018)
Gualtieri, M., Ten Pas, A., Saenko, K., Platt, R.: High precision grasp pose detection in dense clutter. In: 2016 IEEE/RSJ International conference on intelligent robots and systems (IROS). IEEE, pp. 598–605 (2016)
Morrison, D., Corke, P., Leitner, J.: Learning robust, real-time, reactive robotic grasping. The International Journal of Robotics Research 39(2–3), 183–201 (2020)
Mokhtar, K., Heemskerk, C., Kasaei, H.: Self-supervised learning for joint pushing and grasping policies in highly cluttered environments (2022). arXiv preprint arXiv:2203.02511
Xu, Y., Kasaei, M., Kasaei, H., Li, Z.: Instance-wise grasp synthesis for robotic grasping (2023). arXiv preprint arXiv:2302.07824

Download references

Acknowledgements

We thank the Center for Information Technology of the University of Groningen for their support and for providing access to the Peregrine high-performance computing cluster.

Author information

Authors and Affiliations

Department of Artificial Intelligence, University of Groningen, Groningen, The Netherlands
Hamidreza Kasaei, Georgios Tziafas, Sha Luo & Remo Sasso
School of Informatics, University of Edinburgh, Edinburgh, UK
Mohammadreza Kasaei

Authors

Hamidreza Kasaei
View author publications
You can also search for this author in PubMed Google Scholar
Mohammadreza Kasaei
View author publications
You can also search for this author in PubMed Google Scholar
Georgios Tziafas
View author publications
You can also search for this author in PubMed Google Scholar
Sha Luo
View author publications
You can also search for this author in PubMed Google Scholar
Remo Sasso
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Hamidreza Kasaei proposed the main idea and led the work. He also contributed to the development of the approach and performed experiments in both simulation and real robots. Mohammadreza Kasaei also contributed to the development of the proposed approach and performed simulation experiments. Georgios Tziafas developed the Vision Transformer part and Sha Lou contributed to developing the simulation environment. Remo Sasso was partly involved in the development of the grasp network. All authors reviewed the manuscript.

Corresponding author

Correspondence to Hamidreza Kasaei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kasaei, H., Kasaei, M., Tziafas, G. et al. Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains. J Intell Robot Syst 110, 62 (2024). https://doi.org/10.1007/s10846-024-02092-5

Download citation

Received: 10 February 2023
Accepted: 11 March 2024
Published: 16 April 2024
DOI: https://doi.org/10.1007/s10846-024-02092-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Abstract

Article PDF

Similar content being viewed by others

Learn to grasp unknown objects in robotic manipulation

Development of a robust cascaded architecture for intelligent robot grasping using limited labelled data

A Method for Object Recognition and Robot Grasping Detection in Multi-object Scenes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains

Abstract

Article PDF

Similar content being viewed by others

Learn to grasp unknown objects in robotic manipulation

Development of a robust cascaded architecture for intelligent robot grasping using limited labelled data

A Method for Object Recognition and Robot Grasping Detection in Multi-object Scenes

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation