Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Pattern Recognition Letters 32 (2011) 2254–2260 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec Towards stratified model-based environmental visual perception for humanoid robots D. Gonzalez-Aguirre ⇑, T. Asfour, R. Dillmann Institute for Anthropomatics, Humanoids and Intelligence Systems Lab, Karlsruhe Institute of Technology, Adenauerring 2, Karlsruhe, Germany a r t i c l e i n f o Article history: Available online 13 October 2010 Keywords: Model-based vision Object recognition and detection Cognitive vision Humanoid robots a b s t r a c t An autonomous environmental visual perception approach for humanoid robots is presented. The proposed framework exploits the available model information and the context acquired during global localization by establishing a vision-model coupling in order to overcome the limitations of purely data-driven approaches in object recognition and surrounding status assertion. The exploitation of the model-vision coupling through the properceptive components is the key element to solve complex visual assertion-queries with proficient performance. An experimental evaluation with the humanoid robot ARMAR-IIIa is presented. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction The emerging research field of humanoid robots for human daily environments is an exciting multidisciplinary challenge. It embodies multiple aspects and disciplines from mechanical engineering up to artificial intelligence. The physical composition and appearance of humanoid robots differentiate them from other robots according to their application domain. This composition will ultimately allow the robots to noninvasively and effectively operate in human-centered environments. In order to properly and efficaciously interact in those environments it is indispensable to equip the humanoid robots with autonomous perception capabilities. Recently, considerable results (Okada et al., 2006, 2007, 2008a,b) in this field have been achieved and several humanoid robots expose various knowledge-driven capabilities. However, these approaches mainly concentrate on knowledge processing for grasping with fixed object-centered attention zones, e.g. a kettle’s tip for pouring tea, a water faucet for washing a cup, etc. These approaches assume a fixed pose of the robots in order to perceive and manipulate unattached objects and environmental elements within a kitchen. In addition, the very narrow field-ofview with no objects in the background and the fully saturated colors of the auxiliary localization props constrains their applicability in real daily scenarios. ⇑ Corresponding author. Tel.: +49 721 608 8489; fax: +49 721 608 4077. E-mail addresses: gonzalez@ira.uka.de (D. Gonzalez-Aguirre), asfour@kit.edu (T. Asfour), ruediger.dillmann@kit.edu (R. Dillmann). 0167-8655/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2010.09.028 These perception limitations can be overcome through a properceptive1 stratified sensing approach. It allows an enhanced exploitation of the available model information by including compact but concise cue-extraction from the model and reasoning sublayers within the visual perception system. There are works on humanoid robots reasoning for task planning and situation interpretation, see (Okada et al., 2008a,b). These approaches focus on atomic operations and discrete transitions between states of the modeled scenario for behavior generation and verification. This high-level scenario reasoning is not the focus of the present work, but the inclusion of the essential properceptive and reasoning mechanism while perception takes place in order to robustly recognize and interpret complex patterns, i.e. distinguish and track environmental objects in presence of cluttered backgrounds, grasping occlusions and different poses of both the humanoid robots and the objects. This article focuses on rigid elements of the environment which could be transformed through rigid-parametric transformations, e.g. furniture, kitchen appliances, etc. In the following sections, the model-based stratified visual perception for humanoid robots and its implementation are introduced. It comes with the experimental results of a demonstration scenario, where the concepts were evaluated providing remarkable 1 Properception is the counterpart of perception. The properception deals with the external world by internal means through models and knowledge mechanisms, whereas the perception captures the world through external sensory stimuli. In contrast to the properception, the proprioception deals with the sense related to limbs position, self-posture, awareness of equilibrium, and other internal conditions. Both properception and proprioception provide awareness of the outside (models) and inside (interoceptive senses) respectively, see Fig. 1. D. Gonzalez-Aguirre et al. / Pattern Recognition Letters 32 (2011) 2254–2260 2255 The Fig. 1 shows the strata or spaces of abstraction involved in this approach. By dividing the whole approach into these container spaces it is possible to establish the bridge (see Figs. 1e and f) between the reality and the models. The vision-model coupling is composed by the confluence of the stimuli-novelty (percepts) and inference-prediction (symbols) respectively provided by the perception and properception processes. In order to make this coupling mechanism tractable and its implementation plausible, it is necessary to profit from both the vision-to-model association acquired during the global localization by our previous work (Gonzalez-Aguirre et al., 2006, 2008, 2010, 2009; Wieland et al., 2009) and the model-to-vision resulting from the inference rules in the model-based approach. term memory. Particularly in the environmental perception, the world-model and the defined transformations compose this nonvolatile memory. In this approach, an appropriate description has been done by separating the geometric composition from the pose. The attributes are the configuration state of the instances, e.g. name, identifier, type, size, parametric transformation, etc. This persistent graph structure together with the implemented mechanism (see Sec. 2.3-4 in Gonzalez-Aguirre et al. (2008)) for pruning and inexact matching constitute the spatial query solver, see Fig. 2. On the other hand, the mental imagery (see Section 3.4) and the acquired percepts are contained within an ego-space which corresponds to the short term memory. By attaching the base platform pose (the frame B in Fig. 3a) to the registration pose of the contained percepts, it is possible to have a short term registration frame for the fusion and model exploitation. Obviously, when the humanoid robot moves its platform, the temporal registration frame and the contained percepts of the ego-space are automatically discarded. 3. Visual perception framework 3.2. Sensing The processing of low-level sensor data and higher-level worldmodel information for segmentation, recognition, and association constitutes the visual perception. It bridges the gap between the image processing and the object recognition components through a cognitive perception framework (Patnaik, 2007). This framework actively extracts valuable information from the real world through stereo color images and the kinematic configuration of the humanoid robots active vision head (Asfour et al., 2008). The adequate representation, efficient unified storage, automatic recall, and task-driven processing of this information takes place within different states of cognition. These cognition states are categorically organized according to their function as sensing, attention, reasoning, recognition, planning, coordination, and learning, see Fig. 2. The noise-tolerant vision-to-model (see Fig. 1e) coupling arises from the full configuration of the active vision system including the calibrated internal joint configuration (Welke et al., 2008), the external position and the orientation of the camera centers as well as all required mechanisms (Azad and Dillmann, 2009) to obtain Euclidean metric from stereo images (Hartley and Zisserman, 2004), see Figs. 3a and b. real-time results which purely data-driven algorithms would hardly provide. 2. Stratified visual perception 3.1. Memory: model and ego spaces The formal representation of real objects within the application domain and the relationships between them constitute the long 3.3. Planning Visual planning determines two fundamental aspects for the robust perception; 3.3.1. Target subspace When the target-node has been established (usually by the coordination cycle), the visual-planner provides a frame and the definition of a subspace W where the robot has to be located, so the target-node can be robustly recognized, see Figs. 3c and d. Notice that the subspace W is not a single pose as in Okada et al. Fig. 1. The model-based stratified visual perception for humanoid robots: a) the physical-space embraces the tangible reality, b) the visual-space embodies the image projection from the reality to percepts by means of sensor devices and active recognition components, c) the ego-space is the short term registration storage for percepts and self-localization, d) the model-space contains the geometrical and topological description of the entities of the physical-space, e) the signals-to-percepts is the transducer process from visual-space to ego-space. It converts incoming signals from the visual-space to outgoing percepts corresponding to abstracted entities of the model-space, and f) the symbols-to-percepts process fuses the percepts corresponding to abstracted entity in the model-space. 2256 D. Gonzalez-Aguirre et al. / Pattern Recognition Letters 32 (2011) 2254–2260 Spatial Query Solver Fig. 2. The states of cognition and cycles involved in the model-based visual perception approach. The properception is implemented by the mental imagery module which fuses information from the long- with the short-term memory, i.e. world- and ego-spaces respectively. The mental imagery provides the properceptive cues for rule-based reasoning and attention planning. The integration of these cues takes place in the image and space domain reasoning submodules, see Section 4. The communication interface with the coordination cycle is done through assertion queries, see an example in Section 5.  i ¼ M t ½M stereo ðX i Þ, b) the partial transformation from visual-space to Fig. 3. a) Mapping of percepts from physical-space to the ego-space by the composed transformation X ego the ego-space M tego ¼ ½T ðtÞ N ðtÞ HC L 1 , see also (Gonzalez-Aguirre et al., 2010), c) the restriction subspace W where the target-node can be robustly recognized, and d) alternative view of the subspace W. (2008a,b). It is a rather wide range of reachable poses allowing more flexible applicability and more robustness through wider tolerance for uncertainties in the navigation and self-localization. 3.3.2. Appearance context Once the robot has reached a valid position within W, the visual-planner uses the CAD geometric composition of the node to predict parametric transformations and appearance properties, i.e. how the modeled image content looks like and how the spatial distribution of environmental elements is related to the current pose. Notice that this is not a set of stored image-node associations as in the appearance graph approaches (Koenig et al., 2008) but a novel generative/extractive continuous technique implemented by the following properception mechanism. 3.4. Properception: towards visual mental imagery The properception skills cue-extraction and prediction allow the humanoid robot to capture the world by internal means by exploiting the world-model (scene-graph) through the hybrid cameras, see Fig. 4. These hybrid devices use the full stereoscopic calibration of the real stereo rig in order to set the projection volume and the projection matrix within the virtual visualization. This half virtual/ half physical device is inspired by the inverted concept of augmented reality approaches (Koenig et al., 2008) for overlay image composition. In contrast to augmented reality, this hybrid stereo rig is used to predict and analyze the image content in the world-model, e.g. rigid parametric transformations, extraction of position and orientation cues either for static or dynamic configurations, i.e. rotation or translation over time as in Fig. 4. 4. Visual reasoning for recognition The reasoning process for perception is decomposed into two interdependent domains; Visual domain: The 2D signals-to-percepts process (see Fig. 1e) deals with the image content and includes all the chromatic-radiometric sensor calibration and signal processing components required for segmentation, saliency estimation, and geometric primitive extraction. Furthermore, these components are capable of incorporating additional information for purpose-driven extraction, i.e. model-based segmentation. For a detailed example see Section 4.1. Spatial domain: The 3D percepts-to-symbols matching and reasoning process (see Fig. 1f) manages the geometric entities from both the ego-space and the modelspace. This management includes the coupling and the inference through (until now) simple geometric rules, for a more detailed illustration see Section 4.2. D. Gonzalez-Aguirre et al. / Pattern Recognition Letters 32 (2011) 2254–2260 2257 Fig. 4. The properceptive mental imagery for prediction of dynamic configuration trajectories. The blue lines in the left IpL ðtÞ and right IpR ðtÞ image planes of the hybrid cameras show the ideal trajectory of the interest point Ip (door handle end-point) during a door opening task. The predicted subspace reduces the size of the region of interest. In addition, the predicted orientation helps to reject complex outliers, see example in Section 4.1. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) 4.1. Signals-to-percepts The pose estimation of a partially occluded door handle, when the robot has already grasped it, turns out to be difficult because of many perturbation factors:  No size rejection criterion can be assumed, because the robot’s hand is partially occluding the handle surface and the hand slides during task execution, producing variation of the apparent size.  No assumption about color or edges in the background of the handle could be made. This happens when the door is partially open and the perspective view overlaps handles from lower doors similar chromatic distributions appear. This image content avoids the edge tracking (Azad et al., 2009).  In addition, the glittering of the metal surfaces on both the robot’s hand and door’s handle produce very confusing phenomena when using standard segmentation techniques (Comaniciu et al., 2002; Hui Zhang and Fritts, 2008). In this context, we propose an environment-dependent but very robust and fast technique (25–50 ms) to simultaneously segment the regions and erode the borders, producing non-connected regions. First, the raw RGB-color image Irgb ðxÞ 2 N3 ; x 2 N2 is split per channel and used to compute the power image I/ 2 R, namely  p I/ ðx; pÞ ¼ Ir ðxÞ  Ig ðxÞ  Ib ðxÞ ; where p 2 R and p > 1, see Fig. 5. After linear normalization and locally adaptive threshold (Chang et al., 2006), a binary image IB(x) is produced. It is used to extract the pixel-connected components (blobs) Bk :¼ fxi gni¼1 and build the corresponding feature vectors F(Bk) for rejection and/or matching purposes (see Fig. 5b), namely FðBk Þ :¼ ½n; dðBk Þ; Er1 ðBk Þ=Er2 ðBk ÞT : This feature vector is formed by the blob’s discrete area jBkj = n, its power density dðBk Þ :¼ n 1X I/ ðxi ; pÞ; n i¼1 and the elongation descriptor, i.e. the ratio of the blob’s eigenvalues Er1 : Er2 computed by the singular value decomposition h ƒƒ! i Er1;2 ; Er1;2 ¼ SVDðMBk Þ -centered and k-weighted covariance matrix MBk , namely of the x n X 1 I/ ðxi ; pÞ  xi ; n  dðBk Þ i¼1  2 kðxi Þ :¼ I/ ðxi ; pÞ  dðBk Þ Pn ½xi  x  T kðxi Þ½xi  x : M Bk :¼ i¼1 Pn i¼1 kðxi Þ  :¼ x This characterization allows a powerful rejection of blobs when verifying the right-left cross matching by allowing only candidates in pairs (Bk, Bm) which fulfill the coherence criterion K(Bk, Bm) > Kmin, i.e. the orientation of their main axes shows an angular discrepancy less than arccos(Kmin) radians. Until this point, the image feature extraction methods proceed without any model information or knowledge rule, i.e. data-driven. In the next step, the properceptive cue selection and usage is introduced. The interest point Ip in both images is selected as the furthest pixel along the blob’s main axis in the opposed direction of the vector C, i.e. unitary vector from the door center to the center of the line segment where the rotation axis is located, see Figs. 6 and 7. This vector is obtained from the mental imagery as stated in Section 3.4. Moreover, the projected edges of a door within the kitchen improves the segmentation results while extracting the door pose. It improves the overall precision by avoiding to consider edge pixels close to the handle, see Fig. 6a. The key factor of this model-vision coupling relies on the fact that very general information is applied. In other words, from the projected lines and blobs extracted employing mental imagery, only their direction is used (e.g. injected through a noise-tolerant criterion Kmin) and not the position itself, which normally differs from the real one. These deviations are produced due to the discretization, quantization, sensor noise, actuator deviations, and model uncertainties. 4.2. Percepts-to-symbols One interesting feature of this approach is the usage of the vision-model coupling to deal with limited visibility. For instance, because of the size of both the door and the 3D field-of-view (3DFOV, see Figs. 3c and d and Fig. 7), it can be easily corroborated that the minimal distance where the robot must be located for the complete door to be contained inside the robot’s 3DFOV, lies outside of the reachable space. Therefore, triangulation techniques cannot be used. In this situation, the reasoning for perception uses a simple geometric rule for the recognition and pose estimation of the door. 2258 D. Gonzalez-Aguirre et al. / Pattern Recognition Letters 32 (2011) 2254–2260 Fig. 5. a) Input image Irgb. Notice that the book (particularly the white paper side) in the background. It shows not only a similar color distribution, but is has almost the same size as the door handle and b) the power image I/ and the blobs Bk. Based only on the feature vector Bk (data-driven) recognition it will be hardly possible to reject the presence of a door handle in the location of the book. Fig. 6. a) The input image Irgb with recognized edges, projected model edges and the properceptive cue (unitary vector C) and b) segmentation results for block and edge analysis. images in a stereo system define an intersection subspace, i.e. a 3D-line Kk ¼ UL ^ Ulð!L ;!R Þ : Fig. 7. Geometric elements involved during the spatial reasoning for perception. The 3D field-of-view is the subspace resulting from intersecting the left and right field-of-views of the stereo-rig (Gonzalez-Aguirre et al., 2006). These 3D-lines Kk are subject to noise and calibration artifacts. Thus, they are not suitable for direct computation of 3D intersections. However, their direction is robust enough. Next, the left image 2D points H(L,i) resulting from the intersection of 2D-lines !i are matched against those in the right image H(R,j) producing 3D points X ðLi ;Rj Þ by means of least square triangulation. Finally, it is possible to acquire corners of the door and directions of the lines connecting them, even when only partial edges are visible. The direction of the vector C is the long-term memory cue used to select the 3D edge line by its direction DAxis and the closest point X ðLi ;Rj Þ , namely PAxis in Fig. 7. 5. Experimental evaluation The preconditions of the rule are: three partially visible edges of the door, the context (robot joint configuration and pose) and the model to assert. The post conditions are: the pose of the model and the geometric derived information, e.g. door’s normal vector and the door’s angle of aperture. The rule is computed as follows: First, a 2D-line !i on an image and the center of its capturing camera Lc or Rc define a 3D-space plane U, see Fig. 7. Hence, two such planes UL and Ulð!L ;!R Þ , resulting from the matching l(!L, !R) of two lines in the left and right In order to demonstrate the advantages of the presented approach for visual perception and to verify these methods, we accomplished the task of door opening in a regular kitchen with the humanoid robot ARMAR-IIIa (Asfour et al., 2006). In this scenario (see Fig. 8), the estimation of the relative pose of the furniture not only allows to grasp the door’s handle but it also helps to reduce the external forces on the hand during operation. This results from the adaption of the task frame while the manipulation changes the handle’s orientation. D. Gonzalez-Aguirre et al. / Pattern Recognition Letters 32 (2011) 2254–2260 2259 Fig. 8. Experimental evaluation of the framework with the humanoid robot ARMAR-IIIa interacting with a cupboard in the kitchen environment. In our previous approach (Prats et al., 2008), the results using only one sensory channel (force-torque sensor) were acceptable but not satisfactory because the estimation of the task frame solely depends on the accuracy of the robot kinematics. In this experimental evaluation, the framework estimates the interest point Ip and normal vector Np of the door to build the task frame. During task execution this frame is estimated by the previously mentioned methods and the impedance control balances the external forces and torques at the hand. For details on the sensor fusion strategy see Wieland et al. (2009). Robustness and reliability of the handle tracker are the key to reduce the force stress in the robot’s wrist as it has been shown in Wieland et al. (2009). Combining stereo vision and force control provides the advantage of real-time task frame estimation by vision, which avoids the errors of the robot’s kinematics and adjustment of actions by force control. 6. Conclusions The world-model and the available context acquired during self-localization do not only make it plausible to solve otherwise hardly possible, complex visual assertion queries, but they also help to dispatch them with a proficient performance. This is achieved by the presented framework which implements the basic reasoning skills by extracting simple but compelling geometrical cues from the properception component. These cues are applied as clue-filters for the association of percepts either for tracking (by optimization of the region of interest in terms of size) or handling incomplete visual information. The novelty of our approach is the coupling of vision-model by means of the properceptive cues generated with the mental imagery and the visual extraction for spatial reasoning. Acknowledgments The work described in this article was partially conducted within the German Humanoid Research project SFB588 funded by the German Research Foundation (DFG: Deutsche Forschungsgemeinschaft) and the EU Cognitive Systems projects GRASP (FP7-215821) and PACO-PLUS (FP6-027657) funded by the European Commission. The authors also thank the support from the DAAD-Conacyt Scholarship Reg.180868 funded by the German academic exchange service (DAAD: Deutscher Akademischer Austausch Dienst) and the Mexican National Council of Science and Technology (Conacyt: Consejo Nacional de Ciencia y Tecnología). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.patrec.2010.09.028. References Asfour, T., Regenstein, K., Azad, P., Schroder, J., Bierbaum, A., Vahrenkamp, N., Dillmann, R. 2006. Armar-III: An integrated humanoid platform for sensorymotor control, in: Humanoid Robots, 2006 6th IEEE-RAS International Conference on, pp. 169–175. Asfour, T., Welke, K., Azad, P., Ude, A., Dillmann, R. 2008. The karlsruhe humanoid head, in: Humanoid Robots, 2008. Humanoids 2008. 8th IEEE-RAS International Conference on, pp. 447–453. Azad, P., Asfour, T., Dillmann, R. 2009. Accurate shape-based 6-dof pose estimation of single-colored objects, in: Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International Conference on, pp. 2690–2695. Azad, P., Dillmann, R. 2009. The integrating vision toolkit, http://ivt.sourceforge.net/. Chang, C.-I., Du, Y., Wang, J., Guo, S.-M., Thouin, P., 2006. Survey and comparative analysis of entropy and relative entropy thresholding techniques, vision, image and signal processing. IEEE Proceedings. Comaniciu, D., Meer, P., Member, S., 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal Machine Intell 24, 603–619. Gonzalez-Aguirre, D., Bayro-Corrochano, E. 2006. Intelligent autonomous systems 9, proceedings of the 9th international conference on intelligent autonomous systems, university of tokyo, in: Arai, T., Pfeifer, R., Balch, T.R., Yokoi, H. (Eds.), IAS, IOS Press, ISBN:1-58603-595-9. Gonzalez-Aguirre, D., Asfour, T., Bayro-Corrochano, E., Dillmann, R. 2008. Modelbased visual self-localization using geometry and graphs, in: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pp. 1–5. Gonzalez-Aguirre, D., Wieland, S., Asfour, T., Dillmann, R., 2009. On environmental model-based visual perception for humanoids. In: The 15th Iberoamerican Congress on Pattern Recognition. CIARP 2010, vol. 5856. Springer. ISBN 978-3642-10267-7. Gonzalez-Aguirre, D., Asfour, T., Bayro-Corrochano, E., Dillmann, R., 2010. Improving model-based visual self-localization using gaussian spheres. In: BayroCorrochano, E., Scheuermann, G. (Eds.), Geometric Algebra Computing. Springer. ISBN 978-1-84996-107-3. Hartley, R., Zisserman, A., 2004. Multiple view geometry in computer vision, 2nd ed. Cambridge University Press. ISBN 0521540518. Hui Zhang, S.A.G., Fritts, Jason E., 2008. Image segmentation evaluation: A survey of unsupervised methods. Comput. Vision and Image Understanding 110 (2), 260– 280. Koenig, A., Kessler, J., Gross, H.-M. 2008. A graph matching technique for an appearance-based, visual slam-approach using rao-blackwellized particle filters, in: Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International Conference on, pp. 1576–1581. Okada, K., Kojima, M., Sagawa, Y., Ichino, T., Sato, K., Inaba, M. 2006. Vision based behavior verification system of humanoid robot for daily environment tasks, in: Humanoid Robots, 2006 6th IEEE-RAS International Conference on, pp. 7–12. Okada, K., Kojima, M., Tokutsu, S., Maki, T., Mori, Y., Inaba, M. 2007. Multi-cue 3D object recognition in knowledge-based vision-guided humanoid robot system, in: Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on, pp. 3217–3222. Okada, K., Tokutsu, S., Ogura, T., Kojima, M., Mori, Y., Maki, T., Inaba, M. 2008. Scenario controller for daily assistive humanoid using visual verification, in: Intelligent Autonomous Systems 10, 2008. IAS-10. International Conference on, pp. 398–405. 2260 D. Gonzalez-Aguirre et al. / Pattern Recognition Letters 32 (2011) 2254–2260 Okada, K., Kojima, M., Tokutsu, S., Mori, Y., Maki, T., Inaba, M. 2008. Task guided attention control and visual verification in tea serving by the daily assistive humanoid HRP2-JSK, in: Intelligent Robots and Systems, 2008. IROS 2008. IEEE/ RSJ International Conference on, pp. 1551–1557. Patnaik, S. 2007. Robot Cognition and Navigation: An Experiment with Mobile Robots, Springer-Verlag, Berlin Heidelberg, ISBN:978-3-540-23446-3. Prats, M., Wieland, S., Asfour, T., del Pobil, A., Dillmann, R. 2008. Compliant interaction in household environments by the Armar-III humanoid robot, in: Humanoid Robots, 2008. Humanoids 2008. 8th IEEE-RAS International Conference on, pp. 475–480. Welke, K., Przybylski, M., Asfour, T., Dillmann, R. 2008. Kinematic calibration for saccadic eye movements,, Tech. rep., Institute for Anthropomatics, University of Karlsruhe, http://digbib.ubka.uni-karlsruhe.de/volltexte/1000012977. Wieland, S., Gonzalez-Aguirre, D., Vahrenkamp, N., Asfour, T., Dillmann, R. 2009. Combining force and visual feedback for physical interaction tasks in humanoid robots, in: Humanoid Robots, 2009. Humanoids 2009. 9th IEEE-RAS International Conference on, pp. 439–446.