1 Introduction

With the increase in labor cost and the automation demand of production line equipment, industrial robots with stereo vision are applied to product processing, especially for bin picking and workpiece loading, such as the picking competition held by Amazon from the box or on the shelf. However, there are more and more demands for the automatic blanking and placement of objects in the “reversed picking process.” Limited by the visual perception technology of industrial robots, the packing process is usually set by manual procedure or based on artificial prior knowledge. For example, objects are packaged into simplified geometric shapes (cubes, cylinders, etc.) and then loaded into the boxes with known shapes and fixed positions. For some industrial robots with complex shape and elastic objects, robot engineers often set up the procedure of goods placement. Wynright company of the USA (Criswell 2014) built a mobile robot system for automatic loading of tires in containers, but there will still be unexpected space changes in the container body and thus remaining space (accommodation space). The key problem of automatic loading in an irregularly changing space is to obtain the pose of a target object in the remaining accommodation space, which eventually contains at least one object.

The effective pose matrix of the object in the accommodation space is the last crucial step in the process of intelligent robot’s picking, running (obstacle avoidance) and placing. Automatic loading (Guo et al. 2019) using industrial robot needs the hand–eye matrix (1), pose matrix (2) between the camera coordinate system and workpiece coordinate system of reference model, pose matrix (3) between the reference model and the target model and pose matrix (4) between the target object (or the reference model (5)) and the accommodation space, as shown in Fig. 1a. In the past few years, object recognition of robot in bin picking has been a hot topic in the scientific research and robot companies. Many algorithms have been proposed in the aspect of 6-degree of freedom (DOF) pose estimation, such as the application of Fast Point Feature Histograms (FPFH), Signature of Histograms of Orientations (SHOT), LineMod algorithm with template (Hinterstoisser et al. 2012), Point Pair Feature (PPF) with voting mechanism (Drost et al. 2010), DenseFusion (Wang et al. 2019), Pose-CNN based on regression (Xiang et al. 2017) and ssd-6d (Kehl et al. 2017).

Fig. 1
figure 1

Structure diagram of the system and the applications. a Structure diagram of automatic loading using industrial robots in the remaining accommodation space (Guo et al. 2019). b The applications of alpha-shape algorithm (Wang and Chen 2019)

The end-to-end trajectory of robot is also a hotspot research, which focuses on the trajectory planning based on optimal time or minimum energy. These planning algorithms mostly establish the dynamic and kinematic constraints of robot, using intelligent optimization algorithm to achieve trajectory acquisition or apply sensors to generate Octomap or three-dimensional (3D) data environment to establish space obstacle avoidance trajectory. Robot is usually based on the pre-setting pose matrix when it moves to the place where objects can be placed. Human beings have great flexibility in the operation of object placement, such as choosing a reasonable place for the object according to the size of the object or the environment on the plane, and packing in a limited space, which can be best placed according to different objects.

Alpha-shape algorithm is used to reconstruct object surface from an unorganized point cloud. The method was proposed in 2D points by Edelsbrunner and was then extended to 3D points (Edelsbrunner et al. 2003; Edelsbrunner and Mücke 1994). Compared with convex hull, alpha shape can be used to reconstruct object shape of nonconvex body, as shown in Fig. 1b. Alpha shape was widely used in 3D object shape. Zhu et al. (2008) proposed a novel approach for tree crown reconstruction based on an improvement of alpha-shape modeling, where the data are points unevenly distributed in a volume rather than on a surface only. Lou et al. (2013) used alpha shape to extract topographical features from engineering surfaces and found that the alpha-shape method was more efficient in performance for large structuring elements. Santos et al. (2019) proposed an adaptive method which estimates a local parameter for each edge based on local point spacing and used the method to extract the building roof boundaries from LiDAR Data.

Based on previous study (Guo et al. 2019), this paper still aims to solve the problem of object pose estimation in the accommodation space assuming that accommodation space can hold one target object. The alpha-shape algorithm and improved FOA (Pan 2012) are used to determine the object’s final state given a pose matrix. Firstly, the alpha-shape algorithms are introduced, and then, the alpha-shape volume variety of object and measured space is set to the objective function and the pose variety of object is set to six variables of improved FOA. Next, in the simulation experiments we at first present pose estimation process in cube space and obtain the convergence curves of four different spaces (namely the cube, hemisphere, cylindrical and triangular prism) and the object’s final pose in the four spaces. Finally, the reliability of the proposed method is verified experimentally, comparisons with previous work are given, and conclusions are drawn.

2 Proposed method

2.1 Alpha-shape algorithm

Alpha-shape algorithm can reconstruct geometry from a set of discrete points, as shown in Fig. 2. A circle with radius value α is used to scroll around the point set S. After traversing all the points, the inner and outer contours of S can be obtained. When the radius α is large, the circle rolls outside S, and the trace of the external roll is the boundary contour of the point set. When α is small, the circle will roll to the interior of S. When α is small enough, every point in S is a boundary point. The value of circle radius α is closely related to the fineness of the detected points contour. When the radius is relatively small, the detected contour is relatively fine, and when the radius is relatively large, the detected contour is slightly rough.

Fig. 2
figure 2

Alpha shape of the planar point set

2.2 Pose estimation algorithm

When an industrial robot picks, runs, and places the target object in the accommodation space, the shape of accommodation space generally can be generally convex. The swarm intelligence optimization algorithm usually requires an objective function to determine the direction of iterative optimization. Compared with previous work (Guo et al. 2019), we just use the volume of alpha shape to establish an objective function of FOA.

In the process of placing the object in the accommodation space, there are two groups of point clouds (the object and the accommodation space), in Fig. 3a–c. The two groups of points are combined into one group, and the change of relative position can be expressed by the whole volume value Vt. The whole volume value Vt of the optimal position is minimal, as the object is placed at the bottom of the accommodation space and there is no space between the object and the bottom of the accommodation space.

Fig. 3
figure 3

The process of object placing into the measured space

As the object is placed at the bottom of the accommodation space, the combined point cloud is concave. The whole volume value Vt can be obtained by tuning α value of alpha-shape algorithm. The fitness function of FOA is expressed by:

$$ S_{\min } = V_{t} . $$
(1)

Based on previous work (Guo et al. 2019), the target rotation is regarded as a separate rotation along each axis, and the posture adjustment of the object is simplified into three parameter variables, whose rotation matrixes are expressed as:

$$ {\mathbf{R}}_{x} (\alpha ) = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & {\cos (\alpha )} & { - \sin (\alpha )} \\ 0 & {\sin (\alpha )} & {\cos (\alpha )} \\ \end{array} } \right], $$
(2)
$$ {\mathbf{R}}_{y} (\beta ) = \left[ {\begin{array}{*{20}c} {\cos (\beta )} & 0 & {\sin (\beta )} \\ 0 & 1 & 0 \\ { - \sin (\beta )} & 0 & {\cos (\beta )} \\ \end{array} } \right], $$
(3)
$$ {\mathbf{R}}_{z} (\gamma ) = \left[ {\begin{array}{*{20}c} {\cos (\gamma )} & { - \sin (\gamma )} & 0 \\ {\sin (\beta )} & {\cos (\gamma )} & 0 \\ 0 & 0 & 1 \\ \end{array} } \right], $$
(4)

where θ, β and γ are the rotation parameters of x-, y- and z-axes, respectively, and \({\mathbf{R}}_{x} (\theta )\), \({\mathbf{R}}_{y} (\beta )\) and \({\mathbf{R}}_{z} (\gamma )\) are the corresponding rotation matrices, respectively.

To overcome local optimum, we still use multiple individuals to search for a global optimum, as shown in Fig. 4. We start by defining the point cloud of the target object as O(x, y, z) and the point cloud of the measured accommodation space as P(x, y, z).

Fig. 4
figure 4

Flowchart of pose estimation in the accommodation space based on improved FOA

(1) Initialize the FOA parameters:

$$ O_{g} (\overline{{x_{m} }} ,\overline{{y_{m} }} ,\overline{{z_{m} }} ) = \sum\limits_{i = 1}^{m} {O_{i} (x_{i} ,y_{i} ,z_{i} )} /m, $$
(5)
$$ P_{g} (\overline{{x_{n} }} ,\overline{{y_{n} }} ,\overline{{z_{n} }} ) = \sum\limits_{i = 1}^{n} {P_{i} (x_{i} ,y_{i} ,z_{i} )} /n, $$
(6)
$$ [O^{1} ,1]^{T} = \left[ {\begin{array}{*{20}c} {\mathbf{E}} & {{\mathbf{O}}_{g} { - }P_{g} } \\ 0 & 1 \\ \end{array} } \right] \cdot [O,1]^{T} , $$
(7)
$$ V^{t} = alphaShape([O^{1} ;P_{g} ]), $$
(8)

where \(m\) and \(n\) are the corresponding point cloud numbers, \({\mathbf{E}}\) is the 3 identity matrix, \(V^{t}\) is the volume of the alpha-shape algorithm result for the two point clouds, and alphaShape() represents the alpha-shape algorithm.

(2) Initialize the population position parameters of FOA:

$$ \left\{ {\begin{array}{*{20}c} {X_{i}^{o} = \overline{{x_{n} }} } \\ {Y_{i}^{o} = \overline{{y_{n} }} } \\ {Z_{i}^{o} = \overline{{z_{n} }} } \\ \end{array} } \right., $$
(9)
$$ \left\{ {\begin{array}{*{20}c} {\alpha_{i}^{o} = 2\pi \cdot rand - \pi } \\ {\beta_{i}^{o} = 2\pi \cdot rand - \pi } \\ {\gamma_{i}^{o} = 2\pi \cdot rand - \pi } \\ \end{array} } \right., $$
(10)

where \(rand\) denotes a random number within the range (0, 1), \(X_{i}^{o}\), \(Y_{i}^{o}\) and \(Z_{i}^{o}\) denote the coordinates of the ith initial given position, and \(\alpha_{i}^{o}\), \(\beta_{i}^{o}\) and \(\gamma_{i}^{o}\) denote the ith initial random angle parameters.

(3) The fruit flies use olfactory cues to search for the food in random directions and distances:

$$ \left\{ {\begin{array}{*{20}c} {X_{i} = X_{i}^{o} + a_{1} \cdot rand - b_{1} } \\ {Y_{i} = Y_{i}^{o} + a_{2} \cdot rand - b_{2} } \\ {Z_{i} = Z_{i}^{o} + a_{3} \cdot rand - b_{3} } \\ \end{array} } \right., $$
(11)
$$ \left\{ {\begin{array}{*{20}c} {\alpha_{i} = \alpha_{i}^{o} + a_{4} \cdot rand - b_{4} } \\ {\beta_{i} = \beta_{i}^{o} + a_{5} \cdot rand - b_{5} } \\ {\gamma_{i} = \gamma_{i}^{o} + a_{6} \cdot rand - b_{6} } \\ \end{array} } \right., $$
(12)

where \(a_{1} \sim a_{6}\) and \(b_{1} \sim b_{6}\) are constants used to constrain the range of random numbers.

(4) The minimum volume of alpha shape of the object and the accommodation space is taken as a judgment value of the taste concentration:

$$ {\mathbf{R}}_{i} = {\mathbf{R}}_{z} (\gamma_{i} ) \cdot {\mathbf{R}}_{y} (\beta_{i} ) \cdot {\mathbf{R}}_{x} (\alpha_{i} ), $$
(13)
$$ {\mathbf{T}}_{i} = [X_{i} ,Y_{i} ,Z_{i} ]^{T} , $$
(14)
$$ [O_{i}^{2} ,1]^{T} = \left[ {\begin{array}{*{20}c} {{\mathbf{R}}_{i} } & {{\mathbf{T}}_{i} } \\ 0 & 1 \\ \end{array} } \right] \cdot [O_{i}^{1} ,1]^{T} , $$
(15)
$$ PO_{i} (x,y,z) = [O_{i}^{2} ;P], $$
(16)
$$ V_{i}^{t} = alphaShape(PO_{i} (x,y,z)), $$
(17)
$$ S_{i} = V_{i}^{t} . $$
(18)

(5) The groups are sorted in ascending order according to the concentration value:

$$ {[}S\_s \, S\_index{\text{] = sort(}}S{)}. $$
(19)

where S_s is the volume value after sorting the groups and S_index is index of the corresponding individual.

(6) Half individuals with smaller concentration values are selected and marked again according to the concentration values:

$$ \left\{ {\begin{array}{*{20}c} {X_{j} = X(S\_index)} \\ {Y_{j} = Y(S\_index)} \\ {Z_{j} = Z(S\_index)} \\ \end{array} } \right., $$
(20)
$$ \left\{ {\begin{array}{*{20}c} {\alpha_{j} = \alpha (S\_index)} \\ {\beta_{j} = \beta (S\_index)} \\ {\gamma_{j} = \gamma (S\_index)} \\ \end{array} } \right., $$
(21)

where the range of \(S\_index\) is determined by the number of individuals in half population, (Xj, Yj, Zj) denote new position, (αj, βj, γj) denote new posture, and subscript j is index of individual.

(7) After step (6), return to step (3), and then to step (4); random direction and location are updated to obtain the taste concentration of a new half individuals. The pre-selected half individuals and the new half individuals are combined, and then, all individuals are sorted by the taste concentration according to step (5). The algorithm will stop the iterative optimization loop until the end condition is satisfied.

3 Experimental results

3.1 Obtaining the target object

In order to obtain the sparse point cloud of the target, the object was placed in the common field of view of the left and right cameras, and the object images of the left and right cameras were obtained, as shown in Fig. 5a. According to the point cloud acquisition process, Otsu threshold segmentation was performed on the object image of the left camera; subsequently, the background was removed, and region of the object was retained, as shown in Fig. 5b. Harris feature points were detected on the left image after threshold segmentation, as shown in Fig. 5c. The feature points of the right image were tracked by the obtained left image feature points, and the positions of features point pairs were obtained using KLT optical flow (Shi and Tomasi 1994), as shown in Fig. 5d. According to the corresponding feature point positions of left and right images and camera calibrated parameters, the top point clouds in the left camera coordinate system were calculated and finally rotated and translated into the world coordinate system, as shown in Fig. 5e. The bottom point cloud was obtained using the same method, and the top point cloud and the bottom point cloud were combined to a complete point cloud of the target object based on the world coordinate system.

Fig. 5
figure 5

The process of point cloud derived from binocular vision

3.2 Parameters setting result analysis

The initial parameters of the improved FOA are population size, number of iterations and random direction and position parameters (\(a_{1} \sim a_{6}\) and \(b_{1} \sim b_{6}\)). In the experiments, the number of iterations is set to be 100 and the radius α is set to be 250. The range of random position variation was limited to (− 1, 1), and the range of random posture variation was limited to (− 0.05 \(\pi\), 0.05 \(\pi\)); thus, \(a_{1} \sim a_{3}\) and \(b_{1} \sim b_{3}\) are set to be 2 and 1, respectively, and \(a_{4} \sim a_{6}\) and \(b_{4} \sim b_{6}\) are set to be 0.1 \(\pi\) and 0.05 \(\pi\), respectively. Software environment is Win10 64bit and MATLAB 2015b, and the corresponding hardware configuration parameters were CPU i7-7700 and RAM 8 GB.

Based on previous study, the point cloud of the target object is obtained by the Harris feature point detection and KLT optical flow, in Fig. 6a. 3D accommodation space is set as a cuboid with the size of 550 × 100 × 300 mm3, in Fig. 6b. The population number of fruit flies is 100. Figure 6c shows the initial state; Fig. 6d shows the 10th iteration; Fig. 6e shows the 20th iteration; and Fig. 6f shows that pose adjustment in the accommodation space is completed at the 40th iteration.

Fig. 6
figure 6

Iteration process and its final results

The space types are designed only to accommodate the object and the target object’s final posture in those three kinds of space as shown in Fig. 8. The four space types, namely the cube, hemisphere, cylindrical and triangular prism, represent different shapes of the common accommodation space. The proposed method can be applied to different convex shaped spaces. The iterative convergence curves of the four space types are plotted in Fig. 7. The four curves show that the object is basically placed in four spaces after iterating the 10th iteration. To test the effect of different value of parameter α on the estimation pose, parameter α values are set to be 60, 80, 100, 150, respectively. As shown in Fig. 9, there will be different results for setting different values of parameter α and it is important to choose proper α value.

Fig. 7
figure 7

The iterative convergence process

Fig. 8
figure 8

The designed types of accommodation space

Fig. 9
figure 9

The alpha shape with different parameter α

3.3 Method comparison

Our previous work uses convex hull to construct the point clouds of the object and the accommodation space. Object final pose in the hemisphere space is computed using previous method, as shown in Fig. 10. Comparing Figs. 8e–f, 10d with Fig. 10a–c, the result shows that alpha-shape algorithm is more suitable for the placement requirements of robot real scenes than the convex hull algorithm. Previous work only considers that the accommodation space meets the placement of the target object, but not gives the best final pose in practical application; our new study using alpha-shape algorithm not only keeps the object in the accommodation space, but also maintains the object pose which is at the bottom of the space and can meet the actual robot placements requirements.

Fig. 10
figure 10

The object pose estimation using convex hull

3.4 Hardware test

We used the SIASUN industrial robot (model: SR6C), binocular vision (or 3D vision sensor, model: Astra Mini) and vacuum sucker to establish the hardware test platform shown in Fig. 11a. A more complex-shaped object and a randomly cube box were considered, and their corresponding point clouds were derived using a 3D sensor, as shown in Fig. 11b, c. The final pose of the complex object in the box space was estimated using the proposed method, as shown in Fig. 11d.

Fig. 11
figure 11

Hardware platform test of the complex object

4 Conclusion

In this paper, the method of combining alpha-shape algorithm and improved fruit fly optimization is proposed to estimate the object posing in a 3D accommodation space. The proposed method uses the alpha-shape algorithm to establish the objective function, which uses the whole volume change to adjust the object pose. To obtain the best individual with 6 degrees of freedom, the iteration strategy of improved FOA chooses better half individuals to produce next half individuals. The experiments were performed setting parameters of improved FOA and considering the four space types, and the obtained results show that proposed method can obtain the object pose in the common accommodation space. Compared with previous work using convex hull, the new study using alpha shape could keep the object pose at the bottom of the accommodation space without limitation of gravity, which could meet the actual robot stacking. For an actual object placed in the enclosed space, the future study would increase the constraints (such as elasticity and collision) and adjust the parameters (or adds other algorithms) to shorten the estimation time. In addition, the objective function depends on the whole volume change of alpha shape and parameter α directly related to point density and the level of detail of the boundary. As the proposed method needs choosing an appropriate parameter α, segmenting point clouds of the accommodation space and building 3D object, our future work will focus on using adaptive alpha-shape algorithm and obtaining the whole scene point clouds and the space point clouds segmentation.