1. Introduction
Vision measurement is the application of computer vision [
1,
2] to measure and pose spatial geometries [
3] accurately. Vision measurement technology has the characteristics of non-contact, high measurement accuracy, fast response speed, not being easy to disturb, etc., and it has begun to be applied to various fields of industry in recent years [
4,
5]. According to the different selected features, machine vision pose measurement can be divided into cooperative target-based pose measurement and non-cooperative target-based pose measurement [
6,
7]. Currently, the cooperative target-based measurement method is the most extensive and mature [
8,
9]. However, this method often requires high-precision target mounting or a precise optical system to achieve high measurement accuracy in practical applications [
10,
11], which greatly limits the application scope of vision measurement [
12].
A monocular vision measurement method based on a cooperative target is proposed to aim at the shortcomings of the existing machine vision pose measurement techniques. A new directional planar target is designed. The target only consists of circles and rings, has simple design and processing advantages, and is easy to adapt and use [
13,
14]. The pixel coordinates of the feature points are also further optimised based on the rule of covariance of the circle centres [
15,
16]. The self-calibration of the camera is achieved by combining it with the Zhang method [
17]. The target is arbitrarily installed on the object to be measured when installing the camera. The relationship between the camera and the target is re-established through the camera’s self-calibration, and then high-precision measurement can be carried out [
18,
19,
20]. By solving the PnP problem, the corresponding rotation matrices of the reference image and the image to be measured are obtained.The transformation matrix is calculated using the two rotation matrices, and finally, the pose angle of the target can be obtained by decomposing the transformation matrix.
2. Materials
The imaging process of the new cooperative target under the pinhole imaging model is shown in
Figure 1 below.
Figure 1 defines four right-angle coordinate systems, of which
is the world coordinate system, the world correspond origin is set at the centre of the target centre circle;
is the camera coordinate system, where the
axis coincides with the optical axis of the camera lens. The camera coordinate system can be set at any pose in the world coordinate system, between which the camera’s external reference can convert it;
is the image physical coordinate system, with the centre of the image as the origin and the pose of the pixel expressed in physical length;
is the pixel coordinate system, which is described in pixels.
Assuming there exists a point
p on the image, in the pixel coordinate system, its x-axis value is denoted as
u, and the y-axis value is denoted as
v. Therefore, its pixel coordinates can be represented as
, the 3D coordinates of the corresponding spatial point
, and the chi-square expressions for both are
.
. Then, the projection relation between the two is shown in Equation (
1).
where
s is the scale factor;
is the external reference matrix, which is used to transform between the world coordinate system and the camera coordinate system, where
R is the rotation matrix,
t is the translation vector, and the pose of the target is one-to-one with its external reference matrix. The camera’s internal reference matrix,
K, is shown in Equation (
2).
where
represent the coordinates of the image’s principal points,
,
represent the pixel
axis and
focal length values in the direction of the axis, and
is the tilt factor of the image element axis.
Since the world coordinate system is on the target scale now, the
axis is perpendicular to the target coordinate, so the z-coordinates of the feature points will all be equal to 0. The three-dimensional coordinate point of
P can be represented as
and the corresponding chi-square coordinates are
. Define the first column vector of the rotation matrix
R is
, which can be expressed by substituting it into Equation (
3):
It is important for the feature points of the target itself to be all on the same spatial plane. When the camera captures the feature points, a homography transformation relationship exists between the spatial plane and the image plane, i.e., the spatial coordinates of the feature points can be transformed into pixel coordinates by a homography matrix. Let us define the homography matrix as
H. At this point,
H satisfies Equation (
4).
Substituting into Equation (
3), we obtain
which, due to the existence of scale invariance in the chi-square coordinates, yields an arbitrary scalar
. At this point it follows that
H is a
matrix defined by
, substituting into Equation (
5), we have:
The rotation matrix
R is an orthogonal matrix, their
and
are standard orthogonal bases, both mutually perpendicular and of modulus 1, i.e., satisfying
and
, which can be obtained by combining Equation (
6):
Two basic constraints on the camera’s internal reference have thus been obtained. Let
where
B is a symmetric matrix that can be expressed as a six-dimensional vector
Let the
ith column vector of the homography matrix
H be
; then, we have
In this way, Equation (
7) can be written as two chi-square equations with
b as an unknown
However, the coefficient matrix of Equation (
11) has only two rows, which is not enough to solve the 6-dimensional vector
b. Therefore, we need to shoot
n targets to obtain
n single responsivity matrices and coefficient matrices with
rows. When
, then we can solve for the 6-dimensional vector
b. That is, we need to obtain at least 3 pictures of the target to complete the calibration of the camera, and the total system of equations obtained at this time can be expressed as:
V is a matrix of coefficients.
At this point, Equation (
12) is scale-equivalent, i.e., solving for
b multiplied by any multiple is still the correct solution, so the matrix
B consisting of
b does not strictly satisfy
, but rather there exists a scale factor
that satisfies
. By deduction, all camera parameters can be calculated from Equation (
13).
For each image, the outer parameters
R,
t can be calculated by Equation (
14).
Among them, .
4. Results
To verify the feasibility of the algorithm and analyse the accuracy of the algorithm, the following experiments are carried out: the first group of experiments is a simulation-based camera calibration experiment, which is used to verify the superiority of the topology-based fitting intersection algorithm as well as the accuracy difference of the new type of directional targeting target compared with the traditional checkerboard targeting target in terms of the calibration of the camera parameters; the second group of experiments is the simulation experiment of the pose solving, which is used to verify the theoretical accuracy of the overall system in the absence of hardware errors; the third group is the actual shooting experiment of the pose solving.
4.1. Camera Calibration Simulation Experiment
Based on the 3ds Max platform, a set of simulation experimental systems is designed; its main function is to use the camera imaging principle without considering the lens aberration, image noise, mechanical errors generated during installation and mechanical errors in the plane movement to be measured, according to the input pose transformation value to be transformed from the zero-pose image to the image to be measured, to replace the actual painting, and by this paper’s method for camera. The camera is self-calibrated according to the method of this paper, and 20 images are simulated to prove the feasibility of the method of this paper by calculating the reprojection error value of each feature point in each image. In the simulation platform, the physical dimensions of the newly set target are 220 mm × 220 mm.
To verify the accuracy difference between the new directional target and the traditional checkerboard target in calibrating the camera parameters, and considering the potential ambiguity of the checkerboard target after a 180-degree rotation, a simulated checkerboard target with dimensions of specifications is generated. The physical size is 230 mm × 220 mm. Subsequently, 20 images are captured for camera calibration using the aforementioned simulation system, and the calibration results are calculated by the MATLAB Camera Calibration Toolbox.
The reprojection error values of each feature point in the 20 maps are expressed as Euclidean distances in pixels. There are 462 traditional checkerboard target feature points and 441 new directional target feature points. The experimental results are shown in
Figure 5. The green line represents the reprojection error value corresponding to each point in the 20 images of the traditional chessboard target, and the dark green line is the average of the 20 error values; The blue line represents the reprojection error value corresponding to each point in the 20 images of the new target without fitting and intersecting points, and the dark blue line is the average of the 20 error values; The red line represents the reprojection error value corresponding to each point in the 20 images of the new target after fitting and intersecting points, and the dark red line is the average of the 20 error values.
From the results shown in
Figure 5, it can be seen that the reprojection error of the traditional checkerboard target is still larger than that of the new directional target without fitting the intersection points; after the intersection points are fitted to the feature points, the value of the reprojection error is further reduced, and the difference in the reprojection error between individual points is also further reduced.
4.2. Pose-Solving Simulation Experiment
Based on the above simulation platform, after obtaining the calibration results, the corresponding pictures are generated by simulation according to the pre-set completed angles. The simulation experiment is mainly divided into two parts. One is to detect the error of the system itself in four different pose situations; the other is to calculate the repeatability error of the system by subtracting the system error after obtaining the results.
To cover more pose angles and to demonstrate the measurement range of the system, four prominent cases of simulated curves were selected: a smaller angle of rotation in the positive direction, a smaller angle of rotation in the negative direction, a larger angle of rotation in the positive direction, and a larger angle of rotation in the negative direction. Fifteen diagrams were generated for each case simulation for calculation. The simulation results are shown in
Figure 6,
Figure 7,
Figure 8 and
Figure 9, respectively. The line chart above represents the pitch angle error value, the line chart below represents the roll angle error value, the green line represents the average error value of the attitude angle, and the red line represents the maximum error value of the attitude angle. The combination of the angle of the upper and lower line graphs corresponds to complete attitude angle errors of the simulation graph.
From the results shown in
Figure 6,
Figure 7,
Figure 8 and
Figure 9, it can be seen that after using this method, the measurement error of the tumbling angle perpendicular to the optical axis direction is 0.001° at the maximum, and the average error is 0.00044° at the maximum; the measurement error of the pitch angle is 0.008° at the maximum, and the average error is 0.00551° at the maximum. The measurement error of the roll angle is generally much smaller than that of the pitch angle. The accuracy is inversely proportional to the pitch angle, and the angular error in the roll direction is more stable.
To further simulate the stability of the system to solve the pose, the two perspectives of roll angle 10°, pitch angle 5° and roll angle −20°, pitch angle −10° were further selected, and 15 pictures were simulated to be generated for each pose, respectively. Their average error was used as the system error. The error of each image subtracted from the system error value was used as the error of the repeatability accuracy. The experimental results are shown in
Figure 10 and
Figure 11. The line chart above represents the pitch angle error value, the line chart below represents the roll angle error value, the green line represents the average error value of the attitude angle, and the red line represents the maximum error value of the attitude angle. The combination of the angle of the upper and lower line graphs corresponds to the complete attitude angle errors of the simulation graph.
From the results shown in
Figure 10 and
Figure 11, it can be seen that after using this method, the maximum error of repeated measurement of the tumble angle perpendicular to the direction of the optical axis is 0.00008°, and the average maximum mistake of repeated measurement is 0.00003°; the maximum error of repeated measurement of the pitch angle is 0.0004°, and the maximum average error of repeated measurement is 0.00013°. It can be analysed that the repeated size for a short period has a very small variation of error, and the error is randomly distributed.
From the above simulation experiments, the method can theoretically achieve a very high pose resolution accuracy, the measurement range is large, and the repeat measurement accuracy is very high. However, the algorithm still has a small error in the extraction of the ellipse centre coordinates, by which the simulated camera calibration parameters obtained also have a certain error, which finally leads to the simulation results and the actual pose still having a certain error.
4.3. Pose Solving Practical Filming Experiments
The experimental system of monocular vision pose measurement built in the actual scene is shown in
Figure 12, and the testing platform is mainly composed of a two-axis rotary table and a camera. The model of the two-axis rotary table is PTU-E46-17P70T, the accuracy of the pitch axis is up to 0.003 degrees, and the accuracy of the tumble axis is up to 0.013 degrees. The camera chosen is a Basler ace2 a2A2840-48ucBAS, equipped with a Sony IMX546 CMOS chip, with a frame rate of 48 fps, a resolution of 8 megapixels, and horizontal and vertical pixel dimensions of 2.74 μm. The lens of choice is the Ricoh Lens FL-CC0616A-2M, with a focal length of 6.0 mm and an aperture of F1.4-F16.0.
According to the settings of the simulation experiment, the pictures are taken with the same pose. The actual shooting pose solution results are shown in
Figure 13,
Figure 14,
Figure 15 and
Figure 16. The line chart above represents the pitch angle error value, the line chart below represents the roll angle error value, the green line represents the average error value of the attitude angle, and the red line represents the maximum error value of the attitude angle. The combination of the angle of the upper and lower line graphs corresponds to complete attitude angle errors of the actual shooting graph.
From the results shown in
Figure 13,
Figure 14,
Figure 15 and
Figure 16, it can be seen that using this method after considering the camera imaging noise and the overall mechanical error, the measurement error of the tumbling angle perpendicular to the optical axis direction is 0.01911° at most, and the average error is 0.01423° at most; the measurement error of the pitch angle is 0.03077° at most, and the average error is 0.02085° at most. The measurement error of the roll angle is generally much more minor than the pitch angle’s, the accuracy is inversely proportional to the pitch angle, and the angular error in the roll direction is more stable. In addition to the large difference in accuracy, the change rule of the error is consistent with the simulation results.
Based on the simulation experiments, the two poses of roll angle 10°, pitch angle 5°, and roll angle −20 °, pitch angle −10 ° were further selected, and 15 pictures were taken for each pose. Their average error was taken as a systematic error. The error of each image subtracted from the value of the systematic error was taken as the error of the repeatability accuracy. The experimental results are shown in
Figure 17 and
Figure 18. The line chart above represents the pitch angle error value, the line chart below represents the roll angle error value, the green line represents the average error value of the attitude angle, and the red line represents the maximum error value of the attitude angle. The combination of the angle of the upper and lower line graphs corresponds to the complete attitude angle errors of the actual shooting graph.
The results in
Figure 17 and
Figure 18 show that the maximum error of repeated measurement of the tumble angle perpendicular to the direction of the optical axis is 0.00039°. The average maximum mistake of repeated measurement is 0.00014°, the maximum error of repeated measurement of the pitch angle is 0.0004°, and the average maximum mistake of repeated measurement is 0.0009°. The short-time repeated measurements have very small changes in the errors, and the errors are randomly distributed.
In real-world scenarios, due to factors such as environmental temperature, camera noise, and errors in the mechanical turntable, there are significant differences in accuracy compared to simulation results. The pose variation range selected in
Figure 13 and
Figure 14 is extremely small, making it difficult to discern the distribution of errors.
Figure 15 and
Figure 16 are influenced by external factors, resulting in less obvious error distributions. However, overall, the conclusions obtained from actual shooting are consistent with the simulation results. The angle error in pitch direction is inversely proportional to the pitch angle, while the angle error in the roll direction is relatively stable. The repeated measurement errors exhibit a state of random distribution.
5. Conclusions
Monocular vision measurement has been one of the hotspots in the industry due to the advantages of simple equipment and few operation steps. A pose measurement method based on machine vision and a new type of directional target is proposed to achieve high-precision measurement of object pose. The new directional target is designed and manufactured, and the object-matching algorithm for the target is implemented; the high-precision calibration of the camera is realised based on the target; various algorithms are introduced to improve the accuracy of the coordinates of the feature points and further reduce the error of the pose calculation.
Several experiments show that the camera calibration parameters obtained using this target are higher than those of the traditional checkerboard target; calculating the object’s pose can be achieved with high accuracy. The system has a simple structure, fast speed, high precision and stability.
The significance of the pose measurement method proposed in this paper is that the self-calibration of the camera can be realised without the help of external equipment, and then the high-precision pose data of the object to be measured can be obtained. The method in this paper can make the target for arbitrary installation, which further reduces the application requirements of the vision measurement method, simplifies the measurement steps while ensuring measurement accuracy, and broadens the application scope of monocular vision measurement.