1. Introduction
3D urban building models play an important role in the association, convergence and integration of economic and social urban data and have been widely used in various fields, e.g., smart cities construction, social comprehensive management, and emergency decision-making. Moreover, the rapid developments and increasing advancement in emerging industries, e.g., the self-driving industry, building information modelling (BIM) industry and indoor autonomous navigation, have created the need for more detailed and more accurate 3D building models. Consequently, the investigation of 3D building models is a significant issue for both industrial workers and researchers.
According to the international standard CityGML, a building model can be categorized into four levels of details in the city model: LoD1, LoD2, LoD3 and LoD4 [
1]. The LoD1 building model is a blocks model comprised of prismatic buildings with flat roof structures. A building model in LoD2 has differentiated roof structures and thematically differentiated boundary surfaces. A Building model in LoD3 means that the model has detailed wall and roof structures potentially including doors and windows. A LoD4 building model refines the LoD3 by adding interior structures of buildings.
Most current works mainly concentrate on the reconstruction of building models at the LoD2 and LoD3 levels. The generation procedure of LoD2-level building models is relatively mature and reliable [
2,
3,
4,
5,
6,
7,
8]. These models can be constructed fully automatically, in situations where data-driven or model-driven methods are used to extract roof structures and flat facades from airborne laser scanning and aerial image data. In contrast, the automatic and accurate reconstruction of LoD3 building models is difficult and challenging because of the complexity of urban building geometry and topology, especially when using only a single data source. The development of more remote sensing sensors and platforms, e.g., LiDAR scanning from terrestrial, mobile and UAV platforms and UAV oblique photogrammetry, provides a good opportunity to reconstruct more accurate LoD3 building models more efficiently. Elements such as windows, doors, eaves, balconies in LoD3 building models can be acquired by integrating the above multi-source remote sensing data.
Almost all contemporary reconstruction works of LoD3 building models using point clouds data are based on knowledge-based approaches. Wang presented a semantic modelling framework-based approach for automatic building model reconstruction, which exploits the semantic feature recognition from airborne point clouds data and XBML code to describe a LoD3 building model [
9]. Pu and Vosselman [
10] and Wang et al. [
11] reconstructed the detailed building elements using the knowledge about the area, position, orientation and topology of segmented point clouds clusters. Lin et al. [
12] proposed a complete system to semantically decompose and reconstruct 3D models from point clouds and built a three-level semantic tree structure to reconstruct the geometric model with basic decomposed and fitted blocks. Nan et al. [
13] developed a smart and efficient interactive method to model the building facades with the assembled “Smart Box”. The above methods take full advantage of semantic information derived from high-precision point clouds and benefit the reconstruction of the LoD3 building topology. Unfortunately, they mainly concentrate on the model integrity and topological correctness and usually have low accuracy in building elements. The accuracy of building elements, e.g., the position accuracy of corners and edges of building roofs and windows, is significant in many actual urban applications such as building illumination analysis. However, the point clouds are discretely sampled and easily affected by many factors, e.g., occlusion, field of view (FOV) and noise, their density is not homogeneous, and the data missing always occurs [
14]. Accordingly, that causes considerable difficulty in subsequent semantic recognition and edge extraction, particularly for buildings with complex local structures.
Photogrammetric sequential imagery data is another important data source for building model reconstruction. In recent years, thanks to the techniques of dense matching, triangulation and texture mapping, 3D scene models based on multi-view oblique images can be generated with a high automation level and excellent performance in feature details [
15,
16,
17,
18]. However, they are simply integrated as textured meshes and lack semantic knowledge. Some researchers have tried to use sequential images to model the building as piecewise planar facades [
19,
20,
21,
22,
23,
24]. Unfortunately, the problems of shadow, occlusion and texture lacking exist in the image data, resulting in both geometric distortion in local area and low accuracy. It is accordingly difficult to satisfy the requirements of refined 3D building model reconstruction.
The integration of point clouds and imagery data for detailed building modelling can help to solve the above problems [
14,
25,
26,
27,
28,
29]. Researchers have made some trials in the integration of multi-source data for reconstructing the building models. Kedzierski and Fryskowska [
30] discussed and analysed different methods for the integration methods of terrestrial and airborne laser scanning data, and evaluated their accuracies for building model reconstruction. Later, Kedzierski et al. [
29] proposed an automated algorithm to integrate the terrestrial point clouds and remote sensing images in the near-infrared and visible range, which achieves good accuracy results for cultural heritage documentation. Pu [
25] extracted feature lines from single-view close range images to refine the edges of building models. Huang and Zhang [
26] presented a method to reconstruct the building model, which uses airborne point clouds for roof surface extraction and aerial images for junction line detection. Wang et al. [
31] automatically decomposed the compound buildings with symmetric roofs into semantic primitives, by exploiting local symmetry contained in the building structure for LoD2 building model reconstruction. Gruen et al. [
27] developed a city modelling workflow, which adopted the UAV images to reconstruct the roof surfaces and terrestrial mobile mapping point clouds for the facades. Yang et al. [
28] combined the terrestrial LiDAR data and image data to achieve a better presentation of building model. These reported methods contribute significantly towards integrated building model reconstruction, but they all mainly focus on data registration and feature fusion, and the automation level of building model reconstruction could be improved.
In this paper, we propose an automatic multi-source data integrated method to reconstruct the LoD3 building model and improve the accuracy and efficiency of building model reconstruction. The method takes full advantage of the accurate planar surfaces extracted from the multi-source laser point clouds and uses them as objective constraints in the boundary extraction process from oblique images, which are then projected onto the planes to improve the reconstruction performance of building edges. The main contribution of our paper is extracting features from different data and using them as mutual constraints to reconstruct a detailed and accurate building model frame for further interactive operation.
The rest of this paper is arranged as follows.
Section 2 presents the methodology of the proposed method.
Section 3 describes the experimental data used to verify the performance of our method and the results are also presented in
Section 3.
Section 4 discusses our experiments and
Section 5 summarizes the conclusions.
2. Methodology
Figure 1 illustrates the main procedure of the proposed method. First, building roofs are extracted from UAV LiDAR scanning point clouds by normal vector clustering and segmentation. Using the building roofs as outline constraints, their corresponding facade areas in terrestrial ground point clouds are efficiently identified to obtain the integrated UAV and terrestrial points clouds belonging to the same building. Second, building plane primitives are extracted from the integrated point clouds by the Random Sample Consensus (RANSAC) algorithm [
32], which are further used as planar objective constraints during the feature line extraction from oblique images. Third, feature lines are employed to improve the reconstruction accuracy of building outlines during the process of topology construction and boundary optimization. Finally, the approaches of interactive topology editing and texture mapping are used to achieve the desired refined building models. It should be noted that the prerequisite is the fine spatial registration of multi-source point clouds and remote sensing images.
2.1. Building Segmentation from UAV LiDAR Scanning Point Clouds
The building points, that is, point clouds denoting the same building, are achieved by the segmentation of UAV LiDAR scanning point clouds, which involves the following two steps.
(1) Ground points elimination by CSF. The Cloth Simulation Filtering (CSF) algorithm [
33] is applied to filter out the ground point clouds. It uses a piece of cloth to generate a fitting surface via operations such as gravity displacement, intersection check, and internal force. The ground points are then filtered out by the height differences between the initial LiDAR points and the fitted ground surface. The parameters of CSF are easy to set, so it is accordingly adopted for filtering of our point clouds.
(2) Extraction of building roofs and facades. Building roofs and facades are mainly vertical or horizontal planes. Accordingly, using normal vector estimation of point clouds and the angles between normal vectors and their vertical directions, the horizontal points and vertical points from non-ground points can be identified using a simple normal angle thresholding scheme. The k-means clustering algorithm with Euclidean distance is adopted to group the labelled horizontal points and vertical points. Owing to the irregular distribution of normal vectors, the number of points and area of vegetation regions are limited to a relatively small range. Setting an appropriate threshold to the number of points in each cluster, the points of vegetation region can also be filtered out. The remaining points in other clusters are identified as building roofs and facades.
2.2. Plane Primitives and Feature Lines Extraction
Most building surface models could be divided into two parts: the multi-scale planes and their edges. For an accurate building reconstruction with high automation, two main steps are adopted to simultaneously improve the accuracy of surface position and edge integrity. The multi-source point clouds are used to ensure the surface accuracy, while oblique images are used for good edge performance. The details are explained as follows.
(1)
The extraction of building plane primitives from multi-source point clouds. The plane primitives are main features of a building frame structure, and hence the robust planar surface extraction is a significant step for the building model reconstruction. The primitive extraction is implemented in three steps. First, an approximate building bounding box is calculated from the building roof and facade points extracted from UAV LiDAR scanning point clouds, where building points within the bounding box are selected from the terrestrial scanning point clouds. Later, the RANSAC algorithm is used to extract plane primitives from the integrated point clouds belonging to the same building. The points are clustered according to the normal vectors at first place. To promote iterative convergence, the mean value of normal directions and their centroid are used as the initial plane parameter. Finally, the Alpha Shape [
34] method is used to extract the boundary of segmented planes because of its flexibility.
(2)
Feature line extraction from oblique images. The surface accuracy of building plane primitives extracted from the point clouds is high. However, owing to factors such as discretization, noise, and occlusion, the plane boundaries are jagged or even partially missing (e.g.,
and
in
Figure 2), especially for windows and doors. In contrast, the resolution of oblique images is much higher and uniform, and hence it can provide much more detailed and accurate edge information with greyscale variations in the images. Accordingly, we combine the image edge information with the plane primitives to make a complete surface model.
The procedure of image feature line extraction includes two steps: image selection and line matching. The image selection is to choose the most appropriate stereo image pair for the feature lines matching. Since the projection position of extracted planar surfaces is arbitrary, the closer the image is to its orthographic direction, the better its edge feature performance. To select the stereo image with the best imaging angle, each plane primitive is projected to the images. The top two images with the biggest projection polygon area are selected for the stereo lines matching. For example, the
,
and
in
Figure 2 represent the projections of the segmented plane
to the image
,
, where
is the normal vector of
. Since the comparison relationship of area between
,
and
is
(
indicates the area of projection plane), the image
are selected for the feature line extraction.
During the process of feature line extraction, the widely used Edge Drawing Line (EDLine) [
35] method is used to extract feature lines, and the Line Band Descriptor (LBD) [
35] is imported for the description of lines. It should be noted that the initial matching method is used to construct an adjacency matrix, and matching lines are then acquired from the consistency iteration computation. For the matching of building area in oblique images, because of the repeating textures and similarity of local greyscale distribution, it is difficult to obtain the correct matching lines. For example, the white lines in
Figure 3 represent the lines that failed to be matched, whereas the lines on the left and right image connected by a line with the same colour show the matching result. The yellow ellipses in boldface denote the incorrect matching pairs, and only a few lines are correctly matched.
Therefore, the extracted planes from point clouds are again used as objective constraints to improve the matching results. During the line matching process, the modification and improvement of the original algorithm is that the corresponding objective plane is employed to calculate the approximately reliable initial matching positions of feature lines on the reference image. As illustrated in
Figure 4,
and
represent the edge lines extracted from the projection area of objective plane
in the images
and
respectively.
denotes the forward intersection of
and
, and
is the back projection of
onto the image
, representing the initial matching positions of
. If
is the matching line of
,
should be close to
, and their LBD vector should be consistent with each other, which could make the matching result more reliable.
Moreover,
Figure 5 shows the main steps of the improved feature line matching algorithm. The extracted planes are first employed to select the image stereo with the best angle. After that, the EDLine method is used to extract the matching feature lines from the projection area. To obtain a better matching result, the planar surfaces obtained from the first step are again used as constraints to provide the initial positions for potential matchings.
2.3. Topology Graph Construction and Boundary Optimization
Through the abovementioned steps, the accurate planar surfaces and edges are acquired automatically. However, the boundaries of surfaces and extracted feature edges still need to be fused and optimized for a consistent boundary expression. To generate the initial building model frame with full automation, three steps including feature line projection, topology graph creation and boundary optimization are further employed in this section. They are explained in the following paragraphs.
(1)
Image feature line projection. The feature line extracted from images mainly includes the outline of walls, windows, and balconies. Because of the accuracy inconsistency between the stereo matching lines and the extracted planar surface, it is necessary to project the image feature edge line to the objective planes to establish the boundary constraints. As shown in
Figure 6, during the projection of feature line to corresponding segmented plane,
represents stereo forward intersection of matching lines, and
represents the result of projecting
along the plane’s normal vector
to the plane. The parameter
of the minimum projected distance (shown as
and
in
Figure 6) is used in order to eliminate the possible incorrect matching, as well as a threshold
to determine whether the direction of the line is approximately perpendicular to the normal vector of the projected plane.
(2)
Topology graph creation using RAG. Using the plane segmentation in
Figure 7a, the topology graph RAG (Region Adjacent Graph) is constructed according to the adjacency relationship of plane primitives. Two steps are involved in the construction of RAG. The first step is to compute the adjacency matrix in
Figure 7b, which is a symmetric matrix with diagonal elements as zero. All the non-zero elements in the matrix are then extracted and transformed into an undirected graph, as shown in
Figure 7c. The boundaries indicate that two planes are adjacent and have intersections. The procedure of building the adjacency relationship is as follows. The angles between the normal vector of the planes is calculated. If the angle is smaller than the threshold (normally set to be 15 degrees), it means that the two planes are nearly parallel and not adjacent. After that the boundary of segmented plane primitives is traversed with short connected lines. The short boundary lines in the plane
are traversed to calculate the project distances
and
from the pairwise line endpoints to the plane
, as well as the line length
. If
and
are both smaller than the distance threshold
, and the projected point along the normal vector of plane
is within the range of plane
, the short line is adjacent to plane
and is added into the set of adjacent boundary lines. Finally, all the short edge lines in the plane
are computed and the sum length
of lines in
is calculated. If
(
is the threshold of length), the planes
and
are adjacent.
(3)
Intersection and boundary optimization. The intersection of the two planes is calculated using the constructed RAG graph. Two cases when two planes intersect with each other are considered in the intersection processing, as shown in
Figure 8a. In
Figure 8a,
and
represent two adjacent planes. The endpoint of the intersection in each plane is determined by the distance from the original boundary point to the intersection line (
Figure 8b). The boundaries of each plane are divided into three categories: intersecting boundaries, non-intersecting outer boundaries and non-intersecting inner boundaries. For example, in
Figure 8c,
,
and
represent non-intersecting outer boundaries,
represents the intersection boundary, and
,
,
and
represent the non-intersecting inner boundaries. The optimization of intersecting boundaries is conducted when more than two planes intersect with each other. The endpoint of an intersected boundary is used to search for adjacent endpoints of intersecting boundaries in other planes. If the adjacent boundaries belong to planes within the minimum closed loop of the RAG graph, the boundaries are supposed to intersect at the same point. The intersection point is further calculated using the least squares method (shown in
Figure 8d).
Non-intersecting outer boundaries optimization is used to integrate the original boundaries from point clouds with the projected image feature lines. If the original boundary lines are close enough to the image feature lines, they are replaced by the projected feature line as the new boundary line section. This process is also applied to the non-intersecting inner boundaries such as a window. The boundaries are further regularized to ensure that the result is consistent with the actual shape of buildings, where the main direction adjustment method proposed by Dai et al. [
36] is used. It should be noted that the optimized intersecting boundaries should be kept unchanged, and only the non-intersecting outer and inner boundaries are processed.
2.4. Interactive Model Editing
The main building body including the roofs, main facades, and window boundaries could be extracted and reconstructed with the preceding pipeline. However, there are still some small problems such as topological errors or integrity deficiency because of local data missing or topology complexity. Accordingly, the interactive manual editing., e.g., further topological inspection and geometrical detail adjustment, is still indispensable for reconstructing a refined building model. In this study, we use the software 3D Max for further interactive model editing and obtain the final detailed building model.
4. Discussions
3D building reconstruction using multi-view oblique images can achieve fine scene restoration, but some obvious geometric deformation exists in certain local details, especially in the texture-less areas. The LiDAR sensors sample the building points discretely and the data of building edges cannot always be obtained, the precision of a building model reconstructed from point clouds manually is also limited. Compared with the reconstruction results from multi-view oblique images and only point clouds, the horizontal and elevation precisions of the building models from the proposed method are significantly improved. These improvements mainly benefit from the boundary-feature constraint of non-intersecting outer and inner boundaries, especially for the walls and windows.
In regard to efficiency, the reconstruction method using multi-view oblique images is time-consuming because of its dense matching and mesh reconstruction process. The manual 3D Max reconstruction method needs to collect boundary feature lines from complex point clouds interactively, which is labour-intensive, especially when occlusions and ambiguities occur with big probability. The presented method integrating multi-source data takes full advantage of the feature lines extracted from the combination of point clouds and images, and reconstruct the building main frame automatically and accurately, providing a more reliable reference for the later interactive editing. Thanks to the automatically reconstructed building main frame, the reconstruction efficiency of the proposed method is considerably promoted.
The proposed method provides an alternative way for the automatic and rapid reconstruction work of the building’s main outline, but the results still need interactive editing. Accordingly, more efforts are required to develop advanced methods to achieve a fully automatic reconstruction method of LoD3 building models in the future. Besides, we, in the future work, will propose multi-scale segmentation schemes of smooth surface primitives and enrich the existing primitive topology processing rules in the method to further improve the generalization and automation for complicated building structures.