1. Introduction
Nowadays, establishing methods to extract relevant knowledge from abundant information in big data is very challenging. Data mining technologies have alleviated the issue of extracting effective information from jumbled data by proposing big data processing models that discover certain characteristics [
1], such as pattern recognition and clustering analysis. Clustering is one of the most prominent data mining methods used for mining spatial information. It processes data by analyzing its spatial characteristics; spatial clustering [
2] has been shown to perform well in various disciplines [
3,
4,
5,
6,
7], including detecting crime hotspot distribution in crime analysis, identifying disease outbreak patterns related to public health problems, determining climate in the context of meteorological phenomena, detecting earthquake distribution in geological exploration studies, and determining the ecological landscape pattern in the ecological field. On the other hand, spatial clustering can be used as a preprocessing step for other data analysis. For example, it may be used for generating objects in high-resolution remote sensing image classification, solving small sample problems in rare events, reducing data redundancy in geographic data visualization, and identifying groups in cartographic synthesis. In addition, a large proportion of spatial data for polygon clustering can be used to generalize maps at different scales [
8], watershed analysis, drought analysis, and spatial epidemiology [
9]. Hence, clustering is a vital technique for spatial data analysis and other related applications.
Spatial clustering approaches are divided into six categories: partition-, hierarchy-, density-, grid-, graph-, and model-based [
10,
11,
12,
13,
14,
15,
16]. Although the categories vary widely, they are inseparable from similarities. As the similarities between spatial polygons are fundamental for clustering [
17], exploring the influence of indices on similarity are abundant. The research initially only considered single properties [
18], and investigations into multi properties arose later [
19]. To date, multi properties investigations have been more recognized [
15]. Geographical configuration and the spatial cognition theory [
20] can handle polygons that have more regular and simple shapes. Additionally, work performed in [
17] proposes a multi-level graph partitioning approach for clustering polygons, which can handle more generic polygons with irregular and complex shapes. The aforementioned studies accurately define spatial similarities between polygons, thus achieving better spatial clustering, indicating that similarity-based investigations that express the relation between polygons well are vital in spatial clustering.
Various spatial properties (such as area, orientation, and shape, etc.) have been used as indices for measuring similarities between polygons. However, rigid conventional similarity mechanisms (by ratio or difference) still limit the process of depicting the relation between spatial objects. To resolve this issue—that the similarities between spatial objects measured by conventional similarity approaches lose details—similarities should be calculated “softly” [
21], in a fuzzy set (FS) manner [
22]. Intuitionistic fuzzy sets (IFS) [
23], i.e., the generation of FS, can describe objects more realistically and practically. IFS extends the concept of FS by defining non-membership and uncertainty, as well as origin membership [
24], thus improving the objects’ express ability and making them more widely applicable across disciplines [
25]. Research conducted into the IFS approach has shown that the similarities/distances vary widely for the different generated approaches. However, existing IFS measures may generate unreasonable results when applied to specific situations [
26], indicating a limit to the bounds of IFS applications. To avoid these drawbacks, applying appropriate IFS measures for depicting real world objects is essential. Beyond the four common geometric model-based IFS measures [
27], the Interpolative Boolean Algebra (IBA) approach [
28] with a solid mathematical background has advantages in describing objects. Details of the similarities measured using the IFS-IBA approach are preserved between objects, consistent with the approach of selecting more proper indices to acquire more crucial details, allowing us to measure similarities between spatial objects. Multiple studies have supported the descriptive power of the IFS-IBA approach [
29], through which similarity detection between polygons can be further improved.
In this paper, we propose an extended IFS-IBA (Extend Intuitionistic Fuzzy Set-Interpolation Boolean Algebra (EIFS-IBA)) similarity approach to measure the similarities between polygons and discover their clustering patterns. In this model, we first fuzzified the polygon’s extracted properties (such as area, orientation, and length–width ratio, etc.) as indices and used them to measure similarity. We then built adjacency graph models (that further contain distance and connectivity) between the adjacent polygons, with corresponding similarities that were measured using the EIFS-IBA similarity approach. Finally, the obtained similarities were employed to complete the clustering. Compared with conventional similarity approaches, EIFS-IBA exhibited stronger information expression capabilities when depicting the similarities between adjacent polygons, which is beneficial for producing more reasonable clustering results.
The remainder of the paper includes the following:
Section 2 introduces the methodology, including IBA theory application in polygons, the EIFS-IBA similarity approach, and the evaluation approach.
Section 3 covers the experimental results and analyses, including experimental data.
Section 4 discusses the advantages of the proposed EIFS-IBA similarity approach. Finally,
Section 5 includes the concluding remarks and an outlook on future work.
4. Discussion
It is essential to describe polygon property information accurately as a condition of differentiation during polygon clustering. As the conventional similarity approach simply handles the similarity properties, it is difficult to include detailed feature information representing polygons. To resolve the issue, we have proposed the EIFS-IBA similarity approach, which is very flexible and has some outstanding advantages over ConS methods. First, we dealt with the properties of a single polygon, which is consistent with the way that humans or computers come into contact with city scenes. In the fuzzy similarity, the geometric spatial information of the direction, shape, and size of the polygon was added to improve the information richness. This information can better express the polygon attributes and establish a more accurate attribute relationship for clustering so as to obtain better clustering results. Secondly, the EIFS-IBA similarity approach has a strict mathematics foundation which is derived from the argument of equivalent substitution and is logically rigorous. More importantly, the EIFS-IBA similarity approach has strong expressive ability and can accurately describe the relationships between polygon entities, which is more advantageous than conventional ways that calculate it from ratios or differences.
The fuzzy set theory is relatively mature and performs well in clustering, partitioning, and pattern recognition. However, the application of clustering in geographic information systems is relatively rare. In this paper, we have proposed the EIFS-IBA similarity approach that has a strong capability of information expression and integration to measure polygon similarities. The experiments showed that the similarities acquired by the EIFS-IBA similarity approach have a good effect on clustering. However, the effectiveness of the EIFS-IBA similarity clustering experiment is still affected by the following factors: first, in fuzzy set application, we only adopt the currently applied mature degree of membership and non-membership, and do not apply the third uncertainty index of the fuzzy concentration, which will affect the powerful information expression of the EIFS-IBA approach to a certain extent; second, although the spatial attribute features used in this paper are rich, there may still exist other potentially more effective attribute indices, such as POI, etc. Even multiple attributes may interact with each other to further affect clustering.
5. Conclusions
Polygon clustering is one of the most important tasks of data mining. Most of the current similarity calculation approaches for clustering mining experiments remain at a certain level, and do not further explore the potential of similarity approaches. On the one hand, the mechanisms of current similarity approaches are quite primitive and the acquired similarity cannot show details of the similar part. On the other hand, current mathematical sciences have reached a higher level in the study of similarity approaches, which has explored the advanced similarity that can express the additional detail of similar parts. However, there are fewer theories applied to geographic information systems. The major contribution of this work is the designed EIFS-IBA similarity approach, which can measure similarities between polygons.
This paper overcomes the drawbacks of the conventional IFS-IBA approach that cannot measure the spatial relation between spatial objects. In this paper, we first extracted spatial properties (such as area, shape, and orientation, etc.). Then we applied IFS-IBA to measure the properties of spatial objects and measure the additional similarities between spatial objects (length and connectivity). Finally, we conducted spatial clustering with the weight similarity between spatial objects. Both the visual result and evaluation criteria demonstrate that the EIFS-IBA similarity approach can partition complex polygons in accordance with visual recognition results. In addition, our proposed EIFS-IBA similarity approach is expressive and therefore can be applied to many geographical information analyses which utilize similarity. Furthermore, we will also explore the impact of the uncertainty in the EIFS-IBA similarity approach, and of membership and non-membership on the ability to express similarities in geographic information. In future work, we aim to explore cluster analysis tools that are more conducive to mining hidden information in spatial data.