5.2. Variables of Statistical Units
Taking 25 km × 25 km grids as statistical units, the variables (Pw, Pd, and Dw) of a total of 227 units are analyzed from a spatial perspective.
The river network density
Dw is employed in every unit as the natural index of the river network. The river networks identified by each unit are assigned an index
Dw following Formula 2. The spatial pattern of the river network over the study area is shown through the choropleth map in
Figure 3a, which indicates that the regions with dense river networks are mainly concentrated in the central and southeastern parts of the study area, with a smaller number in the northern part.
Acting as regional indexes of water-related toponyms and direction-related toponyms, the
Pw and
Pd of each unit are calculated according to Formula 1 as mentioned above. The spatial pattern of the
Pw and
Pd values is generally consistent with the river distribution as shown in
Figure 3b,c. The grids with a high value of
Pw and
Pd cluster around the southeastern part of the study area, and the distribution of
Pw values is more consistent with the river network density. Thus, grids of dense river networks appear to contain dense water-related and direction-related toponyms as well.
Accordingly, evidence of a strong association between the naming of toponyms and the river networks in Hubei Province can be provided by correlation analyses between Pw and Dw and between Pd and Dw.
5.3. Correlation Analysis
To reveal the statistical disparity between the two pairs of variables (i.e.,
Pw and
Dw and
Pd and
Dw), the mean value, standard deviation, maximum value and minimum value are calculated and listed in
Table 5.
To study the spatial disparity in the relevancy between
Pw and
Dw and
Pd and
Dw and validate the feasibility of our reconstruction, the GWR method is performed to analyze the global and local relationships. In the social sciences, correlation coefficient values over 0.6 represent a strong correlation between two variables, and a strong relationship is indicated by R
2 values over 0.36 [
36]. Taking
Pw as the consequent variable and
Dw as the argument, the residual square of the model is 2.063 and the global adjusted R
2 value is 0.620; thus, these findings reveal that the observed data present a good overall fit with this model. For
Pd and
Dw, the model also reveals a strong correlation, with a residual square of 0.005 and a global adjusted R
2 value of 0.576.
To discuss the local correlation in each unit, Choropleth maps are shown in
Figure 4 to visualize the local R
2 distribution by dividing the local model fit into several levels to highlight counties with high local R
2 values. Both maps demonstrate a spatial variation in the degree of model fit between the predicted and true values. The central to southern part of the study area exhibits a good fit since the dense river network greatly influences the spatial distribution characteristics of water-related and direction-related toponyms; that is, a denser river network in a particular region will have a more influential effect on the associated place names. Therefore, we can safely promote the opinion that toponyms are strongly affected by the presence of a rich river network and the water environment is recorded in the nomenclature throughout Hubei Province, especially the central part.
5.4. Reconstruction of the Historical River Network
Taking advantage of the properties of Thiessen polygon edges, we can extract the common lines between water-related toponyms that reference opposite directions, and thus, we can proceed to a reconstruction of the historical river shape. Thiessen polygons are generated according to the positions of the toponyms in the DRTD, after which they are converted from polygon boundaries to lines. Those lines within Hubei Province are clipped and reserved, the results of which are shown in
Figure 5a. The spatial pattern of toponyms in the DRTD reveals that they are more densely gathered in the eastern part of the study area and more dispersed in the west; that is, the results of extracting lines from the Thiessen polygons appear similar to the spatial distribution of the polygons. In the south-central and southeastern parts of the study area, we can observe dense and clustered lines extracted from a dense distribution of relatively small Thiessen polygons with more edges; meanwhile, the lines in the western parts exhibit contrasting features.
Figure 5b shows the spatial distribution of lines extracted from the Thiessen polygon edges.
The lines extracted from the Thiessen polygon edges (
Figure 5b) exhibit a similar spatial density as the river networks, but still present a disorderly distribution over the whole area, which obscures the true river orientation. To reconstruct the specific historical river shape, an optimization approach should be utilized to eliminate redundancy within the results. We obtain an appropriate number of clusters by performing hierarchical clustering on the basis of the Euclidean distances between water-related toponyms referring to particular river levels. In our experiment, we try to reconstruct the shapes of the main rivers (i.e., the Yangtze River and the Han River) in Hubei Province by water-related toponyms incorporating the word “Jiang”, which definitely represents a first- or second-level river in Chinese. The toponyms of the WRDT containing this key word are selected as inputs for the hierarchical clustering, and the tree diagram of the analysis is shown in
Figure 6. According to the tree diagram and to show the general trend, we first group the toponyms into 7 clusters via a clustering analysis based on spatial Delaunay triangulation constraints. As shown in
Figure 7a, those 7 groups of clusters can be divided into several parts to show parts of the river orientations within the clusters. Subsequently, the toponyms are grouped into 12 parts to reveal more details of the river shape in
Figure 7b.
Adding layers from the grouping analysis for “Jiang” to the extracted results in
Figure 5b, we can draw continuous first- and second-level historical river skeletons from the overlay. When generating continuous reconstruction results, we only utilize extracted lines of
Figure 5b within two standard deviational ellipse polygons (describe the spatial characteristics of geographic features: central tendency, dispersion, and directional trends) of the above clusters to ensure that these lines definitely indicate the orientation of our target rivers. Due to the terrain characteristics in China, where the land is topographically high in the west and low in the east, the rivers and streams mostly run from the west to the east. Therefore, the junction between the toponyms containing “Jiang” and the extracted results of
Figure 5b should be operated according to the order of abscissa values in addition to the spatial continuity. Lines closer to toponyms with the key word “Jiang” are preferentially connected to form the reconstructed result. Combined with manual identification and the application of qualitative knowledge, the reconstruction results can be modified to avoid fragmented rivers.
The final reconstruction result for the historical river shape is acquired after applying a curve smoothing algorithm to the lines handled through the abovementioned methods to smooth the original curves and eliminate noise effects.
Figure 8 shows the reconstructed first- and second-level rivers. Some of the reconstruction results for the Han River deviate from the known path of the river. For instance, in boxes A and B, the reconstructed Han River exhibits more zigzag features; moreover, in box C, two lines intersect the Yangtze River. However, the reconstructed shape of the Yangtze River is almost identical to the present-day path, and even the geometrical characteristics at the county level are restored in spite of slight differences in box D. In addition, some redundant lines cover lands that do not currently possess first- or second-level rivers. Taking a holistic view of the results in
Figure 8, the reconstruction of the river shape in the south-central and southeastern parts of the study area overall exhibit a good quality, and they are coincident with the present-day river pathways where there are numerous, densely concentrated water-related and direction-related toponyms. Meanwhile, there are many redundant lines in the southwest and northeast. Considering the correlation analysis results in
Figure 4, the reconstructed results around the central and southern part of the study area are more credible.
The misfits of the river shapes in boxes A, B, C and D in
Figure 8 may be caused by two reasons summarized below.
1. No adequate toponyms:
The lines in boxes A and B zigzag with many angularities despite the curve smoothing process. These differences could result from an insufficient number and sparse distribution of toponyms for integrating the distribution of lines extracted from the Thiessen polygons with the grouping analysis results shown in
Figure 7. Therefore, these angular lines are not the result of landscape changes. Moreover, there are no records about such events in the local chronicles of these regions. These differences should be regard as dynamics with low credibility, and only provide possible overall trends. Additionally, topographic and geomorphic features are often prominent in regions with sparse toponyms. Historical changes in that place can be extracted from DEM analysis as a detailed supplement to solve the problem, or more on historical records.
2. River dynamics:
With the dense distribution of toponyms in these areas, the misfits in boxes C and D are considered the results of river dynamics, which can also be confirmed by historical maps and local chronicles. With regard to the Han River, box C demonstrates a change in the location where it enters the Yangtze River. Although this location is under dispute, the map of the river course in
Figure 9 changes around the Han River [
37] and shows the same river orientation as our reconstructed result in this region. Moreover, the misfit in box C can also be linked to river network changes according to an ancient book, namely, the History of Ming. Box D shows some lines of the Yangtze River that are generated by a natural curve cut-off phenomenon that forms oxbow lakes. The evidence of these dynamics can be observed in remote sensing images of Shishou and Jianli County, where many oxbow lakes are present.
To explore and verify the river dynamics around box D in
Figure 8, we digitize ancient maps from different dynasties collected in The Historical Atlas of China [
38] and map the main rivers belonging to the current Hubei Province. A comparison of the reconstructed rivers with the historical rivers in
Figure 10 shows that the river dynamics in different times are integrated into the reconstructed results. The spatial–temporal characteristics of toponyms are recorders of landscapes, with each of them describing an event of a certain period, and overall, they indicate historical dynamics.
As for the redundant lines and geometrical angular offsets, they may be caused by the extracted lines of the Thiessen polygons. The generated Thiessen polygons rely substantially on the distribution pattern of all toponyms in the DRTD; thus, an uneven distribution of place names referencing different locations can lead to varying densities of Thiessen polygons. The lines extracted from the Thiessen polygons will be more detailed and have fewer angular offsets with a denser concentration of direction-related toponyms in the study area; thus, the result is more likely to approach the real river shape. The extracted lines are used to reveal possible river orientations around our target rivers; therefore, lines describing other rivers are also integrated in the reconstructed results, which inevitably leads to redundant lines. Meanwhile, the extraction of lines from pairs of Thiessen polygons requires not only a balanced spatial distribution of place names representing opposite directions but also similar counts in different directions. Although there are toponyms representing every possible direction in our study area, this is not the case for some marginal units. For instance, some direction-related place names may be concentrated in one place; alternatively, they may not be distributed along the entire trend of the described river or may be situated on only one side, resulting in discontinuous and scattered extraction results.
Our analysis is atemporal, because toponyms we based on are current data have no additional temporal information to refer to a specific period, but do preserve changes of the past. In this case, the result shows river network dynamics for all time periods. If this method is applied to toponyms of the same period, changes before this time can be extracted.