2.3.1. Population Density Index
First, we calculated individual population density based on a three-dimensional activity space area. This is the core step, and the main difficulty of this algorithm is converting the mobile signaling data based on cell phone base stations into user spatial distribution data based on land use (lot). Deville et al. proposed an algorithm to calculate the dynamic population density of different administrative regions by taking the land area as the distribution weight in his population density study in southern Europe using mobile phone data [
45]. This algorithm overlaps the boundaries of administrative regions with the boundaries of base station districts. Assuming that the population in each base station district is evenly distributed on a two-dimensional plane, the population of each superimposed block is calculated. Then, the population of superimposed blocks belonging to the same administrative region is summarized to obtain the population of each administrative region. The algorithm proposed in this study is further improved using Deville’s method and uses a three-dimensional active space area instead of a two-dimensional land area as the allocation weight. To make the distinction, this algorithm is called the algorithm based on the three-dimensional active space area, whereas the algorithm based on the plane area used by Pierre and others as the allocation weight is called the algorithm based on the two-dimensional area.
The three-dimensional activity space defined in this study refers to the activity space of the main population in the city. This space is composed of architectural space and outdoor public space (but does not include public space that is difficult for people to use, such as open water or vegetation). For specific urban areas, the three-dimensional active space area can be expressed as follows,
or
where,
A is the three-dimensional activity space area,
A0’ is the outdoor activity area of the area,
A0 is the total construction area,
A’ is the area, and
Ag’ is the total floor area.
This study defines individual behavior density as the number of individuals moving on the land per unit area at a certain time. The following mobile phone user behavior density algorithm based on three-dimensional activity space area is proposed, and the formula is as follows.
Here, represents the base station cell number j, represents the land number i, is the density of mobile phone users in area , represents the area of land in area , represents the number of mobile phone users at a certain time in the base station cell , represents the area of the three-dimensional moving space of the base station cell , represents the area of the three-dimensional moving space in the overlapping area formed by the area , and the base station is .
This algorithm is applied to calculate the signaling data in the
n period of the mobile phone base station with a quantity of
j within the scope of the study. The data format with the base station as the unit is effectively transformed into the data format with the land as the unit. The format of the obtained behavior density data is shown in
Table 2 below.
2.3.2. Linear Regression Analysis
Linear regression was employed to examine the correlation between public service facilities distribution and population density. Linear regression based on data from a large number of experiments is widely used as a metering method used to determine interactions between variables, the levels of influence of variables, and the static rules underlying numerical distributions. A geographically weighted regression (GWR) is normally an appropriate method for processing spatial nonstationary data. However, the correlation between the block plot ratio and population dynamic density in the different urban areas in this study is not caused by their different spatial locations, but by their different land use properties (land for public service facilities and land for non-public service facilities). As the distribution of land used for public service facilities and that for non-public service facilities is not a function significantly related to space, it is not necessary to discuss the nonstationarity of the space. Therefore, GWR or other spatial analysis models are not used in this study. Establishing a model for linear regression requires the following conditions.
The independent variables refer to nonrandom variables that are not interrelated, namely, ;
Random error terms are independent of each other and follow a normal distribution with the expectation being zero and the standard deviation σ; namely, ; and
sample number is more than the number of parameters, namely, n > p + 1
Based on the theoretical hypothesis above, a model is created as follows,
where
y is the dependent variable,
x is the independent variable,
interceptis the regression intercept,
is the regression coefficient, and
is the random error.
The linear regression model adopts the significance test based on the regression coefficient R2, which reflects the reasonableness of the independent variables. The regression does not pass the significance test if the test statistic t is less than the critical value, and vice versa. Those variables with coefficients that do not pass the test should be eliminated based on actual conditions, which is a widely used method in the choice of independent variables.
In this study, the linear regression equations were assessed based on R2 and standardized coefficients. R2 stands for the proportion of explicable part of sample data in regression equation. The larger the proportion is, the closer R2 is to 1, which refers that more samples in the regression equation can be explicable, and the model will be more accurate. When the multiple regression method R2 is between 0.8 and 1, it means that the goodness of fit of the model is relatively high. Given the large differences in complexity and accuracy between micro and macro data, appropriate changes in the evaluation criteria should be considered. If R2 is between 0.5 and 0.8, the goodness of fit of the model is considered reasonable. The regression coefficient is the coefficient after eliminating the effects of the units of dependent variable y and independent variable x. The value of regression coefficient directly reflects the effect of x on y. Thus, the larger regression coefficient a is, the greater influence of x has on y. If regression coefficient is positive, y increases with increasing x; if it is negative, y decreases with increasing x.
The relationship between urban population density distribution and the FAR of (non-)public service facilities is analyzed by adopting an ordinary least square (OLS) model. The FAR the FAR of public service facilities , and the FAR of non-public service facilities were selected as the main dependent variables in our regression model. Daily population density the day-time (7 a.m.–6 p.m.) population density , and the night-time (7 p.m.–6 a.m.) population density were chosen as the main independent variables. The day-time population density and night-time population density represent the population densities at 4 a.m. and 2 p.m., respectively.
Table 3 defines all variables involved in the regression analysis, including the meanings these values represent and their measurement units.
As shown in
Table 2,
represents the FAR (i.e., the ratio of total construction area to land use area) of each area. For public service and management lands, land for commercial and service facilities, and mixed-use land (commercial and residential mixed areas are not included here),
is equal to that of the FAR, while
is 0. For land for non-public service facilities (e.g., residential land), industrial land, and land for warehouses,
is equal to that of the FAR
, while
is 0. For commercial and residential land,
and
, respectively, represent the ratio of commercial land to residential land in mixed-use land. The independent variable
stands for daily population density;
is the day-time population density from 10 a.m. to 11 a.m. and
is the night-time population density from 4 a.m. to 5 a.m.