Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 Contents lists available at SciVerse ScienceDirect Spatial and Spatio-temporal Epidemiology journal homepage: www.elsevier.com/locate/sste Quantifying the magnitude of environmental exposure misclassification when using imprecise address proxies in public health research Martin A. Healy, Jason A. Gilliland ⇑ The University of Western Ontario, 1151 Richmond St. N, London, Ontario, Canada N6A 5C2 a r t i c l e i n f o Article history: Available online 11 February 2012 Keywords: Geographic information systems Geocoding Accessibility Environmental health Public health a b s t r a c t In spatial epidemiologic and public health research it is common to use spatially aggregated units such as centroids of postal/zip codes, census tracts, dissemination areas, blocks or block groups as proxies for sample unit locations. Few studies, however, address the potential problems associated with using these units as address proxies. The purpose of this study is to quantify the magnitude of distance errors and accessibility misclassification that result from using several commonly-used address proxies in public health research. The impact of these positional discrepancies for spatial epidemiology is illustrated by examining misclassification of accessibility to several health-related facilities, including hospitals, public recreation spaces, schools, grocery stores, and junk food retailers throughout the City of London and Middlesex County, Ontario, Canada. Positional errors are quantified by multiple neighborhood types, revealing that address proxies are most problematic when used to represent residential locations in small towns and rural areas compared to suburban and urban areas. Findings indicate that the shorter the threshold distance used to measure accessibility between subject population and health-related facility, the greater the proportion of misclassified addresses. Using address proxies based on large aggregated units such as centroids of census tracts or dissemination areas can result in very large positional discrepancies (median errors up to 343 and 2088 m in urban and rural areas, respectively), and therefore should be avoided in spatial epidemiologic research. Even smaller, commonly-used, proxies for residential address such as postal code centroids can have large positional discrepancies (median errors up to 109 and 1363 m in urban and rural areas, respectively), and are prone to misrepresenting accessibility in small towns and rural Canada; therefore, postal codes should only be used with caution in spatial epidemiologic research. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Recent advances in the analytical capacity of desktop geographic information system (GIS) software, combined with the increasing availability of spatially-referenced health and environmental data in digital format, have created new opportunities for making breakthroughs in spa⇑ Corresponding author. Tel.: +1 (519) 661 2111x81239; fax: +1 (519) 661 3750. E-mail address: jgillila@uwo.ca (J.A. Gilliland). 1877-5845/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.sste.2012.02.006 tial epidemiology (Zandbergen, 2008). As digital mapping is an abstraction of reality, the spatial data used for visualizing and analyzing geographic phenomena will always be inaccurate to some degree. Such inaccuracies can be compounded when spatially aggregated units are used as locational proxies for mapping and analyzing spatial relationships, rather than more precise geographic locations. In environmental and public health research, it is common to use proxies for sample unit locations, such as centroids of postal/zip codes, census tracts, dissemination areas, blocks, or lots; however, it is very uncommon for 56 M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 studies to address, or even mention, the potential problems ensuing from the positional discrepancies associated with using imprecise address proxies. It is the responsibility of the researcher to identify, quantify, interpret, and attempt to reduce any errors associated with using particular spatial data and locational proxies, so that they do not interfere with any conclusions and recommendations to be made from the findings (Fotheringham, 1989; Anselin, 2006). Researchers in spatial epidemiology have long been concerned about the absolute or relative spatial accuracy of the address points used to map sample populations or phenomena within a GIS (Goldberg, 2008). Numerous researchers have examined the ‘positional errors’ which occur when the address from a database is located on a digital map, but the point is not located at the true position of the address (Cayo and Talbot, 2003; Ward et al., 2005; Schootman et al., 2007; Strickland et al., 2007; Zandbergen and Green, 2007; Jacquez and Rommel, 2009). In many previous studies, positional errors are reported as Euclidian distance errors, or errors in the X and Y dimension. While much has been said about positional errors, much less has been said about how study results might be affected when researchers use spatially aggregated units (which themselves might be positionally accurate) as address proxies. Very few studies measure and compare the positional discrepancies between address proxies and the exact address they are used to represent (Bow et al., 2004). A major area of investigation in the fields of spatial epidemiology, health geography, and public health attempts to assess the levels of accessibility or ‘exposure’ of subject populations to elements in their local environments that are believed to be health-promoting or health-damaging, and are related to certain health-related behaviors or outcomes. Accessibility is typically measured in relation to the distance between subject populations and selected environmental features, and is often operationalized as a binary variable (i.e., accessible/inaccessible, exposed/not exposed) or a density variable (i.e., number of sites within, volume of contaminant within) in relation to an areal unit or ‘buffer’ of a certain threshold distance (radius) around the subject’s address. There is much variability, but unfortunately not much debate, regarding the particular threshold distances to be used in accessibility studies; however, most authors do attempt to justify their choice of threshold distances based on human behavior (e.g. ‘walking distance’) or perhaps some characteristic of contaminant source (e.g. 150 m from roadway). The chosen accessibility thresholds also typically vary by study population (e.g. children vs. adults), setting (e.g. urban vs. rural), and by health-related outcome (e.g. physical activity vs. asthma). In their study of the environmental influences on whether or not a child will walk or bike to school, for example, Larsen and colleagues (2009) justify the choice of a 1600 m neighborhood buffer based on the local school board cut-off distance for providing school bus service (see also Schlossberg et al., 2006; Muller et al., 2008; Brownson et al., 2009; Panter et al., 2009). Studies which have focussed on access to neighborhood resources such as public parks and recreation spaces have utilized a variety of threshold distances, typically between 400 and 1600 m (compare Lee et al., 2007; Bjork et al., 2008; Tucker et al., 2008; Maroko et al., 2009); however, we submit a threshold distance of 500 m is ideal, as it represents a short 5–7 min walk, therefore easily accessible for populations of all ages (see Tucker et al., 2008; Sarmiento et al., 2010; Wolch et al., 2010). The 5–7 min walk zone, as represented by the 500 m buffer around a home or public school, is also a common distance used in studies exploring the relationship between access to junk food and obesity (see Austin et al., 2005; Morland and Evenson, 2009; Gilliland, 2010). Studies of ‘food deserts’ (disadvantaged areas with poor access to retailers of healthy and affordable food) and the potential impact of poor access to grocery stores on dietary habits and obesity have tended to focus on longer distances (800 m or greater), and vary according to urban vs. rural setting (see Wang et al., 2007; Larsen and Gilliland, 2008; Pearce et al., 2008; Sharkey, 2009; Sadler et al., 2011). For the purpose of this analysis, we focus on 1000 m, or the 10–15 min walk zone around a grocery store, as has been identified in previous studies of food deserts in Canadian cities (Apparicio et al., 2007; Larsen and Gilliland, 2008). Explorations of how distance from a patient’s home to emergency services available at hospitals is associated with increased risk of mortality are more likely to use much larger threshold distances than standard ‘walk zones’ (e.g. greater than 5 km) (see Jones et al., 1997; Cudnick et al., 2010; Nicholl et al., 2007; Acharya et al., 2011). Nicholl and colleagues (2007), for example, discovered that a 10 km increase in straight-line distance to hospital is associated with a 1% increase in mortality. As hospitals tend to be a regional, rather than a neighborhood facility, we will use the threshold distance of 10 km for our analyses. Rushton and colleagues (2006) have argued that when short distances between subject population and environmental features are associated with health effects in epidemiologic studies, the geocoding result must have a positional accuracy that is sufficient to resolve whether such effects are truly present. The purpose of this study is to quantify the magnitude of the positional discrepancies in terms of distance errors and accessibility misclassification that result from using several commonly-used address proxies in public health research. Positional errors have been shown to vary greatly by setting (Bonner et al., 2003; Cayo and Talbot, 2003; Ward et al., 2005); therefore, we quantify errors by multiple neighborhood types: urban, suburban, small town, and rural. We also attempt to ascribe ‘meaning’ to these errors for spatial epidemiologic studies by examining errors in distance and accessibility misclassification with respect to several health-related features, including hospitals, public recreation facilities, schools, grocery stores, and junk food retailers. 2. Methods 2.1. Study area and data The City of London (population 350,200) and Middlesex County (population 69,024) in Southwestern Ontario, Canada are ideal study areas for examining the geocoding errors in accessibility studies as they encompass a mix of urban, suburban, small town, and rural agricultural areas M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 (Statistics Canada, 2011) (see Fig. 1). The study area was categorized into four neighborhood types as follows: (1) urban areas correspond to neighborhoods in the City of London built primarily before World War II; (2) suburban neighborhoods are areas built following WWII that fall within London’s contemporary urban growth boundary; (3) small towns are settlements outside London within Middlesex County, these settlements have fewer than 20,000 inhabitants; and (4) rural areas are defined as all areas of Middlesex County not identified as small town, as well as areas within the city limits of London which are outside its urban growth boundary. All of the areas combine for a total of 104,025 residential addresses, as well as 94 census tracts, 665 dissemination areas and population weighted dissemination areas, 1410 dissemination blocks, 14,256 postal codes, and 19,365 street segment center points. The spatial relationship between geographically aggregated units and a sample dwelling centroid are illustrated in Fig. 2. The dwelling centroid is located within hierarchical spatial structure starting with the census tract, moving down to dissemination area, and then to the dissemination block and finally the individual parcel of land or lot. The dwelling unit is also located within a postal code region, and on a street segment. Each of these larger geographic units can be operationalized as point locations according to their centroids, as seen in Fig. 2. Digital spatial layers to be used as our address proxies were prepared in ArcMap-ArcInfo10.0 (ESRI Inc., 2011). The census tract, dissemination area, and dissemination block boundary files, supplied by Statistics Canada (2006), were converted to centroids using the ‘Feature to Point’ tool. These three spatially aggregated units are commonly used in geographic analyses of population data in Canada; each having their own tradeoffs for researchers based on the size of the aggregated unit vs. the richness of data available. Dissemination blocks are the smallest of the three geographic units in terms of area; therefore their centroids provide a more spatially accurate proxy for exact address. However, most Canadian census data, 57 except population and dwelling counts, is suppressed at this level, and for this reason, the utility of dissemination blocks in studies of accessibility among population subgroups is more limited. Dissemination areas are made up of a small group of dissemination blocks. They are commonly-used in population health studies as they are the smallest aggregated geographic unit available for which Statistics Canada releases a number of key demographic variables (e.g. median household income, population by age, population by ethnicity); nevertheless, a considerable amount of data suppression still occurs at this scale. While census tracts are the most commonly-used proxy for ‘neighborhoods’ in sociological, geographical, and population health research in Canada, and they offer the most comprehensive census data for spatial epidemiologic analyses, they are also the largest geographic unit examined in this study. For this reason, they are hypothesized to result in the greatest positional discrepancy when used as address proxies. Additionally, census tracts are only available in metropolitan areas and therefore do not cover most rural areas. The weighted dissemination areas centroids were created using the ‘Median Center’ tool by leveraging the population distribution data stored within dissemination block centroids which were nested within the dissemination areas. The weighted dissemination areas centroid has been used in previous research (e.g. Apparicio et al., 2008; Henry and Boscoe, 2008) and was included in this study as a more representative measure for the probable location of population within the area. It is therefore expected to produce a closer approximation for an address proxy than the dissemination area centroid. The postal code boundaries and points were drawn from the Platinum Postal Code Suite (DMTI Spatial Inc., 2009). The typical postal code in a Canadian city is a much smaller geographic unit than the typical US zip code, and is commonly used as a proxy for residential address by Canadian researchers when full civic address is unavailable, or suppressed to maintain subject privacy (e.g. Larsen et al., 2009). The street segment centers were created using the tool ‘Feature Fig. 1. Study area: London and Middlesex County, Ontario. 58 M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 Fig. 2. Spatial relationships between various geographic aggregation levels and their corresponding centroid within a census tract. Vertices to Points’ with the CanMapstreet files (DMTI Spatial Inc., 2009). The geometric center of every street segment was generated as an aggregate address proxy for all the dwellings on that segment. The average street length for rural neighborhoods was 711 m, 187 m for small towns, 142 m for suburban neighborhoods, and only 127 m for urban neighborhoods. All 147,000 addresses points in the study area were supplied by the City and County for every parcel of land, dwelling, business, and institution (City of London, 2010; Middlesex County, 2011). A total of 104,025 address points were identified as residential, and each point was located within the centroid of the dwelling polygons provided by the City and County. A tabular list of each of the residential addresses was generated and these addresses were used to geocode against the CanMap street files (2009) using the ‘US Address – Dual Ranges’ address locator, thus generating interpolated address points with the default 10 m offset from the street center line. These interpolated addresses, referred to as ‘geocoded points’ in this paper, are undeniably the most commonly-used address proxies when full address information is available to the researcher. While most researchers use such geocoded points without question, we argue that even these address proxies could have positional discrepancies which might cause accessibility misclassification and therefore they must also be subjected to further scrutiny. Dwelling centroids are the ‘gold standard’ of address proxies in this study, to which all other address proxies will be measured. We submit that this is the best choice, as all journeys from the home begin somewhere within the home. In this paper, the issues of address validity and match rates for dwelling and lot centroid are controlled for, in that every one of the 104,025 residential addresses were matched at 100%. To calculate accessibility measures, the centroids for dwelling centroids and all the address proxies (except those located on the street segment or a fixed distance from the street segment) were linked with a connecting lateral line from the proxy address point to the nearest corresponding street segment using a custom algorithm. These lateral lines were included in the network distances reported in the study. The street segment center points already located on the street centerline did not require a lateral line to connect them to the network, while the geocoded points were all standardized to be 10 m from the street centerline and thus the 10 meters were added to the individual distances post process. GIS layers including the locations of all 6 hospitals, 138 elementary schools, and 512 public recreation spaces within the study area were provided by the geomatics divisions of the City and County (City of London, 2010; Middlesex County, 2011). Addresses for the 52 grocery stores and 1213 junk food retailers (including fast food restaurants M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 and convenience stores) in the study area were provided by the Middlesex-London Health Unit (MLHU, 2010) and geocoded using the master address files provided by the City and County. All data was verified and corrected using orthorectified air photos of London and Middlesex (15 and 30 cm resolution, respectively) (City of London, 2010; Middlesex County, 2011). For built structures, the centroid of the building polygon was used as the address ‘gold standard’; however, for recreational places without a defined built structure, such as parks, the access points were manually created using the air photos. The City, County, DMTI Spatial Inc., and Statistics Canada publish no metric regarding the absolute or relative spatial accuracy of their datasets. In this study, the City and County spatial data were accepted as the most spatially accurate of all the data sources. The City and County spatial data were used to create the building centroids for facilities, dwellings, and the centroid for dwelling lots. Spatial features found in the Statistics Canada and DMTI Spatial Inc. data are within 15 m of the same corresponding features in the City and County data for most of the study area. The Statistics Canada and DMTI Spatial Inc. data were used to generate the census tract, dissemination area, weighted dissemination area, dissemination block, postal code centroids, the street segment center, and the geocoded point address proxies, and to generate the shortest path network routes and polygons. 2.2. GIS methods Shortest path routes (by distance) along the street network from the address proxies to the health-related destination facilities were created using the ArcMap 10.0 Network Analyst ‘Closest Facility’ function (ESRI Inc., 2011). Starting from each dwelling centroid a network route was created to the nearest health-related facility (i.e., the nearest hospital, school, grocery store, junk food outlet, and public recreation facility). This procedure was repeated for every type of health facility until all 104,024 dwelling centroids were assigned a separate shortest path route to one of each of the facility types. The process was then repeated for each of the eight address proxies. The distance measures were stratified into rural, small town, suburban, and urban neighborhood types and exported from ArcMap 10.0 for analysis in Excel 2010 (Microsoft, 2011) and PASW 18 (IBM, 2011). A recent study of accessibility to multiple food retailer types in rural Middlesex County illustrated how accessibility can be misclassified if facilities outside the county boundary are not considered in distance calculations (Sadler et al., 2011). Sadler and colleagues (2011) demonstrated that when facilities in neighboring counties were included in the spatial analyses, distance to the nearest grocery store decreased for nearly one-third of households, and distance to nearest fast food outlet decreased for over one-half of households. The edge effect was taken into account in the present study by compiling the datasets for selected health-related facilities in neighboring counties (within 10 km from the border of Middlesex County) and then including these facilities in the distance calculations. 59 2.3. Misclassified address proxies When spatial aggregations of the subject populations or geographic features are used as proxies in a study of accessibility, the researcher risks misrepresenting the accessibility metric used in that study. Fig. 3 illustrates several potential problems of misclassification and miscounting of grocery stores by identifying three accessibility areas; the census tract boundary; a1000 m network service area buffer originating from the centroid of that same census tract; and a 1000 m network service area buffer originating from a dwelling centroid from within the same census tract. The figure shows that the census tract boundary and the 1000 m network service area buffer around the census tract centroid does not contain a grocery store, and thus would be coded as inaccessible; however, the dwelling centroid buffer does ‘contain’ at least one grocery store and would be coded as accessible. Fig. 3 also illustrates that the count and density metrics will be affected by the positional discrepancy of using imprecise address proxies. We see that the census tract boundary and the buffer around the census tract centroid do not contain any grocery stores, while the dwelling centroid buffer contains two grocery stores. A further look at the Fig. 3 reveals that the distance between the census tract centroid and the dwelling centroid is biased in the direction of the positional discrepancy. In this example, if the census tract centroid was used as the address proxy, the researcher would have coded all sample unit locations within the census tract as not having a grocery store within 1000 m, when in fact, there are two grocery stores within 1000 m for some of the sample units. Moreover, the researcher would have over-estimated the distance to the closest grocery store for numerous dwelling units, such as the one in our example. Following some commonly-used distances found in previous health-related studies of accessibility (as noted above), the thresholds distances used in this study were: 500 m for junk food and public recreation spaces, 1000 m for grocery stores, 1600 m for schools, and 10 km for hospitals. Shortest path route buffers had been created for each address proxy and each address proxy point was binary encoded, either the address proxy was inside the threshold (coded as 1) or outside the threshold (coded as 0). We then matched the binary variable to every dwelling centroid from every corresponding address proxy, and then reported the percentages of improperly coded addresses. 2.4. Statistical methods The distance discrepancies were generated by taking the shortest path distance from a dwelling centroid to a health-related facility and then subtracting the corresponding shortest distance from each corresponding address proxy to that same health facility type. The Phi correlation coefficient was generated in PASW 18 (IBM, 2011) and was used to measure the association between the binary threshold values (i.e., accessible/inaccessible) between the dwelling centroid threshold value (0,1) to each of its corresponding address proxy threshold values (0,1). Phi will return an association coefficient between 1 and +1. A positive value of +1 occurs when all the dwelling 60 M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 Fig. 3. Illustration of threshold distance miscoding errors. threshold values and all the address proxy threshold values are in concordance with one another. Conversely, if there is total discordance between all the dwelling threshold values and all the address proxy threshold the Phi coefficient will be 1. If a number of dwelling centroid threshold values differ from those of the corresponding address proxy, the coefficient will begin to move toward 0, thus suggesting a weaker association in terms of accessibility encoding for that address proxy. The significantly positive associations (sig. < 0.01) are between 0.7 and 1.0. 3. Results 3.1. Magnitude of positional discrepancies In almost every case, urban neighborhoods show the smallest median distance error for all address proxies, followed successively by suburban, small town, and rural areas (see Table 1). As expected, lot centroids were the most accurate proxy for precise residential dwelling location that we examined in relation to nearest distance to health related facilities, with the median positional discrepancy (50th percentile) between lot centroids and dwelling centroids equal to 6–9 m for locations in urban and suburban neighborhoods, 25–43 m for locations in small towns, and 43–50 m for locations in rural areas. The second most accu- rate proxy for residential location was the geocoded point, with median positional discrepancies between geocoded points and dwelling centroids between 38 and 84 m for residential locations in urban neighborhoods, 37–80 m for locations in suburban neighborhoods, 34–82 m for small town locations, and 77–100 m in rural locations. The third most accurate address proxy we examined was the street segment centroid, with median positional discrepancies in relation to dwelling centroids between 52 and 102 m for residential locations in urban neighborhoods, 75–106 m for locations in suburban neighborhoods, 52–100 m for small town locations, and 173–197 m in rural locations. In urban and suburban areas, the positional discrepancies between postal code centroids and dwelling centroids are very similar to the positional discrepancies between street segment centroids and dwelling centroids; however, the positional discrepancies are drastically worse when using postal codes in small towns (median distance errors between 373 and 1177 m) and rural areas (distance errors between 762 and 1363 m). In rural areas and small towns, the positional errors are always greater when using postal code centroids as address proxies compared to centroids of dissemination blocks, weighted dissemination areas, and dissemination areas. Conversely, postal codes show smaller positional errors than these same address proxies in urban and suburban areas. Census tract centroids are always the M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 Table 1 Median positional discrepancy (meters) by facility type and neighborhood type. Neighborhood type Rural Small Suburban Urban (m) town (m) (m) (m) Junk food Lot centroids 49 Geocoded point 85 Street segment center 175 Postal code 762 Dissemination block 680 Weighted dissemination area 897 Dissemination area 1054 Census tract 930 29 48 65 373 147 279 509 1414 9 51 75 78 127 168 176 243 8 38 52 54 78 100 113 160 Public recreation places Lot centroids 43 Geocoded point 77 Street segment center 185 Postal code 896 Dissemination block 677 Weighted dissemination area 988 Dissemination area 1070 Census tract 1347 43 34 52 1177 156 296 599 1723 8 75 106 114 176 228 241 352 8 84 102 109 145 185 207 247 Grocery stores Lot centroids 43 Geocoded point 100 Street segment center 197 Postal code 1196 Dissemination block 810 Weighted dissemination area 1193 Dissemination area 1263 Census tract 1704 25 82 95 494 169 335 559 1870 6 80 100 98 141 198 201 373 9 59 76 79 112 145 158 343 Schools Lot centroids 50 Geocoded point 94 Street segment center 173 Postal code 913 Dissemination block 665 Weighted dissemination area 1017 Dissemination area 1140 Census tract 1268 32 51 66 711 148 361 573 1679 6 60 80 82 133 187 194 363 9 55 66 68 101 132 140 251 Hospitals Lot centroids 46 Geocoded point 85 Street segment center 187 Postal code 1363 Dissemination block 769 Weighted dissemination area 1350 Dissemination area 1255 Census tract 2088 27 65 100 537 349 415 538 2166 5 37 67 78 176 203 204 445 8 75 93 101 160 166 171 343 address proxy with the largest positional error for all neighborhoods and facility types, with median positional discrepancies ranging from the lowest distance error of 160 m (when calculating distance to junk food locations in urban areas) to a high of 2088 m (when calculating distance to hospital in rural areas). Tables A1–A5 (in the Appendix A) provide additional information on the positional discrepancies (including mean distance errors, as well as errors at 75th, 90th, 95th, and 99th percentiles) between the address proxies and the dwelling centroids they are meant to represent. The general pattern observable for the median (i.e., 50th percentile) positional discrepancies (reported in Table 1) tends to be similar in relative terms, but much less dramatic in terms of absolute distance errors, compared to 61 the mean positional discrepancies, as well as the 75th, 90th, 95th, and 99th percentile of discrepancies. 3.2. Positional discrepancy by facility type The positional discrepancies between the address proxy locations and the dwelling centroids they are to represent not only vary considerably by neighborhood type, but they also vary by health facility type. When lot centroids are used as address proxies, there is a very small variability between distance errors for all facility types, regardless of neighborhood type (rural = ±7 m; urban = ±1 m) (see Table 1). Of the 32 unique combinations of address proxies, neighborhood types, and facility types it is the junk food outlets (N = 1213) that have the minimum median distance errors 68.8% of the time (22/32), while public recreation facilities (N = 512), singularly, account for almost 50% (15/32) of the facilities with maximum median distance errors. The junk food outlets have minimum median distance errors for all the address proxies in the urban neighborhood type. Junk food outlets, also, account for all the minimum median distance errors in suburban and small town neighborhood types for postal codes, dissemination block, weighted dissemination area, dissemination area, and census tract proxies. For rural neighborhoods, the minimum median distance errors for junk food outlets are found when the postal code, weighted dissemination area, and census tract proxies are used. For the most part, public recreation facilities (N = 512) display larger median positional discrepancies than all other health related facilities in urban and suburban areas, while hospitals (N = 6) and grocery stores (N = 52) show the greatest positional discrepancies compared to the other health related facilities in rural and small towns. The postal code median distance error of 1177 m for small town and public recreation facilities is a larger error than rural neighborhood types and public recreation facilities (896). 3.3. How positional discrepancy impacts accessibility measures In addition to reporting the positional discrepancy errors it is instructive to look at how much of an effect these errors have on the classification of the population aggregated in each of the address proxies. In some health related accessibility studies continuous variables are used to measure the proximity of health related facilities to an address proxy. Some studies use binary variables to identify whether or not a health related facility exists within a set threshold distance (or buffer radius) around a proxy (Talen, 2003; Apparicio et al., 2008); still more studies use density and counts, however, as indicated in Fig. 3, this approach can also lead to serious misclassification errors. Table 2 considers the impact of positional discrepancy on accessibility, by reporting the percentage of cases that are incorrectly classified as accessible or not, by address proxy, neighborhood type and health related facility type. The general trend is that the smaller the distance threshold, the greater the percentage of addresses misclassified; also, the larger the geographic area of the unit of aggregation, the greater the percentage of addresses which are 62 M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 Table 2 Accessibility thresholds: percentage of misclassified observations by address proxy. Address proxy Neighborhood type Junk food (500 m) Recreation places (500 m) Grocery (1 km) Schools (1.6 km) Hospitals (10 km) Census tracts (N = 94) Rural* (n = 17) Small town (n = 3) Suburban (n = 54) Urban (n = 20) 13.5 36.7 31.2 16.9 8.0 33.7 47.4 49.5 4.7 21.0 16.8 37.1 18.3 35.7 15.9 0.1+ 26.4 10.2 5.1+ 0.0+ DA(N-665) Rural (n = 125) Small town (n = 43) Suburban (n = 367) Urban (n = 130) Rural (n = 110) Small town (n = 53) Suburban (n = 372) Urban (n = 130) 7.6 35.4 23.9 15.5 9.6 31.5 23.0 10.7 3.7 37.1 28.2 33.5 4.7 33.5 29.2 29.7 3.8 22.8 11.4 15.3+ 3.9 15.2 11.5 15.5+ 11.9 29.3 7.6 0.1+ 11.9 19.2 6.7 0.1+ 8.6 2.7 0.7+ 0.0+ 8.5 1.2+ 0.7+ 0.0+ Rural (n = 1499) Small town (n = 593) Suburban (n = 1409) Urban (n = 709) Rural (n = 2539) Small town (n = 1003) Suburban (n = 7792) Urban (n = 2922) Rural (n = 6310) Small town (n = 2227) Suburban (n = 8364) Urban (n = 2464) Rural (n = 16,686) Small town (n = 14,139) Suburban (n = 54,579) Urban (n = 18,620) Rural (n = 16,686) Small town (n = 14,139) Suburban (n = 54,579) Urban (n = 18,620) 6.9 18.2 18.4 12.0 9.2 29.9 2.9 22.2 25.6 24.5 6.8 33.2 2.5 11.2+ 9.1 13.1+ 3.0+ 27.8 8.5+ 15.3+ 5.9+ 1.1+ 8.1+ 37.0 5.2+ 1.0+ 0.9+ 0.0+ 6.9+ 3.5+ 11.3+ 6.5 4.3+ 9.0+ 21.0 22.8 2.3+ 8.7+ 6.4+ 10.5+ 1.0+ 4.3+ 2.4+ 0.1+ 3.6+ 2.7+ 0.3+ 0.0+ 1.2+ 0.6+ 12.4+ 6.2 2.9+ 7.1+ 21.6 23.1 1.9+ 6.7+ 6.8+ 10.1+ 1.1+ 4.0+ 2.3+ 0.1+ 2.5+ 2.2+ 0.3+ 0.0+ 0.5+ 0.4+ 8.9+ 18.3 5.3+ 1.5+ 0.2+ 5.6+ 0.8+ 2.0+ 21.1 0.4+ 1.8+ 9.4+ 0.2+ 0.8+ 0.1+ 0.6+ 0.6+ 0.0+ 0.5+ 0.1+ 1.7+ 1.5+ 0.6+ 0.4+ 0.1+ 1.5+ 1.7+ 1.3+ 0.0+ 0.0+ Weighted DA (N = 665) DB (N = 4210) Postal code (N = 14,256) Street segment (N = 19,365) Geocoded (N = 104,024) Lot (N = 104,024) Abbreviations: DB – dissemination block; DA – dissemination area; N – number of address proxies; n – number of address proxies by neighborhood type. * Census tracts only exist for rural areas within Census Metropolitan Areas and therefore coverage is biased toward more densely populated rural areas. + Phi coefficient strong positive association (+0.7 to +1.0) sig. < 0.01. misclassified. For example, using the centroid of a large aggregated unit such as a census tract as a proxy for precise residential address when calculating whether or not a park is located within 500 m from residential addresses in urban neighborhoods will result in nearly half (49.5) of all observations being misclassified. On the other hand, using a large threshold distance of 10 km to determine accessibility to a hospital results in no misclassification in urban areas, no matter what the address proxy used (as the threshold practically covers the entire urban area). The Phi coefficient shows a positive association between each of the dwelling centroids and each and every corresponding address proxy of the coding threshold (inside/ outside) across all the health related facility thresholds, except for one. There is a weak negative (U = 0.6, p < 0.01) association for the urban census tract proxy coding thresholds for public recreation facilities. For example, census tract centroids coded as ‘outside’ (those that do not have a public recreation facility within 500 m) will have many corresponding dwelling centroids coded as ‘inside’ (those that do have a public recreation facility within 500 m) resulting in this negative association. There is a strong positive association between dwelling centroid and lot centroid for threshold distances of 1 km to grocery stores. If a suburban dwelling centroid is coded as being within 1 km from a grocery store (code = 1) there is a strong probability (U = 0.996, p < 0.01) that the corresponding lot centroid will also be within 1 km of a grocery store and coded in the same way. Conversely, if a dwelling centroid is coded as being farther away than 1 km from a grocery store (code = 0) then there is the same probability (U = 0.996, p < 0.01) that the corresponding lot centroid will also be coded in the same way. The range of Phi values for dwellings and corresponding census tracts, dissemination areas, and weighted dissemination area proxies for junk food and recreation places (500 m thresholds) are weakly associated ( 0.6 < U < 0.47, p < 0.01). The fewest misclassification errors and strongest associations for the 500 m thresholds exist for lot centroids (U > 0.93, p < 0.01) followed by geocoded points (0.6 < U < 0.87, p < 0.01). Postal code M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 centroids showed very high errors in coding for small town (29.9%) and weak association (rural U = 0.26, small town U = 0.29, suburban U = 0.59, and urban U = 0.58, p < 0.01). 4. Discussion It is common in public health research to use spatially aggregated units as address proxies for the locations of subjects and facilities when more precise address information is unavailable. It is rare, however, for public health researchers to examine, or even mention, the potential distance and misclassification errors resulting from the positional discrepancies between the locations of imprecise address proxies and precise subject locations. It is inappropriate for researchers to ignore these inaccuracies or to merely accept them as an inevitable component of doing spatial research. It is important to identify and quantify any spatial errors so that we can critically examine research findings and properly advise those to whom policy recommendations are made regarding the potential correlations between subject populations and environmental exposures. One of the contributions of our study is to quantitatively describe the magnitude of distance errors that result when several of the most commonly-used address proxies are implemented in several different neighborhood types, including rural, suburban, small town, and urban areas. It is recognized that accessibility thresholds will vary by setting, as well as health outcome or health-related behavior. Therefore, by demonstrating how the magnitude of the distance errors can affect measures of accessibility (or exposure) to a variety of health-related spaces in different environments and at different distance thresholds, this study also makes a methodological contribution to the environmental and public health literature. The dwelling as represented by the centroid of the building in which the study participant resides is considered the gold standard for residential address location. If dwelling centroids are not available to the researcher, then the second most accurate address proxy is the centroid of the parcel of land(i.e., the lot) on which the dwelling unit is located; this finding is true regardless of neighborhood type. When the lot centroid is used as an address proxy, accessibility misclassification errors are virtually nonexistent in urban and suburban neighborhoods, and are very minor in rural areas and small towns. Where digital files for all residential buildings or residential lots are not available for a study region, but the researcher has access to the complete civic address (i.e., street name and number) for each subject, it is very common for researchers to geocode their tables of subject addresses using ‘address locator’ tools to interpolate residential addresses. While the median distance error for this address proxy is too high for researchers to simply ignore (ranging from a low of 34 m to a high of 100 m depending on facility and neighborhood types), for the most part, there are few instances of miscoded accessibility when this commonlyused address proxy is used: fewer than one-tenth (8.9%) of all observations are misclassified, except for recreation spaces within 500 m in suburban and urban neighbor- 63 hoods, where approximately one-fifth of observations are misclassified (18.3% and 21.1%, respectively). A variation on the interpolated address technique is to use the centroid of the closest street segment as address proxy. This method is useful for environmental equity studies, where researchers may want to map and visualize how access to certain environmental features varies at a fine scale across a study area, but they do not have (or cannot show for privacy reasons) specific address data for subject populations. The street segment centerline address proxy appeared to have fewer distance and misclassification errors than the more commonly-used postal code centroids, particularly for small town and rural areas. Postal codes are certainly the most commonly-used proxy for residential addresses of research subjects in Canadian public health studies. In Canada, the postal code centroid is often the best solution when exact addresses are unavailable, or inaccessible due to research ethics board policies and privacy concerns. Our results indicate that postal code centroids are reasonably accurate proxies for residential addresses in urban and suburban areas (median positional discrepancies between 54 and 109 m depending on facility type); however, we recommend that postal codes should be used only with extreme caution for studies based in small town and rural areas of Canada. Positional discrepancies between postal code centroid and dwelling centroid can be very high in rural areas: depending on facility type, median distance errors in rural areas ranged between 762 and 1363 m. Furthermore, we found that postal codes are reasonably accurate for accessibility studies when distance thresholds are 1000 m or greater; however, we advise that postal codes should not be used as proxies for residential addresses in accessibility studies where the threshold distances or density buffers are as short as 500 m. Postal code centroids are particularly prone to misrepresenting accessibility in small towns and rural Canada, and therefore should only be used with more caution in spatial epidemiologic research in Canada. Urban areas show the smallest distance error for all address proxies followed by suburban, small town, and rural neighborhoods. As expected, the magnitude of distance errors and threshold misclassification errors are larger, or most problematic, when the address proxy is the centroid of a large geographic aggregation such as the census tract. In general, the census tract performed poorly as an address proxy except in urban areas where threshold distances are 1600 m or greater. Similarly, we recommend that centroids of dissemination areas and weighted dissemination areas should only be used as residential address proxies in urban areas when threshold distances are set at greater than 1000 m and in suburban areas when threshold distances are set at greater than 1600 m. As for Canadian small towns, researchers should also avoid all spatially aggregated address proxies for threshold distances less than 1.6 km as the misclassification errors are consistently large, as are the distance errors. While these recommendation are based on the empirical findings related to the specific health-related facilities examined in this study, it is recognized that the positional accuracy required for spatial epidemiology research also depends on the specific exposure and health outcome under examination (e.g. spatial 64 M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 accuracy is more critical for studies of exposure to air pollution than distance to nearest hospital). This study looked at the errors in the shortest path distances from each address proxy to the closest public recreation space, junk food outlet, grocery store, school, and hospital in a full range of neighborhood types. One way in which this study differed from previous studies of positional error is that street network distances were used in the error calculation, not the relative positional errors in terms of Euclidean distances. Since a subject must use the existing street network (or pathway network) to travel from their dwelling to access the nearest park, junk food outlet, grocery store, school, or hospital, it would be inaccurate to calculate positional errors and therefore accessibility misclassification as Euclidean or ‘crow fly’ distances between address proxies and dwelling centroids (except where distances are too small to require use of the network). As a necessary methodological step to create baseline distance measures for comparative purposes, this study assigned health-related accessibility scores to every residential address in the study area. These individual values are at the finest scale so that, in future, they can be aggregated in any geographic frame a researcher would see fit. By creating accessibility measures to individual dwelling centroids, researchers are no longer constrained by the (often arbitrary) boundaries of blocks, postal codes, dissemination areas, census tracts, or even counties. There is a growing trend in public health studies, particularly within the burgeoning field of ‘active living research’, toward the use of ‘ego-centric’ units (typically defined by buffers around a study participant’s residence) to characterize a participant’s neighborhood in order to examine the effect that local environmental factors (e.g. the mix of land uses and coverage of sidewalks) may have on health-related behaviors such as walking (e.g. Larsen et al., 2009) and outcomes such as physical activity levels (Tucker et al., 2008). The findings of this study have revealed that if commonly-used proxies such as centroids of census tracts, dissemination areas, and even postal codes, are used instead of exact addresses, distance errors can be significantly large. If distance errors are large, such ‘ego-centric’ neighborhood units will be significantly ‘off center’ and local environments can be mischaracterized. For example, the chances of misclassifying a health-promoting feature of the neighborhood such as a park (or a health-damaging feature such as a junk food outlet) as accessible (or not) can be unacceptably high, particularly when threshold distances are short, such as the commonly-used 500 m buffer (or 5-min walk zone). If positional discrepancies are too large, it will be impossible for the researcher to resolve whether any health effects of an environment are truly present. Improving the accuracy of our distance calculations increases the utility of our findings for making decisions and enacting policies aimed at improving a population’s spatial accessibility to environmental features that contribute to their overall health and well-being. Appendix A Table A1 Distance errors (m) from address proxy to closest junk food retailer. Neighborhood type % (N = 104,024) Lot (np = 104,024) Geocoded point (np = 104,024) Street segment (np = 19,365) Postal code (np = 14,265) DB (np = 4210) Weighted DA (np = 665) DA (np = 665) Census tract* (np = 94) Rural (n = 16,686) Mean Median 75th 90th 95th 99th 69 49 74 166 182 364 163 85 168 337 471 1683 274 175 370 597 772 1683 1344 762 2040 3742 4436 5832 984 678 1431 2312 2835 4053 1325 897 1930 3219 4097 5536 1415 1054 2033 3261 4060 5690 1427 930 2159 3473 4136 5383 Small town (n = 14,139) Mean Median 75th 90th 95th 99th 38 29 35 56 99 187 69 48 78 148 196 351 89 65 111 181 245 475 1241 373 1786 4467 5099 6483 455 146 458 1231 2515 3418 562 279 623 1207 2774 3729 979 509 1227 2528 3765 5926 1883 1414 3280 4190 4791 5448 Suburban (n = 54,579) Mean Median 75th 90th 95th 99th 12 9 11 17 35 167 83 51 76 147 331 551 111 75 125 238 380 625 107 78 133 224 331 547 186 126 255 430 558 881 226 168 312 501 637 975 250 176 334 564 730 1216 297 243 423 626 767 1037 Urban (n = 18620) Mean Median 75th 90th 95th 99th 13 8 12 17 30 61 51 38 51 77 137 366 66 52 81 120 166 377 71 54 90 139 187 413 108 77 146 230 309 530 126 100 176 260 322 527 139 113 194 284 351 550 195 160 281 405 492 651 Abbreviations: DB – dissemination block; DA – dissemination area; N – number of dwelling centroids; n – number of dwelling centroids by neighborhood type; np – number of address proxies. Census tracts only exist for rural areas within Census Metropolitan Areas and therefore coverage is biased toward more densely populated rural areas. * 65 M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 Table A2 Distance errors (m) from address proxy to closest public recreation place. Neighborhood type % (N = 104,024) Lot (np = 104,024) Geocoded point (np = 104,024) Street segment (np = 19,365) Postal code (np = 14,265) DB (np = 4210) Weighted DA (np = 665) DA (np = 665) Census tract* (np = 94) Rural (n = 16,686) Mean Median 75th 90th 95th 99th 63 43 72 158 185 346 156 77 161 386 606 1069 270 185 401 608 781 1118 1645 896 2393 4206 5458 8931 972 677 1427 2324 2879 4024 1491 988 2177 3612 4570 6495 1520 1070 2180 3561 4401 6097 1961 1347 2749 4629 6017 8579 Small town (n = 14,139) Mean Median 75th 90th 95th 99th 41 38 43 55 99 195 56 34 60 109 175 464 77 52 92 155 235 517 1779 1177 3109 4076 5095 9996 503 156 482 1590 2770 3327 645 296 712 1699 2971 4521 1105 599 1513 2882 4010 6495 2020 1723 3172 3768 6521 7828 Suburban (n = 54,579) Mean Median 75th 90th 95th 99th 11 8 9 16 33 161 191 75 238 557 745 1207 214 106 265 586 766 1243 211 114 257 558 761 1231 266 176 367 632 822 1242 319 228 443 732 920 1383 347 241 473 772 985 1674 525 352 645 1031 1420 4993 Urban (n = 18,620) Mean Median 75th 90th 95th 99th 11 8 12 18 24 60 182 84 257 513 632 937 193 102 275 527 639 921 195 109 279 518 639 953 208 145 290 483 608 938 242 185 347 523 638 1055 265 207 377 567 690 1084 293 247 419 593 714 943 Abbreviations: DB – dissemination block; DA – dissemination area; N – number of dwelling centroids; n – number of dwelling centroids by neighborhood type; np – number of address proxies. * Census tracts only exist for rural areas within Census Metropolitan Areas and therefore coverage is biased toward more densely populated rural areas. Table A3 Distance errors (m) from address proxy to closest grocery store. Neighborhood type % (N = 104024) Lot (np = 104,024) Geocoded point (np = 104,024) Street segment (np = 19,365) Postal code (np = 14,265) DB (np = 4210) Weighted DA (np = 665) DA (np = 665) Census tract* (np = 94) Rural (n = 16,686) Mean Median 75th 90th 95th 99th 64 43 74 168 191 380 281 100 212 568 1420 2740 377 197 450 805 1361 2762 2000 1196 2793 4798 6736 11412 1095 810 1599 2531 2976 4122 1707 1193 2476 4029 5102 7154 1721 1263 2463 3877 4773 6604 2581 1704 3707 6123 7361 9584 Small town (n = 14139) Mean Median 75th 90th 95th 99th 3 25 31 53 95 184 115 82 121 211 454 567 135 95 152 288 482 647 2000 494 3532 5529 8523 10709 471 169 493 1465 2234 3027 651 335 765 1623 2367 4722 1102 559 1501 2963 3683 6662 2730 1870 3653 6821 9253 10225 Suburban (n = 54579) Mean Median 75th 90th 95th 99th 12 6 9 16 34 164 168 80 116 171 609 2212 197 100 157 258 736 2405 190 98 162 257 629 2237 271 141 294 614 994 2190 327 198 394 727 1147 2094 345 201 404 762 1354 2358 573 373 697 1136 1817 3819 Urban (n = 18620) Mean Median 75th 90th 95th 99th 11 9 14 19 23 61 115 59 88 232 587 854 129 76 118 247 594 892 132 79 129 274 580 902 177 112 209 423 671 924 203 145 262 442 656 935 217 158 281 476 686 951 381 343 553 752 871 1089 Abbreviations: DB – dissemination block; DA – dissemination area; N – number of dwelling centroids; n – number of dwelling centroids by neighborhood type; np – number of address proxies. * Census tracts only exist for rural areas within Census Metropolitan Areas and therefore coverage is biased toward more densely populated rural areas. 66 M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 Table A4 Distance errors (m) from address proxy to closest school. Neighborhood type % (N = 104,024) Lot (np = 104,024) Geocoded point (np = 104,024) Street segment (np = 19,365) Postal code (np = 14265) DB (np = 4210) Weighted DA (np = 665) DA (np = 665) Census tract* (np = 94) Rural (n = 16,686) Mean Median 75th 90th 95th 99th 68 50 76 163 187 378 147 94 159 294 413 1074 254 173 367 590 743 1071 1547 913 2339 3957 5021 7693 974 665 1388 2284 2929 4060 1564 1017 2300 3852 4795 6308 1595 1140 2299 3784 4752 6303 1850 1268 2616 4441 5550 7493 Small town (n = 14,139) Mean Median 75th 90th 95th 99th 34 32 38 61 100 189 65 51 79 115 163 358 87 66 108 170 228 483 1522 711 2465 4048 5922 6990 445 148 477 1311 2271 3047 666 361 806 1517 2723 4187 1087 573 1423 2604 4322 6560 2155 1679 2954 5926 6875 7758 Suburban (n = 54,579) Mean Median 75th 90th 95th 99th 13 6 10 15 34 166 82 60 84 126 180 687 108 80 125 191 286 713 109 82 136 206 277 698 215 133 273 513 716 1200 272 187 357 609 838 1341 300 194 379 671 976 1597 510 363 667 1057 1567 2830 Urban (n = 18,620) Mean Median 75th 90th 95th 99th 13 9 14 19 23 61 68 55 74 111 170 381 81 66 100 147 190 417 84 68 110 164 210 409 139 101 186 295 387 654 162 132 227 331 405 641 171 140 241 349 426 651 296 251 442 624 724 869 Abbreviations: DB – dissemination block; DA – dissemination area; N – number of dwelling centroids; n – number of dwelling centroids by neighborhood type; np – number of address proxies. * Census tracts only exist for rural areas within Census Metropolitan Areas and therefore coverage is biased toward more densely populated rural areas. Table A5 Distance errors (m) from address proxy to closest hospital. Neighborhood type % (N = 104,024) Lot (np = 104,024) Geocoded point (np = 104,024) Street segment (np = 19,365) Postal code (np = 14,265) DB (np = 4210) Weighted DA (np = 665) DA (np = 665) Census tract* (np = 94) Rural (n = 16,686) Mean Median 75th 90th 95th 99th 66 46 72 156 180 359 176 85 284 458 553 859 278 187 426 655 817 1148 2382 1363 3683 6150 8116 11812 1082 769 1561 2508 3052 4375 1903 1350 2732 4496 5708 8419 1854 1255 2700 4400 5535 8292 3285 2088 5223 7815 9735 13483 Small town (n = 14,139) Mean Median 75th 90th 95th 99th 34 27 33 56 96 185 178 65 335 443 511 821 192 100 341 450 516 828 1296 537 1589 3664 4320 8095 546 349 645 1355 2203 3060 674 415 832 1580 2319 3690 998 538 1273 2373 3766 6689 2413 2166 3266 5281 6095 9435 Suburban (n = 54,579) Mean Median 75th 90th 95th 99th 12 5 9 16 33 164 68 37 75 178 188 367 93 67 127 189 231 503 102 78 143 214 267 441 255 176 326 556 777 1358 287 203 384 640 848 1389 301 204 390 647 885 1620 651 445 797 1256 1689 5312 Urban (n = 18,620) Mean Median 75th 90th 95th 99th 11 8 12 17 22 58 101 75 181 193 200 312 104 93 170 204 226 319 114 101 175 225 263 362 190 160 262 380 464 738 207 166 292 434 538 774 214 171 301 445 555 814 414 343 580 835 1078 1668 Abbreviations: DB – dissemination block; DA – dissemination area; N – number of dwelling centroids; n – number of dwelling centroids by neighborhood type; np – number of address proxies. * Census tracts only exist for rural areas within Census Metropolitan Areas and therefore coverage is biased toward more densely populated rural areas. M.A. Healy, J.A. Gilliland / Spatial and Spatio-temporal Epidemiology 3 (2012) 55–67 References Acharya A, Nyirenda J, Higgs G, Bloomfield M, Cruz-Flores S, Connor L, et al. Distance from home to hospital and thrombolytic utilization for acute ischemic stroke. J Stroke Cerebrovasc Dis 2011;20(4):295–301. Anselin L. How (not) to lie with spatial statistics. Am J Prev Med 2006;30(2):S3–6. Apparicio P, Cloutier M, Shearmur R. The case of Montréal’s missing food deserts: evaluation of accessibility to food supermarkets. Int J Health Geog 2007;6(4):12. Apparicio P, Abdelmajid M, Riva M, Shearmur R. Comparing alternative approaches to measuring the geographical accessibility of urban health services: distance types and aggregation-error issues. Int J Health Geogr 2008;7(7). Austin S, Melly S, Sanchez B, Patel A, Buka S, Gortmaker A. Clustering of fast-food restaurants around schools: a novel application of spatial statistics to the study of food environments. Am J Public Health 2005;95(9):1575–81. Bjork J, Albin M, Grahn P, Jacobsson H, Ardo J, Wadbro J, et al. Recreational values of the natural environment in relation to neighbourhood satisfaction, physical activity, obesity and wellbeing. J Epidemiol Community Health 2008;62(2). Bonner M, Daikwon H, Nie J, Rogerson P, Vena J, Freudenheim J. Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology 2003;14:408–12. Bow C, Jennifer D, Waters N, Faris P, Seidel J, Galbraith D, et al. Accuracy of city postal code coordinates as a proxy for location of residence. Int J Health Geogr 2004;3(5). Brownson R, Hoehner C, Day K, Forsyth A, Sallis J. Measuring the built environment for physical activity: state of the science. Am J Prev Med 2009;36(S4):S99–S123. Cayo M, Talbot T. Positional error in automated geocoding of residential addresses. Int J Health Geogr 2003;2(10). City of London. Parcels, buildings, address points, and health facilities GIS files [DVD]. London (ON): Geomatics Division; 2010. Cudnick M, Schmicke R, Vaillancourt C, Newgard C, Christenson J, Davis, et al. A geospatial assessment of transport distance and survival to discharge in out of hospital cardiac arrest patients: Implications for resuscitation centers. Resuscitation 2010;81:518–23. DMTI Spatial Inc. Database of postal code centroids and street centerline GIS files [Internet], Ottawa(On);2009. Available from <http:// equinox.uwo.ca>. Fotheringham S. Scale-independent spatial analysis. In: Goodchild M, Gopal S, editors. Accuracy of spatial data. London: Taylor & Francis; 1989. p. 221–8. Gilliland J. The Built environment and obesity: trimming waistlines through neighbourhood design. In: Bunting, Filion, Walker, editors. Canadian cities in transition. 4th ed. Oxford Univ Press; 2010. p. 391– 410. Goldberg D. A Geocoding Best Practices Guide. Springfield, IL North Am Assoc Cent Cancer Registries;2008. Henry K, Boscoe F. Estimating the accuracy of geographical imputation. Int J Health Geogr 2008;7(3). Jacquez G, Rommel R. Local indicators of geocoding accuracy (LIGA): theory and application. Int J Health Geogr 2009;8(60). Jones A, Bentham G, Horwell C. Health service accessibility and deaths from asthma in 401 local authority districts in England and Wales, 1988–92. Thorax 1997;52:218–22. Larsen K, Gilliland J. Mapping the evolution of ‘food deserts’ in a Canadian city: supermarket accessibility in London, Ontario, 1961–2005. Int J Health Geogr 2008;7(16). Larsen K, Gilliland J, Hess P, Tucker P, Irwin J, He M. The influence of the physical environment and sociodemographic characteristics on children’s mode of travel to and from school. Am J Public Health 2009;99(3):520–6. Lee R, Cubbin C, Winkleby M. Contribution of neighbourhood socioeconomic status and physical activity resources to physical activity among women. J Epidemiol Community Health 2007;61: 882–90. Maroko A, Maantay J, Sohler N, Grady K, Arno P. The complexities of measuring access to parks and physical activity sites in New York 67 city: a quantitative and qualitative approach. Int J Health Geogr 2009;8(34). Middlesex County. Database of parcels, address point, aerial photos, and health facilities GIS files [DVD]. London (ON): Middlesex County Planning Dept.;2011. Middlesex-London Health Unit. Database of food retailers [DVD].London (ON): Middlesex County Food Inspection Dept.;2010. Morland K, Evenson K. Obesity prevalence and the local food environment. Health Place 2009;15:491–5. Muller S, Tscharaktschiew S, Haase K. Travel-to-school mode choice modelling and patterns of school choice in urban areas. J Transport Geog 2008;16:342–57. Nicholl J, West J, Goodacre S, Turner J. The relationship between distance to hospital and patient mortality in emergencies: an observational study. Emerg Med J 2007;24:665–8. Panter J, Jones A, van Sluijs E, Griffin S. Attitudes, social support and environmental perceptions as predictors of active commuting behaviour in school children. J Epidemiol Community Health 2009;61:389–95. Pearce J, Hiscok R, Blakely T, Witten K. The contextual effects of neighbourhood access to supermarkets and convenience stores on individual fruit and vegetable consumption. J Epidemiol Community Health 2008;62:198–201. Rushton G, Armstrong M, Gittler J, Greene B, Pavlik C, West M, Zimmerman D. Geocoding in Cancer Research. Am J Prev Med 2006;30(2):S16–24. Sadler R, Gilliland J, Arku G. An application of the edge effect in measuring accessibility to multiple food retailer types in Southwestern Ontario, Canada. Int J Health Geogr 2011;10:34. Sarmiento OL, Schmid TL, Parra DC, Diaz-del-Castillo A, Gomez LF, Pratt M, Jacoby E, Pinzon JD, Duperly J. Quality of life, physical activity, and built environment characteristics among Columbian adults. J Phys Act Health 2010;2010 7(S2):S181–95. Schlossberg M, Greene J, Phillips P, Johnson B, Barker B. School trips: effects of urban form and distance on travel mode. Am Plann Assoc: J Am Plann Assoc 2006;72(3):337–46. Schootman M, Sterling D, Struthersa J, Yan Y, Laboubea T, Emo B, et al. Positional accuracy and geographic bias of four methods of geocoding in epidemiologic research. Ann Epidemiol 2007;17(6):464–70. Sharkey J. Measuring potential access to food stores and food-service places in rural areas in the US. Am J Prev Med 2009;36(4):S151–5. Statistics Canada. Census boundary files [Internet]. Ottawa (On); Data Liberation Initiative;c2006. Available from <http://equinox.uwo.ca>. Statistics Canada. Rural and Small Town Canada Analysis Bulletin 2011. Available from <http://www.statcan.gc.ca/pub/21-006-x/21-006x2001003-eng.pdf>. Strickland M, Siffel C, Gardner B, Berzen A, Correa A. Quantifying geocode location error using GIS methods. Environ Health 2007;6:10. Talen E. Neighborhoods as service providers: a methodology for evaluating pedestrian access. Environ Plann B Plann Des 2003;30(2):181–200. Tucker P, Irwin J, Gillliland J, Larsen K, He M, Hess P. Environmental influences on physical activity levels in youth. Health Place 2008;15(1):357–63. Wang M, Kim S, Gonzalez A, MacLeod K, Winkleby M. Socioeconomic and food-related physical characteristics of the neighbourhood environment are associated with body mass index. J Epidemiol Community Health 2007;61:491–8. Ward M, Nuckols J, Giglierano J, Bonner M, Wolter C, Airola M, et al. Positional accuracy of two methods of geocoding. Epidemiology 2005;16(4):542–7. Wolch J, Jerrett M, Reynolds K, McConnell R, Chang R, Dahmann N, Brady K, Gilliland F, Su JG, Berhane K. Childhood obesity and proximity to parks and recreational resources: a longitudinal cohort study. Health Place 2010;17(1):207–14. Zandbergen P, Green J. Error and bias in determining exposure potential of children at school locations using proximity-based GIS techniques. Environ Health Perspect 2007;115(9):1363–70. Zandbergen P. A comparison of address point, parcel and street geocoding techniques. Comput Environ Urban Syst 2008;32(3):214–32.