Housing Deficit - Methodology and Guide
Housing Deficit - Methodology and Guide
Prepared by
Jordan Fischer and Camilo Pecha
October 2019
Acknowledgment
This report details theoretical methodology and practical guidelines to replicate the estimation
of updated housing deficit using census data and satellite imagery. We want to thank Patricio
Zambrano-Barragán, Edgar Lemus Pablo, and Muchen Zhu for their support.
TABLE OF CONTENTS
I. INTRODUCTION .................................................................................................................. 4
II. DEFINITIONS ....................................................................................................................... 5
Quantitative housing deficit ................................................................................................... 6
Qualitative housing deficit...................................................................................................... 6
III. HOW TO MEASURE HOUSING DEFICIT ................................. Error! Bookmark not defined.
Estimation process ................................................................................................................. 8
IV. INDICATORS’ ESTIMATION BASED ON DECISION RULES .................................................. 11
V. USING PUBLIC GIS AND SATELLITE INFORMATION TO ESTIMATE HOUSING DEFICIT ...... 15
VI. Bibliography...................................................................................................................... 32
VII. Annex ............................................................................................................................... 35
I. INTRODUCTION
Understanding housing deficit is crucial in creating housing policy. Exactly how it is defined
varies somewhat depending on the source1, which may be necessary to take into account
cultural and climatological factors, but the standard method for estimating housing deficit
relies on census data. Census data is ideal for its thorough data collection, but it is only available
once a decade. Housing and infrastructure projects are bound to take place in between
censuses, and housing deficit will be an essential input to inform such projects.
Using open source software and the input of context experts, the georeferenced estimation of
housing deficit can be automated to produce standardized maps of housing deficit within and
across countries, and even, where conditions allow, to now-cast updated estimations using
satellite imagery for contexts where recent census data is not available.
This document presents a methodology for the estimation of qualitative and quantitative
housing deficits at a highly granular geospatial level and now-casting to estimate housing
deficit at the time of the most recently available satellite images. The methodology was tested
on data from three countries: Guyana, Trinidad and Tobago, and Peru with encouraging
results.
The first section lays out data requirements and important definitions; the second explains the
default methodology to be used in cases where a country does not have a nationally prescribed
housing deficit estimation methodology; the third will describe in detail how to access and
process satellite imagery using QGIS (the same process can be completed using a different
software, but QGIS is open-source, free of charge, and more accessible than most alternatives);
the next section will explain the now-casting element of the exercise in which a regression is
used to ‘predict’ updated deficit; and finally, the last section will cover mapping and comparing
the results of all these calculations.
1
The CEPAL (Comisión Económica para América Latina y el Caribe) explores definitions and variations in the
1996 publication “Déficit habitacional y datos censales sociodemográficos: una metodología”:
https://repositorio.cepal.org/bitstream/handle/11362/9781/S9600043_es.pdf?sequence=1&isAllowed=y
II. DATA REQUIREMENTS, DEFINITIONS, AND DEFICIT
MEASUREMENT
This section will describe the tabular and geospatial data required to calculate georeferenced
quantitative and qualitative housing deficit indicators. The conditions necessary to now-cast
updated estimates will be explained in section IV (Using public GIS and satellite information to
estimate housing deficit). Assuming access to the necessary data, this section will also outline
fundamental concepts of housing deficit estimation and provide reference to prevailing
literature on the topic.
Data requirements
The feasibility of this exercise will depend on the availability of georeferenced microdata
from a given country’s most recent Census, the availability of granular administrative division
shapefiles2, the ability to associate these two datasets, and finally, satellite images of the
same resolution, correction, and metrics available for the land mass in question from
approximately the year of the census and the most recent year available. It is of crucial
importance that the analyst be able to associate the administrative divisions used in the
census to the administrative divisions present in the shapefiles. Furthermore, the census data
must include georeferenced deficit indicator variables necessary to construct
methodologically sound deficit indices. While this may vary from country to country (for
example, inclusion of data on flooring but not roofing, or vice versa), the data should include
data on construction materials, access to services, and density of habitation.
The R script “Data_prep.R” includes basic steps for data cleaning and standardizing that
should be carried out on the census microdata. The exact code used to process the data for
Guyana, Trinidad and Tobago, and Peru are contained within the scripts “GUY_data_prep.R”,
“TT_data_prep.R”, and “PER_data_prep.R”. These scripts are meant to serve as guidelines
only, as data from a new area of study will require subjective scrutiny to determine exactly
what steps must be taken to be cleaned and standardized.
The cleaning steps included in “Data_prep.R” address common issues found in census and
census-style data. While these steps may be appropriate for some datasets, that does not
mean they will be appropriate for all datasets. For example, one very tricky but very common
problem found in large datasets is null values. In R, these values show up as “NA”, and are
excluded from some functions – for this reason, the code uses the function:
Table(Census$variable, useNA = ‘always’)
Instead of:
2
To download shapefiles for administrative division of most countries, visit http://www.diva-gis.org/gdata where you can
select the country and download various type of GIS data.
Table(Census$variable)
Null values represent unanswered questions. Especially in cases where the census includes a
code for the respondent declining to answer, it is worth exploring why a census data
engineer did not include any answer to this question. This can happen because the question
was not asked – in which case, why was the question not asked? Do census-takers consider
certain types of households unqualified to answer this question? Why? The data dictionary
will not explain null values by nature, which means it is up to the analyst to delve deeper into
the situation. Where possible, the best course of action is to follow up with the statistical
institute responsible for producing the data in question.
3
Instituto nacional de estadística e informática de Perú
(https://www.inei.gob.pe/media/MenuRecursivo/publicaciones_digitales/Est/Lib1442/cap13.pdf)
4
DANE 2009
5 General Assembly of Ministers and High Authorities of Housing and Urban Development of Latin America and the Caribbean
6 MINURVI cited by UN HABITAT (2015)
7 MINURVI cited by UN HABITAT (2015)
More specifically, ECLAC8 has defined three aspects to take into consideration to measure
qualitative deficit: 1. Materials of walls, roof, and floor, 2. Overcrowding (number of people
per room at home), and 3. Access to utilities (potable water, sewerage, and electricity)9.
Deficit indicators
In the absence of a nationally prescribed housing deficit estimation, a methodology can be
derived using examples from other countries and guidance from the studies of relevant
international organizations. Total housing deficit is constructed based on total dwelling needs
in terms of quality and quantity. To calculate this, it is important to determine the number of
households living in suboptimal conditions in terms of building quality and overcrowding.
Based on questions commonly found in household Census data, a series of indicators were
chosen to best represent housing deficit based on the information available, with certain
categories considered ‘inadequate’, i.e., indicating a household in housing deficit.
The categories considered adequate or inadequate will vary in different climatological and
cultural contexts. For example, indigenous communities may use traditional building materials
for primarily cultural reasons; a colder climate will necessitate more insular building materials
than a warmer one; and cultural considerations of ‘crowding’ vs. ‘overcrowding’ may change
over decades and across regions. It is therefore imperative that this methodology be reviewed
carefully and modified according to specific contexts and available information before any
replication.
Housing deficit for both Guyana and Trinidad and Tobago was calculated using the following
indicators, which provide valuable insight into households’ shelter needs. A table detailing
variables, categories, and values assigned to create the housing deficit indices can be found in
the Annex.
Wall material quality: The Census questionnaire includes the variable ‘Main wall material’
wherein materials like “Makeshift”, “Galvanized”, “Troolie palm” are considered inadequate.
Roof material quality: The Census variable ‘Main roofing material’ indicates insufficient quality
by materials like “Makeshift”, “Sheet metal”, “Troolie palm”, and others.
Cohabitation: The Census also identifies cases of cohabitation, defined as two or more
households sharing the same dwelling.
Overcrowding: This indicator is determined by the number of habitants per room. Where it is
possible to differentiate between urban and rural households, it may be pertinent to apply
Acute overcrowding: More than 5 people per bedroom is considered acute overcrowding for
all parts of the country.
Availability of utilities: The Census contains information regarding households’ main source of
lighting (where not having access to electricity is considered deficit), water (deficit will be
defined by absence of piped water and access to water from spring/river/pond), sewerage
(deficit is defined by not having WC connected to a sewerage), and garbage disposal (dumping
on land, burning, dumping/throwing into river/sea/pond is considered to be deficit).10
Estimation process.
Housing deficit is treated as a binary variable for this estimation – a household is either in
deficit (indicators aggregate to a number greater than zero) or it is not (indicators aggregate
to zero). Once the deficit situation for each household has been defined, the scores of all
indicators can be aggregated as follows in order to determine the size of quantitative,
qualitative, and total housing deficits.
10
In particular, since there is no way to identify rural/urban areas from census data, it was important to
incorporate potential indication of deficits for water access using an aggregation of urban indicator (lack of
access to piped water) and rural indicator (main access to water from spring/river/pond).
c. electricity,
d. garbage disposal.
The estimation of Total Housing Deficit is calculated by summing up the quantitative and
qualitative deficit. The share of population in housing deficit is equal to the number of
households in deficit divided by the total number of households in the country. The indicators
will then be estimated by a relatively granular geographical location (often second
administrative division) and shown in a map.
Figure 4: 2nd level administrative division maps of housing deficit drivers, overall deficit in Peru, 2017
Top left: sewerage access by province; Top right: electricity access by province; Bottom: total housing deficit
Throughout the region, qualitative deficit is widely considered to affect more households than
quantitative deficit11. This was found to be the case in Guyana, Trinidad and Tobago, and Peru
for the most recent census year available (2012, 2011, and 2017, respectively). In Trinidad and
Tobago, overcrowding was also a major driver. This suggests that electric light is tied closely to
overall housing deficit in these three diverse countries, an important assumption that precedes
the nocturnal luminosity regression in section V.
11
For details, see the IDB’s and the World Bank’s “The Quality of Life in Latin American Cities”:
http://documents.worldbank.org/curated/en/325571468045577440/pdf/544310PUB0EPI01OX0349415B01Pu
blic10.pdf
III. INDICATORS’ ESTIMATION BASED ON DECISION
RULES
To estimate the indicators for housing deficits, a set of rules is defined. In particular, the
following rules will help to identify the deficit condition of a household through some
conditions related to the variables described above. The following are the rules designed to
measure housing deficit.12
The rules for cohabitation and acute overcrowding are used to measure the size of quantitative
deficit. In principle, a household will be in quantitative housing deficit if at least one of the
following rules is satisfied:
Meaning that, in the dataset, a household that shares its dwelling with other(s) household(s)
will has a 1 and 0 otherwise.
Thus, taking the ratio between the household size and the number of bedrooms, a household
which ration goes above 5 people per room will be assigned with 1 and 0 otherwise.
The size of the quantitative housing deficit in region j will be determined by the fraction of
households that comply with any of the mentioned rules relative to the total number of
households in region j:
Where the numerator is defined by the sum of all of households i in region j that are in
cohabitation or in acute overcrowding.13 The denominator is the total number of households
in region j.
12 Table A1 in the annex presents all the variables used in the estimations, the corresponding categories, and values assigned
in terms of deficit.
13 In the case a household presents cohabitation AND acute overcrowding, the indicator will take a value of one for
qualitative deficit.
As explained in the previous section, qualitative deficit is measure using quality of walls and
roof materials, and access to public utilities. The rules to measure the size of qualitative
housing deficit are the following.
In terms of dwellings’ structure, measures of the quality of construction materials will give an
idea of dwellings’ durability. In particular, Guyana’s census incorporates questions regarding
wall and roofing materials, however no information regarding floors is captured.14 The rules
are defined so that those households living in dwellings with non-durable materials are
considered to be in housing deficit and therefore are valued as a 1 in the data, and 0 otherwise.
With respect of the available space of a dwelling to develop social and biological activities, a
less stringent measure of acute overcrowding is used and is defined by the ratio between the
household size and the number of rooms.15 If the ratio goes above 3 people per room, the
household is consider to be in a situation of overcrowding and therefore it is assigned a 1 in
the data, or 0 otherwise.
With respect to access to basic services, the availability of electricity, piped potable water,
sewerage, and/or garbage collection help to determine if a dwelling has sufficiently habitable
conditions.
1 𝑖𝑓 𝑛𝑜 𝑎𝑐𝑐𝑒𝑠𝑠 𝑡𝑜 𝑒𝑙𝑒𝑐𝑡𝑟𝑖𝑐𝑖𝑡𝑦
𝐿𝑖𝑔ℎ𝑡𝑖𝑛𝑔 = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The most common way to identify the availability of electricity is through information related
to main sources of lighting in the dwelling, collected by the census. A household is considered
in deficit if it lacks access to electricity by private or public means, so it is assigned a 1 for
lighting; those that have access to electricity are assigned a 0.16
14 Internationally speaking, some methodologies incorporate the quality of outer walls materials as part of the indicators or
rules for quantitative deficit calculations. If a weak dwellings’ structure is considered to be important for quantifying the
replacement need of units, the quality of walls’ material should be considered as part of quantitative deficit. In the case of
Guyana, due to the absence of information on floors’ material, the quality of walls and floors were used as measures of housing
material quality as recommended by CEDLAC (UN-HABITAT 2015)
15 Note that this rule takes into consideration number of rooms not including bedrooms. In particular, Guyana’s census gathers
information on number of rooms other than bedrooms, and number of bedrooms separately. If it is not the case, the use of
number of rooms for both acute and not acute overcrowding could be acceptable.
16 Currently, Guyana’s census incorporates information related to solar or inverter access to lighting which was considered as
Access to quality source of water is considered in deficit when dwelling has no access to piped
water or when its primary water source is identified as spring, river, or similar; sources that
would not warrant sufficient water quality for human consumption (DANE 2009).
Adequate access to waste water management is considered in deficit when the type of toilet
facility does not have a proper connection to sewerage (either sewerage system or septic tank.
In those cases where toilet facilities are not connected to sewerage, the household is assigned
a 1, and 0 otherwise.
1 𝑖𝑓 𝑛𝑜 𝑔𝑎𝑟𝑏𝑎𝑔𝑒 𝑐𝑜𝑙𝑙𝑒𝑐𝑡𝑖𝑜𝑛
𝐺𝑎𝑟𝑏𝑎𝑔𝑒 = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Finally, access to proper collection of waste or garbage could be associated with optimal
habitational environmental conditions.17 Deficit in garbage collection could be associated with
a lack of proper housing environment, affecting suitable living conditions. Households
reporting no garbage collection (including burning, burying, dumping in rivers, etc) will be
assign 1 in the dataset and 0 otherwise.
Following these rules and households’ compliance, the share of them under qualitative deficit,
relative to the total number of households at the regional level is defined as:
∑𝑖 (𝑊𝑖𝑗 𝑤ℎ𝑒𝑟𝑒=1 𝑂𝑅 𝑅𝑖𝑗 𝑤ℎ𝑒𝑟𝑒=1 𝑂𝑅 𝑂𝑖𝑗 𝑤ℎ𝑒𝑟𝑒=1 𝑂𝑅 𝐿𝑖𝑗 𝑤ℎ𝑒𝑟𝑒=1 𝑂𝑅 𝑈𝑖𝑗 𝑤ℎ𝑒𝑟𝑒=1 )
𝑄𝑢𝑎𝑙𝑖𝐻𝐷𝑗 =
𝑡𝑜𝑡𝑎𝑙 ℎ𝑜𝑢𝑠𝑒ℎ𝑜𝑙𝑑𝑠𝑗
Where the numerator is defined by the sum of all households i in region j that indicate they
have walls OR roofs made of non-durable materials OR are living in overcrowding, OR lack
access to electricity, OR access to utilities (for the sake of equation space, 𝑈𝑖𝑗 𝑤ℎ𝑒𝑟𝑒=1
incorporates adequate access to water, OR sanitation, OR garbage collection). A household
assigned 1 for any of these indicators, will be considered in qualitative housing deficit. The
denominator is the total number of households in region j.18
17 This particular information is collected by the Guyana’s census as one of the questions in the facilities available for use.
Other potential variables related to conditions outside dwelling could be distances to work, nearest school, hospital or health
center, and availability of public streetlighting.
18 Note that, in this case, neither qualitative nor quantitative is divided by a regional characteristic like urban or rural, costal
or interior, given that there is no specific variable available in Guyana’s census to discriminate on these criteria. In the case
there is a way to determine regional characteristics like cultural division, or rural or costal classifications, and the rules’ set is
available, what is called here “region j” can be adjusted to the selected identification.
These rules are put into code form in the “indicators.R” script. This same script includes code
that can be used to plot the results directly within R by merging it with the administrative
division shapefiles. If the user prefers the graphics of QGIS, the merged file, now called a
SpatialPolygonsDataFrame (“GUY_map” for Guyana, “PER_map” for Peru, or “TT_map” for
Trinidad and Tobago) can be saved as a shapefile using the writeORG function.
IV. USING PUBLIC GIS AND SATELLITE INFORMATION
TO ESTIMATE HOUSING DEFICIT
Access to proper public amenities like public lighting, adequate roads, schools, and hospitals,
provide additional insight into the housing deficit. Apart from a household’s own access to
electricity, piped water, and adequate sewerage systems, a household’s neighborhood also
comprises an important part of its conditions. Lack of public lighting could indicate risks related
to crime, lack of schools would imply less access to education, and poorly paved roads would
cause an increase in travel times from home to any destination, all of which reduce the quality
of living for residents. These neighborhood indicators can provide a map across which to
project data from contexts where conditions are known, to contexts where conditions are yet
unknown. This section explores using public lighting based on satellite imagery to predict
housing conditions for years when census data is unavailable.
Recently, satellite imagery has been used as a rich source of information to study social issues
related to poverty and inequality. A strong relationship has been found between night lights
and the economic dynamic measured by GDP (NOAA, 2010) and poverty (Engstrom, et.al,
2017); satellite information has been found useful in the detection of slums (Divyani, et.al,
2016) and prediction of housing prices (Bency, et.al, 2017).
To estimate an ‘update’ of the housing deficit in 2019 based on the most recent available
census data, this note will demonstrate how the methodology designed by Henderson et.al,
2012, using night lights to estimate economic growth, can be successfully applied to estimate
qualitative housing deficit in Guyana and Peru. The process incorporates the estimations of
qualitative housing deficit based on the most recently available Census data (from section IV)
and satellite data on nighttime lights.
The reasoning behind the use of night light data is that the level of luminosity detected by
sensors could be associated with the level of urban development (Muzzini, et.al, 2016) in each
region. Qualitative housing deficit, found to be the primary driver of total housing deficit, has
been measured with rules derived from the availability of public services or utilities, variables
that correlate with the presence or absence of public infrastructure like street lighting.
Exploiting this relationship between dwellings’ access to services and luminosity would help to
understand the behavior of qualitative housing deficit in areas or periods of time where and
when no statistical microdata is available. In this study, since the most recent census in Guyana
was implemented in 2012, the idea is to use nightlight data to estimate the size and distribution
of qualitative deficit in 2019 using data from 2012.19
19 The nighttime lights data is not the only data that could be usedfor such a prediction. Depending on availability, it is possible
to also include data on availability of schools and hospitals (from OpenStreetMap) to determine access to services that
correlate with health and education levels. In this case, reliable OpenStreetMap data was not available from 2012 in Guyana.
Checking if the methodology will be appropriate
While some countries’ housing deficit follows the patterns laid out above, other countries’
deficit may be driven by things like cohabitation of multiple families in a single cramped – but
well-lit – dwelling. Some areas may be too densely populated and well-lit for the model to
differentiate between a dwelling providing electric lighting and the unlit dwelling next to it.
Similarly, more densely populated areas also facilitate things like households illegally tapping
electric grids that are meant to feed only their paying neighbors. This means that the model
may not be appropriate for use in areas with higher population density and urbanization
rates or that its accuracy may suffer under such circumstances (this phenomenon is well
documented in nocturnal luminosity methodologies, including the World Bank’s much-cited
Poverty from Space).
Association Analysis
An association analysis is applied to indicators causing deficit to highlight patterns among
sub-optimal conditions. This rules-based machine learning technique reveals relationships
between variables, i.e., how likely it is that a household exhibiting one characteristic will also
exhibit another. The main metric to look at in an association analysis is called Lift. Lift shows
the relationship between the left-hand side (LHS) and the right-hand side (RHS), i.e., how
likely it is that the right-hand side will occur in a case where the left-hand side is already
occurring.
- Lift > 1 indicates that LHS and RHS are dependent
- Lift = 1 indicates that LHS and RHS are independent
- Lift < 1 indicates that lhs and rhs replace each other
Also, information from satellite imagery that contains data on the reflectance of roads could provide information about road
characteristics like pavement status such that the density of unpaved/paved roads at the region level would provide insight
on the level of public infrastructure and how this correlates with qualitative deficit. Again, in the case of Guyana adequate
information was not identified for this study, but could be incorporated in future iterations, or in other contexts.
This consistent association supports the idea that nocturnal luminosity (which is
overwhelmingly driven by access to electric light) will be a strong indicator of qualitative and
overall housing deficit.
Coefficient of Variance
In order to determine the statistical accuracy of the indicators, a coefficient of variation (CV) is
calculated. The CV will capture the indicators’ dispersion so that the researcher can define how
informative the indicator is in statistical terms. The CV is defined as the standard deviation of
the indicator divided by indicator’s mean multiplied by 100. The lower the CV, the more precise
the estimation is. DANE, in Colombia, has defined a scale to determine how informative or
precise a housing indicator is:
The following table presents the value of coefficient of variation for housing deficit drivers in
the three test countries. Noticeably, the CV for access to electric light is higher (less precise) in
denser Trinidad and Tobago than it is for Guyana and Peru, where populations are more spread
out.
Less variation in access to electric light will negatively affect the confidence of the regression
results.
This section will explain the procedure to obtain, extract, and process satellite imagery
information to produce the statistical data used in the estimation of housing deficit. At the end
of this section the user should be able to obtain information in numerical format to be used in
statistical packages in combination with other information, like census microdata, to do
estimations.
20 https://en.wikipedia.org/wiki/Visible_Infrared_Imaging_Radiometer_Suite
21 For more information, please visit: https://ngdc.noaa.gov/eog/viirs/download_dnb_composites.html
The tiles selected should correspond to the part of the world where the area of study is found.
Trinidad and Tobago is contained within Tile 1; Guyana is captured in both Tile 1 and Tile 2;
and Peru is contained within Tile 4. From the menu located in the lower-left corner, select the
folder from the base year (year of the last census)22 by clicking on it. Select “Monthly”, and
then the earliest available month from the base year. For the tile(s) of interest, select the files
named “VCMCFG” containing zipped folders with the tiff images, as shown in figure 2. The
same process is followed for the most recent month available.
Each zipped file is greater than 0.5GB; when unzipped, each folder’s contents will occupy over
2GB of disc space, so it is important to have in mind the amount of free hard disc space needed
for processing.
The following sections (‘Opening data in QGIS’, ‘Merging geotiffs’, ‘Clipping the merged
geotiff’, and ‘Converting the raster data to numerical data’), should be carried out first on the
file(s) from the base year (year of the last census), and then on the file(s) from the most recent
year. The following steps will show how to convert raster data into information compatible
with other datasets like census microdata: the result for all of Section IV should be two .csv
files – one for base year luminosity data and one for the most recent year’s luminosity data.
22 The most recent census taken in Trinidad and Tobago was in 2011; for this exercise, 2012 data was used.
Opening data in QGIS
The first step is to open the data in QGIS. Each downloaded folder should contain 2 .tif files –
select the file ending in ‘…avg_rade9h.tif’ for each tile since this is the file corrected by
radiation. Select the files to be open in the main window as shown in figure 3 by selecting the
“Layer -> add new -> Add raster layer” from the menu, or press the icon from the left side
of the main window, or drag the file from the list of files on the left panel of the window into
the primary panel to the right.23 Second, we load the shapefile data to verify that the full the
area of study is contained within the raster data we have loaded. To do this, open the shapefile
for the level 0 of administrative division selecting “Layer -> add new -> Add vector layer” from
the menu, or simply press icon.
Figure 4 presents how to open the Guyana’s level 0 of administrative division shapefile (adm0,
country boundaries).
23 This tutorial is based on the QGIS 2.18.14, the most stable version of the software for Mac-OS users.
Figure 4: Opening vector files (shapefiles) in QGIS
If there are errors, check the box at the bottom of the prompt window. This box will show the
python code that is used to run this function within QGIS. The code used in this case is shown
in the box below:
gdal_merge.py -of GTiff -o /Users/…/Downloads/merged.tif /Users/…/Downloads/ SVDNB_npp_20121101-
20121130_75N180W_vcmcfg_v10_c201601270845\SVDNB_npp_20121101-
20121130_75N060W_vcmcfg_v10_c201601270845.avg_rade9h.tif /Users/…/Downloads/transaccion-7.pdf
To “cut” or select only the information inside the area of study, select “Raster -> Extractions -
> Clipper…” from menu. In the prompted window, select the merged file in the “Input file
(raster)” field, define an “Output file” that will be the file containing the information for only
Guyana. In the “Clipping mode” section, select “Mask layer” and select the area of study’s full
administrative shapefile as “Mask layer” – for an entire country, this will be the Adm0 shapefile,
or the full territory of the country with no subdivisions. Finally, check mark “Crop the extent of
the target dataset to the extent of the cutline” and click ok. The process is shown in figure 6.
Figure 6: Extracting nighttime light information from raster file and shape file
If there are errors, check the box at the bottom of the prompt window. This box will show the
python code that is used to run this function within QGIS. The code used in this case is shown
in the box below:
gdalwarp -q -cutline "/Users/…/Data/GIS/Shape/GUY_adm/GUY_adm2.shp" -crop_to_cutline -tr 0.0041666667
0.0041666667 -of GTiff "/Users/…/Test.tif" "/Users/…/Test_GY.tif"
Georgetown area
Figure 7: Converting nighttime light information from raster file to shape file 24
24 To obtain the shaded version of the map, select the shapefile with double click, “Properties” window will appear. In the left
window, select “Style” and in the first dropdown menu select “Graduated”. In the dropdown menu select the variable you
want to show and in the “Color ramp” chose one of your interest. Select the number of classes you want to plot from the
selector at the right-hand side under the “Classes” window. Click on “Classify” under the “Mode” dropdown menu. Finally,
click “OK at the bottom left corner. You should obtain a graduated version of the map. This process will be explained in the
following section.
Spatial join
The next step is to combine the new nighttime light shapefile with the administrative divisions
to be used in the analysis (that correspond with the divisions in the census data), often second-
level administrative divisions (Adm2). To do this, first, open the shapefile. To combine datasets,
select “Vector -> Data Management Tools -> Join attributes by location” from the menu25. In
the prompted window, select the administrative division as “Target vector layer”, and select
the nighttime light shapefile as “Join vector layer”, checkmark all the “Geographic predicate”
options, in the “Attribute summary” chose “Takes summary of intersecting features”, define
the output “Joined layer” and click “Run” as shown in figure 8.
Georgetown area
To check that this step was completed successfully, right click on the joined layer, select “open
attribute table” and verify that new columns have been added as seen in the figure below.
25
For QGIS versions >3.0, follow this tutorial: https://www.qgistutorials.com/en/docs/3/performing_spatial_joins.html
Figure 9: Example of shapefile’s “attribute table”
Tabularization
The final step is to export the information stored into the shapefile to a tabular format. This
can be done using a plugin called “MMQGIS”. To install the MMQGIS plugin:
1. Go to menu “Plugins” > “Manage and Install Plugins…”, and a window will appear
2. Using the search field in the window, type mmqgis and press install as shown in the
following figures:
Figure 10. Installing MMQGIS plugin
Once MMQGIS is installed, it will appear as one of the options on the toolbar. Select “MMQGIS
-> Import/Export -> Attributes Export to CSV file” in the menu. In the prompted window, select
the desired attributes, define the name and path and click “OK”. Figure 9 presents the process
and an excerpt of the csv file opened in excel that contains the administrative division ID (in
this case the second-level administrative division of Guyana, NDC) and the summary statistics
of the nighttime light data. This CSV file can now be loaded directly into the “Prediction.R”
script for processing26.
26
The “Prediction.R” script imports these files as "night_base_year.csv" and "night_base_year.csv" in lines 20-
24 of the code.
Figure 10: Example of final shapefile and exported attribute table
As explained above, once all luminosity data processing steps have been completed for the
base year, the same steps should be repeated for the most recent year of available satellite
data, producing one CSV file for each.
Functional code
Throughout this process, the prompt windows will include a box at the bottom with the python
code used to run the various functions within QGIS. For the most part, users do not need to
work with this box, however, if there are errors, the code contained therein can provide insight
into what went wrong. The code used in this section is shown below:
Merging Geotiffs
gdal_merge.py -of GTiff -o /Users/…/ SVDNB_npp_20121101-
20121130_75N180W_vcmcfg_v10_c201601270845\SVDNB_npp_20121101-
20121130_75N060W_vcmcfg_v10_c201601270845.avg_rade9h.tif /Users/…/ SVDNB_npp_20121101-
20121130_75N180W_vcmcfg_v10_c201601270845\SVDNB_npp_20121101-
20121130_75N180W_vcmcfg_v10_c201601270845.avg_rade9h.tif
Polygonizing
gdal_polygonize.py "/Users/…/Test_GY.tif" -f "ESRI Shapefile" "/Users/…/Test_GY_shape_light.shp"
Test_GY_shape_light Light
V. REGRESSION TO PREDICT HOUSING DEFICIT FROM
NOCTURNAL LUMINOSITY
Three sets of data are needed to run this luminosity regression: qualitative deficit estimations
for the base year, luminosity data from the base year, and luminosity data from the target year
(the year of most recently available satellite imagery).
Using the baseline year figures for qualitative deficit estimated in Section III, the regression can
be run in two steps.
First, the level and significance of the relationship between the base year qualitative deficit
figures and the base year nighttime lights data must be estimated. This estimation will validate
(through a test of statistical significance) that the reasoning explained at the beginning of
Section IV is applicable to the area of study. The econometric model used is the following:
Where 𝑄𝑢𝑎𝑙𝐷𝑏,𝑖 is the variable that contains the qualitative deficit for the base year b in region
i. 𝛽1 is the intercept and the parameter of interest is 𝛽2, which accompanies 𝐿𝑏,𝑖 , or the mean
value of observed nighttime light for the base year b in region i. Finally, 𝜀 is an error term. The
estimation process of this model produces the estimated values for the parameters and the
error term, namely 𝛽̂ ̂
1,𝑏 , 𝛽2,𝑏 , and 𝜀̂𝑏 , respectively. In particular, and consistently with the
assumption, it is expected that 𝛽̂ 2,𝑏 will be negative and statistically significant since an
increase in luminosity should be correlated with a decrease in qualitative housing deficit.
Second, the levels of qualitative deficit are predicted for the target year. At this point, the only
way to obtain figures for qualitative deficit in the target year is through predicting those values
using the nighttime light information in the target year with the parameters obtain in equation
1. The process is summarized by the following equation:
𝑃𝑄𝐷𝑡,𝑖 = 𝛽̂ ̂
1,𝑏 + 𝛽2,𝑏 𝐿𝑡,𝑖 + 𝜀̂𝑏 (2)
Where 𝑃𝑄𝐷𝑡,𝑖 is the predicted qualitative deficit value in the target year t for region i, obtained
using the estimated parameters 𝛽̂ ̂
1,𝑏 , 𝛽2,𝑏 , and 𝜀̂𝑏 in addition to the mean value of nighttime
light in region i for target year t, 𝐿𝑡,𝑖 . The result of the exercise is summarized in the following
tables.
Tables 4 and 5 present the results of the estimation of equation 1 for Peru. In both equations,
the parameter of interest is “light_mean” as a predictor of deficit. As expected, the parameter is
negative and statistically significant which supports the logic explained in Section IV, that the
more nighttime lights will indicate lower levels of qualitative and total housing deficit.
The results of equations 1 and 2 can be found in Annexes 2 and 3, respectively. However, it is
important to note that the results of equation 1 for Trinidad and Tobago were only significant
with 75% confidence for qualitative deficit and 90% confidence for total deficit. This was likely
to happen as discussed in Section IV, where the CV for access to electric light indicated low
precision for the island nation. Effectively, the country is too densely populated and too well-
lit for this methodology to be applied with good precision. This can be seen in a side-by-side
comparison of the night lights imagery for 2019 in each country.
In all three countries, both qualitative and total housing deficits were estimated, on the basis
of the fact that qualitative deficit was the overwhelming driver of overall deficit.
VI. RESULTS AND NEXT STEPS
The results of the predictions can be mapped using the tmap-based code in the
“Predictions.R” script, by merging it with the administrative division shapefiles. If the user
prefers the graphics of QGIS, the merged file, now called a SpatialPolygonsDataFrame
(“GUY_map” for Guyana, “PER_map” for Peru, or “TT_map” for Trinidad and Tobago) can be
saved as a shapefile using the writeORG function.
Table 5 shows the 2012 total housing deficit estimates obtained from Guyana’s 2012 census
data for the capital area, compared to the 2019 ‘now-cast’ total housing deficit obtained in
Section V for the same area.
Metrics on qualitative housing deficit by administrative division are shown in the table below
for all three studied countries:
This methodology should be applied to as many countries as allowed by available data, and,
ideally, tested on study areas for which the target year also has census data available for use
in validation27.
27
This was not possible with the counties studied due to the fact that the source of satellite data, NOAA, changed the
methodology the used to obtain and process satellite imagery in 2012, therefore data from prior years would not be
comparable and this methodology would not be useable. Using this satellite imagery, both the base and test year must be
approximately ≥ 2012.
VII. Bibliography
A. J. Bency, S. Rallapalli, R. K. Ganti, M. Srivatsa and B. S. Manjunath, "Beyond Spatial Auto-
Regressive Models: Predicting Housing Prices with Satellite Imagery," 2017 IEEE Winter
Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, 2017, pp. 320-329.
https://vision.ece.ucsb.edu/sites/default/files/publications/bency_wacv_17.pdf
Divyani Kohli, Richard Sliuzas & Alfred Stein (2016) Urban slum detection using texture and
spatial metrics derived from satellite imagery, Journal of Spatial Science, 61:2, 405-426, DOI:
10.1080/14498596.2016.1138247
Engstrom, Ryan; Hersh, Jonathan Samuel; Newhouse, David Locke. 2017. Poverty from space:
using high-resolution satellite imagery for estimating economic well-being (English). Policy
Research working paper; no. WPS 8284; Paper is funded by the Strategic Research Program
(SRP). Washington, D.C. : World Bank Group.
Henderson, J. Vernon, Adam Storeygard, and David N. Weil. 2012. "Measuring Economic
Growth from Outer Space." American Economic Review, 102 (2): 994-1028.
“Muzzini, Elisa; Eraso Puig, Beatriz; Anapolsky, Sebastian; Lonnberg, Tara; Mora, Viviana. 2016.
Leveraging the Potential of Argentine Cities : A Framework for Policy Action. Directions in
Development--Countries and Regions;. Washington, DC: World Bank. © World Bank.
NOAA, Ghosh, T., Powell, R., Elvidge, C. D., Baugh, K. E., Sutton, P. C., & Anderson, S. (2010)
Roof material H1.3 What is the main 1 Sheet metal (zinc, aluminium, galvanize) 0
material used for roofing? 2 Shingle (asphalt) 0
3 Shingle (wood) 0
4 Shingle (other) 0
5 Tile 0
6 Concrete 0
7 Thatched/Troolie Palm 1
8 Makeshift 1
9 Other (specify) 1
Acute H4.8 How many bedrooms are less than 5 people per room 0
overcrowding there in this dwelling unit? 5 or more people per room 1
Acces to services H4.3 What is the main source 1 Private, piped into dwelling 0
(piped water) of water supply for this
household?
2 Private catchments/rain water 0
3 Private, piped into yard/plot 0
4 Public, piped into dwelling 0
5 Public, piped into yard/plot 0
6 Public standpipe or hand pump 0
7 Public well 0
8 Spring/river/pond 1
9 Truck borne 0
10 Dug well/borehole 0
11 Other (specify) 1
Sewerage H4.5 What type of toilet 1 W.C. (Flush toilet) linked to sewer 0
facility does this household 2 W.C. (Flushtoilet)linkedtoseptictank/soak-away 0
have? 3 Ventilated Pit Latrine (VIP) 1
4 Trad. Pit Latrine with slab 1
5 Trad. Pit Latrine w/out slab 1
6 None 1
7 Other (specify) 1
28
Predictions for Trinidad and Tobago are only significant with 75% confidence for qualitative deficit and 90%
confidence for total deficit.