Fang, 2008
Fang, 2008
Fang, 2008
com
Received 29 May 2007; received in revised form 24 December 2007; accepted 11 January 2008
Abstract
This paper develops a new method to solve multivariate discrete–continuous problems and applies the model to mea-
sure the influence of residential density on households’ vehicle fuel efficiency and usage choices. Traditional discrete–con-
tinuous modelling of vehicle holding choice and vehicle usage becomes unwieldy with large numbers of vehicles and vehicle
categories. I propose a more flexible method of modelling vehicle holdings in terms of number of vehicles in each category,
using a Bayesian multivariate ordinal response system. I also combine the multivariate ordered equations with Tobit equa-
tions to jointly estimate vehicle type/usage demand in a reduced form, offering a simpler alternative to the traditional dis-
crete/continuous analysis. Using the 2001 National Household Travel Survey data, I find that increasing residential density
reduces households’ truck holdings and utilization in a statistically significant but economically insignificant way. The
results are broadly consistent with those from a model derived from random utility maximization. The method developed
above can be applied to other discrete–continuous problems.
Ó 2008 Elsevier Ltd. All rights reserved.
Keywords: Multivariate ordered probit; Multivariate Tobit; Discrete/continuous; Residential density; Vehicle choice; Fuel economy
1. Introduction
Policies aimed at reducing gasoline consumption of automobiles can target either vehicle usage, measured
by total miles driven, or vehicle fuel efficiency. Empirical studies have found that elements of urban spatial
structure, particularly higher residential density, are associated with lower private vehicle utilization (Cervero
and Kockelman, 1997; Dunphy and Fisher, 1996; Golob and Brownstone, 2005, etc.). However, whether or
not density induces a compositional shift of households’ automobile holdings towards more fuel efficient vehi-
cles has not been widely studied. This paper develops a Bayesian Multivariate Ordered Probit & Tobit
(BMOPT) model to estimate a joint system of vehicle fuel efficiency choice and vehicle utilization in response
to varying residential density.
0191-2615/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.trb.2008.01.004
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 737
The motivation for a possible causal relationship between density and vehicle type choice comes from the
observation that high density areas tend to have smaller parking spaces, narrower streets and more severe traf-
fic. These conditions all work in favor of choosing smaller, easier-to-maneuver and more fuel efficient vehicles.
Since a driver’s cost of manoeuvering in the streets and searching for a parking space in dense areas is increas-
ing in vehicle size, an incentive to choose smaller vehicles exists. Therefore, we would expect urban sprawl (see
Brueckner (2001) for a comprehensive discussion), which leads to low densities, to deter consumers from
choosing fuel efficient vehicles. Indeed, results from Golob and Brownstone (2005) reveal that annual fuel con-
sumption per vehicle declines a bit more sharply with increasing housing density than does annual miles dri-
ven, generating a positive relationship between fuel efficiency and density. The relationship is further explored
in this paper.
The BMOPT model combines a multivariate ordered probit model describing vehicle type choice with a
multivariate Tobit model describing vehicle usage, both at a disaggregate level. All the equations are linked
by an unrestricted covariance matrix. Several features of the framework merit discussion.
Vehicle choice decisions are modelled as ordered for each vehicle type that I specify (for example, the
ordered choice for trucks would be zero truck, one truck, or two or more trucks). The traditional method
of studying vehicle holdings decision utilizes a nested logit framework with the following decision tree: total
number of vehicles a household owns in the upper level, and possible combinations of vehicle types, given the
total number, in the lower level (Train, 1986; Berkovec and Rust, 1985; Mannering and Winston, 1985; Gold-
berg, 1998; Feng et al., 2005; West, 2004). When more vehicles are owned by the households, or vehicles are
classified into finer categories, the possible vehicle combinations proliferate. For example, if vehicles are clas-
sified into five categories there will be 15 vehicle combinations in a two-vehicle household, 35 combinations in
a three-vehicle household, 70 combinations in a four-vehicle household and so on. This proliferation in vehicle
compositions as vehicle number increases makes nested logit models hard to implement for households with
more than two vehicles. As a result, most of these studies only concentrate on one-vehicle and two-vehicle
households. This restriction results in a loss of useful data.
Since vehicle holdings composition is fully represented by number of each vehicle type, I propose to use
‘number of each type of vehicle’, instead of ‘total number’ of all vehicles, to model households’ vehicle hold-
ings decision. Because the number of each vehicle type is ordered and the choices of each vehicle type within
one household are interrelated, I utilize a multivariate ordered probit model with a correlated covariance
matrix. This structure makes vehicle type choice models less restrictive to number of vehicle holdings within
a household. In the 2001 National Household Travel Survey (NHTS) California data set, one-vehicle and two-
vehicle households comprise 67.5% of the total households surveyed; households with no vehicles comprise
5%. To be able to add the remaining 27.5% of households, which own 48% of the total vehicles in the sample,
will make the estimation results dependent upon the whole vehicle stock and allow policy analysis for the
entire population.
I adopt a reduced-form approach for joint discrete–continuous estimation. Two types of discrete–continu-
ous models, deriving from random utility maximization, are currently implemented in the literature. The first
one follows the methodology developed in Dubin and McFadden (1984) and Hanemann (1984), where a con-
ditional indirect utility function, giving the maximum utility achievable, provides the basis for deriving the
continuous demand and the discrete choice. Because the indirect utility function derived from Roy’s identity
is often in a non-linear form, the procedure involves an approximation in estimation: researchers either make
additional assumptions to produce a linear equation (Dubin and McFadden, 1984) or use a linear approxima-
tion (Train, 1986; Goldberg, 1998; West, 2004).
The second type of utility-based econometric model, a multiple discrete–continuous extreme value
(MDCEV) model, is proposed by Bhat (2005) and extended in Bhat (2008). In the model’s application to vehi-
cle choice and usage (Bhat and Sen, 2006a,b), ‘miles driven’ by each type of vehicle is a choice variable in the
upper level and make/model choices are captured by a multinomial logit component in the lower level. By
embedding vehicle usage into type choice, modelling discrete–continuous choices becomes much simpler.
The model offers an elegant and practical method for handling multiple choices of a large number of discrete
consumption alternatives. Two drawbacks of the model are as follows. First, subject to the utility function
specification, total vehicle utilizations in terms of miles travelled (including non-motorized modes of transpor-
tation) are fixed for each household. Considering that the walking/biking miles are negligible compared to the
738 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
vehicle miles in California, fixing the total travel miles is de facto fixing the total vehicle utilization. While the
model might capture substitution effects between different types of vehicle utilization, this restriction rules out
the potential vehicle utilization reduction which we would expect to occur, or at least test, in response to par-
ticular policies. Second, vehicles must be finely classified such that households own no more than one vehicle
of each type. Multiple choices of one particular vehicle type, say two subcompacts, are not allowed. Such fine
classification requires additional data to describe vehicle characteristics and is often not necessary for some
research purposes.
The model I propose plays a complementary role to the models derived from utility maximization men-
tioned above. A reduced-form analysis escapes the complexity of behavioral models and its functional and
parametric assumptions, to embrace another set of functional form assumptions for the estimation equations.
Though less attached to economic theory, it has advantages in terms of estimation and data fitting. A similar
reduced-form exercise is carried out in Srinivasan and Bhat (2006), analyzing daily time-investment activity
decisions of couples. This paper differs from the aforementioned study in three main aspects, in addition to
the different application. First, correlations across the continuous equations are accommodated in this paper.
This allows me to model automobile usages that are interrelated within a household, and gain efficiency in the
model estimation. Second, the discrete choices in this paper are ordered, instead of binary, adding an addi-
tional layer of complexity to the computation. Third, a Bayesian method is used for the model estimation.
The method is free from direct evaluation of multiple integrals and the use of asymptotic approximations.
It produces exact finite sample inferences.
The BMOPT model is estimated using data augmentation and Bayesian Markov Chain Monte Carlo meth-
ods. To compare the results from the proposed reduced-form model to those from a utility-based model,
Bhat’s MDCEV model is also applied using maximum likelihood estimation. The results from the two models
are broadly consistent in terms of truck usage. Higher density is found to discourage additional truck choice
and also to discourage truck usage.
Since the research goal here is to find out households’ preference for vehicle types in terms of fuel efficiency,
I classify vehicles into two types – cars and trucks, in the BMOPT model. Car is defined as automobile, car, or
station wagon; truck refers to van, sports utility vehicle, or pickup truck. Due to Corporate Average Fuel
Economy (CAFE) standards enacted in 1975, passenger cars and light trucks are regulated under two different
fuel economy requirements. For example, model year 1996–2004 cars must meet the 27.5 miles/gallon CAFE
standard and light trucks must meet the 20.7 miles/gallon standard. Under the standards, the actual average
fuel economy for model year 2000 passenger cars is 28.5 mpg and that for light trucks is 21.3 mpg. These large
differences between fuel economy and weight of passenger cars and trucks make me believe that by categoriz-
ing vehicles into cars and trucks, I can capture the fuel efficiency choice a household actually makes. To dem-
onstrate that the model can handle more classifications, and to check whether the choice pattern with respect
to residential density retains for finer classification of vehicles, subclassification of vehicles into small-size cars,
large-size cars, small-size trucks and large-size trucks are also implemented in Section 2.5.
The MDCEV model requires the vehicles to be finely classified, but the estimation results can be grouped
into category of cars and trucks to be comparable to the results from the BMOPT model.
The data analysis is complicated in two aspects. First, vehicle choices within one household are interdepen-
dent. Instead of estimating each choice equation independently and using the univariate ordered probit model,
a bivariate ordered probit model is adopted with an unrestricted correlation. In addition, vehicle usages are
interdependent themselves and with vehicle choices. I add equations for annual miles driven by cars and trucks
to the two bivariate ordered probit equations. Second, the observations on average annual miles driven by cars
and trucks are censored. About 56.7% of the households in the sample do not hold trucks, and 10.3% do not
hold cars, with whom we observe zero miles driven on trucks and cars, respectively. In a censored regression
model where a large proportion of dependent variables are zero, the OLS estimates fail to account for the
qualitative difference between zero observations and continuous observations, and are biased (Greene,
2000). Tobit models (Tobin, 1958; Amemiya, 1984) have been widely used to model censored or truncated
data, and will be adopted in the analysis here.
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 739
Let two latent continuous variables y 1 and y 2 represent the preference levels for holding cars and trucks, let
latent variables y 3 and y 4 represent uncensored average annual miles driven by cars and trucks. Indexing
household by i, i ¼ 1; . . . N , the system for discrete–continuous choice of the vehicles can be written as:
0
y 1i ¼ w0i b11 þ lnðd i Þ b12 þ 1i ð1Þ
0
y 2i ¼ w0i b21 þ lnðd i Þ b22 þ 2i ð2Þ
0
y 3i ¼ w0i b31 þ lnðd i Þ b32 þ 3i ð3Þ
0
y 4i ¼ w0i b41 þ lnðd i Þ b42 þ 4i ð4Þ
where wi is a vector of characteristics for household i; d i is an indicator of residential density. The number of
cars, y 1i , and trucks, y 2i , held by household i are determined by the value of the corresponding latent utility y 1i
and y 2i ; specifically, y j ¼ 0, if y j 6 a1 , y j ¼ 1, if a1 < y j 6 a2 , y j ¼ 2 or more, if y i > a2 , for j ¼ 1; 2. Average
annual miles driven by cars y 3 is observed only when a household holds at least one car; that is,
y 3 ¼ y 3 ; if y 1 ¼ 1 or 2 ð5Þ
y 3 ¼ 0; if y 1 ¼ 0 ð6Þ
It’s worth noting that in the paper, the lowest and highest cut points of the ordered probit equations, a1 and
a2 , are set to be a1 ¼ U1 ð1=3Þ and a2 ¼ U1 ð1=3Þ (where U stands for normal cumulative density function);
while the variances of the ordered equations are no longer restricted to 1’s. This differs from the common prac-
tice of estimating ordered probit models with the variance constrained to be 1, and with only one of the cut-
points constrained (usually at zero). As Nandram and Chen (1996) has proved using a re-parameterization,
constraining the lowest and highest thresholds is equivalent to constraining one cut point and the variance
for identification purposes in estimating an ordered probit model. The advantage of the former is that the
covariance matrix can be totally unrestricted in a multivariate case, making sampling much easier. Such
approach has also been adopted in Webber and Forster (2006). Since there are only two cut points in my case,
the only parameters to be estimated are the coefficients and the covariance matrix. I have experimented fixing
the two cut-points to be other numbers. Since the covariance matrix changes accordingly, the inference
remains the same after standardization. The whole system can then be written into a SUR (seemingly unre-
lated regression) form:
yi ¼ xi b þ i ð9Þ
The error structure is a multivariate normal with zero means and unrestricted covariance matrix:
i i:i:d: N ð0; RÞ ð10Þ
The likelihood function is given as the following,
Y
Lðb; R; y 1 ; y 2 ; y 3 ; y 4 Þ / f y 1i < a1 ; y 2i < a1 jb; R
i3y 1i ¼0;y 2i ¼0
Y
f y 1i < a1 ; a1 < y 2i < a2 ; y 4i ¼ y 4i jb; R
i3y 1i ¼0;y 2i ¼1
Y
f y 1i < a1 ; y 2i > a2 ; y 4i ¼ y 4i jb; R
i3y 1i ¼0;y 2i ¼2
Y
f a1 < y 1i < a2 ; y 2i < a1 ; y 3i ¼ y 3i jb; R
i3y 1i ¼1;y 2i ¼0
740 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Y
f a1 < y 1i < a2 ; a1 < y 2i < a2 ; y 3i ¼ y 3i ; y 4i ¼ y 4i jb; R
i3y 1i ¼1;y 2i ¼1
Y
f a1 < y 1i < a2 ; y 2i > a2 ; y 3i ¼ y 3i ; y 4i ¼ y 4i jb; R
i3y 1i ¼1;y 2i ¼2
Y
f y 1i > a2 ; y 2i < a1 ; y 3i ¼ y 3i jb; R
i3y 1i ¼2;y 2i ¼0
Y
f y 1i > a2 ; a1 < y 2i < a2 ; y 3i ¼ y 3i ; y 4i ¼ y 4i jb; R
i3y 1i ¼2;y 2i ¼1
Y
f y 1i > a2 ; y 2i > a2 ; y 3i ¼ y 3i ; y 4i ¼ y 4i jb; R
i3y 1i ¼2;y 2i ¼2
Due to the discrete nature of the system, the likelihood function involves integrals of multivariate normal
densities, which can be approximated by the GHK algorithm (see discussion of the algorithm in Train (2003)).
The simulated likelihood function can then be maximized to obtain parameter estimates. This simulated max-
imum likelihood approach, however, incurs high computational cost and ignores the simulation error. The
computational cost of direct evaluating the multiple integrals can be avoided in a Bayesian approach with data
augmentation (Albert and Chib, 1993; Chib and Greenberg, 1998). By ‘‘augmentation”, we mean the unob-
servable latent dependent variables y j are added as additional parameters, which in turn aids estimation.
As Rossi and Allenby (2003) point out, ‘‘To a Bayesian, all unobservable quantities can be considered the
object of inference regardless of whether they are called parameters or latent variables”. Similar data augmen-
tation scheme is used in Cowles et al. (1996) for Tobit modelling of longitudinal ordinal clinical trial compli-
ance data with missingness. An additional benefit of using Bayesian method is that, unlike the classical
method, it does not require a large sample to insure the adequacy of asymptotic approximations. Because
Bayesian methods use the rules of probability and adheres to the likelihood principle, the estimators have
good finite sample properties.
I use Gibbs sampler algorithm (Gelfand and Smith, 1990; Gemand and Gemand, 1984) to simulate draws
from the conditional distributions for the unknown parameters and latent variables y j . These draws from the
Markov chain can later be used to run policy simulations, enabling finite sample inferences. Each iteration of
the Gibbs sampler cycles through b, R and yi . At iteration t, each vector of parameters is sampled from the
conditional distribution given all the other parameters:
ðt1Þ
draw bjR; yi from p bjRðt1Þ ; yi ð11Þ
ðt1Þ
draw Rjb; yi from p RjbðtÞ ; yi ð12Þ
draw yi jb; R; yi from p yi jbðtÞ ; RðtÞ ; yi ð13Þ
ðtÞ
It can be shown that the sequence of iterations bðtÞ ; RðtÞ ; yi converges to the joint posterior distribution of
ðb; R; yi Þ.
Assume a normal prior for b N ðb0 ; V 0 Þ, an Inverse-Wishart for R IWðm; QÞ. To make the priors rela-
tively noninformative, I set the variance of the normal prior to be large and prior degree of freedom of the
Wishart to be small. Specifically, I set b0 to be a vector of zeros, and V 0 to be a diagonal matrix with 100
on the diagonal, m to be 10, and Q an identity matrix. I check the effect of the prior by increasing the prior
variance of b to reflect the noninformativeness of the prior. Since results obtained from the noninformative
priors are virtually the same with the relatively noninformative prior mentioned above, I conclude data infor-
mation is predominant. Figs. 1 and 2 show the prior and posterior density of two density coefficients (bdensity )
in the truck choice and utilization equations. Note that the prior distribution is so flat that almost all the infor-
mation in the posterior is from the data.
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 741
18
Posterior
Prior
16
14
12
10
Density
0
0.3 0.25 0.2 0.15 0.1 0.05 0 0.05 0.1
βdensity in truck choice equation
Fig. 1. Prior and posterior density of bdensity in the truck choice equation.
1.5
Posterior
Prior
1
Density
0.5
0
2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4
βdensity in truck mileage equation
Fig. 2. Prior and posterior density of bdensity in the truck mileage equation. The miles driven by trucks are in the unit of 1000 miles.
742 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
y 1i jy 2i ; y 3i ; y 4i ; b; R N ðl1j:1 ; r1j:1 ÞI ay 1i < y 1i < ay 1iþ1
y 2i jy 1i ; y 3i ; y 4i ; b; R N ðl2j:2 ; R2j:2 ÞI ay 2i < y 2i < ay 2iþ1
y 3i jy 3i ¼ 0; y 1i ; y 2i ; y 4i ; b; R N ðl3j:3 ; R3j:3 ÞI y 3i < 0
y 4i jy 4i ¼ 0; y 1i ; y 2i ; y 3i ; b; R N ðl4j:4 ; R4j:4 ÞI y 4i < 0
y 1i and y 2i take on values of 0, 1 or 2, a0 ¼ 1, a1 ¼ U1 ð1=3Þ, a2 ¼ U1 ð1=3Þ, a3 ¼ 1, ljj:j stands for the
mean of equation j conditional on the joint distribution of the equations other than jð:jÞ. The conditional
means ðljj:j Þ and the covariance matrix of the distributions above are calculated according to Poirier
(1995, p. 122) (detailed derivation of the conditional posterior distributions is given in Appendix A).
Latent miles driven are used for households without cars or trucks to draw marginal inference about how
much a household would have driven had it held a car or truck. Regard y 3 and y 4 as latent willingness to drive
by cars and trucks, the two variables can be negative, which stands for how far away people are from taking
up driving. Because their levels of dis-utility (or costs) for driving differs, some households may purchase new
cars and have additional driving whilst others still retain the same vehicle holdings and the same annual miles
driven, when density decreases or income increases to a certain level. I hence draw latent miles driven from a
truncated normal distribution below zero to reflect the heterogeneity of those households who do not hold
either type of the vehicles.
2.3. Data
The study uses data from 2001 National Household Travel Survey (NHTS), a cross-section survey of a total
of 69,817 households nationwide. Those data contain detailed information on households’ demographics, var-
ious measures of land use density and vehicle properties including year, make, model and estimates of annual
miles travelled. The California sub-sample is used in this paper, so as to be comparable to the results from
previous studies, which have mostly focused on households’ travel behavior in the same region. Moreover,
it provides a starting point to compare behavioral patterns across different regions of the U.S. The original
California sample includes 2583 households. I eliminate observations missing important information such
as income, highest education within the family and vehicle characteristics. The sample retained and used in
the analysis contains 2299 households.
Explanatory variables include density, other neighborhood characteristics and household demographic
characteristics. Density is measured by housing units per square mile at the census block level, which is highly
correlated with population per square mile and jobs per square mile. To capture local transit networks and
non-motorized facilities, an indicator of whether or not the MSA has rail, dummies for selected MSAs and
the number of bicycles in the households are considered. The MSA rail dummy is later excluded from the
regression, since there is a multicollinearity between the rail dummy and the major MSA dummies. Demo-
graphic variables include total household annual income, the highest education level achieved within a house-
hold, number of adults, number of children, children’s ages, home ownership and zone type of the residence
area. The definitions and sample statistics of the explanatory variables used in the analysis are presented in
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 743
Table 1
Descriptive statistics
Variable (observation: 2299) Mean (SD)
Residential density 2566 (1886)
Number of bicycles 0.96 (1.26)
Total number of people 2.69 (1.45)
Number of adults 1.99 (0.80)
If the highest education achieved is high school 31%
If the highest education achieved is bachelor’s degree or higher 46.2%
If the youngest child is under 6 16.9%
If the youngest child is between 6 and 15 18.1%
If the youngest child is between 15 and 21 5.8%
If no children or youngest child more than 21 years old 59.2%
If MSA has rail 65.1%
If household resides in urban area at census tract level 93%
If household resided in rural area at census tract level 7%
If household resides in Los Angeles MSA 42%
If household resides in San Francisco MSA 23.1%
If household resides in San Diego MSA 8.7%
If household resides in Sacramento MSA 7.9%
If household resides in other MSAs 18.3%
If annual household income is less than 20k 15%
If annual household income is between 20k and 30k 10.7%
If annual household income is between 30k and 50k 21%
If annual household income is between 50k and 75k 18.8%
If annual household income is between 75k and 100k 12.8%
If annual household income is greater than 100k 21.7%
If the household owns home 69.1%
Note: Residential density is measured in housing units per square mile, coded into six ranges using midpoints.
Table 1. The average density in California is 2566 housing units per square mile at the census block level.
The minimum density value assigned is 25 and the maximum is 6000 housing units per square mile. In the
four largest MSAs of the California state, Los Angeles–Riverside–Orange County has the highest density
with 3016 housing units per square mile on average, immediately followed by San Francisco–Oakland–San
Jose with 2905 housing units. The density drops to 2563 in San Diego and further down to 2113 in
Sacramento.
Sample statistics of the dependent variables are presented in Table 2. The average number of car holdings
for a California household is 1.1 and the average number of trucks is 0.72. The variation in vehicle holdings is
high, with some households having a total of 6 cars and some none. The average annual miles driven by cars is
11,541, a little bit lower than the 13,198 average annual miles driven by trucks. The utilization is also marked
with high variance. Examining the two-way tabulation of vehicle counts in Table 2 for observations at differ-
ent categories, we can see that all the combination space have a good number of observations. The cell with
the largest number of observations has 547 households owning one car and zero trucks; while the cell with the
least observations has 64 households holding two or more cars, and two or more trucks.
In the Gibbs sampler, I take 10,000 iterations and burn in the first 1000 to mitigate start up effects and use
the remaining draws to get posterior inferences. MCMC convergence diagnoses, such as autocorrelation
within the parameter chain, numerical standard errors (Raftery–Lewis diagnostic, Geweke diagnostic), all
indicate a high degree of accuracy with this number of iterations.
The coefficient estimates are reported in Table 3. The density coefficient for the car choice equation is posi-
tive but with a large standard deviation, indicating insignificance; while the density coefficient in the truck
choice equation is significantly negative. Since the coefficients of the ordered probit equations do not have
744 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Table 2
Dependent variables
Variable Mean (SD) Min Max
Number of cars held 1.1 (0.82) 0 6
Number of trucks held 0.72 (0.79) 0 4
25 quantile 75 quantile
Average miles driven by cars 11,541 (9949) 5927 14,609
Average miles driven by trucks 13,198 (11,945) 6803 16,395
meanings in themselves, I calculated the changes in probabilities of vehicle holdings when residential density
increases by 10%, 25% and 50%, respectively, for each household in the sample. The probabilities of holding
zero, one and two or more cars and trucks are calculated for each household and for each set of draws of
parameters in the MCMC chain excluding the first 1000 draws. Since the draws of the parameters in the
MCMC chain can be regarded as the draws from the posterior distribution, the distribution of the probability
changes can be obtained by getting the mean effect of all the households for each set of draws.
Table 4 lists the mean and standard deviation of the probability changes in number of cars and trucks in
response to density changes. When density increases by half, the probability of not holding trucks increases by
approximately 1.2%, and the probabilities of holding one truck and two trucks decrease by around 0.75% and
0.46%, respectively. Clearly, when density increases, only a limited portion of people opt not to choose trucks.
Residential density effects households’ choice of cars with a much smaller scale and in a less significant way.
When density increases by 50%, the probability of holding zero cars increases by 0.13%, that of holding one
car increases by 0.006%, while the probability of holding two cars decreases by 0.136%. The weak effect and
the positive sign on holding one car can be explained as a substitution effect between cars and trucks, when
households move from low density areas to high density areas. When trucks are too costly, in terms of parking
search and maneuvering, to hold in high density areas, households might substitute cars for trucks.
In terms of vehicle utilization (see Table 5), miles driven by cars are less responsive to density changes than
miles driven by trucks. For example, the average utilization of trucks per household will decrease by approx-
imately 309.8 miles annually when density increases by 25%, while the average annual miles travelled by cars
per households will decrease by approximately 64.6 miles.
The small effect of density on vehicle type choice and miles travelled described above is in line with the find-
ings in the study by Bento et al. (2005), where various measures of urban form are found to have a small
impact on the number of vehicles owned and vehicle miles travelled individually (with elasticities less than
0.11). However, Bento et al. also note that if urban form and transit availability change simultaneously, the
vehicle miles travelled will be changed substantially. As mentioned in Section 2.3, the estimation in this paper
is done under the scenario that transit availability and neighborhood characteristics are controlled by inclu-
sion of the dummies for major MSAs in California. So the question remains is whether the result will change
substantially if the estimation is done unconditional on the transit availability and other neighborhood
1
More specifically, they find a 10% increase in population centrality reduces the probability of owning two vehicles by 1.5% and the
probability of owning three or more vehicles by approximately 2.1%. They also find a 1% increase in population centrality reduces annual
average vehicle miles travelled by approximately 1.5%. In Bento et al. (2003), a 10% increase in population centrality is found to reduce
annual miles travelled by approximately 300 miles.
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 745
Table 3
Estimation results of the BMOPT model (rail excluded)
Variable Coefficient
Number Number Annual average car Annual average truck
of cars of trucks miles (in 1000 miles) miles (in 1000 miles)
log(density) 0.0136 0.0943 0.2896 1.3882
(0.0234) (0.0236) (0.2725) (0.3884)
Number of bikes 0.0165 0.1188 0.0243 1.1565
(0.0213) (0.0208) (0.2407) (0.3556)
Household size 0.1044 0.0969 0.2747 1.3818
(0.0401) (0.0406) (0.4762) (0.6535)
Number of adults 0.5603 0.1900 2.1774 1.0476
(0.0517) (0.0517) (0.6065) (0.8437)
Urban 0.2071 0.4325 1.7901 1.9542
(0.1197) (0.1183) (1.3866) (1.9056)
Income between 20k and 30k 0.2078 0.1088 0.2749 5.2804
(0.0957) (0.0964) (1.1295) (1.6918)
Income between 30k and 50k 0.2162 0.4881 2.0909 6.9117
(0.0831) (0.0853) (0.9685) (1.4724)
Income between 50k and 75k 0.4757 0.5813 4.0084 9.2357
(0.0881) (0.0894) (1.0182) (1.5464)
Income between 75k and 100k 0.4921 0.7537 2.9076 10.2581
(0.1006) (0.1005) (1.1672) (1.7153)
Income greater than 100k 0.5984 0.7921 4.1652 11.9082
(0.0951) (0.0968) (1.1005) (1.648)
Owns home 0.2302 0.2562 0.049 3.5008
(0.0572) (0.0569) (0.6658) (1.0056)
Highest education: high school 0.2244 0.1043 2.1343 0.2801
(0.0676) (0.0675) (0.7974) (1.1606)
Highest education: bachelor 0.3313 0.2472 3.0401 4.0957
(0.0713) (0.0709) (0.8218) (1.2075)
Youngest child under 6 0.1269 0.0656 2.3385 4.4909
(0.1089) (0.1097) (1.2456) (1.7723)
Youngest child between 6 and 15 0.0834 0.0449 1.0392 2.4871
(0.0972) (0.0975) (1.1128) (1.5984)
Youngest child between 15 and 21 0.0735 0.0103 0.1141 0.0987
(0.1111) (0.1088) (1.2631) (1.8056)
LA 0.0417 0.1959 1.9529 0.3116
(0.0731) (0.073) (0.8515) (1.2203)
Sacramento 0.0986 0.2568 2.9610 0.4090
(0.1045) (0.1052) (1.2068) (1.7264)
San Diego 0.1823 0.2039 0.7071 1.7565
(0.1012) (0.1032) (1.1663) (1.7172)
San Francisco 0.216 0.3254 1.2381 3.5607
(0.0815) (0.0824) (0.9388) (1.3822)
Notes: The base groups are households with income below 20k, do not own home, are high school dropout, have no children, live in rural
area, and reside outside of major MSAs. Posterior standard deviations are reported in parentheses.
characteristics. To this end, I re-estimate the model including only the density variable and household
characteristics. The parameter estimates are presented in Table 6, and the simulation results are presented
746 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Table 4
Changes in vehicle choice when density increases (BMOPT)
% changes in density Probability changes for truck choice
DP (tnum = 0) DP (tnum = 1) DP (tnum P 2)
10% 0.0028 0.0017 0.0011
(0.0007) (0.0004) (0.0003)
25% 0.0066 0.0041 0.0025
(0.0017) (0.001) (0.0006)
50% 0.012 0.0075 0.0046
(0.003) (0.0019) (0.0011)
% changes in density Probability changes for car choice (in 103 unit)
DP (cnum = 0) DP (cnum = 1) DP (cnum P 2)
10% 0.30 0.02 0.32
(0.52) (0.04) (0.55)
25% 0.71 0.04 0.75
(1.23) (0.09) (1.29)
50% 1.30 0.06 1.36
(2.23) (0.15) (2.34)
Notes: Posterior standard deviations are reported in parentheses.
This table documents the changes in the probabilities of holding zero (P (tnum = 0)), one (P (tnum = 1)), and two or more (P (tnum P 2))
trucks, and zero (P (cnum = 0)), one (P (cnum = 1)), and two or more (P (cnum P 2)) cars, averaging across each household in the
sample, when density increases by 10%, 25% and 50%, respectively.
Table 5
Changes in vehicle miles when density increases (BMOPT)
Dcar miles % Dcar miles Dtruck miles % Dtruck miles
10% 27.60 0.31 132.31 1.85
(25.97) (0.29) (37.02) (0.52)
in Table 7. From these tables, we see that the changes in the probabilities of car choices are sensitive to
the specification, and that the changes for the truck choice from the new model are approximately twice
the scale as those from the original model. For example, when residential density increases by 50%, the
probability of not holding trucks increases by 2.2% in the new specification, as apposed to the 1.2% increase
in the original model. Therefore, we conclude that the effect of the residential density is economically
insignificant.
Table 8 presents the error correlation matrix of the four equations. These correlations allow me to gauge
whether the association between the errors are important enough to be taken into consideration. The errors
from number of cars held and number of trucks held exhibit a highly negative correlation of 0.44. The cor-
relation between miles driven by cars and miles driven by trucks is also large at 0.32. This indicates a sub-
stitution effect between cars and trucks, not only type-wise but also usage-wise. Across vehicle choice and
vehicle usage, I find that the error of choice of cars is positively associated with utilization of cars and
Table 6
Estimation results of the BMOPT model (urban, rail and MSA dummies excluded)
Variable Coefficient
Number of cars Number of trucks Annual average car miles (in 1000 miles) Annual average truck miles (in 1000 miles)
log(density) 0.0209 0.1704 0.1036 1.7375
(0.0173) (0.0176) (0.2068) (0.2842)
Number of bicycles 0.0165 0.1177 0.0275 1.1696
(0.0211) (0.0212) (0.2407) (0.3457)
Household size 0.1021 0.0927 0.2724 1.3635
(0.0395) (0.0401) (0.4802) (0.6482)
747
748 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Table 7
Changes in vehicle choice when density increases (BMOPT, pure demographics)
% changes in density Probability changes for truck choice
DP (tnum = 0) DP (tnum = 1) DP (tnum P 2)
10% 0.0051 0.0031 0.002
(0.0005) (0.0003) (0.0002)
25% 0.012 0.0074 0.0046
(0.0012) (0.0008) (0.0005)
50% 0.022 0.0138 0.0082
(0.0022) (0.0015) (0.0009)
% changes in density Probability changes for car choice (in 103 unit)
DP (cnum = 0) DP (cnum = 1) DP (cnum P 2)
10% 0.47 0.03 0.50
(0.39) (0.03) (0.41)
25% 1.09 0.07 1.16
(0.90) (0.08) (0.97)
50% 1.97 0.14 2.12
(1.64) (0.16) (1.8)
Notes: Posterior standard deviations are reported in parentheses.
As in Table 4, this table documents the changes in the probabilities of holding zero (P (tnum = 0)), one (P (tnum = 1)), and two or more (P
(tnum P 2)) trucks, and zero (P (cnum = 0)), one (P (cnum = 1)), and two or more (P (cnum P 2)) cars, averaging across each household
in the sample, when density increases by 10%, 25% and 50%, respectively, based on estimation results from Table 6.
Table 8
Error correlation matrix (BMOPT)
Number of cars Number of trucks Average car mile Average truck mile
Number of cars 1.00 – – –
Number of trucks 0.44 1.00 – –
[0.48, 0.40]
Average car mile 0.48 0.32 1.00 –
[0.45, 0.52] [0.36, 0.28]
Average truck mile 0.32 0.59 0.19 1.00
[0.36, 0.27] [0.56, 0.62] [0.24, 0.15]
Notes: Highest posterior density intervals are reported below each correlation.
negatively associated with utilization of trucks, and the inverse applies to choice of trucks. Hence I conclude a
joint estimation of the whole system is expected to gain substantial efficiency.
To demonstrate that a finer vehicle type classifications can be accommodated by the proposed model, I
subclassify cars and trucks into two types, respectively, according to their size: small-size cars, large-size cars,
small-size trucks and large-size trucks. The enhanced system consists of eight equations, four for the ordered
choice of the four types of vehicles and four for the vehicle usages. The estimation procedure remains the same
regardless. Since the size information is not available for all the vehicles in the sample, subclassifica-
tion reduces the sample size to only 2001 households with vehicles. The reduction in the observation does
not occur for zero-vehicle households. To include zero-vehicle households here needs re-weighting and is
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 749
Table 9
Changes in vehicle choice and usage when density is increased by 25% (subclassification, BMOPT)
Type choice Probability changes
DPr (n = 0) DPr (n = 1) DPr (n P 2)
Equations
Car Small-car 0.0024 0.0016 0.0008
(0.0026) (0.0017) (0.0009)
Large-car 0.0005 0.0003 0.0002
(0.0024) (0.0015) (0.0009)
Truck Small-truck 0.0016 0.0012 0.0004
(0.0024) (0.0018) (0.0006)
Large-truck 0.0073 0.0054 0.0019
(0.0022) (0.0017) (0.0006)
Miles driven Usage changes
Car
Small-car 24.4
(129.5)
Large-car 88.9
(120.4)
Truck
Small-truck 201.2
(145.6)
Large-truck 646.9
(179.9)
Notes: Posterior standard deviations are reported in parentheses.
Table 10
Error correlation matrix (subclassification, BMOPT)
Number of small cars 1.00 – – – – – – –
Number of large cars 0.35 1.00 – – – – – –
Number of small trucks 0.16 0.20 1.00 – – – – –
Number of large trucks 0.21 0.14 0.18 1.00 – – – –
Miles driven by small cars 0.74 0.33 0.12 0.17 1.00 – – –
Miles driven by large cars 0.26 0.75 0.18 0.15 0.25 1.00 – –
Miles driven by small trucks 0.15 0.16 0.82 0.22 0.11 0.13 1.00 –
Miles driven by large trucks 0.17 0.12 0.14 0.78 0.15 0.14 0.19 1.00
not pursued. Hence the results for the subclassification is based on the sample that includes only households
with vehicles.2
Table 9 displays marginal effects of increasing density by 25% on the probabilities of choosing different type
of vehicles and their usage. The results are largely consistent with what we have obtained previously. When
density increases by 25%, people tend to switch from trucks to cars, and from large-size truck to small-size
trucks within the truck category. More specifically, the probability of holding zero large-size trucks increases
by 0.73% point, in which 0.54% point is due to the reduced choice of one large-size truck and 0.19% point is
due to the reduced choice of two or more large-size trucks. In the meanwhile, the annual usage of large-size
cars and all trucks decrease, with the utilization of small-size cars slightly increases. Like the car/truck clas-
sification, the marginal effects of density are more precisely estimated for trucks than cars. The error covari-
ance matrix is presented in Table 10. Again, there exhibit substitution effects among vehicles of different sizes.
2
As pointed out by one of the anonymous reviewers, exclusion of zero-vehicle households might distort the estimation results. From the
parameter results of car/truck classification based on observations with and without zero-vehicle households, I find that exclusion of zero-
vehicle households tends to slightly over-estimate the substitution effect from trucks to cars when density increases, but the distortion is
negligible for truck choice and usage and does not change the main conclusion.
750 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Finer classifications can continue but only to a manageable number of classifications depending on the size
of the data set. This leads to a limitation in this exercise: the model cannot handle hundreds of vehicle make/
model/body type/vintage combinations as estimated by models such as MDCEV developed by Bhat (2005).
For some research focuses where fine vehicle classification is essential, we have to use models such as
MDCEV; for others, the model structure proposed is a simple alternative.
In the MDCEV model as proposed by Bhat (2005), a household’s utility stems from choosing make/model l
of class k, which contains a total of N k number of make/model. There are altogether K classes of vehicles to
choose from, and the actual number of classes of vehicles a household owns is denoted as Q. The utility func-
tion of a typical household is then formed as
X
K
s
U¼ expðmaxfW k þ Y kl þ gkl gÞðmk þ 1Þ k ð16Þ
l2N k
k¼1
where W k depends on household characteristics x that relates to the choice of class k, and equals to x0 bk , Y kl
depends on vehicle properties of a certain make/model l within class k, zkl , and equals to z0kl c, and mk denotes
miles driven by a vehicle of class k. sk is considered as a non-satiation factor, and is constrained between 0 and
1. For simplicity, household subscript i is omitted.
Households maximize their utility by choosing miles driven mk for each class
Pand one make/model
l from class
K
k for which mk > 0, under the constraint that total miles driven is fixed at M k¼1 m k ¼ M . The stochastic term
gkl is assumed to be generalized extreme value distributed. Applying the Kuhn–Tucker conditions, we have,
H k ¼ H 1; if mk > 0
H k < H 1; if mk ¼ 0
where
0
X
0 z c
H k ¼ x bk þ hk ln exp kl þ ln sk þ ðsk 1Þ ln mk þ 1 þ ek ð17Þ
l2N k
h
The probability that the first Q of the K vehicles being chosen, P m1 ; m2 ; . . . ; mQ ; 0; . . . ; 0 , is then derived
from the above Kuhn–Tucker conditions (cf. Bhat, 2005).
(18)
(19)
where
1 sk
rk ¼ ð20Þ
mk þ 1
X
z0kl c
V k ¼ x0 bk þ hk ln exp þ ln ak þ ðsk 1Þ ln mk þ 1 ð21Þ
l2N k
h
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 751
P
The log likelihood function is then L ¼ i log P i . For identification purpose, we need to pick a baseline class
K 1 so that its coefficients bk1 are set to zeros. Compact cars are used here as the baseline choice. The log like-
lihood is maximized using quasi-Newton (BFGS) algorithm.
Besides the California sub-sample of the 2001 National Household Travel Survey (NHTS) for household
characteristics, Wards Automobile Yearbook is also used to get vehicle properties, such as price, length,
weight and mile per gallon (mpg), down to the make/model level. Vehicle are classified, according to the clas-
sifications in Consumer Report, into 10 types: compact, compact luxury, sedan midsize, sedan fullsize, sedan
luxury, SUV small, SUV midsize, SUV large, mini-van and pickup trucks. Such fine classification is necessary
to ensure that each household chooses no more than one same type of vehicle, as requested by the model.
Table 11 presents the vehicle classifications in the sample used. The second column lists how many make/
model there are within each class and the third column counts how many vehicles within the sample belong
to that particular class.
Table 12 presents the coefficient estimates for vehicle properties at the make/model level. Table 13 presents
the coefficient estimates for density, household demographics and neighborhood characteristics at the class
level. The smallest vehicle type, compact car, is chosen as the baseline type. The negative sign of the density
coefficient for each of the remaining classes represents that the attractiveness of that class diminishes against
compact car when density increases. As the estimation results show, compact cars are gaining edge over all but
SUV midsize, in dense areas.
To draw policy implications of an increase in density, I substitute
PKthe estimated parameters into the utility
function (Eq. (16)), and maximize the utility function, subject to k¼1 mk ¼ M and mk P 0 for k ¼ 1; . . . ; K,
with respect to mk for each household. The matrix of optimal mk’s for all the households are obtained through
numerical optimization with constraints. The optimization is done before and after a policy change to calcu-
late the policy effects.
To compare the results to those from the BMOPT model of the car/truck classification, I group the vehicles
into cars and trucks from the original 10 classifications, and obtain the changes in vehicle type choice and uti-
lization with density increases by 10%, 25% and 50%, respectively. The changes are presented in Table 14. The
comparison of changes in truck miles with respect to residential density between the MDCEV and the
BMOPT model are presented in Table 15. Two observations are in order. First, the changes in truck miles
from the MDCEV model are consistent with those in the BMOPT model. For all density changes, the changes
in truck miles from the MDCEV model remain within one standard deviation from the changes in truck miles
Table 11
Classification of vehicles (MDCEV)
Types Number of make/model Observations
Compact 39 461
Compact: luxury 46 275
Sedan: midsize 28 532
Sedan: fullsize 9 84
Sedan: luxury 34 195
SUV: small 7 52
SUV: midsize 13 247
SUV: large 10 131
Minivan 18 266
Pickup trucks 23 429
Total 227 2672
Note: Vehicle types are derived according to the classifications in Consumer Report.
752 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Table 12
Coefficient estimates for vehicle properties (MDCEV)
Variables Coefficient Standard error
M.P.G. 0.039*** (0.011)
Price 0.020*** (0.005)
Chevrolet 0.071 (0.067)
Ford 1.406*** (0.223)
Honda 1.938*** (0.321)
Toyota 1.175*** (0.195)
Dodge 0.856*** (0.157)
Nissan 0.254*** (0.072)
Notes: Purchase price of the vehicle (price) is in the unit of $1000. Six major makes of the vehicles are included as the explanatory variables
to examine the brand effect. There are altogether 39 makes. The left out makes are Acura, Audi, BMW, Buick, Cadillac, Chrysler, GMC,
Hyundai, Infiniti, Isuzu, Jaguar, Jeep, Kia, Land Rover, Lexus, Lincoln, Lotus, Mazda, Mercedes Benz, Mercury, Mini cooper,
Mitsubish, Oldsmobile, Plymouth, Pontiac, Porsche, SAAB, Saturn, Scion, Subaru, Suzuki, Volkswagen and Volvo.
from the BMOPT model. Therefore, the association between higher density and lower truck utilization is well
established. Second, under the requirement of the MDCEV model, total miles driven has to remain fixed.
Therefore, we would see the decrease in the truck utilization to be equal to the increase in car utilization.
This estimation of the MDCEV model shows that the results from the utility derived MDCEV are compa-
rable to those from the reduced-form BMOPT model proposed. The advantages of BMOPT model in solving
this type of problem, in which only broader classification of vehicles are necessary, are as follows. First, the
model is conceptually and computationally easy to solve. Second, there is no additional information required
for finer classification of vehicles. There is also no constraint imposed on the total miles travelled for a house-
hold. Moreover, policy simulations are less costly.
4. Endogeneity
The estimates from the model are useful for policy implications if, beyond correlation, they also indicate
causality. If the decision process for households is to first choose where to live and then choose what kind
of vehicles to own, reverse causality is unlikely. However, it is also plausible that the residential location choice
and vehicle ownership decisions are made jointly. The potential problem that might bias my estimates is the
existence of unobserved factors that affect both vehicle choice and density choice. In other words, a person
drives a truck not because she lives in a suburb, but because she enjoys larger spaces, which in turn influences
her decision to live in the suburb and drive a truck. I control for part of this by using disaggregate data and
detailed household characteristics. With this approach I hope to capture some of the factors that affect both
residential density choice and vehicle type and usage choice, but not all.
To test and control for endogeneity, the BMOPT model and the MDCEV model face the same challenge.
We can either use appropriate instrumental variables, or estimate a simultaneous residential location and vehi-
cle ownership and usage mode system with choice of residential density itself as a dependent variable and an
endogenous component. In the latter case, we need additional exogenous covariates other than the explana-
tory variables used in the vehicle ownership and usage equations to identify the system and these variables are
valid instrumental variables. Therefore, finding appropriate exogenous variables to act as IVs is the key to the
problem.
However, it is difficult to find instrumental variables that are correlated with density but not with vehicle
choice. In the extreme case, such variables may not be readily available. A possible solution is using average
density for a tract’s MSA as an instrument variable for the tract population density. This IV was used by Brue-
ckner and Largey (2008) in their study. However, the data set I use has only six MSAs, a number too small to
provide enough variability in the density to capture its influence. School quality might be another feasible
instrument, since schools with good quality are usually located in low density suburban areas, while those with
lower quality are located in high density downtown areas. However, school quality itself is a variable difficult
to measure and hence hard to obtain.
Table 13
Coefficient estimates for household characteristics (MDCEV)
Variables Compact: luxury Sedan midsize Sedan fullsize Sedan luxury SUV small SUV midsize SUV large Minivan Truck
Residential density 0.022 0.073 0.100 0.108 0.090 0.077 0.016 0.124 0.187
(0.000) (0.022) (0.100) (0.048) (0.041) (0.025) (0.124) (0.035) (0.037)
Number of bicycles 0.052 0.061 0.042 0.014 0.002 0.048 0.087 0.114 0.142
(0.000) (0.076) (0.258) (0.131) (0.084) (0.075) (0.298) (0.148) (0.191)
Total number of people 0.107 0.293 0.047 0.199 0.503 0.209 0.189 0.415 0.056
753
754
H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Table 13 (continued)
Variables Compact: Sedan Sedan Sedan SUV SUV SUV Minivan Truck
luxury midsize fullsize luxury small midsize large
Youngest child between 6 and 15 0.437 0.653 0.576 0.240 0.151 0.225 0.450 0.395 0.286
(0.033) (0.078) (0.228) (0.147) (0.141) (0.120) (0.437) (0.123) (0.262)
Youngest childe between 15 and 21 0.370 0.047 1.101 0.138 0.231 0.450 0.234 0.540 0.503
(0.046) (0.084) (0.276) (0.173) (0.101) (0.107) (0.325) (0.147) (0.291)
Los Angeles 0.296 0.498 4.110 0.014 0.826 0.056 0.571 0.712 0.076
(0.011) (0.346) (0.389) (0.361) (0.352) (0.353) (0.412) (0.358) (0.377)
Sacremanto 0.242 0.888 0.316 1.043 0.981 1.131 0.338 0.601 1.126
(0.002) (0.005) (0.003) (0.002) (0.003) (0.008) (0.003) (0.004) (0.002)
San Diego 0.049 1.031 0.505 0.171 0.351 0.319 0.495 0.394 0.498
(0.003) (0.012) (0.006) (0.005) (0.005) (0.013) (0.005) (0.006) (0.004)
San Francisco 0.077 0.425 3.653 0.214 0.557 0.183 1.224 1.711 0.344
(0.014) (0.067) (0.033) (0.031) (0.015) (0.089) (0.023) (0.022) (0.018)
Notes: Standard errors are reported in parentheses.
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 755
Table 14
Changes in vehicle choice and usage when density increases (MDCEV)
When density increases by Change in % change in
Car miles Truck mile Car miles Truck mile Car holdings Truck holdings
10% 136.7 136.7 1.47 2.01 0.17 1.81
25% 343.7 343.7 3.52 4.81 0.56 3.72
50% 603.1 603.1 6.52 8.91 1.47 8.2
Table 15
Comparison of truck usage changes between the MDCEV and BMOPT model
When density increases by MDCEV Dtruck miles BMOPT Dtruck miles
10% 136.7 132.2
(70.1) (37)
25% 343.7 309.8
(156.6) (86.7)
50% 603.1 562.9
(366.0) (157.5)
More importantly, Golob and Brownstone (2005) dealt with the sample selection problem with a simulta-
neous equation system and did not find any evidence for endogeneity of density choice. This does not, of
course, rule out the endogeneity issue in this paper. If the estimation results can be driven either by a casual
relation between density and fuel efficiency or unobserved factors that determine both density and fuel effi-
ciency choice, the results from this analysis provide an upper bound on the possible reductions in choice
and usage of fuel inefficient vehicles. Since the reduction is negligibly small even if causality is assumed, we
can conclude that changing density hardly effects vehicle choice and usage.
5. Conclusion
Two models, a reduced-form Bayesian Multivariate Probit and Tobit (BMOPT) model and the Multiple
Discrete–Continuous Extreme Value (MDCEV) model derived from utility maximization, are applied to
model households’ vehicle holdings and usage decisions in California.
The system of BMOPT is composed of a multivariate ordered probit model and a multivariate Tobit model.
The ordered probit is used to capture household decisions on number of vehicles in each category. Within this
framework, vehicles are categorized into fuel efficient (cars) and fuel inefficient vehicles (trucks), which permits
me to capture possible environmental and energy saving policy implications. Note that this model can be
extended to incorporate a finer classification of vehicles, thereby suiting the needs of particular studies. By
using ‘number of vehicles’ in each category instead of ‘total number of vehicles’, the analysis circumvents a
usual difficulty faced by traditional modelling of vehicle holdings. In traditional modelling, with an increase
in total number of vehicles, possible combinations of vehicle holdings proliferate. Hence the estimation
becomes cumbersome when there are households with more than two vehicles. With the method employed
in this paper, however, handling multiple-vehicle households becomes simple and flexible.
The multivariate Tobit captures household decisions on miles driven, conditional on each category. Tradi-
tional discrete–continuous models were built upon utility maximization theory, but approximations (in esti-
mation) dampen the elegance of the theoretical derivation. By combining the multivariate ordered probit
and Tobit model and assuming an unrestricted covariance matrix, I can more ‘cleanly’ estimate a reduced-
form discrete/continuous system. Using data augmentation and Bayesian Markov Chain Monte Carlo meth-
ods, the estimation is straightforward.
The BMOPT model and the MDCEV model both have their pros and cons. The BMOPT model is easy to
implement, convenient to get inferences and hence draw policy implications, able to handle a large total
756 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
number of vehicles, but it will become computationally intensive with increasing vehicle categories because the
number of equations to be estimated increases proportionally with number of categories. The MDCEV is con-
sistent with random utility maximization, and can accommodate hundreds of vehicle classifications, but one
restriction is that the total utilization of vehicles are assumed to be fixed no matter how the policy changes.
This assumption rules out the potential vehicle utilization reduction which we would expect to occur, or at
least test, in response to particular policies. In addition, finer classification of vehicles to a degree that no
one type of vehicle can be chosen twice for a household is a must for the model implementation.
In sum, an efficient estimation technique along with the recent and detailed 2001 NHTS data enables me to
obtain a small but statistically significant effect of density on households’ vehicle choice. I conclude that
increasing residential density within feasible ranges will have a very small impact on household vehicle hold-
ings and vehicle fuel usage.
Acknowledgements
The author is highly indebted to David Brownstone for his guidance. She is grateful to Jan Brueckner, Ivan
Jeliazkov, David Neumark, Ken Small, and Kurt Van Dender for their helpful comments and encourage-
ments. She appreciates Busik Choi for providing part of the data. She is also thankful to the Department
of Economics and School of Social Science at UCI for their generous financial support. Associate editor Chan-
dra Bhat and four anonymous referees provided invaluable comments.
Y
T
1 1=2 1 0 1
/ jR j exp y i xi b R y i xi b I j¼1;2 ðay ij < y ij
i¼1
2
1 0 1 ðmþkþ1Þ=2 1
6 ay ij þ1 Þ exp ðb b0 Þ V 1
0 ðb b 0 Þ jR j exp trR1
Q
2 2
Given the joint posterior distribution, the conditional posterior distributions are:
" #!
1 X T
0 1 0 1
f ðbjy i ; y i ; RÞ / exp y xi b R y i xi b þ ðb b0 Þ V 0 ðb b0 Þ
2 i¼1 i
1 0 1
/ exp ðb bÞ V ðb bÞ
2
where
!1
X
T
V ¼ V 1
0 þ x0i R1 xi
i¼1
!
X
T
¼V
b 1
þ x0i R1 y i
V 0 b0
i¼1
H.A. Fang / Transportation Research Part B 42 (2008) 736–758 757
Hence,
VÞ
bjy i ; y i ; R Nðb;
!!
1 X
T
0 1
1 T =2 1 ðmþkþ1Þ=2
f Rjy i ; y i ; b / jR j exp tr R1 y i xi b y i xi b jR j 1
exp trR Q
2 1
2
!!
1 XT
0
1 ðmþT þkþ1Þ=2
/ jR j exp trR1 y i xi b y i xi b þ Q
2 i¼1
Therefore,
!
X
T
0
Rjy i ; y i ; b IW m þ T ; y i xi b y i xi b þ Q
i¼1
The
conditional posterior
distribution of the
latent utilities, f yi jy i ;b; R , is drawn recursively from
f y 1i jy 2i ; y 3i ; y 4i ; b; R ; f y 2i jy 1i ; y 3i ; y 4i ; b; R ; f y 3i jy 1i ; y 2i ; y 3i ; b; R ; and f y 4i jy 1i ; y 2i ; y 3i ; b; R
where
y 1i jy 2i ; y 3i ; y 4i ; b; R N ðl1j:1 ; r1j:1 ÞI ay 1i < y 1i < ay 1iþ1
y 2i jy 1i ; y 3i ; y 4i ; b; R N ðl2j:2 ; R2j:2 ÞI ay 2i < y 2i < ay 2iþ1
y 3i jy 3i ¼ 0; y 1i ; y 2i ; y 4i ; b; R N ðl3j:3 ; R3j:3 ÞI y 3i < 0
y 4i jy 4i ¼ 0; y 1i ; y 2i ; y 3i ; b; R N ðl4j:4 ; R4j:4 ÞI y 4i < 0
I follow Poirier (1995) to obtain the conditional mean and variance for partitioned matrix. Generally, for
Z ¼ ½Z 01 Z 02 0 N ðl; RÞ where Z is a N 1 random vector, Z 1 is a m 1 vector, and Z 2 is ðN mÞ 1 with
l1 R11 R12
l¼ ; R¼
l2 R012 R22
The conditional distribution of Z 1 given Z 2 is Z 1 jZ 2 N ðl1j2 ; R1j2 Þ, where
l1j2 ¼ l1 þ R11 R1
12 ðZ 2 l2 Þ
References
Albert, J., Chib, S., 1993. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association
88, 669–679.
Amemiya, T., 1984. Tobit models: a survey. Journal of Econometrics 24, 3–61.
Bento, A.M., Cropper, M.L., Mobarak, A.M., Vinha, K., 2003. The Impact of Urban Spatial Structure on Travel Demand in the U.S.
World Bank Policy Research Working Paper No. 3007.
Bento, A.M., Cropper, M.L., Mobarak, A.M., Vinha, K., 2005. The effect of urban spatial structure on travel demand in the united states.
Review of Economics and Statistics 87 (3), 466–478.
Berkovec, J., Rust, J., 1985. A nested logit model of automobile holdings for one vehicle households. Transportation Research Part B 19
(4), 275–285.
Bhat, C.R., 2005. A multiple discrete–continuous extreme value model: formulation and application to discretionary time-use decisions.
Transportation Research Part B 39 (8), 679–707.
Bhat, C.R., 2008. The multiple discrete–continuous extreme value (MDCEV) model: role of utility function parameters, identification
considerations, and model extensions. Transportation Research Part B 42 (3), 274–303.
Bhat, C.R., Sen, S., 2006a. Household vehicle type holdings and usage: an application of the multiple discrete–continuous extreme value
(MDCEV) model. Transportation Research Part B 40 (1), 35–53.
Bhat, C.R., Sen, S., 2006b. The impact of demographics, built environment attributes, vehicle characteristics, and gasoline prices on
household vehicle holdings and use. Working Paper, The University of Texas at Austin.
Brueckner, J., 2001. Urban sprawl: lessons from urban economics. In: Gale, W., Pack, J. (Eds.), Brookings-Wharton Papers on Urban
Affairs.
758 H.A. Fang / Transportation Research Part B 42 (2008) 736–758
Brueckner, J., Largey, A., 2008. Social interaction and urban sprawl. Journal of Urban Economics.
Cervero, R., Kockelman, K., 1997. Travel demand and the 3Ds: density, diversity and design. Transportation Research Part D 3, 199–219.
Chib, S., Greenberg, E., 1998. Bayesian analysis of the multivariate probit model. Biometrika 85, 347–361.
Cowles, M., Carlin, B., Connett, E., 1996. Bayesian Tobit modelling of longitudinal ordinal clinical trial compliance data with
nonignorable missingness. Journal of the American Statistical Association 91 (433), 86–98.
Dubin, J., McFadden, D., 1984. An econometric analysis of residential electric appliance holdings and consumption. Econometrica 52,
345–362.
Dunphy, R., Fisher, K., 1996. Transportation, congestion, and density: new insights. Transportation Research Record 1552, 89–96.
Feng, Y., Fullerton, D., Gan, L., 2005. Vehicle choices, miles driven and pollution policies. NBER Working Paper Series, Working Paper
11553.
Gelfand, A.E., Smith, A.F., 1990. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical
Association 85, 398–409.
Gemand, S., Gemand, D., 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on
Pattern Analysis and Machine Intelligence 6, 721–741.
Geweke, J., 1991. Efficient simulation from the multivariate normal and Student-t distributions subject to linear constraints and the
evaluation of constraint probabilities. In: Proceedings of 23rd Symposium on the Interface between Computing Science and Statistics,
pp. 571–578.
Goldberg, P., 1998. The effects of the corporate average fuel efficiency standards in the US. Journal of Industrial Economics 46 (1), 1–33.
Golob, T., Brownstone, D., 2005. The impact of residential density on vehicle usage and energy consumption. Working Paper, University
of California, Irvine.
Greene, W.H., 2000. Econometric Analysis. fourth ed. Prentice Hall.
Hanemann, M., 1984. Discrete/continuous models of consumer demand. Econometrica 52 (3), 541–562.
McCulloch, R., Rossi, P., 1994. An exact likelihood analysis of the multinomial probit model. Journal of Econometrics 64, 217–228.
Mannering, F., Winston, C., 1985. A dynamic empirical analysis of household vehicle ownership and utilization. The RAND Journal of
Economics 16 (2), 215–236.
Nandram, B., Chen, M., 1996. Reparameterizing the generalized linear model to accelerate gibbs sampler convergence. Journal of
Statistical Computation and Simulation 54, 129–144.
Poirier, D., 1995. Intermediate Statistics and Econometrics: A Comparative Approach. MIT Press, Cambridge, MA.
Rossi, P., Allenby, G., 2003. Bayesian statistics and marketing. Marketing Science 22 (3), 304–328.
Srinivasan, S., Bhat, C.R., 2006. A multiple discrete–continuous model for independent- and joint- discretionary-activity participation
decisions. Transportation 33 (5), 497–515 (2006 TRB Special Issue).
Tobin, J., 1958. Estimation of relationships for limited dependent variables. Econometrica 26, 24–36.
Train, K.E., 1986. Qualitative Choice Analysis: Theory, Econometrics, and Application to Automobile Demand. MIT Press, Cambridge,
MA.
Train, K.E., 2003. Discrete Choice Methods with Simulation. Cambridge University Press, Cambridge, UK.
Webber, E.L., Forster, J.J., 2006. Bayesian model determination for multivariate ordinal and binary data. Working Paper, School of
Mathematics, University of Southampton, United Kingdom.
West, S., 2004. Distributional effects of alternative vehicle pollution control policies. Journal of Public Economics 88, 735–757.