DEA With Stata
DEA With Stata
DEA With Stata
1 Introduction
In this article, we introduce a new application in Stata for performance measurement
of decision-making units (DMUs) using data envelopment analysis (DEA) techniques.
DEA is a nonparametric linear programming method for assessing the efficiency and
productivity of DMUs. DEA application areas have grown since it was first introduced
as a managerial and performance measurement tool in the late 1970s. Since then,
new applications with more variables and complicated models have been and are being
introduced.
Stata equipped with the dea command will provide the user with a new nonpara-
metric tool to analyze productivity data. From within Stata, users will be able to
produce DEA scores and analyze them. Because second-stage DEA analysis and DEA
efficiency estimates involve statistical inference, DEA users need a software package that
can analyze the whole process in one system. The alternative is juggling between a DEA
command and a statistical software that uses the DEA scores as dependent variables to
find the influential variables in second-stage DEA analysis.
The main purpose of this article is to implement the dea command in Stata. The
article unfolds as follows. The next section describes the DEA models and calculations
in DEA. The remainder of this article illustrates the features and options of the dea
command.
c 2010 StataCorp LP st0193
268 Data envelopment analysis
E
Y(output) 3
C D
2
B0 B1 B
1
B2
A
VRS Frontier
0 A1 B3
0 1 2 3 4 5
X(input)
Modified from Coelli et al., (2005, 174) and Cooper et al., (2006, 128)
It is possible to decompose the CRS technical inefficiency into scale efficiency and
“pure” technical efficiency. In figure 1, B2 B contributes to the technical efficiency of
point B regarding the VRS model, and B1 B contributes to the technical efficiency of
point B regarding the CRS model. Then B1 B2 contributes to scale efficiency.
A G
5
B
4
3
X2/Y
G1
C H
2
H1
F
D E
1
0
0 1 2 3 4 5
X1/Y
Modified from Coelli et al., (2005, 197) and Cooper et al., (2006, 57)
distance to the frontier. For example, firms that are technically inefficient operate at
points in the interior of the frontier, while those that are technically efficient operate
somewhere along the technology defined by the frontier. The DMU is called efficient when
the DEA score equals 1 and all slacks are 0 (Cooper, Seiford, and Tone 2006). If only the
first condition is satisfied, the DMU is called efficient in terms of “radial”, “technical”,
and “weak” efficiency. If both conditions are satisfied, the DMU is called efficient in
terms of “Pareto–Koopmans” or “strong” efficiency. The technical efficiencies of DMUs
G and H are defined as OG1 /OG and OH1 /OH, respectively.
Inefficiency can be seen as how much the inputs must contract along a ray from
the origin until the ray crosses the frontier. For example, for firm G, the measure of
technical efficiency is OG1 /OG. Point G1 is the Farrell efficient point; however, input
X2 could be further reduced and still produce the same output. For this case, firm G
has input slack CG1 . If we disregard the slack and calculate it residually, the DEA model
becomes the single-stage DEA model. The way to reduce the slack and find the Pareto
optimal reference set can be further discussed; there are two-stage and multistage DEA
models available in the literature (Cooper, Seiford, and Tone 2006; Coelli et al. 2005).
Cherchye and Puyenbroeck (2001) showed that “most representative efficient points”
can be found using a direct approach and may differ from those obtained by multistage
DEA. The DEA model in this article provides stage options for single-stage and two-stage
which are still the most prevalent approaches in DEA literature. Reference or peer is a
point that an inefficient DMU, such as point G, targets to move from the Farrell efficient
point, such as point G1 , to the Pareto–Koopman efficient point, such as point C, in
figure 2 (based on the two-stage DEA solution or Pareto optimal solution). However,
the slack issues in DEA models disappear as the number of DMUs increases because the
DEA piecewise linear frontier becomes smoother and has fewer chances to run the Farrell
point to the input or output axes.
Free disposability means that one can produce the same output by wasting resources
or increase the output without increasing resources. Strong disposability assumes that
it is costless for firms to dispose of inputs or outputs or the isoquant does not bend
backwards. In figure 2, the line that links A and F represents the frontier imposed
by weak disposability. The line that links B and E represents the frontier imposed by
strong disposability.
Assuming the economic production activities, convexity, strong disposability, and
CRS, we can develop the linear program as a type of piecewise linear frontier. Input-
oriented CRS efficiency is defined as (1) by applying the piecewise linear frontier to the
input requirement set (Cooper, Seiford, and Tone 2006). This enables us to evaluate
the efficiency relative to the frontier.
max z = uyj (1)
ν,u
the virtual input, relative to a given virtual output, subject to the constraint that no
DMU can operate beyond the production possibility set and the constraint relating to
nonnegative weights. In practice, most of the available DEA programs use the dual forms
as expressed in (2), which lower the calculation burden and are virtually the same as
(1).
min θ (2)
θ,λ
minθ θ (3)
minλ,s+ ,s− −s+ − s− (4)
where ivars and ovars are input and output variable lists, respectively.
3.2 Description
dea requires the user to select the input and output variables from the user-designated
data file or in the dataset currently in memory and solves DEA models with the specifica-
tions set in the options specified. There are several options to enhance the models. The
user can select the desired options according to the particular model that is required.
The dea command requires an initial dataset that contains the input and output
variables for observed DMU. Variable names must be identified by ivars for input vari-
ables and by ovars for output variables so that the dea command can identify and handle
the multiple input–output dataset. In the output of the dea command, the prefix dmu:
precedes DMU names.
272 Data envelopment analysis
The command has the ability to accommodate an unlimited number of inputs and
outputs with an unlimited number of DMUs. The only limitation is the available com-
puter memory. The resulting file reports information including reference points and
slacks in the DEA model. This information can be used to analyze the inefficient DMU,
for example, the source of the inefficiency and how an inefficient unit could be improved
to the desired level.
saving(filename) creates filename.dta, which contains the results of dea, including
information about the DMUs, inputs and outputs the data used, ranks of DMUs, efficiency
scores, reference sets, and slacks. The log file dea.log will be created in the working
directory.
Based on the data and the options specified, the dea command conducts matrix
operations and linear programming to produce a results dataset that is available to
print or can be used for further analysis.
3.3 Options
rts(crs | vrs | drs | nirs) specifies the returns to scale. The default, rts(crs), specifies
constant returns to scale. rts(vrs), rts(drs), and rts(nirs) specify variable
returns to scale, decreasing returns to scale, and nonincreasing returns to scale,
respectively.
ort(in | out) specifies the orientation. The default is ort(in), meaning input-oriented
DEA. ort(out) is output-oriented DEA.
stage(1 | 2) specifies the way to identify all efficiency slacks. The default is stage(2),
meaning two-stage DEA. stage(1) is single-stage DEA.
trace specifies to save all the sequences displayed in the Results window in the dea.log
file. The default is to save the final results in the dea.log file.
saving(filename) specifies that the results be saved in filename.dta. If filename.dta
already exists, the existing data will be moved to the file filename bak DMYhms.dta
before the new data are saved in filename.dta.
4 Applications of dea
4.1 Data
This section provides examples using data from Cooper, Seiford, and Tone (2006, 75,
table 3.7) and Coelli et al. (2005, 175, table 6.4) for illustration of the dea command.
The data of Cooper, Seiford, and Tone (2006) consist of five stores that use two inputs—
i employees (number of employees as an input variable) and i area (the area of floor
as an input variable)—to produce two outputs: o sales (the volume of sales as an
output variable) and o profits (the volume of profits as an output variable). The
data of Coelli et al. (2005) consist of five firms that use one input, i 1, to produce one
output, o 1.
. use cooper_table3.7.dta
. dea i_employee i_area = o_sales o_profits
. use coelli_table6.4.dta
. dea i_x = o_q, rts(crs) ort(o) stage(1)
The rank of DMUs and efficiency score (theta), as well as the residually given ref-
erence set (ref:) and the slacks (islack: or oslack:), are listed in the above results.
Store C is the only efficient DMU and seems to be the referent for all other stores. The
Results window will display the above result, and a dea.log file that contains the above
result will be created in your working directory. The “.” in the results table represent
small numbers less than 10 to the minus 12 power, which mostly can be ignored. How-
ever, sometimes when you want to analyze financial data, the distinction between zero
and “.” values may be required to maintain accuracy.
Note that the efficiency score (theta) of DMU B is 0.625, and DMUs A and C are the
reference DMUs for DMU B. The sum of the reference weights should equal 1 because
n
rts(vrs) specifies that j=1 λj = 1. The sum of the reference weights for DMU B equals
1 from (λA , λB , λC , λD , λE ) = (0.5, 0, 0.5, 0, 0). And note that the efficiency scores of
all the DMUs are not changed from the single-stage model shown in the previous section
to the current two-stage analysis because there is no slack in this case. This means
that the slack level of profits has no effect on the efficiency evaluation. Stores A, C,
and E are the efficient points that inefficient DMUs (B and D) can target to move in
input-oriented DEA calculation.
Here we analyze data from Coelli et al. (2005, 175) using the saving() option:
Specifying saving(coelli 6.4 results) will save the VRS frontier results, shown
above, as coelli 6.4 results.dta. The results match the Pareto–Koopman solution
of Coelli et al. (2005, 176, table 6.5) because slack has no role in this case.
The results show that the rnd (R&D) level is positively related with the CRS effi-
ciency scores of DMUs at the 1% level of significance.
5 Conclusion
Today, many academic researchers recognize Stata as one of the leading packages for
statistical analysis; however, there are still uncovered areas that managerial organiza-
tions are interested in. In particular, optimization procedures in Stata can be further
developed to fill in the gaps between parametric and nonparametric analysis. The dea
command introduced in this article is a new application in Stata and is a powerful
managerial tool for measuring the efficiency and productivity of DMUs.
Y. Ji and C. Lee 279
The dea command application has several advantages, including the following:
• It can be used by Stata users with no extra cost for DEA software.
• It provides Stata with managerial tools for reports and statistical analysis, as well
as optimization procedures.
• The dea command report files can directly feed to other Stata routines for further
analysis.
6 Acknowledgments
We thank H. Joseph Newton and an anonymous reviewer for comments.
7 References
Banker, R. D., A. Charnes, and W. W. Cooper. 1984. Some models for estimating
technical and scale inefficiencies in data envelopment analysis. Management Science
30: 1078–1092.
Cooper, W. W., L. M. Seiford, and K. Tone. 2000. Data Envelopment Analysis: A Com-
prehensive Text with Models, Applications, References and DEA-Solver Software. 2nd
ed. New York: Springer.
———. 2006. Introduction to Data Envelopment Analysis and Its Uses. New York:
Springer.
Lee, C., J. Lee, and T. Kim. 2009. Innovation policy for defense acquisition and dynamics
of productive efficiency: A DEA application to the Korean defense industry. Asian
Journal of Technology Innovation 17: 151–171.
280 Data envelopment analysis