Quantile Regression Through Linear Programming PDF
Quantile Regression Through Linear Programming PDF
programming
Anton Antonov
Mathematica for Prediction blog
Mathematica for Prediction project at GitHub
December 2013
Introduction
We can say that least squares linear regression corresponds to finding the mean of a
single distribution. Similarly, quantile regression corresponds to finding quantiles of a
single distribution. With quantile regression we obtain curves -- called “regression quan-
tiles” -- that together with the least squares regression curve would give a more complete
picture of the distributions (the y’s) corresponding to a set of x’s.
For a complete, interesting, and colorful introduction and justification to quantile regression
see [2]. An introduction and description of the major properties of quantile regression is
given in the Wikipedia entry [3].
In order to have a fast enough for practical purposes implementation of quantile regression
we need to re-cast the quantile regression problem as linear programming problem. (Such
a formulation is also discussed in [2].)
This document is mostly a guide for usage of the Mathematica package for quantile regres-
sion that is provided by the MathematicaForPrediction project at GitHub, see [1].
The second section provides theoretical background of the linear programming formulation
of the quantile regression problem. The third section shows examples of finding regression
quantiles using the function QuantileRegressionFit provided by [1]. The last section
describes profiling experiments and their results.
The motivational examples in the theoretical section, formulas (1) and (2), can be com-
pleted with more expansions and proofs. (Which will be done in the next version of the
document.)
Theory
We can formulate the quantile regression problem in way analogous to the formulation of
least squares (conditional mean) regression.
Consider a random variable Y having some distribution function F and a sample 8yi <ni=1 of
Y . The median of the set of samples 8yi <ni=1 can be defined as the solution of the minimiza-
tion problem
2 Quantile regression through linear programming.nb
Consider a random variable Y having some distribution function F and a sample 8yi <ni=1 of
Y . The median of the set of samples 8yi <ni=1 can be defined as the solution of the minimiza-
tion problem
n
min ‚ †yi - b§, b œ R . (1)
b
i=1
To see that the b which minimizes (1) is the median of 8yi <ni=1 , consider two points y1 < y2 .
Then †y1 - m§ + †m - y2 § = †y1 - y2 §, " m œ @y1 , y2 D, hence any m œ @y1 , y2 D minimizes (1). Using
the observation for two points we see that for three points y1 < y2 < y3 , m = y2 minimizes
(1). For four points y1 < y2 < y3 < y4 any m œ @y2 , y3 D minimizes (1). We can generalize
these observations and show that (1) gives the median for any set of points.
If we want to find q-th sample quantile of 8yi <ni=1 then we need to change (1) into
Consider a set of random variables Yi , i œ @1, nD, n œ N that are paired with a set of x-
coordinates X = 8xi <ni=1 . We have data of pairs 8xi , yi <ni=1 , where yi is a realization of Yi .
The linear regression problem can be formulated as
n
min ‚ Hyi - Hb0 + b1 xi LL2 . (3)
b0 ,b1
i=1
In order to convert (5) into a linear programming problem, let us introduce the non-negative
variables ui and vi for which the following equations are true:
yi - Hb0 + b1 xi L + ui = 0, i œ 8i : yi ¥ b0 + b1 xi <,
(6)
ui = 0, i – 8i : yi ¥ b0 + b1 xi <,
Hb0 + b1 xi L - yi + vi = 0, i œ 8i : yi < b0 + b1 xi <,
(7)
vi = 0, i – 8i : yi < b0 + b1 xi <.
Since ui and vi are greater than 0 on complementary sets, we can re-write (6) and (7)
simply as
yi - Hb0 + b1 xi L + ui - vi = 0, ui ¥ 0, vi ¥ 0, i œ @1, nD. (8)
Quantile regression through linear programming.nb 3
min ‚ q ui + ‚ H1 - qL vi . (9)
ui ,vi ,b0 ,b1
iœ8i:yi ¥b0 +b1 xi < iœ8i:yi <b0 +b1 xi <
The equations (8) and formula (10) are the linear programming formulation of the quantile
regression problem (5).
Note that ui vi = 0, " i œ @1, nD.
The quantile regression formulations (5), and (8) and (10) can be done for any model of Yi
that is a linear combination of functions over X not just for the linear model b0 + b1 X .
4 Quantile regression through linear programming.nb
Examples of usage
Package load
Load the package [1]:
In[1]:= Get@"~êMathFilesêMathematicaForPredictionêQuantileRegression.m"D
13
12
11
Out[3]=
10
8
x
50 100 150 200
y = b0 + b1 x + b2 x + b3 logHxL. (11)
Let us put the model functions for the regression fit in the variable funcs:
Quantile regression through linear programming.nb 5
We also apply Fit to the data and the model functions in order to compare the regression
quantiles with the least-squares regression fit:
In[8]:= fFunc = Fit@data, funcs, xD
Out[8]= 5.53539 + 0.349617 x - 0.0111451 x + 0.53445 Log@xD
Here is a plot that combines the found regression quantiles and least squares fit:
13
12 ·ϱH0.05,xL
·ϱH0.25,xL
11
·ϱH0.5,xL
Out[10]=
10 ·ϱH0.75,xL
·ϱH0.95,xL
9
Least squares
8
Let us check how good the regression quantiles are for separating the data according to
the quantiles they were computed for:
6 Quantile regression through linear programming.nb
quantile fraction
above
0.05 0.950833
0.25 0.749167
0.5 0.499167
0.75 0.249167
0.95 0.0491667
Robustness
Let us demonstrate the robustness of the regression quantiles with the data of the previous
example. Suppose that for some reason all the data y-values greater than 11.25 are
altered by multiplying them with a some greater than 1 factor, say, a = 1.2 . Then the
altered data looks like this:
In[14]:= a = 1.2;
dataAlt = Map@If@ÒP2T > 11.25, 8ÒP1T, a ÒP2T<, ÒD &, dataD;
ListPlot@dataAlt, AxesLabel Ø 8"x", "y"<,
PlotRange Ø All, ImageSize Ø 400D
y
16
14
Out[16]=
12
10
8
x
50 100 150 200
and let us also compute the least squares fit of the model (11):
In[19]:= fFuncAlt = Fit@dataAlt, funcs, xD
Out[19]= 6.36794 + 0.529118 x - 0.00904845 x - 0.0328413 Log@xD
Here is a plot that combines the functions found over the altered data:
16
14 ·ϱH0.05,xL
·ϱH0.25,xL
12
·ϱH0.5,xL
Out[21]=
·ϱH0.75,xL
·ϱH0.95,xL
10
Least squares
We can see that the new regression quantiles computed for 0.05, 0.25, and 0.5 have not
changed significantly:
In[23]:= qrFuncsP1 ;; 3T
Out[23]= 95.03476 + 0.00913189 x + 2.15707 µ 10-15 x + 0.98281 Log@xD,
4.98702 + 4.08128 µ 10-11 x + 1.69792 µ 10-13 x + 1.0654 Log@xD,
5.07619 + 2.84134 µ 10-12 x + 2.09895 µ 10-11 x + 1.11739 Log@xD=
In[24]:= qrFuncsAltP1 ;; 3T
Out[24]= 95.03476 + 0.00913189 x + 3.87407 µ 10-15 x + 0.98281 Log@xD,
4.98702 + 8.04435 µ 10-10 x + 5.84834 µ 10-12 x + 1.0654 Log@xD,
5.07619 + 2.49087 µ 10-13 x + 1.25704 µ 10-12 x + 1.11739 Log@xD=
ant that they are still good for separating the un-altered data:
8 Quantile regression through linear programming.nb
quantile fraction
above
0.05 0.950833
0.25 0.750833
0.5 0.499167
0.75 0.204167
0.95 0.0116667
Also we can see that the least squares fit of (11) has significantly changed:
In[26]:= fFunc
Out[26]= 5.53539 + 0.349617 x - 0.0111451 x + 0.53445 Log@xD
In[27]:= fFuncAlt
Out[27]= 6.36794 + 0.529118 x - 0.00904845 x - 0.0328413 Log@xD
Out[95]=
4
In[53]:= p ê 15 êê N
Out[53]= 0.20944
Next we need to find a guess for the phase. Again we use the second solution provided by
Solve:
In[54]:= Solve@8f > 0, Sin@f + 50 p ê 15D ã 1<, fD
Out[54]= ::f Ø ConditionalExpressionB
1
H-5 p - 12 p C@1DL, C@1D œ Integers && C@1D § -1F>,
6
1
:f Ø ConditionalExpressionB H7 p - 12 p C@1DL,
6
C@1D œ Integers && C@1D § 0F>>
1
In[55]:= 7 p êê N
6
Out[55]= 3.66519
Alternatively, we can simply use Manipulate and plot the data together with a model
function subject to different parameters change.
10 Quantile regression through linear programming.nb
In[96]:= Manipulate@
DynamicModule@8gr1, gr2<,
gr1 = ListPlot@dataSN, PlotRange Ø AllD;
gr2 = Plot@a Sin@f + b xD, 8x, 0, 140<, PlotStyle Ø Darker@RedDD;
Show@8gr1, gr2<D
D, 88a, 1<, 0.5, 10, 1<, 8b, 0, 2, 0.01<, 8f, 0, 30, 0.25<D
0.21
3.75
10
Out[96]=
8
-2
From the calculations we did so far we assume that the model for the data is
y = b0 + b1 x + b2 Sin@3.7 + x p ê 15D
Let us put the model functions for the regression fit in the variable funcs:
In[57]:= funcs = 81, x, Sin@3.7 + x p ê 15D<;
We find the regression quantiles:
Quantile regression through linear programming.nb 11
As in the previous example we also apply Fit to the data and the model functions in order
to compare the regression quantiles with the least-squares regression fit:
In[99]:= fFunc = Fit@dataSN, funcs, xD
px
Out[99]= 2.94185 + 0.0167552 x + 1.01842 SinB3.7 + F
15
Here is a plot that combines the functions found:
10
8
·ϱH0.05,xL
6
·ϱH0.25,xL
·ϱH0.5,xL
Out[106]=
4 ·ϱH0.75,xL
·ϱH0.95,xL
2
Least squares
quantile fraction
above
0.05 0.95
0.25 0.7505
0.5 0.5005
0.75 0.2495
0.95 0.0505
Quantile regression through linear programming.nb 13
Profiling
It is interesting the see timing profile of the computations with QuantileRegressionFit
across two axes: (i) data size and (ii) number of functions to be fit.
First we need to choose a family or several families of test data. Also, since Mathematica’s
function LinearProgramming has several methods it is a good idea to test with all of
them. Here I am going to show results only with one family of data and two
LinearProgramming methods. The data family is the skewed noise over a logarithmic
curve used as an example above. The first LinearProgramming method is Mathemati-
ca’s (default) “InteriorPoint”, the second method is “CLP” that uses the built-in COIN-OR
CLP optimizer. I run the profiling tests using one quantile 80.5< and five quantiles
80.05, 0.25, 0.5, 0.75, 0.95<, which are shown in blue and red respectively. I also run tests
with different number of model functions 91, x, x , Log@xD= and 81, x, Log@xD< but there
was no significant difference in the timings (less than 2%).
Clear@SinWithParabolaTrendD
SinWithParabolaTrend@
nPoints_Integer, start_?NumberQ, end_?NumberQD :=
BlockB8data<,
data = TableB
timingsLogarithmicCurveWithNoiseF4Q1 = Map@
8Length@ÒD, AbsoluteTiming@QuantileRegressionFit@Ò, modelFuncs,
x, 80.5<, Method Ø LinearProgrammingD;DP1T< &, dataSetsD
88500, 0.098310<, 81000, 0.298962<,
81500, 0.588580<, 82000, 0.987977<, 82500, 1.412103<,
83000, 1.961045<, 83500, 2.658844<, 84000, 3.302903<,
84500, 3.824936<, 85000, 4.812682<, 85500, 5.670178<,
86000, 7.059653<, 86500, 8.519677<, 87000, 9.950919<,
87500, 10.886583<, 88000, 12.807758<, 88500, 14.694095<,
89000, 16.205932<, 89500, 17.815996<, 810 000, 21.027593<<
Quantile regression through linear programming.nb 15
timingsLogarithmicCurveWithNoiseF4Q5 = Map@
8Length@ÒD, AbsoluteTiming@QuantileRegressionFit@Ò, modelFuncs,
x, qs, Method Ø LinearProgrammingD;DP1T< &, dataSetsD
88500, 0.549529<, 81000, 1.494088<,
81500, 2.591068<, 82000, 3.493207<, 82500, 5.204162<,
83000, 6.551883<, 83500, 7.290538<, 84000, 7.828214<,
84500, 11.299679<, 85000, 12.629508<, 85500, 13.832353<,
86000, 18.640458<, 86500, 19.818611<, 87000, 21.795818<,
87500, 23.999628<, 88000, 27.048659<, 88500, 30.410464<,
89000, 33.484113<, 89500, 38.420541<, 810 000, 38.569162<<
ListLogPlot@8timingsLogarithmicCurveWithNoiseF4Q1,
timingsLogarithmicCurveWithNoiseF4Q5<,
PlotStyle Ø 8PointSize@0.012D<,
PlotLegends Ø SwatchLegend@8Darker@BlueD, Darker@RedD<,
8"one quantile", "five quantiles"<D,
AxesLabel Ø Map@Style@Ò, LargerD &, 8"data size", "time,s"<D,
PlotLabel Ø Style@
"QuantileRegressionFit@__,Method->LinearProgrammingD\ntimings
per data size with four model functions\nfor
skewed noise over a logarithmic curve",
LargerD, PlotRange Ø All, ImageSize Ø 600D
QuantileRegressionFit@__,Method->LinearProgrammingD
timings per data size with four model functions
for skewed noise over a logarithmic curve
time,s
10.0
1.0
0.5
data size
2000 4000 6000 8000 10 000
Mean@timingsLogarithmicCurveWithNoiseF4Q1PAll, 2T ê
timingsLogarithmicCurveWithNoiseF4Q5PAll, 2TD
0.380121
0.50
0.45
0.40
0.35
0.30
0.25
0.20
timingsLogarithmicCurveWithNoiseCLPF4Q5 =
Map@8Length@ÒD, AbsoluteTiming@
QuantileRegressionFit@Ò, modelFuncs, x, qs, Method Ø
8LinearProgramming, Method Ø "CLP"<D;DP1T< &, dataSetsD
88500, 0.120797<, 81000, 0.411864<,
81500, 0.913235<, 82000, 1.595715<, 82500, 2.369918<,
83000, 3.433828<, 83500, 4.666453<, 84000, 6.038684<,
84500, 7.456060<, 85000, 9.125450<, 85500, 10.848903<,
86000, 13.103446<, 86500, 15.463412<, 87000, 17.807215<,
87500, 20.780654<, 88000, 23.596392<, 88500, 26.951688<,
89000, 31.034202<, 89500, 34.052749<, 810 000, 38.549676<<
ListLogPlot@8timingsLogarithmicCurveWithNoiseCLPF4Q1,
timingsLogarithmicCurveWithNoiseCLPF4Q5<,
PlotStyle Ø 8PointSize@0.012D<,
PlotLegends Ø SwatchLegend@8Darker@BlueD, Darker@RedD<,
8"one quantile", "five quantiles"<D,
AxesLabel Ø Map@Style@Ò, LargerD &, 8"data size", "time,s"<D,
PlotLabel Ø Style@
"QuantileRegressionFit@__,Method->8LinearProgramming,Method->\"
CLP\"<D\ntimings per data size with four model
functions\nfor skewed noise over a logarithmic curve",
LargerD, PlotRange Ø All, ImageSize Ø 600D
QuantileRegressionFit@__,Method->8LinearProgramming,Method->"CLP"<D
timings per data size with four model functions
for skewed noise over a logarithmic curve
time,s
10.0
5.0
one quantile
five quantiles
1.0
0.5
0.1
data size
2000 4000 6000 8000 10 000
Mean@timingsLogarithmicCurveWithNoiseCLPF4Q1PAll, 2T ê
timingsLogarithmicCurveWithNoiseCLPF4Q5PAll, 2TD
0.50362
0.54
0.52
0.50
0.48
Notes
It is interesting to note that the average ratio of the timings with 1 vs. 5 quantiles is 0.38 for
"InteriorPoint" and 0.5 for "CLP".
Out[71]=
20 40 60 80 100
-2
In[72]:= AbsoluteTiming@
qrFuncs = QuantileRegressionFit@dataSet, modelFuncs, x, qs, Method Ø
8LinearProgramming, Method Ø "CLP", Tolerance Ø 10^-14.0<D
D
px px
Out[72]= :0.852718, :-2.6634 + 2.62856 SinB F + 0.998518 SinB F,
200 10
px px
2.71231 - 2.98137 SinB F + 0.898953 SinB F,
200 10
px px
-2.6634 + 5.42374 SinB F + 0.597093 SinB F,
200 10
px px
-2.6634 + 7.41221 SinB F - 0.0484257 SinB F,
200 10
px px
-2.6634 + 21.3234 SinB F - 1.80728 SinB F>>
200 10
Quantile regression through linear programming.nb 21
10
Out[74]= 5
20 40 60 80 100
References
[1] Anton Antonov, Quantile regression Mathematica package, source code at GitHub,
https://github.com/antononcube/MathematicaForPrediction, package QuantileRegres-
sion.m, (2013).
[2] Roger Koenker, Gilbert Bassett Jr., “Regression Quantiles”, Econometrica, 46(1), 1978,
pp. 33-50.
JSTOR URL: http://links.jstor.org/sici?sici=0012-9682%28197801 %2946 %3 A1 %3 C33
%3 ARQ %3 E2 .0.CO%3 B2-J .
[3] Wikipedia, Quantile regression, http://en.wikipedia.org/wiki/Quantile_regression .
[4] Brian Cade, Barry Noon, “A gentle introduction to quantile regression for ecologists”,
Front. Ecol. Environ. 1(8), 2003, pp. 412–420.