An O(n2) Algorithm for Isotonic Regression

Burdakov, Oleg; Sysoev, Oleg; Grimvall, Anders; Hussian, Mohamed

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/226336475 An O(n2) Algorithm for Isotonic Regression Chapter · May 2006 DOI: 10.1007/0-387-30065-1_3 CITATIONS READS 17 44 4 authors: Oleg Burdakov Oleg Sysoev 46 PUBLICATIONS 281 CITATIONS 10 PUBLICATIONS 65 CITATIONS Linköping University SEE PROFILE Linköping University SEE PROFILE Anders Grimvall Mohamed A. Hussian 133 PUBLICATIONS 2,694 CITATIONS 22 PUBLICATIONS 115 CITATIONS Linköping University SEE PROFILE Cairo University SEE PROFILE All content following this page was uploaded by Oleg Burdakov on 27 November 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. An O(n2 ) Algorithm for Isotonic Regression Oleg Burdakov, Oleg Sysoev, Anders Grimvall and Mohamed Hussian Department of Mathematics, Linköping University, SE-58183 Linköping, Sweden ({olbur,olsys,angri,mohus}@mai.liu.se) Summary. We consider the problem of minimizing the distance from a given ndimensional vector to a set defined by constraints of the form xi ≤ xj . Such constraints induce a partial order of the components xi , which can be illustrated by an acyclic directed graph. This problem is also known as the isotonic regression (IR) problem. IR has important applications in statistics, operations research and signal processing, with most of them characterized by a very large value of n. For such large-scale problems, it is of great practical importance to develop algorithms whose complexity does not rise with n too rapidly. The existing optimization-based algorithms and statistical IR algorithms have either too high computational complexity or too low accuracy of the approximation to the optimal solution they generate. We introduce a new IR algorithm, which can be viewed as a generalization of the Pool-Adjacent-Violator (PAV) algorithm from completely to partially ordered data. Our algorithm combines both low computational complexity O(n2 ) and high accuracy. This allows us to obtain sufficiently accurate solutions to IR problems with thousands of observations. Key words: Quadratic programming, large scale optimization, least distance problem, isotonic regression, pool-adjacent-violators algorithm. 1 Introduction We consider the isotonic regression problem (IR) in the following least distance setting. Given a vector a ∈ Rn , a strictly positive vector of weights w ∈ Rn and a directed acyclic graph G(N, E) with the set of nodes N = {1, 2, . . . , n}, find x∗ ∈ Rn that solves the problem: Pn 2 min (1) i=1 wi (xi − ai ) s.t. xi ≤ xj ∀(i, j) ∈ E Since this is a strictly convex quadratic programming problem, its solution x∗ is unique. The optimality conditions and an analysis of the typical block (or 2 Oleg Burdakov, Oleg Sysoev, Anders Grimvall and Mohamed Hussian cluster) structure of x∗ can be found in [BC90, Lee83, PX99]. The monotonicity constraints defined by the acyclic graph G(N, E) imply a partial order of the components xi , i = 1, . . . , n. A special case of IR problem arises when there is a complete order of the components. This problem, referred to as IRC problem, is defined by a directed path G(N, E) and is formulated as follows: Pn 2 min (2) i=1 wi (xi − ai ) s.t. x1 ≤ x2 ≤ · · · ≤ xn IR problem has numerous important applications, for instance in statistics [BBBB72, DR82, Lee83], operations research [MM85, R86] and signal processing [AB98, RB93]. These applications are characterized by a very large value of n. For such large-scale problems, it is of great practical importance to develop algorithms whose complexity does not rise with n too rapidly. The existing optimization-based algorithms [DMT01] and statistical IR algorithms have either high computational complexity or low accuracy of the approximation to the optimal solution they generate. In this connection, let us quote a recent paper by Strand [Str03]: Unfortunately, in the case where m > 1 and at least some of the explanatory variables are continuous (i.e. typical multiple regression data), there is no practical algorithm to determine LSIR ... estimates where m stands for the number of explanatory variables. The case m > 1 in the least squares isotonic regression (LSIR) corresponds to problem (1), while the case m = 1 corresponds to (2). The most widely used method for solving IRC problem (2) is the so-called pool-adjacent-violators (PAV) algorithm [ABERS55, BBBB72, HPW73]. This algorithm is of computational complexity O(n) (see [BC90, PX99]). The PAV algorithm has been extended by Pardalos and Xue [PX99] to the special case of IR problem (1), in which the partial order of the components is presented by a directed tree. Several other special cases, in which the PAV algorithm is applied repeatedly to different subsets of the components, are considered in [BDPR84, DR82, SS97, Str03]. In [BC90], it was shown that some algorithms for problems (1) and (2) may be viewed as active set quadratic programming methods. The minimum lower set algorithm [Br55] is known to be the first algorithm proposed for the general IR problem. It was shown in [BC90] that this algorithm, being applied to IRC problem, is of complexity O(n2 ). Perhaps, the most widely used approaches for solving applied IR problems are based on simple averaging techniques [M88, MS94, Str03]. They can be easily implemented and enjoy a relatively low computational burden, but the quality of their approximations to x∗ are very case-dependent and can be far from optimal. The best known computational complexity for the general case of IR problem is O(n4 ), and it refers to an algorithm introduced in [MM85, R86]. This algorithm provides the exact solution to problem (1) by solving O(n) minimal flow problems. An O(n2 ) Algorithm for Isotonic Regression 3 The aim of this paper is to present our generalization of the PAV algorithm to the case of IR problem (1). It is referred to as the GPAV algorithm. Our numerical experiments shows that it enjoys both low computational burden and high accuracy. The computational complexity of our algorithm is O(n2 ). It must be emphasized here that, for graphs of some special structure, the number of non-redundant monotonicity constraints in (1) grows in proportion to n2 . In the case of the partial order presented by a directed tree, the GPAV algorithm is closely related to the algorithm of Pardalos and Xue [PX99]. These two algorithms provide the exact optimal solution, and they are of complexity O(n log n). The article is organized as follows. Section 2 describes the GPAV algorithm. Its O(n2 ) complexity is discussed in Section 3. Section 4 presents the results of our numerical experiments, which indicate that the GPAV algorithm provides much higher accuracy and has lower computational burden in comparison to the simple averaging algorithm. The numerical results complement those we reported in [BGH04, HGBS05] for some other test and applied problems. 2 Generalization of PAV algorithm We say that the component xi is adjacent to the component xj , if (i, j) ∈ E. According to [BC90, Lee83, PX99], the optimal solution to IR problem (1) is characterized by groups (blocks) of several adjacent components x∗i with one and the same value equal to the average a-value over the block they belong to. The GPAV algorithm generates some splitting of the set of nodes N into disjoint blocks of nodes. Since the values of components within one block are the same, each block is presented in the subsequent computations by one element called the head element. The block with the head element i ∈ N is denoted by Bi . The set of head elements is denoted by H. Obviously, H ⊆ N , and moreover, • • • Bi ⊆ N, ∀i ∈ H, ∪i∈H Bi = N , Bi ∩ Bj = ∅, ∀i, j ∈ H. The components of x associated with each block Bi , i ∈ H, have the following values P wk ak , ∀j ∈ Bi , xj = k∈Bi Wi where X Wi = wk k∈Bi is the weight of block Bi . 4 Oleg Burdakov, Oleg Sysoev, Anders Grimvall and Mohamed Hussian For node i ∈ N , denote the set of its immediate predecessors {j ∈ N : (j, i) ∈ E} by i− . The block Bi is said to be adjacent to block Bj , or called immediate predecessor for Bj , if there exist k ∈ Bi and l ∈ Bj such that k ∈ l− . Let Bi− denote the set of all adjacent blocks for Bi . The GPAV algorithm sets initially Bi = {i} and Bi− = i− for all the nodes i ∈ N , and subsequently operates with the blocks only. The GPAV algorithm deals with a reduced acyclic graph of blocks, which is initially identical to G(N, E), and which subsequently can shrink from step to step. This is because one block can absorb, under certain conditions, its adjacent block. This operation reduces the set of blocks and edges connecting them. The operation of absorbing Bi ∈ Bj− by Bj can be presented as follows. Algorithm ABSORB (i, j). 1. 2. 3. 4. 5. Set H = H \ {i}. Set Bj− = Bi− ∪ Bj− \ {i}. Set xj = (Wj xj + Wi xi ) / (Wj + Wi ). Set Bj = Bj ∪ Bi and Wj = Wj + Wi . For all k ∈ H such that i ∈ Bk− , set Bk− = Bk− \ {i} ∪ Bj . The GPAV algorithm returns a feasible approximation x̂ to the optimal solution x∗ . Our numerical experiments show that the accuracy of this approximation is sufficiently high. The GPAV algorithm begins with the set of untreated blocks U = N , and then treats the blocks in U one by one. This takes n steps. After each treatment step, the values assigned to the components xi , i ∈ H \ U , ensure the fulfillment of those of the original monotonicity constraints, which involve only the components composing the blocks Bi , i ∈ H \ U . It is typical in practice that these values are optimal solutions to problem (1) reduced to the variables xi ∈ N \ U . If not optimal, the values of these components are reasonably close to the optimal ones. We call Bj a lower block of U , if the set {i ∈ U : i ∈ Bj− } is empty, i.e., if U contains no blocks adjacent to Bj ∈ U . The block to be treated, say Bj , is chosen in the GPAV algorithm among the lower elements of the partially ordered set U . Then Bj can be enlarged by absorbing, one by one, some of its adjacent blocks in H \ U . If the common value of the components of x in the new block violates any of the monotonicity constraints involving its adjacent blocks, the new block absorbs the one corresponding to the mostly violated constraint. This operation is repeated until all the constraints involving the new block and its adjacent ones are satisfied. Our generalization of the PAV algorithm can be presented as follows. An O(n2 ) Algorithm for Isotonic Regression 5 Algorithm GPAV. 1. 2. 3. 4. 5. 6. 7. 8. Set H = N and U = N . For all i ∈ N , set Bi = {i}, Bi− = i− , xi = ai and Wi = wi . While U 6= ∅, do: Find any of the lower elements of U , say, j. Set U = U \ {j}. While there exists k ∈ Bj− such that xk ≥ xj , do: Find i ∈ Bj− such that xi = max{xk : k ∈ Bj− }. ABSORB (i, j). For all k ∈ H and i ∈ Bk , set x̂i = xk . This algorithm can be viewed as a generalization of the PAV algorithm, because they coincide for IRC problem (2). Moreover, the GPAV algorithm is closely related to the algorithm of Pardalos and Xue [PX99], when the graph G(N, E) in IR problem (1) is a directed tree. 3 Complexity A detailed proof of the fact that the GPAV algorithm is of computational complexity O(n2 ) will be presented in a separate paper. Here we just outline the basic properties of our algorithm which will be used in the proof. Obviously, Steps 1, 2 and 8 of Algorithm GPAV take O(n) time. Step 4 is repeated n times, and each time the complexity of finding a lower element of U does not exceed O(|U |). Thus, all the operations related to Step 4 are of complexity O(n2 ). Algorithm ABSORB can be implemented in O(n) time. The number of times that it is called on Step 7 of Algorithm GPAV does not exceed n, the total number of nodes. The while-loop presented by Steps 5-7 can not be repeated more than 2n times, because every computation of the while-condition either terminates (n times at most) the treatment of the new block, or allows (n times at most) the new block to absorb one of its adjacent blocks. At Steps 5 and 6, each operation of finding the maximal value of the head components of the blocks adjacent to the new one takes O(n) time. Therefore, the total number of operations associated with the while-loop can be estimated as O(n2 ). The reasoning above explains why the GPAV algorithm is of computational complexity O(n2 ). Notice that the complexity is of the same order as the possible maximal number of non-redundant constraints in (1), which is also estimated as O(n2 ). 4 Numerical results We used MATLAB for implementing the GPAV algorithm, the simple averaging algorithm described in [M88], and a modification of the GPAV algorithm. They are referred to as GPAV, SA and GPAV+, respectively. 6 Oleg Burdakov, Oleg Sysoev, Anders Grimvall and Mohamed Hussian In the modification mentioned above, we run GPAV twice. First, we run it on the original problem (1), which returns an approximation x̂I to the optimal solution. Second, we run it on the problem Pn 2 min i=1 wi (x̄i − āi ) s.t. x̄i ≤ x̄j ∀(i, j) ∈ Ē which is equivalent to (1) for āi = −ai , x̄i = −xi and for the same edges in Ē as in E, but inverted. Let x̂II denote the resulting approximation to the optimal solution of (1). Denote the convex linear combination of these two approximations as x(λ) = λx̂I + (1 − λ)x̂II , where the scalar λ ∈ [0, 1]. Since the feasible region in (1) is a convex set, x(λ) does not violate the monotonicity constraints. Then GPAV+ returns an approximation to the optimal solution of problem (1) that corresponds to the optimal solution of the simple onedimensional problem: min 0≤λ≤1 n X wi (xi (λ) − ai )2 . i=1 In our test IR problems we set wi = 1, i = 1, . . . , n. The test problems were based on the model a = U1 + U2 + ǫ, where U1 , U2 are two explanatory variables (m = 2) and ǫ is an error. Samples of n = 80, n = 400 and n = 2000 observations i U1 i U = , ai = U1i + U2i + ǫi , i = 1, . . . , n, U2i were generated for normally distributed explanatory variables and for normally, exponentially or double-exponentially distributed error. Thereafter, for each i, j ∈ N such that U i ≤ U j component-wise, we generated the monotonicity constraint xi ≤ xj . In this way we constructed the set of edges E = {(i, j) : U i ≤ U j , i, j ∈ N }, (3) that, along with a1 , . . . , an , define problem (1). Then we reduced E by eliminating all the redundant constraints, i.e., we eliminated the edge (i, j), if there existed k ∈ N such that (i, k) ∈ E and (k, j) ∈ E. We also performed a topological sort [CLRS01] of our directed acyclic graph to assure that (i, j) ∈ E implies i < j. The topological sort allowed to skip the search for a lower element of U at Step 4 of Algorithm GPAV; instead, the blocks were treated in the natural order B1 , B2 , . . . , Bn . For the future numerical experiments, we plan to generate in an efficient way the reduced set of edges directly from the data, without generating the complete set of edges (3). This will produce simultaneously a topological sort of our graph. An O(n2 ) Algorithm for Isotonic Regression 7 The numerical results presented here were obtained on a PC running under Windows XP with a Pentium 4 processor (2.8 GHz). We compare the objective function value of problem (1), obtained by GPAV, GPAV+ and SA, with the optimal objective function value obtained by MATLAB solver lsqlin. Tables 1, 2 and 3 summarize the obtained performance data, where we use the notation N/A = lsqlin failed to solve problem within given time limits, sum = objective function value (sum of squares), cpu = CPU time (seconds) until termination, r.e.% = relative error (%) calculated as follows, sumA − sum lsqlin · 100% sum lsqlin where A stands for GPAV, GPAV+ or SA. Table 1. Normally distributed errors. n GPAV GPAV+ SA lsqlin 80 400 2000 sum cpu r.e.% sum cpu 41.966 41.315 52.722 40.297 0.016 0.048 0.047 0.657 4.14 2.52 30.83 0.00 279.819 0.172 279.496 0.562 356.515 0.875 276.589 263.969 r.e.% sum cpu 1.16 1.05 28.89 0.00 1670.699 1657.814 2132.418 N/A 6.532 19.579 18.875 N/A Table 2. Exponentially distributed errors. n GPAV GPAV+ SA lsqlin 80 400 2000 sum cpu r.e.% sum cpu 19.475 19.365 23.601 18.995 0.032 0.062 0.047 0.390 2.52 1.94 24.24 0.00 161.177 0.171 160.631 0.655 222.578 0.891 159.983 233.766 r.e.% sum cpu 0.74 0.40 39.12 0.00 1445.026 1439.493 1726.989 N/A 6.375 19.359 18.750 N/A The three tables show that GPAV required less CPU time than SA for producing much more accurate approximation to the optimal solution. Moreover, GPAV+ always improved the relative error produced by GPAV, requiring for this about three times more CPU time. The linear combination of the approximations generated by GPAV+ and SA did not result in any appreciable improvement of the accuracy. All these observations are in good agreement with the numerical results reported in [BGH04, HGBS05] for some other test and applied problems. 8 Oleg Burdakov, Oleg Sysoev, Anders Grimvall and Mohamed Hussian Table 3. Double-exponentially distributed errors. n GPAV GPAV+ SA lsqlin 80 400 2000 sum cpu r.e.% sum cpu r.e.% 51.190 51.105 70.043 50.716 0.015 0.047 0.031 0.484 0.93 0.76 38.10 0.00 604.085 0.266 1.56 598.177 0.702 0.56 1406.101 0.844 136.40 594.787 365.000 0.00 sum cpu 3344.406 3317.172 7488.342 N/A 6.266 18.219 18.250 N/A 5 Conclusions The presented generalization of the PAV algorithm allows us now to obtain sufficiently accurate solutions to the isotonic regression problems with thousands of observations. The lack of the complete order in the data is not that limiting now as it was previously. In our future work, we plan to improve the presented version of the GPAV algorithm in the following way. When the algorithm is running, it is not difficult to check whether the x-components of the new block produced on Steps 5-7 are suspected to be nonoptimal. Then such block can be split in smaller ones in order to make the corresponding values of the x-components more close, or even equal, to the optimal values. 6 Acknowledgements The authors are grateful for financial support from the Swedish Research Council. References [AB98] Acton, S.A., Bovik, A.C.: Nonlinear image estimation using piecewise and local image models. IEEE Transactions on Image Processing, 7, 979–991 (1998) [ABERS55] Ayer, M., Brunk, H.D., Ewing, G.M., Reid, W.T., Silverman, E.: An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics, 26, 641–647 (1955) [BBBB72] Barlow, R.E., Bartholomew, D.J., Bremner, J.M., Brunk, H.D.: Statistical inference under order restrictions. Wiley, New York (1972) [BC90] Best, M.J., Chakravarti, N.: Active set algorithms for isotonic regression: a unifying framework. Mathematical Programming, 47, 425–439 (1990) [BDPR84] Bril, G., Dykstra, R., Pillers, C., Robertson, T.: Algorithm AS 206, isotonic regression in two independent variables. Applied Statistics, 33, 352– 357 (1984) An O(n2 ) Algorithm for Isotonic Regression [Br55] 9 Brunk, H.D.: Maximumlikelihood estimates of monotone parameters. Annals of Mathematical Statistics, 26, 607–616 (1955) [BGH04] Burdakov, O., Grimvall, A., Hussian, M.: A generalised PAV algorithm for monotonic regression in several variables. In: Antoch, J. (ed.) COMPSTAT, Proceedings in Computational Statistics, 16th Symposium Held in Prague, Czech Republic. Physica-Verlag, A Springer Company, Heidelberg New York, 761–767 (2004) [CLRS01] Cormen, T. H., Leiserson, C.E., Rivest, R. L., Stein, C.: Introduction to Algorithms. The MIT Press, Cambridge MA (2001) [DMT01] De Simone, V., Marino, M., Toraldo, G.: Isotonic regression problems. In: Floudas, C.A., Pardalos, P.M. (eds) Encyclopedia of Optimization. Kluwer Academic Publishers, Dordrecht London Boston (2001) [DR82] Dykstra, R., Robertson, T.: An algorithm for isotonic regression for two or more independent variables. he Annals of Statistics, 10, 708–716 (1982) [HPW73] Hanson, D.L., Pledger, G., Wright, F.T.: On consistency in monotonic regression. The Annals of Statistics, 1, 401–421 (1973) [HGBS05] Hussian, M., Grimvall, A., Burdakov, O., Sysoev, O.: Monotonic regression for the detection of temporal trends in environmental quality data. MATCH Commun. Math. Comput. Chem., 54, 535–550 (2005) [Lee83] Lee, C.I.C.: The min-max algorithm and isotonic regression. The Annals of Statistics, 11, 467–477 (1983) [MM85] Maxwell, W.L., Muchstadt, J.A.: Establishing consistent and realistic reorder intervals in production-distribution systems. Operations Research, 33, 1316–1341 (1985) [M88] Mukarjee, H.: Monotone nonparametric regression. The Annals of Statistics, 16, 741–750 (1988) [MS94] Mukarjee, H., Stern, H.: Feasible nonparametric estimation of multiargument monotone functions. Journal of the American Statistical Association, 425, 77–80 (1994) [PX99] Pardalos, P.M., Xue, G.: Algorithms for a class of isotonic regression problems. Algorithmica, 23, 211–222 (1999) [RB93] Restrepo, A., Bovik, A.C.: Locally monotonic regression. IEEE Transactions on Signal Processing, 41, 2796–2810 (1993) [R86] Roundy, R.: A 98% effective lot-sizing rule for a multiproduct multistage production/inventory system. Mathematics of Operations Research, 11, 699–727 (1986) [SS97] Schell, M.J., Singh, B.: The reduced monotonic regression method. Journal of the American Statistical Association, 92, 128–135 (1997) [Str03] Strand, M.: Comparison of methods for monotone nonparametric multiple regression. Communications in Statistics - Simulation and Computation, 32, 165–178 (2003)

Log In

An O(n2) Algorithm for Isotonic Regression

An O(n2) Algorithm for Isotonic Regression

An O(n2) Algorithm for Isotonic Regression

An O(n2) Algorithm for Isotonic Regression

An O(n2) Algorithm for Isotonic Regression

Related Papers

RELATED PAPERS