Birge J.R., Louveaux F. Introduction to Stochastic Programming

Подождите немного. Документ загружается.

204 5 Two-Stage Recourse Problems

Iteration 2: The regularized master is

min

(x + 0.5)

s. t.

≥ 0 ,

≥−

with solution x

= 0.25 :

= 0,

= −3/16 . A cut

≥ 0 is added. As

Q(0.25)=0 < Q(a

) , a

= 0.25 (approximate serious step 1).

Iteration 3: The regularized master is

min

(x −0.25)

s. t.

≥ 0 ,

≥−

x ,

≥ 0,

with solution x

= 0.25 ,

= 0,

= 0 . Because

= Q(a

) , a solution is

found.

In Exercise 1, the L -shaped and multicut methods are compared. The value of a

starting point is given in Exercise 2.

We now describe the main results needed to prove convergence of the regularized

decomposition to an optimal solution when it exists. For notational convenience,

we drop the ﬁrst-stage linear terms c

x in the rest of the section. This poses no

theoretical difﬁculty, as we may either deﬁne

= p

x + Q

(x)) , k = 1,...,K

or add a (K + 1) -th term

K+1

= c

x . With this notation, the original problem can

be written as

min Q(x)=

∑

k=1

(x) (2.2)

s. t. (5.1.2), x ≥0 ,

and Q

(x)=min{q

y |Wy = h

−T

x,y ≥0} . This is equivalent to

min e

∑

k=1

(2.3)

s. t. (5.1.2), (5.1.3), (5.1.4), x ≥0 ,

provided all possible cuts (5.1.3)and(5.1.18) are included.

The regularized master program is

min

(x,

∑

k=1

x −a



(2.4)

s. t. (5.1.2), (5.1.3), (5.1.18), x ≥ 0 .

5.2 Regularized Decomposition 205

Note, however, that in the regularized master, only some of the potential cuts (5.1.3)

and (5.1.18) are included. We follow the proof in Ruszczy´nski [1986].

Lemma 3. e

≤

) ≤ Q(a

) .

Proof: The ﬁrst inequality simply comes from x

−a



≥ 0 . We then ob-

serve that a

always satisﬁes (5.1.2), (5.1.3), as a

is feasible and the seri-

ous steps always pick feasible a

’s. The solution (a

) obtained by choosing

= p

) , k = 1,...,K necessarily satisﬁes all constraints (5.1.18)as

a lower bound on p

(·) . Thus,

) ≤

)=Q(a

) .

Lemma 4. If the algorithm stops at Step 1, then a

solves the original problem

(2.2).

Proof: By Lemma 3 and the optimality criterion, e

= Q(a

) (remember the

linear term c

x has been dropped). It follows that e

) ,which

implies x

−a



= 0 , hence x

= a

. Thus, a

solves the regularized master

(2.4) with the cuts (5.1.3)and(5.1.18) available at iteration

. The cone of feasi-

ble directions at a

does not include any direction of descent of

(x,

) .The

cone of feasible directions at x

for problem (2.3) is included in the cone of fea-

sible directions at iterations

of the regularized master (2.4) contains fewer cuts).

Moreover, the gradient of the regularizing term vanishes at a

. Thus, the descent

directions of the regularized program (2.4) are the same as the descent directions of

(2.3). Hence, a

solves (2.3), which means a

solves the original program (2.2).

Lemma 5. If there is a null step at iteration

,then

) >

) .

Proof: Because the objective function of the regularized master is strictly convex,

program (2.4) has a unique solution. A null step at iteration

may be either a null

infeasible step or a null feasible step. In the ﬁrst case, a cut (5.1.3) is added that ren-

ders x

infeasible. In the second case, a cut (5.1.18) is added that renders (x

)

infeasible. Thus, as the previous solution becomes infeasible and the solution is

unique, the objective function necessarily increases.

Lemma 6. If the number of serious steps is ﬁnite, the algorithm stops at Step 1.

Proof: If the number of serious steps is ﬁnite, there exists some

such that

= a

for all

≥

. By Lemma 5, this implies the objective function of the

regularized master strictly increases at each iteration

≥

. Because there are

only ﬁnitely many possible cuts (5.1.3)and(5.1.18), the algorithm must stop.

Lemma 7. The number of approximate serious steps is ﬁnite.

206 5 Two-Stage Recourse Problems

Proof: By deﬁnition of Step 5, the value of Q(·) does not increase in an approx-

imate serious step (remember that the term c

x is dropped here). Approximate se-

rious steps only happen when Q(x

) = e

. This can only happen ﬁnitely many

times because the number of cuts (5.1.18) is ﬁnite.

Lemma 8. If the algorithm does not stop, then either Q(a

) tends to −∞ as

→ ∞ or the sequence {a

} converges to a solution of the original problem.

Proof: (i) Let us ﬁrst consider the case in which the original problem has solution

ˆx .Deﬁne

= p

( ˆx) . Thus ( ˆx,

) solves (2.3). Also ( ˆx,

) must be

feasible for the regularized master for all

. Because (x

) is the solution of the

regularized master at iteration

, the derivative of

at (x

) in the direction

( ˆx −x

−

) must be non-negative, i.e.,

−a

)

( ˆx −x

)+e

−e

≥ 0

−a

)

− ˆx) ≤ Q( ˆx) −e

, (2.5)

because e

= Q( ˆx) .

Let S be the set of iterations at which serious steps occur. In view of Lemma 7,

without loss of generality, we may consider such a set where all serious steps are

exact. Because, for an exact serious step, e

= Q(x

) ,(5.1.19) does not hold

for any k ,and x

= a

by deﬁnition of the step, for all

∈ S ,(2.5)maybe

rewritten as

−a

)

− ˆx) ≤ Q( ˆx) −Q(a

) .

By properties of sums of sequences,

a

− ˆx

= a

− ˆx

+ 2(a

−a

)

− ˆx) −a

−a



By dropping the last terms and using the inequality, for all

∈S ,

a

− ˆx

≤a

− ˆx

+ 2(a

−a

)

− ˆx) (2.6)

≤a

− ˆx

+ 2(Q( ˆx) −Q(a

)) .

Because Q( ˆx) ≤ Q(a

) for all

, a

− ˆx≤a

− ˆx, i.e., the sequence

} is bounded.

Now (2.6) can be rearranged as

2(Q(a

) −Q( ˆx)) ≤a

− ˆx

−a

− ˆx

Summing up both sides for

∈ S , it can be seen that

∑

∈S

(Q(a

) −Q( ˆx)) < ∞ ,

5.2 Regularized Decomposition 207

which implies Q(a

) →Q( ˆx) for some subsequence {a

∈S

where S

⊆

S . Therefore, there must exist an accumulation point ˆa of {a

} with Q( ˆa)=

Q( ˆx) .All a

are feasible, hence ˆa is feasible and ˆa may substitute for ˆx in (2.6)

implying a

− ˆa≤a

− ˆa, which shows that ˆa is the only accumulation

point of {a

ii) Now assume that the original problem is unbounded but {Q(a

)} is bounded.

Thus one can ﬁnd a feasible ˆx and an

> 0 such that Q( ˆx) ≤ Q(a

) −

, ∀

Then (2.6)gives a

− ˆx

≤a

− ˆx

−2

, which yields a contradiction as

→ ∞ ,

∈ S .

Lemma 9. If the algorithm does not stop and Q{a

} is bounded, there exists

such that if a serious step occurs at

≥

, then the solution (x

) of (2.4) is

also a solution of (2.4) without the regularizing term.

Proof:LetK

denote the set of (x,

) that satisfy all constraints (5.1.2), (5.1.3),

(5.1.18) at iteration

. The problem (2.4) without the regularizing term is thus:

min e

(2.7)

s. t. (x,

) ∈ K

Assume Lemma 9 is false. It is thus possible to ﬁnd an inﬁnite set S such that, for

all

∈S , a serious step occurs and the solution (x

) to (2.4) is not optimal for

(2.7).

Let K

∗

denote the normal cone to the cone of feasible directions for K

) . Nonoptimality of (x

) means that the negative gradient of the objec-

tive in (2.7), −d =



−e



∈K

∗

. As this holds for all

∈ S ,

−d ∈∪

∈S

∗

. (2.8)

Now K

is polyhedral. There can only be a ﬁnite number of constraints (5.1.2)and

cuts (5.1.3)and(5.1.18). Thus, the right-hand-side of (2.8) is the union of a ﬁnite

number of closed sets and, hence, is closed. There exists an

> 0 such that

B(−d,

) ∩K

∗

= /0 , ∀

∈ S (2.9)

where B(−d,

) denotes the ball of radius

centered at −d . On the other hand,

) solves (2.4); hence,

−∇

) ∈ K

∗

, ∀

∈S . (2.10)

By Lemma 8, a

→ ˆx . By Lemma 7, there exists a

such that for

≥

= Q(a

) for all serious steps. Hence, at serious steps

≥

,wehave

Q(a

) ≥

a

−x



+ e

208 5 Two-Stage Recourse Problems

x

−a



+ Q(a

) .

This implies x

→ a

, ∀

∈ S . Hence,

∇

) → d ∀

∈ S ,

and (2.10) contradicts (2.9).

Theorem 10. If the original problem has a solution, then the algorithm stops after

a ﬁnite number of iterations. Otherwise, it generates a sequence of feasible points

} such that Q(a

) tends to −∞ as

→ ∞ .

Proof: By Lemma 6, the algorithm may only stop at a solution. Suppose the orig-

inal problem has a solution but the algorithm does not stop. By Lemma 8, {a

}

converges to a solution ˆx . Lemma 7 implies that for all

large enough, all serious

steps are exact, i.e.,

Q(a

)=e

By Lemma 9, for

large enough, x

also solves (2.4) without the regularizing

term implying

≤ Q( ˆx) ,

because problem (2.4) without the regularizing term is a relaxation of the original

problem. Because Q( ˆx) ≤ Q(a

) for all

, it follows that, for

large enough,

Q(x

)=Q( ˆx) . Thus, no more serious steps are possible, which by Lemma 6 im-

plies ﬁnite termination. The unbounded case was proved in Lemma 8.

Implementation of the regularized decomposition algorithm poses a number of

practical questions, such as controlling the size of the master regularized problem

and numerical stability. An implementation using a QR factorization and an active

set strategy is described in Ruszczy´nski [1986]. On the problems tested by the author

(see also Ruszczy´nski [1993b]) the regularized decomposition method outperforms

all other methods. This includes a regularized version of the L -shaped method, the

L -shaped method, or the multicut method and is conﬁrmed in the experiments made

by Kall and Mayer [1996].

Solving the regularized master program (2.1) is equivalent to solving

min c

x +

∑

k=1

(2.11)

s. t. Ax = b ,



x ≥d



,= 1,...,r ,

(k)

x +

≥ e

(k)

,(k)=1,...,s

,k = 1,...,K,

x −a



≤

x ≥0 .

5.2 Regularized Decomposition 209

for some value of

(Exercise 4), which then suggests the general form of a trust-

region method (see, e.g., Conn, Gould, and Toint [2000]). The norm as well as the

centering point can also be varied in this approach. Linderoth and Wright [2003] use

the ∞ -norm (maximum component deviation) to obtain a trust region algorithm for

stochastic programs that also allows for signiﬁcant parallelization and can achieve

substantial computational efﬁciency.

Exercises

1. Check that, with the same starting point, both the L -shaped and the multicut

methods require ﬁve iterations in Example 1.

2. The regularized decomposition only makes sense with a reasonable starting

point. To illustrate this, consider the same example taking as starting point a

highly negative value, e.g., a

= −20 . At Iteration 1, the cuts

≥−

x−1

and

≥−

x are created. Observe that, for many subsequent iterations, no new

cuts are generated as the sequence of trial points a

move from −20 to −

then −

, −

, .. .each time by a change of

, until reaching 0 , where new

cuts will be generated. Thus a long sequence of approximate serious steps is

taken.

3. As we mentioned in the introduction of this section, the regularized decom-

position algorithm works with a more general regularizing term of the form

x −a



(a) Observe that the proof of convergence relies on strict convexity of the

objective function (Lemma 5), thus

> 0 is needed. It also relies on

∇

x

−a



→ 0asx

→ a

, which is simply obtained by taking a

ﬁnite

. The algorithm can thus be tuned for any positive

and

can

vary within the algorithm.

(b) Taking the same starting point and data as in Exercise 2, show that by se-

lecting different values of

, any point in ] −20,20] can be obtained as a

solution of the regularized master at the second iteration (where 20 is the

upper bound on x and the ﬁrst iteration only consists of adding cuts on

and

you take

to reduce the number of iterations? Discuss some alternatives.

(d) Let

= 1 for Iterations 1 and 2. As of Iteration 2, consider the following

rule for changing

dynamically. For each null step,

is doubled. At each

exact step,

is halved. Show why this would improve the performance

of the regularized decomposition in the case of Exercise 2. Consider the

starting point x

= −0.5 as in Example 1 and observe that the same path

as before is followed.

4. Show the equivalence of (2.1)and(2.11).

210 5 Two-Stage Recourse Problems

5. The choice of

in Exercise 3 has an analogy in the trust-region L -shaped

method in terms of the size of the region

. Find a general expression for

as a function of

and the solution of (2.1) with weight

on the regularizing

term. Find the corresponding value when

= 1 for Example 1. What updating

rule for

would be analogous to the rule in Exercise 3d. Starting with

correspondingto

= 1 , follow that updating rule for the trust-region L -shaped

method for Example 1.

5.3 The Piecewise Quadratic Form of the L -shaped Methods

In this section, we consider two-stage quadratic stochastic programs of the form

minz(x)= c

x +

Cx+ E

[min[q

(

)y(

(

)D(

)y(

)]]

s. t. Ax = b , T(

)x +Wy(

)=h(

) ,

x ≥0 , y(

) ≥ 0 ,

(3.1)

where c , C , A , b ,and W are ﬁxed matrices of size n

×1, n

×n

, m

×n

×1,and m

×n

, respectively and q , D , T ,and h are random matrices of

size n

×1, n

×n

, m

×n

,and m

×1 , respectively. Compared to the linear

case deﬁned in (3.1.1), only the objective function is modiﬁed. As usual, the random

vector ξ is obtained by piecing together the random components of q , D , T ,and

h . Although more general cases could be studied, we also make the following two

assumptions.

Assumption 11. The random vector ξ has a discrete distribution.

Recall that an n ×n matrix M is positive semi-deﬁnite if x

Mx ≥ 0forall

x ∈ ℜ

and M is positive deﬁnite if x

Mx > 0forall0= x ∈ℜ

Assumption 12. The matrix C is positive semi-deﬁnite and the matrices D(

) are

positive semi-deﬁnite for all

. The matrix W has full row rank.

The ﬁrst assumption guarantees the existence of a ﬁnite decomposition of the

second-stage feasibility set K

. The second assumption guarantees that the recourse

functions are convex and well-deﬁned.

We may again deﬁne the recourse function for a given

(

) by:

Q(x,

(

)) = min{q

(

)y(

(

)D(w)y(w) |

)x +Wy(

)=h(

),y(

) ≥0} , (3.2)

which is −∞ or +∞ if the problem is unbounded or infeasible, respectively. The

expected recourse function is

5.3 The Piecewise Quadratic Form of the L -shaped Methods 211

Q(x)=E

Q(x,ξ) (3.3)

with the convention +∞+(−∞)=+∞ .

The deﬁnitions of K

and K

are as in Section 3.5. Theorem 3.32 and Corollar-

ies 3.33 and 3.34 apply, i.e., Q(x) is a convex function in x and K

is convex. Of

greater interest to us is the fact that Q(x) is piecewise quadratic. Loosely stated,

this means that K

can be decomposed in polyhedral regions called the cells of the

decomposition and in addition to being convex, Q(x) is quadratic on each cell.

Example 2

Consider the following quadratic stochastic program

minz(x)= 2x

+ 3x

+ E

min{−6.5y

−7y

+ y

}

s. t. 3x

+ 2x

≤ 15 , y

≤ x

, y

≤ x

+ 2x

≤ 8 , y

≤ ξ

, y

≤ ξ

+ x

≥ 0 , x

≥ 0 , y

≥ 0 .

This problem consists of ﬁnding some product mix (x

) that satisﬁes some

ﬁrst-stage technology requirements. In the second stage, sales cannot exceed the

ﬁrst-stage production and the random demand. In the second stage, the objective is

quadratic convex because the prices are decreasing with sales. We might also con-

sider ﬁnancial problems where minimizing quadratic penalties on deviations from a

mean value leads to efﬁcient portfolios.

Assume that ξ

can take the three values 2, 4, and 6 with probability 1/3, that

can take the values 1, 3, and 5 with probability 1/3, and that ξ

and ξ

are

independent of each other. For very small values of x

and x

, it always is optimal

in the second stage to sell the production, y

= x

and y

= x

. More precisely, for

0 ≤ x

≤ 2and0≤ x

≤ 1,y

= x

is the optimal solution of the second

stage for all ξ . If needed, the reader may check this using the Karush-Kuhn-Tucker

conditions.

Thus, Q(x,

)=−6.5x

−7x

+ x

for all

and Q(x)=−6.5x

−

+ x

. Here, the cell is {(x

) | 0 ≤ x

≤ 2,0 ≤ x

≤ 1}. Within

that cell, Q(x) is quadratic.

Deﬁnition 13. A ﬁnite closed convex complex K is a ﬁnite collection of closed

convex sets, called the cells of K , such that the intersection of two distinct cells

has an empty interior.

212 5 Two-Stage Recourse Problems

Deﬁnition 14. A piecewise convex program is a convex program of the form

inf{z(x) | x ∈ S} where f is a convex function on IR

and S is a closed convex

subset of the effective domain of f with nonempty interior.

Let K be a ﬁnite closed convex complex such that

(a) the n -dimensional cells of K cover S ,

(b) either f is identically − ∞ or for each cell C

of the complex there exists a

convex function z

(x) deﬁned on S and continuously differentiable on an open

set containing C

which satisﬁes

(a) z(x)=z

(x) ∀ x ∈C

,and

(b) ∇z

(x) ∈

∂

z(x) ∀ x ∈C

Deﬁnition 15. A piecewise quadratic function is a piecewise convex function where

on each cell C

the function z

is a quadratic form.

Taking Example 2, we have both Q(x) and z(x) piecewise quadratic. On C

{0 ≤ x

≤ 2,0 ≤x

≤ 1},

(x)=−6.5x

−7x

+ x

and z

(x)=−4.5x

−4x

+ x

Deﬁning a polyhedral complex was ﬁrst done by Walkup and Wets [1967] for the

case of stochastic linear programs. Based on this decomposition, Gartska and Wets

[1974] proved that the optimal solution of the second stage is a continuous, piece-

wise linear function of the ﬁrst-stage decisions and showed that Q(x,

) is piece-

wise quadratic in x . It follows that under Assumption 1, Q(x) and z(x) are also

piecewise quadratic in x .

For the sake of completeness, observe that z(x) is not always

max

(x) . To this end, consider

z(x)=

⎧

⎪

⎨

⎪

⎩

(x)=

when 0 ≤ x ≤ 2 ,

(x)=(x −1)

when x ≥ 2.

This function is easily seen to be piecewise quadratic. On (0,1/2) , z(x)=z

(x)

while max{z

(x),z

(x)} = z

(x) .

An algorithm

In this section, we study a ﬁnitely convergent algorithm for piecewise quadratic

programs (Louveaux [1978]).

5.3 The Piecewise Quadratic Form of the L -shaped Methods 213

Algorithm PQP

Initialization: Let S

= S , x

∈ S ,

= 1.

Step 1.Obtain C

, a cell of the decomposition of S containing x

−1

.Let z

(·) be

the quadratic form on C

Step 2.Let x

∈ argmin{z

(x) | x ∈ S

} and w

∈ argmin{z

(x) | x ∈C

}.If w

is the limiting point of a ray on which z

(x) is decreasing to −∞ , the original PQP

is unbounded and the algorithm terminates.

Step 3.If

∇

)(x

−w

)=0 , (3.4)

then stop; w

is an optimal solution.

Step 4.Let S

= S

∩{x | ∇

)x ≤ ∇

}.Let

+ 1;go to

Step 1.

Thus, contrary to the L -shaped method in the linear case, the subgradient inequality

is not applied at the current iterate point x

. Instead, it is applied at w

, a point

where z

(·) is minimal on C

. Under some practical conditions on the construc-

tions of the cells, the algorithm is ﬁnitely convergent.

We ﬁrst prove that the condition,

∇

)x ≤∇

, (3.5)

is a necessary condition for optimality of x .

Because ∇z

) ∈

∂

z(w

) , the subgradient inequality applied at w

im-

plies that z(x) ≥ z(w

)+∇

)(x −w

) for all x .Now, x is a minimizer

of z(·) only if z(x) ≤ z(w

) . This implies that x is a minimizer of z(·) only if

∇

)(x−w

) ≤0 , which is precisely (3.5). Thus, a solution x ∈argmin{z(x) |

x ∈ S

} is also a solution x ∈ argmin{z(x) | x ∈S}.

We next show that any solution

x ∈ argmin{z

(x) | x ∈ S

} is a solution ∈

argmin{z(x) | x ∈ S

} (and thus by the argument, a solution is in argmin{z(x) | x ∈

S})if

x ∈C

By deﬁnition,

x ∈ argmin{z

(x) | x ∈ S

} is a solution of a quadratic convex

program whose objective is continuously differentiable on S

; it must satisfy the

condition ∇

(x)(x−x) ≥ 0,∀x ∈S

.If x ∈C

,then ∇z

(x) ∈

∂

z(x) . Applying

the subgradient inequality for z(·) at

x implies

z(x) ≥ z(

x)+∇

(x)(x −x) ≥ z(x) ∀ x ∈ S

Thus, if

x ∈C

, it is a solution to the original problem.

Finally, if the optimality condition (3.4) holds, applying the gradient inequality

to the quadratic convex function z

(·) at w

implies

) ≥ z

)+∇

)(x

−w

)=z

) ,