Birge J.R., Louveaux F. Introduction to Stochastic Programming

Подождите немного. Документ загружается.

434 10 Multistage Approximations

tree. The essential difference between AND and SDDP is that, in AND, instead of

considering the full sample-path tree, a set of branching solutions, B

,areusedto

generate new cuts in the backward pass. The branching solutions are quite ﬂexi-

ble under the assumption of serial independence since the cuts generated for any

values of x

yield valid cuts. These solutions may correspond to solutions along

the previous sample-paths, combinations of solutions, or some other set of possible

state values. In the backward pass, all child scenarios of each branching solutions

are solved to ensure that the solutions of each subproblem (6.1.1–1.5) obtain a valid

lower bound on Q

t+1

) for each x

in B

The backward pass progresses for periods t = H −1,...,1 generating a new

optimality cut for each branching solution in B

. Once a new optimality cut has

been added to the ﬁrst-stage subproblem, the backward pass completes, followed

again by a new generation of a new set of sample paths and the forward pass to

construct an upper bound estimate.

Finite convergence of this algorithm follows from the ﬁnite convergence of the

nested decomposition algorithm, since the scenarios from which the optimality cuts

are generated are re-sampled each iteration (see Donohue [1996] and the detailed

proof in Philpott and Guan [2008]). Since the accuracy of the optimal solution de-

pends on the accuracy of the estimated upper bound, the performance of the algo-

rithm depends on the number of scenarios sampled in each iteration.

The Abridged Nested Decomposition Algorithm

Step 0.For t = 1,...,H −1,set s

= 0 , and add the constraint

= 0tothestage

t subproblem. Choose initial values for |F

| (forward branching values) and |B

for t = 2,...,N −1.GotoStep1.

Step 1. Solve the ﬁrst stage problem. Let ˜x

be the current optimal solution and

be the current expected recourse approximation value. Let ˜z

be the current optimal

objective value. Let ˜x

be the ﬁrst stage branching value. Go to Step 2.

Step 2. Forward Pass.

For t = 2,...,H −1,

For j = 1,...,|B

t−1

For k = 1,...,|F

Solve the stage t subproblem (6.1.1–1.5) with input value x

t−1

∈ B

t−1

and sample realization

∈ F

Select |B

| branching values x

from subproblem solutions.

Go to Step 3.

Step 3. Backward Pass.

For t = N,...,2,

For j = 1,...,|B

t−1

For i = 1,...,M

Solve stage t subproblem (6.1.1–1.5) with input value x

t−1

∈B

t−1

for

scenario

.Let (

i,m

) denote the optimal dual vector values.

10.4 Multistage Sampling and Decomposition Methods 435

Compute

t−1

∑

i=1

i,m

t−1

, e

t−1

∑

i=1



i,m



The new cut is then: E

t−1

≥ e

t−1

If the constraint

t−1

= 0 appears in the stage t −1 subproblem, then

remove it. Increment s

t−1

by one and add the new cut to the stage t −1

subproblem. If t = 2 , then the updated ﬁrst stage expected recourse func-

tion upper bound is:

= e

−E

˜x

.If

is within a relative tolerance

,thengotoStep4.Otherwise,gotoStep1.

Step 4. Sampling Step.

Let x

= ˜x

,for k = 1,...,K .

For k = 1,...,K ,

Generate H -stage sample scenario, (

,...,

) .

For t = 2,...,H ,

Given stage t −1 solution x

t−1

and realization

, solve the stage t

subproblem (6.1.1–1.5). Let x

denote the optimal solution.

Using Equations (4.1), (4.2), and (4.3), obtain a conﬁdence interval on the expected

objective value of the current ﬁrst stage solution. If c

˜x

is in the conﬁdence

interval, stop with ˜x

as the optimal solution. Else, increase F

and B

for stage

t = 2,...,N andgotoStep1.

To ensure that the algorithm terminates with a valid conﬁdence interval on z

∗

a procedure such as the sequential sampling method in Section 8.5 should be used.

For this algorithm to be effective, the branching values in B

also must be chosen

carefully. As shown in Donohue and Birge [2006], however, any convex combina-

tion of feasible values at time t has a feasible completion in period t + 1. This

observation allows for consolidation in the branching step. Various ﬁxed rules can

be used for selecting branches or branching solution values can be chosen randomly.

This strategy gives an unbiased sample of stage t solution values, which may have

advantages. We note that this general approach can also be extended to problems

with inﬁnite horizons (see Exercise 2).

Exercises

1. Generate 50 random samples from the distribution given in Section 10.3 for the

three-period ﬁnancial planning example from Section 1.2. Implement AND on

this problem using the following strategies starting with |B

| = 3and|F

| =

6 , increasing each by one whenever required, and terminating whenever ˆz

≤

˜x

+ 2

) .

(a) Choose B

randomly from the set of period t solutions.

436 10 Multistage Approximations

(b) Choose B

initially corresponding to solutions with the maximum, median,

and minimum wealth in each period. If B

increases, choose additional

branching solutions randomly from the set of solutions.

2. For an inﬁnite-horizon problem with stationary data (i.e., ξ

has the same distri-

bution ξ for all t ), the goal is to ﬁnd a function

∞

such that

∞

= T(

∞

) ,

where T is the dynamic programming operator deﬁned by

∞

−T

)) = min

x|Wx=h

−T

x +

∞

(h−Tx,c)], (4.4)

where 0 <

< 1 is a ﬁxed discount factor. Given a linear lower bound

(y,z)=e





≤

∞

(y,z) ,forany y and z , describe a sampling-based

outer-linearization method to ﬁnd

∞

. (Birge and Zhao [2007]).

10.5 Approximate Dynamic Programming and Special Cases

The approaches discussed in the previous sections have focused on sampling and

state or tree aggregation to obtain tractable formulations. Another alternative is to

use approximations of the value function Q

constructed in other ways. The outer

linearization approach in the AND method is one possible value function approxi-

mation. In this section, we discuss other value-function approximations that collec-

tively are often called approximate dynamic programming (ADP) or neuro-dynamic

programming (see, e.g., Bertsekas [2007], Bertsekas and Tsitsiklis [1995], and Pow-

ell [2007]). As noted earlier, other approximations may include approximations of

the actions (or policy) (as, for example, a parameterized function of the state vari-

ables), but the discussion here focuses on value-function approximations.

The general approach in ADP is to replace the value function Q

t+1

) ,orthe

subproblem (scenario-conditional) value functions, Q

t+1

) , with an approxi-

mation that does not require full optimization of the sub-tree corresponding to

given x

. In general, the functions are constructed recursively over time, possibly

with some iteration to update the approximations,

A common approach is to construct an approximation

t+1

) as a linear

combination of known basis functions

(·,·)=(

(·,·),...,

(·,·)) that are ﬁt-

ted with weights,

,sothat

t+1

)

. (5.1)

The

functions can be chosen quite generally to provide close approximation

for a wide range of possible value functions. The

values can be chosen with

a backward recursion to simulate x

and

values at samples (x

) for k =

1,...,K and then to choose

to ﬁt (e.g., using regression)

)

to the

values (for a multistage stochastic linear program):

10.5 Approximate Dynamic Programming and Special Cases 437

t+1

)=minc

t+1

+ E[

t+1

,ξ

t+1

)

t+1

] (5.2)

s. t. W

t+1

= h

−T

t+1

≥ 0.

For the integration of

t+1

, if the integral is easily calculated (as in the separable

approximations below), then this can be evaluated directly; otherwise, additional

samples of ξ

t+1

can be used to ﬁnd an approximate value. For speciﬁc forms of the

functions, independent samples of paths can be used without requiring that the

tree structure be maintained in each period with effort just increasing in a number K

of paths instead of

t=1

as in tree-generation methods. Suppose, for example,

a multistage stochastic linear program such that each

t+1

is an afﬁne function of

= h

−T

(which is most applicable when only h

and T

are random). We

consider a set of K sample paths,

,...,

. The approximate value at period t of

sample k in (5.2) can then be written with explicit dependence on the

values as:

t+1

)=minc

t+1

)

(

t+1

−

t+1

(5.3)

s. t. W

t+1

= h

−T

t+1

≥ 0,

where

t+1

and

t+1

are understood as conditional expectations of h

and T

given

and x

respectively and

t+1

is the scalar value in the afﬁne approxi-

mation. For a dual solution to (5.3),

t+1

)=(h

−T

)

(

t+1

)

t+1

. We can then deﬁne the linear approximation with

to be

consistent with these dual values in each period t :

(

)

(

−

t+1

∑

k=1

−T

)

(

)+(

t+1

)

t+1

which then yields a dual bounding problem with additional constraints to ensure

consistent future period values in (5.2)and(5.3)toﬁnd ˜z

max

∑

t=1

∑

k=1

(5.4)

s. t. (W

)

∑

l=1

t+1

)

t+1

≤ q

,t = 1,...,H −1;k = 1,...,K;

)

≤ q

;k = 1,...,K;

with optimal value

.Since

is a dual feasible solution of (3.4.1), this pro-

cess produces a lower bound estimate on the optimal value z

∗

of (3.4.1) such that

E[˜z

] ≤ z

∗

(Exercise 1). In fact, any feasible solution of (5.4) provides a lower

bound on z

∗

. The approximation comes on any path k from restricting the subse-

quent period multipliers

t+1

to be the same across all paths instead of depending

explicitly on each path (or, in the primal view, on each solution x

). Relaxations of

438 10 Multistage Approximations

this restriction are possible by for example allowing some conditioning in the values

t+1

used in the constraints with each

. In general, the method can also be

viewed as a version of nested decomposition in which only a single cut is added in

each period.

Upper bound estimates are available directly using

˜z

= c

∑

t=2

∑

k=1

, (5.5)

such that E[˜z

] ≥ z

∗

. Increasing the number of samples does not necessarily bring

the lower and upper bound estimates together, but the ability to improve the lower

bounding estimate through some use of conditional information in

suggests a

possible approach to convergence. In any event, this method for estimates has sub-

stantially reduced complexity from full-tree generation methods and can be quite

effective in practice, as we discuss below for problems in network revenue manage-

ment.

a. Network revenue management

A typical application where ADP can be applied is in network revenue management,

which represents decisions on allocating capacity to different products (e.g., fare

classes and itineraries) that use common resources (e.g., seats on a ﬂight, rooms in

a hotel on a given night, or cars of a given class on a given day). The decision vector

includes x

and y

at time t where x

is an n+ m -vector of n product reservation

acceptances in the current period and m cumulative resource commitments and y

an n -vector of penalized acceptances (due to insufﬁcient demand) which is used

to allow for relatively complete recourse. The demand is given by d

,an n -vector

of current period demand. The full problem (where y variables are included for

completeness only) is to ﬁnd z

∗

minc

+ E[

∑

t=1

−c

] (5.6)

s. t. W

1,...,n

+ x

n+1,...,n+m

= h

;

1,...,n

+ x

n+1,...,n+m

= x

t−1

n+1,...,n+m

,t = 2,...,T ;

1,...,n

−y

≤ d

,t = 1,...,T;

≥ 0,t = 1,...,T, a.s.;

nonanticipative, t = 1,...,T, a.s.;

where we can assume for simplicity that W = W

,t = 1,...,H , the resource-usage

matrix, in each period is the same. A common approximation to (5.6)isthebid-

price linear program (see Williamson [1992] and Talluri and van Ryzin [2004])

10.5 Approximate Dynamic Programming and Special Cases 439

which solves the aggregated expected value problem as in (2.5)as: ˆz =

min C

(5.7)

s. t. A(H

1,...,n

n+1,...,n+m

= h

1,...,n

≤

∑

t=1

1,...,n

≥ 0,

where note that H

can be replaced by a different variable X



as is commonly

given. In comparison to (2.5), we have collapsed everything into the ﬁrst period (or

have an empty initial period). We omitted the

Y variables which would be zero in

an optimal solution. From (5.6), we obtain a feasible dual solution to (5.6)sothat

−

= 0 in Theorem 3 and z

∗

≥ ˆz (Exercise 2). For an upper bound, we could use

the solution x

= X

in each period t (and then compute penalties in

whenever

> d

) or we can deﬁne x

recursively as x

= min{d

} and then x

min{H

−x

t+1

},and x

= H

−x

H−1

to obtain a sharper bound, which

amounts to using H

as a static booking limit vector (Exercise 3).

An upper bound can also be obtained (as done in practice) by using the optimal

dual multipliers

) of (5.7) to determine whether to accept a reservation

or not. In this process, if c

−A

·i

≤0 , then a reservation for product i is accepted

if there is sufﬁcient demand and available capacity. This is the notion of bid-prices

in which the −

values are prices on the resources bid against the revenue of each

product. Generally, new versions of (5.7) are re-solved in each period with updated

information to obtain new prices to determine acceptance.

Still another possible disaggregation is to use

∑

t=1

as the probability of ac-

cepting a reservation for product i and again to deﬁne the values sequential in time

with repeated solution of the updated version of (5.7). This approach is described

in Jasin and Kumar [2010], who obtain an a priori bound on the loss in value from

this approximate policy and then show how to choose re-solving times such that

asymptotically as the system size grows, the relative loss in performance from using

the approximation goes to zero.

Another interpretation of (5.7) is in its dual, in which case, it represents an ag-

gregation of the linear ADP formulation in (5.4), which then implies that the lower

bound in (5.7) is not as sharp as would be obtained using (5.4) (Exercise 4). This

is the observation in Adelman [2007], which also presents a method to obtain an

approximate solution with bounded accuracy for a linearization of the full problem.

b. Vehicle allocation problems

Vehicle allocation problems provide a different structure that allows speciﬁc bound

construction. These problems can be represented as multistage network problems

440 10 Multistage Approximations

with only arc capacities random. A formulation would then be the same as (1.1). The

matrices W

correspond to ﬂows leaving nodes in period t while T

corresponds

to ﬂow entering nodes in period t + 1 . The only exception is in the last period for

which W

just gathers ﬂow into ending nodes. For simplicity, this model assumes

that all ﬂow requires one period to move between nodes.

The x

(ij) decisions are then ﬂows from i in period t to j in period t + 1.The

randomness involves the demand from i to j in period t . We assume that x

(ij)=

t, f

(ij)+x

t,e

(ij) ,where x

t, f

(ij) represents full loads (or vehicles) and x

t,e

(ij)

represents empty vehicles (assuming that fractional vehicle loads are feasible). For

demand of ξ

(ij) , we would have x

t, f

(ij) ≤ξ

(ij) .Thecosts c

t, f

(ij) and c

t,e

(ij)

then correspond to the unit values of moving full and empty vehicles from i to j at

t . The result is that vehicles are conserved in (5.8). The decisions generally depend

on the locations of vehicles at any point in time.

Frantzeskakis and Powell [1993] consider several alternative approximations of

(5.8). First, one could solve the expected value problem to obtain ˆx

values. These

corresponding decisions can be used regardless of realized demand (as, e.g., in Bi-

tran and Yanasse [1984]). Then the x

values could be split into full and empty

parts, x

= ¯x

, x

t, f

(ij)=max{¯x

(ij),ξ

(ij)} , according to realized demand to pro-

duce both upper and lower bounds. This could be viewed as a generalization of a

simple recourse strategy; hence Powell and Frantzeskakis refer to it as the simple

recourse strategy.

Another approach is simply to solve the mean value problem, but only actually to

send a vehicle from i to j at t if there is sufﬁcient demand. In this way, x

t, f

(ij)=

max{¯x

(ij),ξ

(ij)} ,but x

(ij)=x

t, f

(ij) whenever i = j . This strategy is called

null recourse.

A further strategy is called nodal recourse, in which a set of decisions or a policy,

(i) , is deﬁned for each node i at all times t . This policy would be a list of options

for ﬂow from i at t . The list would be a ranking of full loads (i.e., preferred nodes,

(i),... j

(i) ) if capacity is available followed by an alternative for any remaining

empty vehicles.

This preference structure can be constructed using a separable approximation

from period t +1toH .Inperiod H , we can begin by assigning some salvage/ﬁnal

value −c

(i) to vehicles on the arcs correspondingto travel from one node to itself.

At period H −1 , the value of sending a full load from i to j is simply

−c

H−1, f

(ij) −c

( j) . Including empty loads in the obvious way and ordering in

decreasing orders for each p determines the strategy at H −1 . Now, given the

distributions of ξ

H−1

, these values yield an expected value function for vehicles

at i at t . The argument of this function is a new (state) variable, y

H−1

(i) . With

the function deﬁned, similar decisions on expected values of loads from i to j

can be made in period H −2 . A dynamic programming recursion would be to ﬁnd

)=E

,ξ

)] where:

,ξ

)=min

+ Q

t+1

)

10.5 Approximate Dynamic Programming and Special Cases 441

s. t. W

= y

−y

t+1

= 0 ,

≥ x

≥ 0 .

(5.8)

If Q

t+1

) is linear with coefﬁcients,

t+1

(i) in each component i of y

t+1

it is for t = H −1 , then the optimal solution to (5.8) is given by the increasing or-

dering of c

t, f

(ij)+

t+1

( j) with each successive x

t, f

(ij) used up to the minimum

of y

(i) and

(ij) according to this realization of ξ

. The key is then to construct

a linear approximation to Q

t+1

) .

With a linearization, the entire strategy can be simply carried back to the ﬁrst

period. As in other ADP methods, this represents a feasible but not optimal strategy

because it avoids calculating the full nonlinear value function. One way to compute

the linearization is to assume an input value ˆy

(i) and to ﬁnd the probability of each

option multiplied by the expected linearized value of that option. Using this to de-

termine the recourse value at each stage can lead to a lower bound at each stage and

overall when the ﬁrst-period problem is solved (see Exercise 4). An upper bound-

ing linearization is also possible. This is analogous to the Edmundson-Madansky

approach (Exercise 5).

Frantzeskakis and Powell [1993] mention that extensions of nodal recourse can

apply to general network problems. These procedures are similar to the separable

bounding procedures presented next. They again rely on building responses to ran-

dom variation that depend separately on the random components and that are also

feasible.

c. Piecewise-linear separable bounds

Another approach to ADP is to extend the basic separable bounds presented in Sec-

tion 8.5b. to multistage problems. The main idea is to use the two-stage method

repeatedly to approximate the objective function by separable functions (and not

just single afﬁne functions as in (5.2)). For linear problems, this leads to sublin-

ear or piecewise linear functions as in Section 8.5b. Functions without recession

directions (e.g., quadratic functions) would require some type of nonlinear (e.g.,

quadratic) function that should again be easily integrable, requiring, for example,

limited moment information (second moments for quadratic functions). We con-

sider the linear case (following Birge [1989]).

The goal is to construct a problem that is separable in the components of the

random vector. In each period t , a decision, x

, is made subject to the constraints,

−T

t−1

, x

≥ 0,where

is the realization of random constraints

and x

t−1

was the decision in period t −1 . The objective contribution from this

decision is c

. We can view this decision as a response to the input,

−

t−1

. The period t decision, x

, then becomes a function of this input, so

(

) becomes x

(η

) . Problem (2.2) becomes

442 10 Multistage Approximations

min c

+ E[c

(η

)+···+ c

(η

)]

s. t. W

= h

(η

)=η

, t = 2,...,H , a.s.,

= ξ

−T

t−1

(η

t−1

) , t = 2,...,H , a.s.,

(η) ≥ 0 , t = 1,...,H .

The optimization problem is to determine the correct response to η

. The two-stage

method given in Section 8.5b. gives a response that is separable in the components

of ξ = η

. In multiple stages, ξ is replaced by η

for period t . The response

must consider future actions and costs; so, it is no longer simply optimization of the

second-period problem.

The dimension of η =(η

,...,η

) makes direct solution difﬁcult in general.

An upper bound is, however, obtained for any feasible response, i.e., decision

vectors, x

(η

) , that satisfy W

(η

)=η

, x

(η

) ≥ 0, a.s., where η

= ξ

−

t−1

(η

t−1

) for all t . The two-stage method can be used to obtain feasible re-

sponses that are separable in the components of η

,i.e.,where x

(η

∑

(η

) .

One choice is to let x

(

) solve

min c

s. t. W

, x

≥

, (5.9)

where e

is the i th unit vector and

depends on choices for the other x

. Program

(5.9) is a parametric linear program in

. It is particularly easy to solve if

= 0.

In this case, x

(

) is linear for positive and negative

. We suppose this case and

let the optimal solutions be x

t,±

when

= ±1.

A solution can be obtained if we can ﬁnd the distribution of the

given re-

sponses determined by solutions of (5.9). The resulting problem to solve is

(SL) min c

∑

t=2

∑

i=1



(η

)P(dη

)

s. t. W

= h

, x

≥ 0 ,

where

(

)=c

≥ 0,and

(

)=c

t−

(−

) if

≤ 0 . Assum-

ing that the distribution of η

is known in this approximation, we can ﬁnd η

t+1

Initially, η

= ξ

−T

, which has the same distributional form as ξ

. In general,

t+1

is given by:

t+1

= ξ

t+1

−T

j,·



∑

i=1

≥0

+ x

t−

)(|η



. (5.10)

Note that the values in (5.10) are linear functions of η

on the regions where η

has

constant sign. We can, therefore, construct η

t+1

as a function of η

by overlaying

these linear transformations of random variables. For normally distributed data, this

may be possible because the transformation does not affect the distribution class. For

10.5 Approximate Dynamic Programming and Special Cases 443

other distributions, it is more difﬁcult. Even in the normal case, however, we have

different distribution parameters for all possible sign combinations of all random

variables in previous period inputs. Exponential growth of the calculations in the

number of periods is not avoided.

Because the approximation given earlier may be difﬁcult to compute even with

normal distributions, it may be necessary to approximate the distribution of η

t+1

We can use bounds on P{η

≥0} and on the moments conditional on η

≥ or < 0.

Given these values, moment problems can be solved to calculate corresponding val-

ues for η

t+1

and to bound

(see Birge and Wets [1989]). Any other bounds on

the input ( T

) from period t to period t + 1 can also be used to obtain crude

bounds on the

values. Also, note that certain problems, such as networks, may

have few nonzeros in the T

terms and close-to-simple recourse structure. The ran-

dom input vector η

t+1

may be easily calculable for these problems.

Another looser but more implementable bound can be obtained by forcing a fea-

sible and separable response in all future periods depending on a single random

variable in the current period. This eliminates the problem of characterizing the dis-

tribution of inputs to all periods. It does, however, force a dependency in future

periods that may increase the bound.

To develop this response function, let X

(±i) be an optimal solution,

,...,x

) ,(t > 1),to:

min c

+ ···+ c

s. t. W

= ±e

t+1

= 0 ,

···

= 0 ,

≥ 0 ,

= t,...,H .

(5.11)

Now deﬁne

(



−

(ξ

−

)P(dξ



−

≤0

t−

(−ξ

)P(dξ), (5.12)

where C

=(c

,... ,c

) .Anupper bound on the objective value of (5.9) is ob-

tained by solving the separable nonlinear program:

min c

+ ···+ c

∑

t=2

∑

i=1

(

)

s. t. W

= h

t+1

−

t+1

= 0 , t = 1,...,H −1 ,

≥ 0 , t = 1,...,H ,

∈

(5.13)