Thomas M. Cover, Joy A. Thomas. Elements of information theory

Подождите немного. Документ загружается.

16.7 UNIVERSAL PORTFOLIOS 635

ﬁnd by differentiating with respect to b that the maximum value

∗

) = max

0≤b≤1

(1 − b)

n−k

(16.111)







n − k



n−k

, (16.112)

which is achieved by

∗



n − k



. (16.113)

Note that



∗

)>1, reﬂecting the fact that the amount “bet” on

is chosen in hindsight, thus relieving the hindsight investor of the

responsibility of allocating his investments w

∗

) to sum to 1. The causal

investor has no such luxury. How can the causal investor choose initial

investments ˆw(j



ˆw(j

) = 1, to protect himself from all possible j

and hindsight-determined w

∗

)? The answer will be to choose ˆw(j

)

proportional to w

∗

). Then the worst-case ratio of ˆw(j

)/w

∗

) will

be maximized. To proceed, we deﬁne V





k(j

)



k(j

)



n − k(j

)



n−k(j

)

(16.114)



k=0









n − k



n−k

(16.115)

and let

ˆw(j

) = V



k(j

)



k(j

)



n − k(j

)



n−k(j

)

. (16.116)

It is clear that ˆw(j

) is a legitimate distribution of wealth over the 2

stock sequences (i.e., ˆw(j

) ≥ 0and



ˆw(j

) = 1). Here V

is the

normalization factor that makes ˆw(j

) a probability mass function. Also,

from (16.109) and (16.113), for all sequences x

)

∗

)

≥ min

ˆw(j

)

∗

)

(16.117)

= min

(

)

(

n−k

)

n−k

∗k

(1 − b

∗

)

n−k

(16.118)

≥ V

, (16.119)

636 INFORMATION THEORY AND PORTFOLIO THEORY

where (16.117) follows from (16.109) and (16.119) follows from (16.112).

Consequently, we have

max

min

)

∗

)

≥ V

. (16.120)

We have thus demonstrated a portfolio on the 2

possible sequences of

length n that achieves wealth

) within a factor V

of the wealth

∗

) achieved by the best constant rebalanced portfolio in hindsight. To

complete the proof of the theorem, we show that this is the best possible,

that is, that any nonanticipating portfolio b

i−1

) cannot do better than

a factor V

in the worst case (i.e., for the worst choice of x

). To prove

this, we construct a set of extremal stock market sequences and show that

the performance of any nonanticipating portfolio strategy is bounded by

for at least one of these sequences, proving the worst-case bound.

For each j

∈{1, 2}

, we deﬁne the corresponding extremal stock mar-

ket vector x

) as

) =



(1, 0)

if j

= 1,

(0, 1)

if j

= 2,

(16.121)

Let e

= (1, 0)

, e

= (0, 1)

be standard basis vectors. Let

K ={x(j

) : j

∈{1, 2}

, x

= e

} (16.122)

be the set of extremal sequences. There are 2

such extremal sequences,

and for each sequence at each time, there is only one stock that yields

a nonzero return. The wealth invested in the other stock is lost. There-

fore, the wealth at the end of n periods for extremal sequence x

)

is the product of the amounts invested in the stocks j

,...,j

,[i.e.,

)) =



= w(j

)]. Again, we can view this as an investment

on sequences of length n, and given the 0–1 nature of the return, it is

easy to see for x

∈ K that



)) = 1. (16.123)

For any extremal sequence x

) ∈ K, the best constant rebalanced port-

folio is

∗

)) =



)



, (16.124)

16.7 UNIVERSAL PORTFOLIOS 637

where n

) is the number of occurrences of 1 in the sequence j

.The

corresponding wealth at the end of n periods is

∗

)) =



)



)



)



)

ˆw(j

)

, (16.125)

from (16.116) and it therefore follows that



∈K

∗

) =



ˆw(j

) =

. (16.126)

We then have the following inequality for any portfolio sequence {b

}

i=1

with S

) deﬁned as in (16.104):

min

∈K

)

∗

)

≤



∈K

∗

(

)



∈K

∗

)

(

)

∗

(

)

(16.127)



∈K

(

)



∈K

∗

)

(16.128)



∈K

∗

)

(16.129)

= V

, (16.130)

where the inequality follows from the fact that the minimum is less than

the average. Thus,

max

min

∈K

)

∗

)

≤ V

.  (16.131)

The strategy described in the theorem puts mass on all sequences of

length n and is clearly dependent on n. We can recast the strategy in

incremental terms (i.e., in terms of the amount bet on stock 1 and stock

2 at time 1), then, conditional on the outcome at time 1, the amount bet

on each of the two stocks at time 2, and so on. Consider the weight

i,1

assigned by the algorithm to stock 1 at time i given the previous

sequence of stock vectors x

i−1

. We can calculate this by summing over

all sequences j

that have a 1 in position i, giving

i,1

i−1

) =



i−1

∈M

i−1

ˆw(j

i−1

1)x(j

i−1

)



∈M

ˆw(j

)x(j

i−1

)

, (16.132)

638 INFORMATION THEORY AND PORTFOLIO THEORY

where

ˆw(j

) =



⊆j

w(j

) (16.133)

is the weight put on all sequences j

that start with j

,and

x(j

i−1

) =

i−1



k=1

(16.134)

is the return on those sequences as deﬁned in (16.106).

Investigation of the asymptotics of V

reveals [401, 496] that

∼







m−1

(m/2)/

√

π (16.135)

for m assets. In particular, for m = 2 assets,

∼



πn

(16.136)

and

√

n + 1

≤ V

≤

√

n + 1

(16.137)

for all n [400]. Consequently, for m = 2stocks, the causal portfolio strat-

egy

i−1

) given in (16.132) achieves wealth

) such that

)

∗

)

≥ V

≥

√

n + 1

(16.138)

for all market sequences x

16.7.2 Horizon-Free Universal Portfolios

We describe the horizon-free strategy in terms of a weighting of different

portfolio strategies. As described earlier, each constantly rebalanced port-

folio b can be viewed as corresponding to a mutual fund that rebalances

the m assets according to b. Initially, we distribute the wealth among

these funds according to a distribution µ(b), where dµ(b) is the amount

of wealth invested in portfolios in the neighborhood db of the constantly

rebalanced portfolio b.

16.7 UNIVERSAL PORTFOLIOS 639

Let

(b, x

) =



i=1

(16.139)

be the wealth generated by a constant rebalanced portfolio b on the stock

sequence x

. Recall that

∗

) = max

b∈B

(b, x

) (16.140)

is the wealth of the best constant rebalanced portfolio in hindsight.

We investigate the causal portfolio deﬁned by

i+1

) =



(b, x

)dµ(b)



(b, x

)dµ(b )

. (16.141)

We note that

i+1



i+1

(b, x

)dµ(b )



(b, x

)dµ(b )

(16.142)



i+1

(b, x

i+1

)dµ(b )



(b, x

)dµ(b)

. (16.143)

Thus, the product



telescopes and we see that the wealth

)

resulting from this portfolio is given by

) =



i=1

i−1

(16.144)



b∈B

(b, x

)dµ(b ). (16.145)

There is another way to interpret (16.145). The amount given to port-

folio manager b is dµ(b), the resulting growth factor for the manager

rebalancing to b is S(b, x

), and the total wealth of this batch of invest-

ments is

) =



(b, x

)dµ(b). (16.146)

Then

i+1

, deﬁned in (16.141), is the performance-weighted total “buy

order” of the individual portfolio manager b.

640 INFORMATION THEORY AND PORTFOLIO THEORY

So far, we have not speciﬁed what distribution µ(b) we use to apportion

the initial wealth. We now use a distribution µ that puts mass on all

possible portfolios, so that we approximate the performance of the best

portfolio for the actual distribution of stock price vectors.

In the next lemma, we bound

∗

as a function of the initial wealth

distribution µ(b).

Lemma 16.7.2 Let S

∗

) in 16.140 be the wealth achieved by the best

constant rebalanced portfolio and let

) in (16.144) be the wealth

achieved by the universal mixed portfolio

b(·), given by

i+1

) =



(b, x

)dµ(b)



(b, x

)dµ(b)

. (16.147)

Then

)

∗

)

≥ min





i=1

dµ(b)



i=1

∗

. (16.148)

Proof: As before, we can write

∗

) =



∗

)x(j

), (16.149)

where w

∗

) =



i=1

∗

is the amount invested on the sequence j

and

x(j

) =



i=1

is the corresponding return. Similarly, we can write

) =





i=1

dµ(b) (16.150)







i=1

dµ(b) (16.151)



ˆw(j

)x(j

), (16.152)

where ˆw(j

) =





i=1

dµ(b). Now applying Lemma 16.7.1, we have

)

∗

)



ˆw(j

)x(j

)



∗

)x(j

)

(16.153)

≥ min

ˆw(j

)x(j

)

∗

)x(j

)

(16.154)

= min





i=1

dµ(b)



i=1

∗

.  (16.155)

16.7 UNIVERSAL PORTFOLIOS 641

We now apply this lemma when µ(b) is the Dirichlet(

) distribution.

Theorem 16.7.2 For the causal universal portfolio

(), i = 1, 2,...,

given in (16.141), with m = 2 stocks and dµ(b) the Dirichlet(

) distri-

bution, we have

)

∗

)

≥

√

n + 1

for all n and all stock sequences x

Proof: As in the discussion preceding (16.112), we can show that the

weight put by the best constant portfolio b

∗

on the sequence j



i=1

∗







n − k



n−k

= 2

−nH (k/n)

, (16.156)

where k is the number of indices where j

= 1. We can also explicitly

calculate the integral in the numerator of (16.148) in Lemma 16.7.2 for

the Dirichlet(

) density, deﬁned for m variables as

dµ(b) =

(

)











j=1

−

db, (16.157)

where (x) =



∞

−t

x−1

dt denotes the gamma function. For simplicity,

we consider the case of two stocks, in which case

dµ(b) =

√

b(1 − b)

db, 0 ≤ b ≤ 1, (16.158)

where b is the fraction of wealth invested in stock 1. Now consider any

sequence j

∈{1, 2}

, and consider the amount invested in that sequence,

b(j

) =



i=1

= b

(1 − b)

n−l

, (16.159)

where l is the number of indices where j

= 1. Then



b(j

)dµ(b) =



(1 − b)

n−l

√

b(1 − b)

db (16.160)

642 INFORMATION THEORY AND PORTFOLIO THEORY



l−

(1 − b)

n−l−

db (16.161)





l +

,n− l +



, (16.162)

where B(λ

,λ

) is the beta function, deﬁned as

B(λ

,λ

) =



−1

(1 − x)

−1

dx (16.163)

(λ

)(λ

)

(λ

+ λ

)

(16.164)

and

(λ) =



∞

λ−1

−x

dx. (16.165)

Note that for any integer n, (n + 1) = n!and(n +

) =

1·3·5···(2n−1)

√

π.

We can calculate B(l +

,n− l +

) by means of simple recursion

using integration by parts. Alternatively, using (16.164), we obtain



l +

,n− l +













. (16.166)

Combining all the results with Lemma 16.7.2, we have

)

∗

)

≥ min





i=1

dµ(b)



i=1

∗

(16.167)

≥ min

B(l +

,n− l +

)

−nH (l/n)

(16.168)

≥

√

n + 1

, (16.169)

using the results in [135, Theorem 2].



It follows for m = 2stocksthat

∗

≥

√

2π

(16.170)

16.7 UNIVERSAL PORTFOLIOS 643

for all n and all market sequences x

,...,x

. Thus, good minimax per-

formance for all n costs at most an extra factor

√

2π over the ﬁxed horizon

minimax portfolio. The cost of universality is V

, which is asymptotically

negligible in the growth rate in the sense that

) −

ln S

∗

) ≥

√

2π

→ 0. (16.171)

Thus, the universal causal portfolio achieves the same asymptotic growth

rate of wealth as the best hindsight portfolio.

Let’s now consider how this portfolio algorithm performs on two real

stocks. We consider a 14-year period (ending in 2004) and two stocks,

Hewlett-Packard and Altria (formerly, Phillip Morris), which are both

components of the Dow Jones Index. Over these 14 years, HP went up by

a factor of 11.8, while Altria went up by a factor of 11.5. The performance

of the different constantly rebalanced portfolios that contain HP and Altria

are shown in Figure 16.2. The best constantly rebalanced portfolio (which

can be computed only in hindsight) achieves a growth of a factor of 18.7

using a mixture of about 51% HP and 49% Altria. The universal portfolio

strategy described in this section achieves a growth factor of 15.7 without

foreknowledge.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Proportion b of wealth in HPQ

Value

(

) of initial investment

FIGURE 16.2. Performance of different constant rebalanced portfolios b for HP and Altria.

644 INFORMATION THEORY AND PORTFOLIO THEORY

16.8 SHANNON–MCMILLAN–BREIMAN THEOREM

(GENERAL AEP)

The AEP for ergodic processes has come to be known as the Shan-

non –McMillan –Breiman theorem. In Chapter 3 we proved the AEP for

i.i.d. processes. In this section we offer a proof of the theorem for a

general ergodic process. We prove the convergence of

log p(X

) by

sandwiching it between two ergodic sequences.

In a sense, an ergodic process is the most general dependent process for

which the strong law of large numbers holds. For ﬁnite alphabet processes,

ergodicity is equivalent to the convergence of the kth-order empirical

distributions to their marginals for all k.

The technical deﬁnition requires some ideas from probability theory. To

be precise, an ergodic source is deﬁned on a probability space (,

B,P),

where

B is a σ -algebra of subsets of  and P is a probability measure.

A random variable X is deﬁned as a function X(ω),ω ∈ , on the prob-

ability space. We also have a transformation T :  → , which plays

the role of a time shift. We will say that the transformation is stationary

if P(TA) = P(A) for all A ∈

B. The transformation is called ergodic if

every set A such that TA= A, a.e., satisﬁes P(A) = 0or1.IfT is station-

ary and ergodic, we say that the process deﬁned by X

(ω) = X(T

ω) is

stationary and ergodic. For a stationary ergodic source, Birkhoff’s ergodic

theorem states that



i=1

(ω) → EX =



XdP with probability 1. (16.172)

Thus, the law of large numbers holds for ergodic processes.

We wish to use the ergodic theorem to conclude that

−

log p(X

,...,X

n−1

) =−

n−1



i=0

log p(X

i−1

)

→ lim

n→∞

E[−log p(X

n−1

)]. (16.173)

But the stochastic sequence p(X

i−1

) is not ergodic. However, the

closely related quantities p(X

i−1

i−k

) and p(X

i−1

−∞

) are ergodic and

have expectations easily identiﬁed as entropy rates. We plan to sandwich

p(X

i−1

) between these two more tractable processes.