Kao M.-Y. (ed.) Encyclopedia of Algorithms

Подождите немного. Документ загружается.

378 G Greedy Approximation Algorithms

where c is a nonnegative cost function deﬁned on 2

and

= fC j f (C [fxg)  f (C)=0forallx 2 Xg.Thefol-

lowing is a greedy algorithm to produce approximation

solution for this problem.

Greedy Algorithm B

input submodular function f and cost function c;

A ;;

while there exists x 2 E such that 

f (A) > 0

do select a vertex x that maximizes 

f (A)/c(x)andset

A A [fxg;

return A.

The following two results are well-known.

Theorem 1 If f is a normalized, monotone increasing, sub-

modular integer function, then Greedy Algorithm B pro-

duces an approximation solution within a factor of H()

from optimal, where  =max

x2E

f (fxg).

Theorem 2 Let f be a normalized, monotone increas-

ing, submodular function and c a nonnegative cost func-

tion. If in Greedy Algorithm B, selected x always satisﬁes



f (A

i1

)/c(x)  1,thenitproducesanapproximation

solution within a factor of 1+ln(f



/opt) from optimal for

above minimization problem where f



= f (A



) and opt =

c(A



) for optimal solution A



Now, come back to the analysis of Greedy Algorithm A for

the MCDS. It looks like that the submodularity of f is not

used. Actually, the submodularity was implicitly used in

the following statement:

“Since adding C



to C

will reduce the potential function

value from f (C

)to2,thevalueoff reduced by a vertex

in C



would be ( f (C

)  2)/opt in average. By the greedy

rule for choosing x

+1,onehas

f (C

)  f (C

i+1

) 

f (C

)  2

opt

: ”

To see this, write this argument more carefully.

Let C



= fy

;:::;y

opt

gand denote C



= fy

;:::;y

Then

f (C

)  2=f (C

)  f (C

[ C



)

opt

j=1

[f (C

[ C



j1

)  f (C

[ C



)]

where C



= ;. By the greedy rule for choosing x

+1,one

has

f (C

)  f (C

i+1

)  f (C

)  f (C

[fy

for j =1;:::;opt. Therefore, it needs to have



f (C

)=f (C

)  f (C

[fy

 f (C

[ C



j1

)  f (C

[ C



)

= 

f (C

[ C



j1

)

(2)

in order to have

f (C

)  f (C

i+1

) 

f (C

)  2

opt

(2) asks the submodularity of f . Unfortunately, f is not

submodular. A counterexample can be found in [3]. This is

why the analysis of Greedy Algorithm A in Sect. “Problem

Deﬁnition” is incorrect.

Giving up Submodularity

Giving up submodularity is a challenge task since it is open

for a long time. But, it is possible based on the following

observation on (2)byDuetal.[1]: The submodularity of

f is applied to increment of a vertex y

belonging to

optimal solutionC



Since the ordering of y

’s is ﬂexible, one may arrange it

to make 

f (C

)  

f (C

[ C



j1

) under control. This

is a successful idea for the MCDS.

Lemma 3 Let y

’s be ordered in the way that for any

j =1;:::;opt, fy

;:::;y

g induces a connected subgraph.

Then



f (C

)  

f (C

[ C



j1

)  1 :

Proof Since all y

;:::;y

j1

are connected, y

can domi-

nate at most one additional connected component in the

subgraph induced by C

i1

[ C



j1

than in the subgraph

induced by c

 1. Hence



p(C

)  

f (C

[ C



j1

)  1 :

Moreover, since q is submodular,



q(C

)  

q(C

[ C



j1

)  0 :

Therefore,



f (C

)  

f (C

[ C



j1

)  1 :

Now, one can give a correct analysis for the greedy algo-

rithm for the MCDS [4].

By Lemma 3,

f (C

)  f (C

i+1

) 

f (C

)  2

opt

 1 :

Greedy Set-Cover Algorithms G 379

Hence,

f (C

i+1

)  2  opt  (f (C

)  2+opt)



1 

opt



 ( f (;) 2  opt)



1 

opt



i+1

=(n  2  opt)



1 

opt



i+1

;

where n = jVj.Notethat1 1/opt  e

1/opt

.Hence,

f (C

)  2  opt  (n  2)e

i/opt

Choose i such that f (C

)  2  opt +2> f (C

i+1

). Then

opt  (n  2)e

i/opt

and

g  i  2  opt :

Therefore,

g  2  opt + i  opt



2+ln

n  2

opt



 opt(2 + ln ı)

where ı is the maximum degree of input graph G.

Applications

The technique introduced in previous section has many

applications, including analysis of iterated 1-Steiner trees

for minimum Steiner tree problem and analysis of greedy

approximations for optimization problems in optical net-

works [4] and wireless networks [3].

Open Problems

Can one show the performance ratio 1 + H(ı) for Greedy

Algorithm B for the MCDS? The answer is unknown.

More generally, it is unknown how to get a clean gener-

alization of Theorem 1.

Cross References

 Connected Dominating Set

 Local Search Algorithms for kSAT

 Steiner Trees

Acknowledgments

Weili Wu is partially supported by NSF grant ACI-0305567.

Recommended Reading

1. Du,D.-Z.,Graham,R.L.,Pardalos,P.M.,Wan,P.-J.,Wu,W.,Zhao,

W.: Analysis of greedy approximations with nonsubmodular

potential functions. ACM-SIAM Symposium on Discrete Algo-

rithms (SODA), 2008

3. Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Opti-

mization. Wiley, Hoboken (1999)

4. Ruan,L.,Du,H.,Jia,X.,Wu,W.,Li,Y.,Ko,K.-I.:Agreedyapprox-

imation for minimum connected dominating set. Theor. Com-

put. Sci. 329, 325–330 (2004)

5. Ruan, L., Wu, W.: Broadcast routing with minimum wavelength

conversion in WDM optical networks. J. Comb. Optim. 9 223–

235 (2005)

Greedy Set-Cover Algorithms

1974–1979; Chvátal, Johnson, Lovász, Stein

NEAL E. YOUNG

Department of Computer Science, University

of California at Riverside, Riverside, CA, USA

Keywords and Synonyms

Dominating set; Greedy algorithm; Hitting set; Set cover;

Minimizing a linear function subject to a submodular con-

straint

Problem Definition

Given a collection

S of sets over a universe U,aset cover

C 

S is a subcollection of the sets whose union is U.

The set-cover problem is, given

S, to ﬁnd a minimum-

cardinality set cover. In the weighted set-cover problem,

for each set s 2

S aweightw

 0isalsospeciﬁed,and

the goal is to ﬁnd a set cover C of minimum total weight

s2C

Weighted set cover is a special case of minimizing a lin-

ear function subject to a submodular constraint,deﬁnedas

follows. Given a collection

S of objects, for each object s

a non-negative weight w

,andanon-decreasingsubmod-

ular function f :2

! R, the goal is to ﬁnd a subcollec-

tion C 

S such that f (C)= f (S) minimizing

s2C

(Taking f (C)=j[

s2C

sjgives weighted set cover.)

Key Results

The greedy algorithm for weighted set cover builds a cover

by repeatedly choosing a set s that minimize the weight

divided by number of elements in s not yet covered by

chosen sets. It stops and returns the chosen sets when they

form a cover:

380 G Greedy Set-Cover Algorithms

greedy-set-cover(S, w)

1. Initialize C ;.Deﬁnef (C)

= j[

s2C

sj.

2. Repeat until f (C)=f (S):

3. Choose s 2 S minimizing the price per

element w

/[f (C [fsg)  f (C)].

4. Let C C [fsg.

5. Return C.

Let H

denote

i=1

1/i  ln k,wherek is the largest

set size.

Theorem 1 The greedy algorithm returns a set cover of

weight at most H

times the minimum weight of any cover.

Proof When the greedy algorithm chooses a set s,imagine

that it charges the price per element for that iteration to

each element newly covered by s. Then the total weight of

the sets chosen by the algorithm equals the total amount

charged, and each element is charged once.

Consider any set s = fx

; x

k1

;:::;x

g in the opti-

mal set cover C



. Without loss of generality, suppose that

the greedy algorithm covers the elements of s in the or-

der given: x

; x

k1

;:::;x

. At the start of the iteration in

which the algorithm covers element x

of s,atleasti el-

ements of s remain uncovered. Thus, if the greedy algo-

rithm were to choose s in that iteration, it would pay a cost

per element of at most w

/i. Thus, in this iteration, the

greedy algorithm pays at most w

/i per element covered.

Thus, it charges element x

at most w

/i to be covered.

Summing over i, the total amount charged to elements in s

is at most w

. Summing over s 2 C



and noting that ev-

ery element is in some set in C



, the total amount charged

to elements overall is at most

s2C



= H

OPT. 

The theorem was shown ﬁrst for the unweighted case

(each w

= 1) by Johnson [6], Lovász [9], and Stein [14],

then extended to the weighted case by Chvátal [2].

Since then a few reﬁnements and improvements have

been shown, including the following:

Theorem 2 Let

S be a set system over a universe with

n elements and weights w

 1. The total weight of the

cover C returned by the greedy algorithm is at most

[1 + ln(n/

OPT)]OPT +1(compare to [13]).

Proof Assume without loss of generality that the algo-

rithm covers the elements in order x

; x

n1

;:::;x

.Atthe

start of the iteration in which the algorithm covers x

,there

are at least i elements left to cover, and all of them could be

covered using multiple sets of total cost

OPT.Thus,there

is some set that covers not-yet-covered elements at a cost

of at most

OPT/i per element.

Recall the charging scheme from the previous proof.

By the preceding observation, element x

is charged

at most OPT/i. Thus, the total charge to elements

;:::;x

is at most (H

 H

i1

)OPT. Using the assump-

tion that each w

 1, the charge to each of the remain-

ing elements is at most 1 per element. Thus, the total

charge to all elements is at most i 1+(H

 H

i1

)OPT.

Taking i =1+dOPTe, the total charge is at most

dOPTe+(H

 H

dOPTe

)OPT  1+OPT(1 + ln(n/OPT)).

Each of the above proofs implicitly constructs a linear-

programming primal-dualpair to show the approximation

ratio. The same approximation ratios can be shown with

respect to any fractional optimum (solution to the frac-

tional set-cover linear program).

Other Results

The greedy algorithm has been shown to have an approx-

imation ratio of ln n  ln ln n + O(1) [12]. For the spe-

cial case of set systems whose duals have ﬁnite Vapnik-

Chervonenkis (VC) dimension, other algorithms have

substantially better approximation ratio [1]. Constant-

factor approximation algorithms are known for geometric

variants of the closely related k-median and facility loca-

tion problems.

The greedy algorithm generalizes naturally to many

problems. For example, for minimizing a linear function

subject to a submodular constraint (deﬁned above), the

natural extension of the greedy algorithm gives an H

approximate solution, where k =max

s2S

f (fsg)  f (;),

assuming f is integer-valued [10].

The set-cover problem generalizes to allow each el-

ement x to require an arbitrary number r

of sets con-

taining it to be in the cover. This generalization admits

a polynomial-time O(log n)-approximation algorithm [8].

The special case when each element belongs to at most

r sets has a simple r-approximation algorithm ([15]§

15.2). When the sets have uniform weights (w

=1),theal-

gorithm reduces to the following: select any maximal col-

lection of elements, no two of which are contained in the

same set; return all sets that contain a selected element.

The variant “Max k-coverage” asks for a set collection

of total weight at most k covering as many of the elements

as possible. This variant has a (1 1/e)-approximation al-

gorithm ([15] Problem 2.18) (see [7] for sets with non-

uniform weights).

For a general discussion of greedy methods for approx-

imate combinatorial optimization, see ([5]Ch.4).

Finally, under likely complexity-theoretic assump-

tions, the ln n approximation ratio is essentially the best

possible for any polynomial-time algorithm [3,4].

Greedy Set-Cover Algorithms G 381

Applications

Set Cover and its generalizations and variants are funda-

mental problems with numerous applications. Examples

include:

 selecting a small number of nodes in a network to store

a ﬁle so that all nodes have a nearby copy,

 selecting a small number of sentences to be uttered to

tune all features in a speech-recognition model [11],

 selecting a small number of telescope snapshots to be

taken to capture light from all galaxies in the night sky,

 ﬁnding a short string having each string in a given set

as a contiguous sub-string.

Cross References

 Local Search for K-medians and Facility Location

Recommended Reading

1. Brönnimann, H., Goodrich, M.T.: Almost optimal set covers in

finite VC-dimension. Discret. Comput. Geom. 14(4), 463–479

(1995)

2. Chvátal, V.: A greedy heuristic for the set-covering problem.

Math. Oper. Res. 4(3), 233–235 (1979)

3. Lund, C., Yannakakis, M.: On the hardness of approximating

minimization problems. J. ACM 41(5), 960–981 (1994)

4. Feige, U.: A threshold of ln n for approximating set cover.

J. ACM 45(4), 634–652 (1998)

5. Gonzalez, T.F.: Handbook of Approximation Algorithms and

Metaheuristics. Chapman & Hall/CRC Computer & Information

Science Series (2007)

6. Johnson, D.S.: Approximation algorithms for combinatorial

problems. J. Comput. Syst. Sci. 9, 256–278 (1974)

7. Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage

problem. Inform. Process. Lett. 70(1), 39–45 (1999)

8. Kolliopoulos, S.G., Young, N.E.: Tight approximation results

for general covering integer programs. In: Proceedings of the

forty-second annual IEEE Symposium on Foundations of Com-

puter Science, pp. 522–528 (2001)

9. Lovász, L.: On the ratio of optimal integral and fractional cov-

ers. Discret. Math. 13, 383–390 (1975)

10. Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Opti-

mization. Wiley, New York (1988)

11. van Santen, J.P.H., Buchsbaum, A.L.: Methods for optimal text

selection. In: Proceedings of the European Conference on

Speech Communication and Technology (Rhodos, Greece) 2,

553–556 (1997)

12. Slavik, P.: A tight analysis of the greedy algorithm for set cover.

J. Algorithms 25(2), 237–254 (1997)

13. Srinivasan, A.: Improved approximations of packing and cov-

ering problems. In: Proceedings of the twenty-seventh annual

ACM Symposium on Theory of Computing, pp. 268–276 (1995)

14. Stein, S.K.: Two combinatorial covering theorems. J. Comb.

Theor. A 16, 391–397 (1974)

15. Vazirani, V.V.: Approximation Algorithms. Springer, Berlin Hei-

delberg (2001)

Hamilton Cycles in Random Intersection Graphs H 383

Hamilton Cycles in Random

Intersection Graphs

2005; Efthymiou, Spirakis

CHARILAOS EFTHYMIOU

,PAUL SPIRAKIS

Department of Computer Engineering and Informatics,

University of Patras, Patras, Greece

Computer Engineering and Informatics, Research

and Academic Computer Technology Institute,

Patras University, Patras, Greece

Keywords and Synonyms

Threshold for appearance of Hamilton cycles in random

intersection graphs; Stochastic order relations between

Erdös–Rényi random graph model and random intersec-

tion graphs

Problem Definition

E. Marczewski proved that every graph can be represented

by a list of sets where each vertex corresponds to a set and

the edges to nonempty intersections of sets. It is natural to

ask what sort of graphs would be most likely to arise if the

list of sets is generated randomly.

Consider the model of random graphs where each ver-

tex chooses randomly from a universal set the members

of its corresponding set, each independently of the others.

The probability space that is created is the space of ran-

dom intersection graphs, G

n;m;p

,wheren is the number

of vertices, m is the cardinality of a universal set of ele-

ments and p is the probability for each vertex to choose

an element of the universal set. The model of random in-

tersection graphs was ﬁrst introduced by M. Karo

nsky, E.

Scheinerman, and K. Singer-Cohen in [4]. A rigorous deﬁ-

nition of the model of random intersection graphs follows:

Deﬁnition 1 Let n, m be positive integers and 0  p  1.

The random intersection graph G

n;m;p

is a probability

space over the set of graphs on the vertex set f1;:::;ng

where each vertex is assigned a random subset from a ﬁxed

set of m elements. An edge arises between two vertices

when their sets have at least a common element. Each ran-

dom subset assigned to a vertex is determined by



vertex i chooses element j



= p

with these events mutually independent.

Acommonquestionforagraphiswhetherithasacycle,

a set of edges that form a path so that the ﬁrst and the last

vertex is the same, that visits all the vertices of the graph

exactly once. We call this kind of cycle the Hamilton cycle

and the graph that contains such a cycle is called a Hamil-

tonian graph.

Deﬁnition 2 Consider an undirected graph G =(V ; E)

where V is the set of vertices and E the set of edges. This

graph contains a Hamilton cycle if and only if there is

a simple cycle that contains each vertex in V.

Consider an instance of G

n;m;p

,forspeciﬁcvaluesofits

parameters n, m,andp, what is the probability of that in-

stance to be Hamiltonian? Taking the parameter p,ofthe

model, to be a function of n and m,in[2], a threshold func-

tion P(n; m) has been found for the graph property “Con-

tains a Hamilton cycle”; i. e. a function P(n; m) is derived

such that

if p(n; m)  P(n; m)

lim

n;m!1



n;m;p

Contains Hamilton cycle



if p(n; m)  P(n; m)

lim

n;m!1



n;m;p

Contains Hamilton cycle



When a graph property, such as “Contains a Hamilton

cycle,” holds with probability that tends to 1 (or 0) as n,

m tend to inﬁnity, then it is said that this property holds

(does not hold), “almost surely” or “almost certainly.”

If in G

n;m;p

the parameter m is very small compared

to n, the model is not particularly interesting and when

m is exceedingly large (compared to n) the behavior of

n;m;p

is essentially the same as the Erdös–Rényi model

384 H Hamilton Cycles in Random Intersection Graphs

of random graphs (see [3]). If someone takes m =

for ﬁxed real ˛>0, then there is some deviation from the

standard models, while allowing for a natural progression

from sparse to dense graphs. Thus, the parameter m is as-

sumed to be of the form m = dn

e for some ﬁxed positive

real ˛.

The proof of existence of a Hamilton cycle in G

n;m;p

is mainly based on the establishment of a stochastic order

relation between the model G

n;m;p

and the Erdös–Rényi

random graph model G

Deﬁnition 3 Let n be a positive integer, 0 

p  1. The

random graph G(n;

p) is a probability space over the set of

graphs on the vertex set f1;:::;ng determined by



i; j



with these events mutually independent.

The stochastic order relation between the two models of

random graphs is established in the sense that if

A is an

increasing graph property, then it holds that



2 A



 Pr



n;m;p

2 A



where

p = f (p). A graph property

A is increasing if and

only if given that

A holds for a graph G(V; E)thenA

holds for any G(V; E

): E

 E.

Key Results

Theorem 1 Let m =

,where˛ is a ﬁxed real positive,

and C

; C

be suﬃciently large constants. If

p  C

log n

for 0 <˛<1 or

p  C

log n

for ˛>1

then almost all G

n;m;p

are Hamiltonian. Our bounds are

asymptotically tight.

Note that the theorem above says nothing when m = n,

i. e. ˛ =1.

Applications

The Erdös–Rényi model of random graphs, G

n;p

,isex-

haustively studied in computer science because it provides

a framework for studying practical problems such as “re-

liable network computing” or it provides a “typical in-

stance” of a graph and thus it is used for average case anal-

ysis of graph algorithms. However, the simplicity of G

n;p

means it is not able to capture satisfactorily many practical

problems in computer science. Basically, this is because of

the fact that in many problems independent edge-events

are not well justiﬁed. For example, consider a graph whose

vertices represent a set of objects that either are placed or

move in a speciﬁc geographical region, and the edges are

radio communication links. In such a graph, we expect

that, any two vertices u, w are more likely to be adjacent

to each other, than any other, arbitrary, pair of vertices, if

both are adjecent to a third vertex v. Even epidemiological

phenomena (like the spread of disease) tend to be more ac-

curately captured by this proximity-sensitive random in-

tersection graph model. Other applications may include

oblivious resource sharing in a distributive setting, inter-

action of mobile agents traversing the web etc.

The model of random intersection graphs G

n;m;p

was

ﬁrst introduced by M. Karo

nsky, E. Scheinerman, and

K. Singer-Cohen in [4] where they explored the evolu-

tion of random intersection graphs by studying the thresh-

olds for the appearance and disappearance of small in-

duced subgraphs. Also, J.A. Fill, E.R. Scheinerman, and

K. Singer Cohen in [3] proved an equivalence theorem re-

lating the evolution of G

n;m;p

and G

n;p

,inparticularthey

proved that when m = n

where ˛>6, the total variation

distance between the graph random variables has limit 0.

S. Nikoletseas, C. Raptopoulos, and P. Spirakis in [8]stud-

ied the existence and the eﬃcient algorithmic construc-

tion of close to optimal independent sets in random in-

tersection graphs. D. Stark in [12] studied the degree of

the vertices of the random intersection graphs. However,

after [2], Spirakis and Raptopoulos, in [11], provide al-

gorithms that construct Hamilton cycles in instances of

n;m;p

,forp above the Hamiltonicity threshold. Finally,

Nikoletseas et.al in [7] study the mixing time and cover

time as the parameter p of the model varies.

Open Problems

As in many other random structures, e. g. G

n;p

and ran-

dom formulae, properties of random intersection graphs

also appear to have threshold behavior. So far threshold

behavior has been studied for the induced subgraph ap-

pearance and hamiltonicity.

Other ﬁelds of research for random intersection

graphs may include the study of connectivity behavior, of

the model i. e. the path formation, the formation of gi-

ant components. Additionally, a very interesting research

question is how cover and mixing times vary with the pa-

rameter p,ofthemodel.

Cross References

 Independent Sets in Random Intersection Graphs

Hardness of Proper Learning H 385

Recommended Reading

1. Alon, N., Spencer, J.H.: The Probabilistic Method. 2nd edn. Wi-

ley, New York (2000)

2. Efthymiou, C., Spirakis, P.G.: On the Existence of Hamilton Cy-

cles in Random Intersection Graphs. In: Proc. of the 32nd ICALP.

LNCS, vol. 3580, pp. 690–701. Springer, Berlin/Heidelberg

(2005)

3. Fill, J.A., Scheinerman, E.R., Singer-Cohen, K.B.: Random inter-

section graphs when m = !(n): an equivalence theorem relat-

ing the evolution of the G(n; m; p)andG(n; p) models. Random

Struct. Algorithms 16, 156–176 (2000)

4. Karo

nski, M., Scheinerman, E.R., Singer-Cohen, K.: On Random

Intersection Graphs: The Subgraph Problem. Comb. Probab.

Comput. 8, 131–159 (1999)

5. Komlós, J., Szemerédi, E.: Limit Distributions for the existence

of Hamilton cycles in a random graph. Discret. Math. 43, 55–63

(1983)

6. Korshunov, A.D.: Solution of a problem of P. Erdös and A. Rényi

on Hamilton Cycles in non-oriented graphs. Metody Diskr.

Anal. Teoriy Upr. Syst. Sb. Trubov Novosibrirsk31, 17–56 (1977)

7. Nikoletseas, S., Raptopoulos, C., Spirakis, P.: Expander Proper-

ties and the Cover Time of Random Intersection Graphs. In:

Proc of the 32nd MFCS, pp. 44–55. Springer, Berlin/Heidelberg

(2007)

8. Nikoletseas, S., Raptopoulos, C., Spirakis, P.: The existence and

Efficient construction of Large Independent Sets in General

Random Intersection Graphs. In: Proc. of the 31st ICALP. LNCS,

vol. 3142, pp. 1029–1040. Springer, Berlin/Heidelberg (2004)

10. Singer, K.: Random Intersection Graphs. Ph.D. thesis, The Johns

Hopkins University, Baltimore (1995)

11. Spirakis, P.G. Raptopoulos, C.: Simple and Efficient Greedy Al-

gorithms for Hamilton Cycles in Random Intersection Graphs.

In: Proc. of the 16th ISAAC. LNCS, vol. 3827, pp. 493–504.

Springer, Berlin/Heidelberg (2005)

12. Stark, D.: The Vertex Degree Distribution of Random Intersec-

tion Graphs. Random Struct. Algorithms 24, 249–258 (2004)

Hardness of Proper Learning

1988; Pitt, Valiant

VITALY FELDMAN

Department of Engineering and Applied Sciences,

Harvard University, Cambridge, MA, USA

Keywords and Synonyms

Representation-based hardness of learning

Problem Definition

The work of Pitt and Valiant [16] deals with learning

Boolean functions in the Probably Approximately Correct

(PAC) learning model introduced by Valiant [17]. A learn-

ing algorithm in Valiant’s original model is given random

examples of a function f : f0; 1g

!f0; 1g from a repre-

sentation class

F and produces a hypothesis h 2 F that

closely approximates f .Herearepresentation class is a set

of functions and a language for describing the functions in

the set. The authors give examples of natural representa-

tion classes that are NP-hard to learn in this model whereas

they can be learned if the learning algorithm is allowed to

produce hypotheses from a richer representation class

Such an algorithm is said to learn

F by H;learningF by

F is called proper learning.

The results of Pitt and Valiant were the ﬁrst to demon-

strate that the choice of representation of hypotheses can

have a dramatic impact on the computational complex-

ity of a learning problem. Their speciﬁc reductions from

NP-hard problems are the basis of several other follow-up

works on the hardness of proper learning [1,3,6].

Notation

Learning in the PAC model is based on the assumption

that the unknown function (or concept) belongs to a cer-

tain class of concepts

C. In order to discuss algorithms that

learn and output functions one needs to deﬁne how these

functions are represented. Informally, a representation for

a concept class

C is a way to describe concepts from C that

deﬁnes a procedure to evaluate a concept in

C on any in-

put. For example, one can represent a conjunction of input

variables by listing the variables in the conjunction. More

formally, a representation class can be deﬁned as follows.

Deﬁnition 1 A representation class

F is a pair (L; R)

where

 L is a language over some ﬁxed ﬁnite alphabet (e. g.

f0; 1g);



R is an algorithm that for  2 L,oninput(; 1

)re-

turns a Boolean circuit over f0; 1g

In the context of eﬃcient learning, only eﬃcient repre-

sentations are considered, or, representations for which

is a polynomial-time algorithm. The concept class repre-

sented by

F is set of functions over f0; 1g

deﬁned by the

circuits in f

R(; 1

) j 2 Lg. For most of the represen-

tations discussed in the context of learning it is straight-

forward to construct a language L and the corresponding

translating function

R, and therefore they are not speci-

ﬁed explicitly.

Associated with each representation is the complexity

of describing a Boolean function using this representation.

More formally, for a Boolean function f 2

C, F-size( f )

is the length of the shortest way to represent f using

F,or

minfjjj 2 L;

R(; 1

)  f g.

In Valiant’s PAC model of learning, for a function f

and a distribution

D over X,anexample oracle EX(f ; D)

is an oracle that, when invoked, returns an example

386 H Hardness of Proper Learning

hx; f (x)i,wherex is chosen randomly with respect to

D, independently of any previous examples. For   0,

afunctiong -approximates a function f with respect to

distribution

D if Pr

[f (x) ¤ g(x)]  .

Deﬁnition 2 A representation class

F is PAC learnable

by representation class

H if there exist an algorithm that

for every >0, ı>0, n, f 2

F,anddistributionD over

X, given , ı, and access to EX(f ;

D), runs in time poly-

nomial in n; s =

F-size(c); 1/ and 1/ı,andoutputs,

with probability at least 1  ı,ahypothesish 2

H that -

approximates f .

A DNF expression is deﬁned as an OR of ANDs of liter-

als, where a literal is a possibly negated input variable. The

ANDs of a DNF formula are referred to as its terms.Let

DNF(k) denote the representation class of k-term DNF ex-

pressions. Similarly a CNF expression is an OR of ANDs of

literals. Let k-CNF denote the representation class of CNF

expressions with each AND having at most k literals.

For a real-valued vector c 2 R

and  2 R,alinear

threshold function (also called a halfspace) T

c;

(x)isthe

function that equals 1 if and only if

in

 .The

representation class of Boolean threshold functions con-

sists of all linear threshold functions with c 2f0; 1g

and

 an integer.

Key Results

Theorem 3 ([16]) For every k  2, the representation

class of DNF(k) is not properly learnable unless RP = NP.

More speciﬁcally, Pitt and Valiant show that learning

DNF(k)byDNF(`) is at least as hard as coloring a k-

colorable graph using ` colors. For the case k =2theyob-

tain the result by reducing from Set Splitting (see [8]for

details on the problems). Theorem 3 is in sharp contrast

with the fact that DNF(k)islearnablebyk-CNF [17].

Theorem 4 ([16]) The representation class of Boolean

threshold functions is not properly learnable unless

RP = NP.

This result is obtained via a reduction from the

NP-complete Zero-One Integer Programming problem

(see [8](p. 245) for details on the problem). The result is

contrasted by the fact that general linear thresholds are

properly learnable [4].

These results show that using a speciﬁc representation

of hypotheses forces the learning algorithm to solve a com-

binatorial problem that can be NP-hard. In most machine

learning applications it is not important which represen-

tation of hypotheses is used as long as the value of the un-

known function is predicted correctly. Therefore learning

in the PAC model is now deﬁned without any restrictions

on the output hypothesis (other than it being eﬃciently

evaluatable). Hardness results in this setting are usually

based on cryptographic assumptions (cf. [14]).

Hardness results for proper learning based on assump-

tion NP ¤ RP are now known for several other represen-

tation classes and for other variants and extensions of

the PAC learning model. Blum and Rivest show that for

any k  3, unions of k halfspaces are not properly learn-

able [3]. Hancock et al. prove that decision trees (

cf. [15]

for the deﬁnition of this representation) are not learnable

by decision trees of somewhat larger size [10]. This result

was strengthened by Alekhnovich et al. who also prove

that intersections of two halfspaces are not learnable by in-

tersections of k halfspaces for any constant k, general DNF

expressions are not learnable by unions of halfspaces (and

in particular are not properly learnable), and k-juntas are

not properly learnable [1]. Feldman shows that DNF ex-

pressions are NP-hard to learn properly even if member-

ship queries, or the ability to query the unknown function

at any point, are allowed [6]. No eﬃcient algorithms or

hardness results are known for any of the above learning

problems if no restriction is placed on the representation

of hypotheses.

The choice of representation is very important even

in powerful learning models. Feldman proved that n

term DNF are not properly learnable for any constant c

even when the distribution of examples is assumed to be

uniform and membership queries are available [6]. This

contrasts with Jackson’s celebrated algorithm for learning

DNF in this setting [12], which is not proper.

In the agnostic learning model of Haussler [11]and

Kearns et al. [13] even the representation classes of con-

junctions, halfspaces, and parity functions are NP-hard to

learn properly (cf. [2,7,9] and references therein). Here

again the status of these problems in the representation-

independent setting is largely unknown.

Applications

A large number of practical algorithms use representations

for which hardness results are known (most notably deci-

sion trees, halfspaces, and neural networks). Hardness of

learning

F by H implies that an algorithm that uses H

to represent its hypotheses will not be able to learn F in

the PAC sense. Therefore such hardness results elucidate

the limitations of algorithms used in practice. In particu-

lar, the reduction from an NP-hard problem used to prove

the hardness of learning

F by H can be used to generate

hard instances of the learning problem.

High Performance Algorithm Engineering for Large-scale Problems H 387

Open Problems

A number of problems related to proper learning in the

PAC model and its extensions are open. Almost all hard-

ness of proper learning results are for learning with respect

to unrestricted distributions. For most of the problems

mentioned in Sect. “Key Results” it is unknown whether

the result is true if the distribution is restricted to belong

to some natural class of distributions (e. g. product distri-

butions). It is unknown whether decision trees are learn-

able properly in the PAC model or in the PAC model with

membership queries. This question is open even in the

PAC model restricted to the uniform distribution only.

Note that decision trees are learnable (non-properly) if

membership queries are available [5] and are learnable

properly in time O(n

log s

), where s is the number of leaves

in the decision tree [1].

An even more interesting direction of research would

be to obtain hardness results for learning by richer repre-

sentations classes, such as AC

circuits, classes of neural

networks and, ultimately, unrestricted circuits.

Cross References

 Cryptographic Hardness of Learning

 Graph Coloring

 Learning DNF Formulas

 PAC Learning

Recommended Reading

1. Alekhnovich, M., Braverman, M., Feldman, V., Klivans, A., Pitassi,

T.: Learnability and automizability. In: Proceeding of FOCS, pp.

621–630 (2004)

2. Ben-David,S.,Eiron,N.,Long,P.M.:Onthedifficultyofapprox-

imately maximizing agreements. In: Proceedings of COLT, pp.

266–274 (2000)

3. Blum, A.L., Rivest, R.L.: Training a 3-node neural network is NP-

complete. Neural Netw. 5(1), 117–127 (1992)

4. Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.: Learn-

ability and the Vapnik-Chervonenkis dimension. J. ACM 36(4),

929–965 (1989)

5. Bshouty, N.: Exact learning via the monotone theory. Inf. Com-

put. 123(1), 146–153 (1995)

6. Feldman, V.: Hardness of Approximate Two-level Logic Mini-

mization and PAC Learning with Membership Queries. In: Pro-

ceedings of STOC, pp. 363–372 (2006)

7. Feldman, V.: Optimal hardness results for maximizing agree-

ments with monomials. In: Proceedings of Conference on

Computational Complexity (CCC), pp. 226–236 (2006)

8. Garey, M., Johnson, D.S.: Computers and Intractability. W. H.

Freeman, San Francisco (1979)

9. Guruswami, V., Raghavendra, P.: Hardness of Learning Halfs-

paces with Noise. In: Proceedings of FOCS, pp. 543–552 (2006)

10. Hancock, T., Jiang, T., Li, M., Tromp, J.: Lower bounds on learn-

ing decisionlists and trees. In: 12th Annual Symposium on The-

oretical Aspects of Computer Science, pp. 527–538 (1995)

11. Haussler, D.: Decision theoretic generalizations of the PAC

model for neural net and other learning applications. Inf. Com-

put. 100(1), 78–150 (1992)

12. Jackson, J.: An efficient membership-query algorithm for learn-

ing DNF with respect to the uniform distribution. J. Comput.

Syst. Sci. 55, 414–440 (1997)

13. Kearns, M., Schapire, R., Sellie, L.: Toward efficient agnostic

learning. Mach. Learn. 17(2–3), 115–141 (1994)

14. Kearns, M., Valiant, L.: Cryptographic limitations on learning

boolean formulae and finite automata. J. ACM 41(1), 67–95

(1994)

15. Kearns, M., Vazirani, U.: An introduction to computational

learning theory. MIT Press, Cambridge, MA (1994)

16. Pitt, L., Valiant, L.: Computational limitations on learning from

examples. J. ACM 35(4), 965–984 (1988)

17. Valiant, L.: A theory of the learnable. Commun. ACM 27(11),

1134–1142 (1984)

High Performance Algorithm

Engineering for Large-scale Problems

2005; Bader

DAVID A. BADER

College of Computing, Georgia Institute of Technology,

Atlanta, GA, USA

Keywords and Synonyms

Experimental algorithmics

Problem Definition

Algorithm engineering refers to the process required to

transform a pencil-and-paper algorithm into a robust, eﬃ-

cient, well tested, and easily usable implementation. Thus

it encompasses a number of topics, from modeling cache

behavior to the principles of good software engineering;

its main focus, however, is experimentation. In that sense,

it may be viewed as a recent outgrowth of Experimen-

tal Algorithmics [14], which is speciﬁcally devoted to the

development of methods, tools, and practices for assess-

ing and reﬁning algorithms through experimentation. The

ACM Journal of Experimental Algorithmics (JEA),atURL

www.jea.acm.org,isdevotedtothisarea.

High-performance algorithm engineering [2]focuses

on one of the many facets of algorithm engineering: speed.

The high-performance aspect does not immediately imply

parallelism; in fact, in any highly parallel task, most of the

impact of high-performance algorithm engineering tends

to come from reﬁning the serial part of the code.

The term algorithm engineering was ﬁrst used with

speciﬁcity in 1997, with the organization of the ﬁrst Work-

shop on Algorithm Engineering (WAE 97).Sincethen,this