80 CHAPTER 6. MATRIX APPROXIMATION VIA RANDOM SAMPLING
6.7 Discussion
Sampling from the length square distribution was introduced in a paper by
Frieze, Kannan and Vempala [FKV98, FKV04] in the context of a constant-
time algorithm for low-rank approximation. It has been used many times sub-
sequently. There are several advantages of sampling-based algorithms for matrix
approximation. The first is efficiency. The second is the nature of the approxi-
mation, namely it is often interpolative, i.e., uses rows/columns of the original
matrix. Finally, the methods can be used in the streaming model where memory
is limited and entries of the matrix arrive in arbitrary order.
The analysis for matrix multiplication is originally due to Drineas and Kan-
nan [DK01]. The linear-time low-rank approximation was given by Drineas et
al. [DKF
+
04]. The CUR decomposition first appeared in [DK03]. The best-
know sample complexity for the constant-time algorithm is O(k
2
/
4
) and other
refinements are given in [DKM06a, DKM06b, DKM06c]. An alternative sam-
pling method which sparsifies a given matrix and uses a low-rank approximation
of the sparse matrix was given in [AM07].
We conclude this section with a description of some typical applications. A
recommendation system is a marketing tool with wide use. Central to this is
the consumer-product matrix A where A
ij
is the “utility” or “preference” of
consumer i for product j. If the entire matrix were available, the task of the
system is simple - whenever a user comes up, it just recommends to the user the
product(s) of maximum utility to the user. But this assumption is unrealistic;
market surveys are costly, especially if one wants to ask each consumer. So,
the essential problem in Recommendation Systems is Matrix Reconstruction -
given only a sampled part of A, reconstruct (implicitly, because writing down
the whole of A requires too much space) an approximation A
0
to A and make
recommendations based on A
0
. A natural assumption is to say that we have
a set of sampled rows (we know the utilities of some consumers- at least their
top choices) and a set of sampled columns (we know the top buyers of some
products). This model very directly suggests the use of the CUR decomposi-
tion below which says that for any matrix A given a set of sampled rows and
columns, we can construct an approximation A
0
to A from them. Some well-
known recommendation systems in practical use relate to on-line book sellers,
movie renters etc.
In the first mathematical model for Recommendation Systems Azar et al.
[AFKM01] assumed a generative model where there were k types of consumers
and each is a draw from a probability distribution (a mixture model). It is easy
to see then that A is close to a low-rank matrix. The CUR type model and
analysis using CUR decomposition was by [DKR02].
We note an important philosophical difference in the use of sampling here
from previous topics discussed. Earlier, we assumed that there was a huge
matrix A explicitly written down somewhere and since it was too expensive to
compute with all of it, one used sampling to extract a part of it and computed
with this. Here, the point is that it is expensive to get the whole of A, so we
have to do with a sample from which we “reconstruct” implicitly the whole.