![](https://cv01.studmed.ru/view/b2c6759ffad/bg137.png)
294
Convolution and Correlation
overlapped by Q
— 1
samples to eliminate errors due to circular convolution
and saving N
—
Q + 1 output values, this method is called overlap-save
method of indirect convolution.
Implementation
The implementation consists of reading blocks of data, two at a time, com-
puting the DFT, multiplying it with the DFT of the impulse response,
computing the IDFT of the product, and storing the valid output. These
operations are repeated until the data is exhausted. As we are finding the
output of two blocks of length N at a time, the computational complexity
for one block is half and it is of computing a complex DFT. The computa-
tion of the DFT of the impulse response is carried out once and is ignored
in the analysis. A single DFT algorithm is sufficient as it can be used for
computing both DFT and IDFT. A table of twiddle factors can be used
repeatedly. The multiplication of two DFTs requires 3iV
—
4 operations for
each block. A block produces N
—
Q +1 valid output points. For example,
let N = 64. The 2x2 PM algorithm for complex data requires 1184 real
multiplications and additions. Multiplying the DFTs requires 188 opera-
tions.
For an impulse response of length 12, 64
—
11 = 53 valid output
points are produced. Therefore, the number of operations per output point
is
U84+188 _ 25.9 (approximately 41og
2
TV).
As for the choice for block length N, the following considerations must
be taken into account. If the block length is long there are two disadvan-
tages:
(i) more memory is required and (ii) the number of DFT operations
per point increases. On the other hand, if the block length is short then
the overlap is more and the number of valid outputs decreases. In general,
for efficient implementation, the following condition is required.
Input length >> Block length, N » Impulse response, Q
Table 14.1 shows the number of operations for various block and impulse
response lengths. The minimum operation count is shown in boldface. The
other major factor is the data transfers which is about
2
log
2
N per output
point. With the complexity order N log
2
N for both arithmetic operations
and data transfers, the indirect convolution using fast DFT algorithms is
efficient for most cases than direct convolution. As the block sizes are much
smaller for any data size, the storage requirements are moderate.