8. Implementation Aspects Flashcards
What is a real-time system?
When the sampling rate at the input and at the output of the scheme is the same. Such a processing unit is called a real-time system if it fulfills the timeliness property. This means that the system is sufficiently fast to ensure that always fs^(out) valid output samples y[k] are produced per second, whatever the signal applied to the input or the operation mode the processing scheme is in.
(p. 229)
What is a batch processing system?
Real-time systems contrast with batch processing systems, i.e. with systems that process a finite set of (prerecorded) data samples (typically stored in the memory of a microprocessor or on hard disk) in an off-line way. Batch processing systems are usually not dictated by time constraints : the user simply has to wait till all the samples have been processed.
(p. 230)
Comment on the correctness of the code.
When porting a signal processing solution onto dedicated hardware, it does not suffice to simply debug the code until all the warnings and (allocation) errors have disappeared: a thorough validation of the correctness of the code is mandatory. This is typically done by processing representative, random test signals both with the original (reference) code in the higher programming language and with the code written in the lower programming language. Only if the difference is small, i.e. around machine precision, correctness can generally be assumed.
(p. 230)
Make sure you can calculate the number of arithmetic operations per sample and the number of arithmetic operations per second that are needed to implement an algorithmic scheme (for example the FIR filtering operation of equation 8.1).
Observe that the number of (arithmetic) operations required to implement an (N + 1)-taps FIR filter amounts to N + 1 multiplications (assuming that all bn 6= 0 and bn 6= ±1) and N additions (assuming that all terms bnx[k − n] 6= 0) per output sample y[k]. As typically, both a multiplication and an addition of two real-valued numbers can be performed in one machine cycle, the number of (arithmetic) operations equals 2N + 1 op. per output sample. If fs is the rate at which filter input x[k], and hence, output y[k] is sampled, the required number of (arithmetic) operations per second is given by (2N + 1)fs ops.
(p. 231)
What are MIPS and MOPS?
Processing power is typically expressed in MIPS (Millions of Instructions per Second) or MOPS (Millions of Operations per Second), and can be found in the data sheet of the processor.
Keep in mind, though, that as each chip manufacturer uses its own definition of MIPS and MOPS, these numbers need to be interpreted with care.
(p. 231)
What is the disadvantage of cost estimates?
Recall that a cost estimate merely counting the number of multiplications and additions is a rough approximation and underestimate for the total execution time only. Nevertheless, it is a useful instrument to highlight the dependence of the total cost on each of the algorithm parameters and to compare and benchmark different algorithmic schemes.
(p. 232)
Make sure you can calculate the required amount of memory that is needed to implement an algorithmic scheme in real time (for example the FIR filtering operation of equation 8.1).
For example, it is easily verified that for a real-time implementation of the FIR filter of equation 8.1 in total 2N + 1 data words need to be stored : N + 1 memory locations are required for the filter coefficients bn (assuming that all bn 6= 0 and bn 6= ±1) and N locations for delayed input samples, i.e. for x[k−1], . . . , x[k−N]. Depending on the specific implementation, a few more storage registers (memory cells) are needed to store additional intermediate variables and the current input x[k] and current output y[k]. The amount of memory required for the program code depends on the software encoding and the programming style.
(p. 232)
Which parts does the input/output delay of a real-time implementation scheme consist of?
- a delay due to the A/D and the D/A converter
- a latency caused by the operating system
- a delay inserted by the block processing
- possibly, an algorithmic delay introduced (on purpose) inside the signal processing scheme
(p. 232 - 234)
What is block processing?
What is sample-based processing?
Block processing is a popular data handling technique used inside many signal processing schemes. The idea is to transfer data and process this data in frames of P samples rather than on a sampleby-sample basis. Sample-based processing is in fact a special case of block processing with P = 1.
(p. 232 - 233)
How long is the delay introduced by a block processing system with frame length P?
Make sure you understand figure 8.1.
Unfortunately, block processing inevitably introduces a delay, which is at least as large as block length, also called frame length P.
(p. 234)
What is in trade-off with the processing delay?
Typically, there is a trade-off between processing delay on the one hand, and computational complexity and/or algorithm performance on the other hand.
(p. 234)
Make sure that you understand figures 8.2 and 8.3, and that you can give the corresponding difference equation, transfer function, filter order, impulse response, frequency response and the poles and zeros.
You may furthermore be asked to discuss the stability and to calculate the number of arithmetic operations, the expected block processing delay and the amount of memory required for a real-time implementation.
(p. 235 - 237)
What is a multiply-accumulate operation? What is it used for?
Many DSP processors provide hardware-encoded so-called multiply-accumulate (MAC) instructions to efficiently perform such a combined multiplication and addition.
Hence, if MAC instructions can be used, the number of operations per second needed
for the FIR filtering operation of equation 7.1 reduces to (N + 1)fs ops.
(p. 235 - 236)
What is a circular buffer?
What is it used for?
If a linear buffer is used, the first cell contains the current input x[k] and in the last cell the least recent input sample x[k − N] is stored. Disadvantage of this approach is that each time a new
input sample arrives the entire delay line has to be updated (mass shift of data). This explains why often a circular buffer is preferred, where at each sample instance only the oldest sample x[k − N], which is not needed any further, is replaced with the newest input sample x[k+ 1] and all other data in memory are left unchanged.
(p. 236)
When are fast convolution techniques needed?
Be aware that FIR filtering operations often require a large number of calculations, and that hence, cost efficient implementation schemes based on fast convolution techniques are called for.
(p. 237)
What is a fast convolution scheme?
A fast convolution scheme is a block processing approach that implements an FIR filtering operation in the frequency domain by relying on the convolution theorem.
(p. 237)
Explain in words why only the last M −N circular convolution results are meaningful, and hence, why the inequality P ≤ M − N must hold.
It is typical for a linear convolution operation in fact that each time the index n of filter coefficient w[n] is increased by 1 (see equation 8.11), the data signal index k − n is decremented by 1. Observe from equations 8.6–8.10 that this is the case for ˜y^(r)[N] till ˜y^(r)[M −1], but not for ˜y^(r)[0] till ˜y^(r)
[N −1], where a sudden increment of the data signal index by M − 1 occurs owing to the periodic repetition of the signal. Hence, although ˜y^(r)[0] till ˜y^(r)[N − 1] are valid circular convolution results, they do not correspond to a linear convolution operation. They are therefore useless for the intended application, which is linear FIR filtering. In other words, the circular convolution ˜w ⋆⊙x˜^(r) results in a sequence ˜y^(r) of length M, of which only the last M − N entries are meaningful and correspond to a real FIR filtering operation.
During each iteration through the different processing steps P new data samples are input into the scheme, and hence, P validly filtered data samples have to be delivered at the output. Only in this way the input data rate matches the output data rate and realtime operation can be ensured. As of the M samples equation 8.5 produces, only M − N are useful, i.e. correspond to a valid linear convolution, the mentioned inequality.
(p. 240)
The 6 algorithmic steps that are shown on page 240 implement the overlap-save fast convolution scheme, and hence perform an FIR filtering operation in the frequency domain. Make sure you understand each of the 6 steps.
How many calculations are needed?
What are the memory requirements?
Give a practical application where the scheme can be used.
?
p. 240 - 241
What is special about calculating the circular convolution in the frequency domain when the w and x are real valued?
Recall that only [M/2] + 1 out of the M products W˜ [m] · X˜(r) [m] need to be computed in case w and x are real valued.
(p. 241)
Make sure you can make a cost estimate for the overlap-save fast convolution scheme and compare with a standard FIR filtering operation. You are not supposed to prove or memorize the cost formula for the M-point real-input FFT (2M log2 M − 4M + 6) and the real-output IFFT (2M log2 M) you find on page 242. These formulas come from a book and were merely used to generate figure 8.5. They correspond to one particular implementation scheme for the real-input FFT and the real-output inverse FFT. There exist many other realizations with a different (yet comparable)
computational complexity.
At the exam you will be either given the cost formulas (2M log2 M − 4M + 6 and 2M log2 M) or are supposed to give a more simplified complexity estimate yourself. Recall from section 4.3 that a simplified yet realistic complexity estimate both for the M-point (real-input) FFT and the M-point (real-output) IFFT is M log2 M.
(p. 242)
What is the price to pay for the considerable cost reduction that is obtained with fast convolution?
Unfortunately, there is no such thing as a free lunch. Despite the remarkably low implementation cost for large N¯ , the biggest disadvantage of the fast convolution approach is the increased processing delay compared to the standard implementation of figure 8.2.
In fact, as the fast convolution scheme is a block processing approach with frame length P, a delay of (up to) 2P samples is introduced (see section 8.1.5). For long filters w, the block processing delay 2P ≈ 2N¯ may be too large to qualify for a real-life application. Recall that the standard implementation of figure 8.2 typically introduces a block processing delay of 1 sample only.
(p. 243)
Comment on the different realization structures of IIR filters.
Keep in mind that there exist different realization structures for IIR filters. They all produce the same filter output y[k] (when making abstraction of possible round-off errors and finite precision), but differ in terms of computational complexity, memory requirements, block processing delay and numerical stability (robustness to finite word length effects).
(p. 244)
Make sure that if you are given a block diagram of an IIR realization structure such as figures 8.6–8.11, you can understand and interpret the scheme, you recognize the type of realization structure (direct form I, direct form II, cascade or parallel) and you can give the corresponding difference equation, transfer function, filter order, frequency response and the poles and zeros.
Furthermore, you may be asked to discuss the stability and to calculate the number of arithmetic operations, the expected block processing delay and the amount of memory that is required for a real-time implementation. You are not supposed to draw block diagrams yourself.
(p. 244 - 249)
Why do finite word length effects show up in digital signal processing?
Finite word length effects tend to show up in digital signal processing schemes such as digital filters that are implemented on finite-precision hardware. They occur because data samples and algorithm parameters (e.g. filter coefficients) are represented by a finite number of bits and calculations are done using (fixed-point) finite-precision arithmetic.
(p. 248)