Writing a Basic FIR Filter in C
Introduction and Theory
A FIR (Finite Impulse Response) filter is mathematically identical to a convolution between the input array and a set of taps which represent the impulse response of the filter. This can be implemented using a series of dot products between the tap vector and a vector consisting of the current input sample and those which came immediately before it.
where is the output array, is the input array, is the array of taps and is the number of taps.
When implementing this in code we need to decide how to treat the first samples as we have no indication of what data should come before them. We can take one of two methods, either we can assume all previous samples are 0 or we can start our calculation at the sample. For this example we will choose the zero assumption method as this gives us the same number of output samples as input.
Basic FIR Filter
We will first create an unoptimised FIR filter which cab be called multiple times as data comes in, with memory of the previous samples. This will require 4 functions, one to initialize the required memory, one to apply the FIR filter, one to reset the memory in the case of a discontinuity in the received data, and one to free all the filter memory. To pass pass around this memory we will first declare the following structure:
typedef struct FIR_Stream_Mem {
size_t tap_len;
double *taps;
double *buffer;
} FIR_Stream_Mem;The tap_len field contains the number of taps stored in
the taps array. buffer will contain the
previous samples and will also have size tap_len. We will
allocate and populate this structure using the following function:
FIR_Stream_Mem *init_fir_stream(double* taps, size_t tap_len){
/* Create the output object in memory */
FIR_Stream_Mem *output = (FIR_Stream_Mem*)(malloc(sizeof(FIR_Stream_Mem)));
/* Copy the taps across */
output->tap_len = tap_len;
output->taps = (double*)(malloc(sizeof(double) * tap_len));
memcpy(output->taps, taps, tap_len * sizeof(double));
/* Set the buffer to all zeros */
output->buffer = (double*)(malloc(sizeof(double) * tap_len));
memset(output->buffer, 0x00, tap_len * sizeof(double));
return output;
}This first allocates the memory to store the
FIR_Stream_Mem structure itself, before allocating memory
for the buffer and taps. The taps are copied over from the input to
allow for the structure to be passed around more easily, and the buffer
is set to all zeros as discussed above.
We are now ready to start using the FIR filter. To do this, we need to implement the algorithm so it can be run on our data. We will use the following function:
void apply_fir(FIR_Stream_Mem *mem,
const double *input,
double *output,
size_t size){
/* Loop through all input samples */
for(size_t i = 0; i < size; i++){
/* Calculate first output sample */
double sample_output = input[i] * mem->taps[0];
/* Loop through all the taps */
for(size_t j = 1; j < mem->tap_len; j++){
sample_output += mem->buffer[j] * mem->taps[j];
}
/* Update buffer */
for(size_t j = mem->tap_len - 2; j > 0; j--){
mem->buffer[j + 1] = mem->buffer[j];
}
mem->buffer[1] = input[i];
output[i] = sample_output;
}
}This has the following arguments:
mem: A pointer to theFIR_Stream_Memstruct for the filter to be applied.input: A pointer to an array of input samples.output: A pointer to an array of output samples. This must be pre-allocated with a size greater than or equal to the size of the input array.size: The number of samples to process. Note this must be less than or equal to the size of the input array
It loops through all the input samples and calculates the dot product as discussed above. It then shifts all the samples in the buffer to the right and adds the current sample to the front before calculating the next sample.
This function can now be applied on chunks of our input data as they
come in, with the memory from the last chunk being kept in the
FIR_Stream_Mem struct.
However, if a discontinuity is detected in the input stream of
samples the history stored in buffer will be incorrect and
we will have no way of determining what it should be. In this case it is
best to reset the filter by setting the filter’s buffer to all zeros,
which can be done with the following function:
void reset_fir(FIR_Stream_Mem *mem){
memset(mem->buffer, 0x00, mem->tap_len * sizeof(double));
}Finally, when we are done with the filter we should free all associated memory. This can be done with the following function:
void deinit_fir(FIR_Stream_Mem *mem){
free(mem->buffer);
free(mem->taps);
free(mem);
}Speed Optimizations
The biggest flaw with the previous implementation is the shifting of
the buffer array in the FIR_Stream_Mem struct
after every sample is processed. This is done by re-writing every
element of the buffer, as shown in the following table:
| Sample | Buffer Content |
|---|---|
| N | 7 6 5 4 3 2 1 0 |
| N + 1 | 8 7 6 5 4 3 2 1 |
| N + 2 | 9 8 7 6 5 4 3 2 |
This means that we are doing FIR_Stream_Mem->tap_len
writes to memory in order to update the buffer. However, only one of the
values in buffer has actually changed, the others have just
shifted position. By changing the start position of the buffer, we can
reduce the number of writes to only one per sample. The table below
shows this in action, where the bolded positions are the start of the
array.
| Sample | Buffer Content |
|---|---|
| N | 7 6 5 4 3 2 1 0 |
| N + 1 | 7 6 5 4 3 2 1 8 |
| N + 2 | 7 6 5 4 3 2 9 8 |
To implement this method we first need to modify the
FIR_Stream_Mem struct to record the starting position of
buffer :
typedef struct FIR_Stream_Mem {
size_t tap_len;
double *taps;
double *buffer;
size_t buffer_start;
} FIR_Stream_Mem;The buffer_start value should also be initialized to 0
in init_fir_stream:
FIR_Stream_Mem *init_fir_stream(double* taps, size_t tap_len){
/* Create the output object in memory */
FIR_Stream_Mem *output = (FIR_Stream_Mem*)(malloc(sizeof(FIR_Stream_Mem)));
/* Copy the taps across */
output->tap_len = tap_len;
output->taps = (double*)(malloc(sizeof(double) * tap_len));
memcpy(output->taps, taps, tap_len * sizeof(double));
/* Set the buffer to all zeros */
output->buffer = (double*)(malloc(sizeof(double) * tap_len));
memset(output->buffer, 0x00, tap_len * sizeof(double));
/* Set the buffer start position to 0 */
output->buffer_start = 0;
return output;
}Finally, we need to modify apply_fft to loop index and
update buffer correctly. We loop though buffer
when applying the filter taps, so we make out first mod here. Rather
than using the for-loop index, we calculate our own index using the
following algorithm:
size_t idx;
if(mem->buffer_start + j < mem->tap_len){
idx = mem->buffer_start + j;
} else {
idx = mem->buffer_start + j - mem->tap_len;
}This ensures we don’t get an out-of-bounds error by looping the index back to the start if it will otherwise overrun.
We also need to ensure that buffer gets updated
correctly. This is done by setting the value at
buffer_start to be our current input sample, then
subtracting one from buffer_start. We again avoid an
overrun by checking if buffer_start is 0, in which case we set
buffer_start to be the last element in the array. This is
done using the following code:
mem->buffer[mem->buffer_start] = input[i];
if(mem->buffer_start == 0){
mem->buffer_start = mem->tap_len - 1;
} else {
mem->buffer_start--;
}This gives a final apply_fft function of:
void apply_fir(FIR_Stream_Mem *mem, const double *input, double *output, size_t size){
/* Loop through all input samples */
for(size_t i = 0; i < size; i++){
/* Calculate first output sample */
double sample_output = input[i] * mem->taps[0];
/* Loop through all the taps */
for(size_t j = 1; j < mem->tap_len; j++){
size_t idx;
if(mem->buffer_start + j < mem->tap_len){
idx = mem->buffer_start + j;
} else {
idx = mem->buffer_start + j - mem->tap_len;
}
sample_output += mem->buffer[idx] * mem->taps[j];
}
/* Update buffer */
mem->buffer[mem->buffer_start] = input[i];
if(mem->buffer_start == 0){
mem->buffer_start = mem->tap_len - 1;
} else {
mem->buffer_start--;
}
output[i] = sample_output;
}
}Speed Boost
Modern compilers are pretty good at optimizing the code given to
them, so does this modification actually give a noticeable speed
increase? To test this we can use the following function which calls
apply_fft 1,000,000 times and compare the execution times
between the two implementations:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <assert.h>
#include "fir_filter.h"
static double randfrom(double min, double max){
double range = max - min;
double div = RAND_MAX / range;
return min + (rand() / div);
}
int main(){
/* Seed random number generator */
srand(time(NULL));
/* Create some random input data */
double data[1024];
for(size_t i = 0; i < 1024; i++){
data[i] = randfrom(-1.0, 1.0);
}
/* Create random taps */
double taps[8] = {1, 2, 3, 4, 4, 3, 2, 1};
/* Init fir filter */
FIR_Stream_Mem *fir_filter = init_fir_stream(taps, 8);
/* Filter data */
double output[1024];
for(size_t i = 0; i < 1000000; i++){
apply_fir(fir_filter, data, output, 1024);
}
/* Deallocate the FIR memory */
deinit_fir(fir_filter);
}Running the code with the unoptimized apply_fft took
25.936 seconds, whereas the optimized version took only 22.511
seconds.
Memory Optimization
As modern computers have a lot of memory I have not experimented with
the possible memory optimizations which could be made to this code.
However, there are two changes which could be made. The first is that
the init_fir_stream function could store the pointer its
given to taps rather than making a copy. This would reduce the memory
footprint of the FIR filter at the expense of thread safety in
multithreaded programs. Secondly, the buffer element in
FIR_Stream_Mem is currently one byte longer than it needs
to be - the final element is never used as part of the filter. This
could be fixed, but it would make the indexing in apply_fft
a bit more complicated.