Design and Implementation of Fir Filter Using Vhdl

Abstract: This paper presents the study on current 'state of the art' for implementation of FIR filters in FPGAs. First, the overview on FIR filters is described. Furthermore, an understanding of FIR filter design method and structures. The paper then discusses about FPGA and implementation of FIR filter with FPGA, where it moves towards elaborating analysis about hardware implementations of FIR filter using different algorithms i.e. Distributed Arithmetic (DA), Common Subexpression Elimination (CSE), Wallace Tree Method and Sum-of-power-of-two (SOPOT) based on researches and other comparative studies by reliable sources.

Keywords: FIR, FPGA, DA, CSE, SOPOT MCM, MAC

1.   Introduction

In almost all these applications in areas such as Software Defined Radio modules, Biomedical systems or image and video processing systems [1] requires digital signal processing features of some degree. In Signal Processing, a Finite Impulse Response (FIR) filter is the most common module for noise reduction as the impulse response is of finite duration, because it settles to zero in finite time. The paper briefly introduces the general aspects of FIR filters. Furthermore, I also discuss the design methods and structures of FIR filters. In recent years, there has been a growing trend to implement digital signal processing functions in the Field-Programmable Gate Array (FPGA). It is an integrated circuit that can be programmed after manufacturing to function as any digital circuit determined by the designer. FPGA is the hardware used to implement FIR filters because of its low cost, high power and low power intemperance. By allowing designers to create circuit architectures developed for the specific applications, high levels of performance can be achieved for many DSP applications providing considerable improvements over conventional microprocessors and dedicated DSP processor solutions [14]. Power consumption has been an important design consideration as the recent trends move towards wireless communication systems and portable computing devices where FPGAs play an important role as an alternative solution for realisation of digital signal processing tasks.

2.   FIR Filter Overview

A FIR filter is designed by finding the coefficients and filter orders that meet certain specifications such as time or frequency domain. One of the main reasons for using FIR filters is the inherent properties: Non-recursive linear time invariant systems stable, finite and casual [2]. The General FIR Filter:

No alt text provided for this image

Equation 1: FIR Transfer Function

FIR filters do not include feedback in its equation and all poles are placed at the center of the convergence circumference, it is also known as all-zero filter because it has zeros but no poles other than that at the origin [6] and, second advantage is that they can be easily designed as having exact linear phase by providing that h(n) is symmetric or antisymmetric [3]. In other words, the group delay of the filter can be constant. Last advantage is that they are casual, the out y at each sample n is a weighted sum of the present input, x[n], and past inputs, x[n-1], x[n-2] ...x[n-M]. The delay of the casual linear-phase FIR filter of length N is exactly (N-1)/2. The required filter length N increases when sharp transitions between frequency bands are specified or large attenuations are required in the stop bands. Thus, high-performance, linear-phase filters with sharp cutoffs and large attenuations necessarily are long and have large, though constant, delays and a large coefficient to be stored [6]. The Design of the FIR filter is based on identifying the pulse transfer function G(z) that satisfies the requirement of the filter specification [4].

3.   FIR Filter Design Method

The FIR filter is designed through finding the coefficients and filter order to meet certain specifications which can be on frequency domain or time domain. The Process of selecting the filter's length and coefficients is known as filter design. The designing methods could be classified based on three main groups such as Filtering Optimisation, Frequency Sampling and Windowing.

Filtering Optimisation, Least Mean Square Error (MSE) method, Parks-McClellan method are used to find the optimal equiripple set of coefficients. In the optimal design methods, the weighted approximation error between the actual frequency response and the desired filter response is spread across the stopband and passband having ripples.

Frequency Sampling, sampling basically helps in converting the continuous time signal into discrete time signal.

No alt text provided for this image

Equation 2

These are the samples of equally spaced time instances, where the sampling frequency is shown in equation 3.

No alt text provided for this image

Equation 3: sampling frequency [samples/sec]

Number of frequency samples could be specified as unconstrained variables to improve the quality of approximation.

Windowing, A simple extension to the window design method is proposed that provides the flexibility to specify independently the passband ripple and stopband ripple. The time domain response of this ideal impulse response can be used as coefficients for the filter However, sharp transitions of frequencies in the frequency domain could create the time domain response infinitely long. The modification to the routine is based on creating an additive component which, when applied to the filter, can redistribute the concentration of z-plane zeros in the passband and stopband regions of the z-plane [5].

No alt text provided for this image

Figure 1: Frequency Response Plot

The FIR filter goal is to achieve a kind of frequency selectivity on the spectrum of the input signal. As shown in the Figure 1 (Frequency Response Plot) which is an example of practical lowpass filter where the frequency component in the passband is from DC to which will pass through the filter with no attenuation whereas the components in the stopband which is above will pass through experiencing significant attenuation. The output of the function will be the coefficients for the filter.

Software packages such as MATLAB, GNU Octave and C are used for calculation of coefficients, filter design and frequency response as they provide convenient ways to apply these different methods.

4.   FIR Filter Structures

As defined by Oguzhan [3], FIR filters can be implemented by three fundamental structures known as direct-form, transposed direct-form and symmetric direct-form. A FIR filter structure can be used to implement almost any sort of frequency response digitally. As shown in Figure 2 (Direct Form) a digital filter can be implemented based on using only three elements such as Delay Blocks, Multiplication by a constant and Addition. The direct form structure is one of the most straightforward structures as per filter transfer function perspective. It can be readily developed from the convolution sum description. The Direct Form is a reproduction of the FIR equation where the input signal x[n] is delayed or registered by a cell and added afterwards to the number of samples of the windows. These are the coefficient values used for multiplication and the output at time n is basically the summation of all the delayed samples multiplied by the appropriate coefficients.

No alt text provided for this image

Figure 2: Direct Form

The Direct and Transpose form requires the same number of delays, additions and multiplications but as observed in figure 2 and figure 3 the order of the filter coefficients is reversed due to which transpose form has shorter critical path. An advantage for Transpose form is that it emulates the pipelining implementations on digital design due to having delay after MAC operation.

No alt text provided for this image

Figure 3: Transpose Form

Transpose filter is naturally a pipelined structure which supports the multiple constant multiplications (MCM) technique under fixed coefficients whereas direct form filter does not support MCM technique. The MCM is more effective in Transpose form when the common operand is multiple with the set of constant coefficients that reduce the computational delay [7]. Transpose structure also benefits from linearity characteristics of FIR filter in reducing the number of coefficients to N/2 or N/2+1 based on H(0) = H(N-1), H(1) = H(N-2), H(2) = H(N-3).

5.   FIR Filters with FPGA

5.1. FPGA Background

The field-programmable gate array (FPGA) is an integrated circuit that consists of internal hardware blocks with user-programmable interconnects to customise operation for a specific application. FPGA architecture consists of thousands of fundamental elements called configurable logic blocks (CLBs), which is surrounded by a system of programmable interconnects which routes signals between CLBs. Input/output blocks interface between the FPGA and external device. Modern FPGAs have millions of logic gates and have become much more complex than its first launch in 1985. An FPGA can be useful when designing a prototype where the test and evaluation process can be performed on the parts of the design, even if the design is not fully completed. An FPGA device is low cost in comparison to ASIC [8], merging these with the other already known advantages such as architectural flexibility, high degree for data parallelisation and high throughput [9] have boosted the use of FPGAs in the DSP field.

Nowadays, scalation technology has driven FPGA architectural resources to limits above Moore theory hence it should not be a limitation for evaluating algorithms. Based on this paper review, which was published in 2018 [10], Comparing FPGAs with DSPs and GPUs for acceleration of hardware based on deep research study it does seem that FPGAs have been empowered due to limited hardware of MACs and bus data width in DSPs application field. The high non recurring engineering (NRE) cost and long development time for application specific integrated circuits (ASICs) make FPGAs attractive for application specific DSP solutions [11]. In comparison to all the other hardware accelerators FPGAs have the highest processing speed which allows them to provide configuration to high performance applications using efficient and flexible algorithms. As described in this paper [10] there are some low-power and high-end portable devices which would be useful in mobile application however, the downside would be longer development times in comparison to other devices due to challenges in the production of robust code, knowledge of development tools and device architectures.

5.2. Distributed Arithmetic (DA)

A most common approach of the algorithms is basically to simplify the multiplications operation by the means of additions which is known as Multiple Constant Multiplication (MCM). Most of the signal processing and communication applications including FIR filters, video, image processing and audio uses some sort of constant multiplication. The main concept is to find the optimal solution, namely, the one with the least amount of subtractions and additions. All of these techniques are based on computing constant multiplications using lookup tables (LUTs) and additions to help in reducing the power and speeding up the application. Distributed Arithmetic (DA) is a popular method and effectively used to implement FIR. It converts the calculation of MAC to a serial of look up table accesses and summation. Even more, it allows paralleling which helps in reducing the resource problem when the number of coefficients increase [12].

No alt text provided for this image

Equation 4: FIR of M order x(n) representation using DA

Taking the input signal as C2 signed number. Where  {0,1} is the bit of input . The FIR equation can also be written as:

No alt text provided for this image

Equation 5: FIR equation with DA

Which could be further expanded and re-arranged as:

No alt text provided for this image

Equation 6: FIR optimised equation

5.3. Common Subexpression Elimination (CSE)

In the international journal of research and computer science, K. Jinalkumari, et al [12] talked about the common subexpression elimination algorithm which helps in reducing the complexity as the coefficients are based on canonical signed digit (CSD). In comparison to the conventional implementation of FIR filters, CSE was aimed to identify the multiple events of identical bit patterns which are present in the coefficients to remove the redundant multiplications which results in reducing the number of adders as well as complexity of FIR filters. This algorithm uses binary representation of the coefficients for high order FIR filter implementation with less number of adder than CSD based CSE algorithm. The binary CSE algorithm focuses on reusing the most common binary bit patterns (BCSs) which are present in the coefficients through reducing the redundant computation in coefficient multiplier.

No alt text provided for this image

Equation 7: 3 bit binary representation forming four BCSs

Where x is the input signal. The other BCSs such as [0 0 1], [0 1 0] and [1 1 0] do not require any adder for implementation because they have only one nonzero bit. Based on the above four BCSs it is seen that they can be obtained by the right shift operation without using any extra adders and can be obtained using an adder.

5.4. Wallace Tree Algorithm

Wallace Tree algorithm is known for its fast computation and latency reduction in the overall circuit. This method involves parallelisation of actions where Carry Save Adder cell (CSA) is used in the tree structure. In Wallace tree architecture, all the bits of the partial products in each column are added together by a set of counters in parallel without propagating any carries [13].

No alt text provided for this image

Figure 4: Wallace Tree Method

It basically takes 3 numbers (x, y, z) to add together the output of two numbers, carry (c) and sum (s). It is carried out in one time unit duration. In a CSA cell, the carry (c) is bought until the last step and the ordinary addition is bought until the very last step. An example of 8 bits X 8 bits multiplication is shown in the figure 4. Architecture will be reduced in case of columns with less partial operations.

5.5. Sum-of-power-of-two (SOPOT)

SOPOT is basically a high speed low area architecture for FIR filter implementation without Multiplication block as shown in figure 5 [12].

No alt text provided for this image

Figure 5: Direct FIR Filter

The value of coefficients are integers power-of-two or sum-of-power-of-two with two or three terms of multipliers can be replaced by shifters.

No alt text provided for this image

Figure 6: Architecture Unit of FIR Filter with SOPOT Type Coefficient

No alt text provided for this image

Equation 8: SOPOT

Where = {-1,1} and = {-t, …,0, …, u}; t and u will determine the word length and the dynamic range of each filter whereas it will give closer approximation to its original number where the numbers are larger for t and u [12].

6.   Conclusion

In the initial stages FPGA resources and technology were not really prepared for signal processing. However, in the past 15 years, with the availability of these high-end and low-cost DSP-Optimised FPGAs with high level design entry methods and the inherent robustness of the design and verification process has really empowered FPGAs as the preferred choice of implementation of DSP. DA algorithm completely eliminates the multiplication block by converting complicated MAC operations to look-up table (LUT) summation and access at the cost of increasing storage resources. Based on the research papers it seems that the key focus areas for new researchers is on implementation of algorithms such as LUTs models, multipliers methods through addition algorithms and coefficient representation with DA. FPGAs have been increasingly used in DSP applications due to better performance in comparison to conventional DSP processors for filter implementations. Based on research studies it is proven that the use of FPGAs is growing rapidly as many applications are outstripping the processing of Digital Signal Processors. Finally, as the technology is evolving FPGAs have become a heterogeneous platform involving multiple software and hardware components. FPGAs provide enough MAC power to support a wide variety of DSP applications however, power consumption is becoming a limiting factor in many cases where using specific vendors and their tailored design flow power consumption could be improved.

7. References

[1] Kumm, Martin "FIR filter optimization for video processing on FPGAs", EURASIP journal on advances in signal processing, 1(1), pp. 1, 2013.

[2] J. G.Proakis, Digital Signal Processing: Principles, Algorithms and Applications, 3rd ed., Prentice Hall, 1996.

[3] O. Coşkun, "FPGA Schematic Implementations and Comparison of FIR Digital Filter Structures," Bajece, Bokul, 2018.

[4] George Ellis, Control System Design Guide, 4 edn., United States of America: Elsevier, 2012.

[5] Smith, M.J.T. "A novel FIR filter design method based on windowing", IEEE International Symposium on Circuits and Systems, 1(1), pp. 347-50, 1989.

[6] T.W. Parks and C.S. Burrus, Digital Filter Design. New York: Wiley,1987.

[7] Ariprasath.S and Dr.C.Santhi "Transpose Form FIR Filter Design for Fixed and Reconfigurable Coefficients", International Research Journal of Engineering and Technology (IRJET), 4(3), pp. 1859, 2017.

[8] Andreas Ehliar. "Performance driven FPGA design with an ASIC perspective". PhD thesis. Linköping University Electronic Press, 2009.

[9] S. Banerjee, "Performance Analysis of Different DSP Algorithms on Advanced Microcontroller and FPGA," IEEE, Zouk Mosbeh, 2009.

[10] A. HajiRassouliha, "Suitability of recent hardware-accelerators (DSPs, FPGAs and GPUs) for computer vision and image processing algorithms," Elsevier, Auckland, 2018.

[11] Shahnam Mirzaei, Ryan Kastner, and Anup Hosangadi "Layout Aware Optimization of High Speed Fixed Coefficient FIR Filters for FPGAs", International Journal of Reconfigurable Computing, 1(1), pp. 1-8, 2010.

[12] Jinalkumari K. Dhobi, Dr. Y. B. Shukla, Dr. K.R.Bhatt "FPGA Implementation of FIR Filter using various Algorithms: A Retrospective", International Journal of Research in Computer Science, 4(2), pp. 19-24, 2014.

[13] Prince Pandey, Madhuraj, Mayank Kumar 'FPGA Implementation of Convolution using Wallace Tree Multiplier', International Journal of Engineering Research & Technology, 3(6), pp. 1904, 2014.

 [14] Roger Woods, John McAllister, Gaye Lightbody, Ying Yi FPGA-based Implementation of Signal Processing Systems, 1 edn., United Kingdom: John Wiley & Sons Ltd, 2008.

Design and Implementation of Fir Filter Using Vhdl

Source: https://www.linkedin.com/pulse/fir-filter-design-implementation-using-fpgas-gaurav-pahuja

0 Response to "Design and Implementation of Fir Filter Using Vhdl"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel