# DESIGN OF PROCESSING ELEMENT (PE3) FOR IMPLEMENTING PIPELINE FFT PROCESSOR Mary RoselineThota, MounikaDandamudi and R.Ramana Reddy Department of ECE, MVGR College of Engineering(A), Vizianagaram. #### ABSTRACT Multiplexing is a method by which multiple analog message signals or digital data streams are combined into one signal over a shared medium. In communication, different multiplexing schemes are used. To achieve higher data rates, Orthogonal Frequency Division Multiplexing (OFDM) is used due to its high spectral efficiency. OFDM became a serious alternative for modern digital signal processing methods based on the Fast Fourier Transform (FFT). The problems with Orthogonal subcarriers can be addressed with FFT in communication applications. An 8-bit processing element (PE3), used in the execution of a pipeline FFT processoris designed and presented in this paper. Simulations are carried out using Mentor Graphics tools in 130nm technology. ## **KEYWORDS:** Multiplexing, OFDM, FFT processor, Mentor Graphics tools. ## 1. Introduction InDiscrete Signal Processing and telecommunications, Discrete Fourier Transform (DFT) is essential. Cooley and Tukey [1] proposed FFT to overcome the intensive computation, which has applications involving OFDM, such as WiMAX, LTE, DSL, DAB/DVB systems, and efficiently reduced the time complexity from $O(N^2)$ to $O(N\log 2N)$ , where N denotes the FFT size. Different FFT processors developed for hardware implementation are classified as memory based and pipeline based architectures [2-4]. Memory-based architecture (single Processing Element (PE) approach), consists of a principal Processing Element and multiple memory units resulting in reduced power consumption and less hardware than the pipeline architecture, but have disadvantages like low throughput, long latency, and cannot be parallelized. Besides, the pipeline architecture can overcome the disadvantages of the memory based architecture style, with an acceptable hardware overhead. Single-path Delay Feedback (SDF) pipeline and Multiple-path Delay Commutator (MDC) pipeline architectures are the two widely used design styles in pipeline FFT processors. SDF pipeline FFT [2-5] requires less memory, easy to design, utilizes less than 50% of the multiplication computation, and its control unit is used in portable devices In view of the advantages, the Radix-2 SDF pipeline architecture is considered in implementing the FFT DOI: 10.5121/ijci.2016.5435 processor. Three processing elements are used in the architecture of the proposed design of FFT processor [1]. In this paper, design of 8-bit processing element (PE3) is implemented. ## 2.FFT ALGORITHM The DFT $X_k$ of an N-point discrete-time signal $x_n$ is defined by: $$X_{k} = \sum_{n=0}^{N-1} x_{n} W_{N}^{nk}, \qquad , 0 \le k \le N-1$$ (1) where $W_N^{nk} = e^{-j2\pi nk/N}$ is twiddle factor. The direct implementation of DFT is difficult to realize due to the requirement of more hardware. Therefore, to reduce its hardware cost and speed up the computation time, FFT was developed. By using Decimation-in-Time (DIT) or decomposition or Decimation-in-Frequency (DIF), FFT analyzes an input signal sequence to construct a Signal-Flow Graph (SFG) that can be computed efficiently. DIF decomposition is employed as it meets the operation of SDF pipeline architecture. A radix-2 DIF FFT SFG for N=8 is presented in Figure 1. Figure 1. Radix-2 Decimation-In-Frequency Fast Fourier Transform Signal Flow Graph for N=8. To perform FFT computing, complex multiplication scheme [6-11] is used, as a result hardware cost is increased due to the use of ROM and complex multipliers. DIF FFT is suitable for hardware implementation as it has a regular SFG and requires less complex multipliers resulting in smaller area of the chip. For example, an input signal multiplied by $W_8^1$ in Figure. 1 can be expressed as: $$(x+jy)W_s^1 = \sqrt{2}[(x+y)+j(x-y)]/2,$$ (2) Where (x+jy) denotes a complex discrete-time signal. Similarly, the complex multiplication of W<sup>3</sup><sub>8</sub> is given by $$(x+jy)W_8^3 = \sqrt{2}[(x-y)-j(x+y)]/2$$ (3) Both the equations (2) and (3) will ease hardware implementation. From symmetric property of the twiddle factors, the complex multiplications can be one of the following three operation types: Type 1: $$W_N^k(x+jy) = W_N^{k-(N/4)}(y-jx)$$ $\frac{N}{4} < k < \frac{N}{2}$ (4) Type 2: $$W_N^k(x+jy) = -W_N^{k-(N/2)}(x+jy)$$ $\frac{N}{2} < k < \frac{3N}{4}$ (5) Type 3: $$W_N^k(x+jy) = -W_N^{k-\binom{3N/4}{2}}(y-jx)$$ $\frac{3N}{4} < k < N$ (6) Any twiddle factor can be obtained by combining the twiddle-factor primary elements (equations (4-6)). The three operation types are used to find the twiddle factor required to reduce the size of the ROM. Additional operation types are given below: Type 4: $$W_N^k(x+jy) = \left[W_N^{(N/4)-k}(y+jx)\right]^*$$ $1 \le k < \frac{N}{4}$ (7) Type 5: $$W_N^k(x+jy) = -j \left[ W_N^{(N/2)-k}(y+jx) \right]^* \qquad \frac{N}{4} < k < \frac{N}{2}$$ (8) Where \* indicates conjugate value. A significant shrinkage of twiddle- factor ROM table can be obtained, after the third butterfly stage as the complex multiplications will be reduced by using the five operation types. ## **3.ARCHITECTURE OF FFT:** A radix-2 8point pipeline FFT processor is presented in Figure 2.The architecture of the pipeline FFT processor contains three processing elements namely,PE3, PE2 and PE1, a complex constant multiplier and delay-line buffers. To remove the twiddle-factor ROM, a reconfigurable complex constant multiplier is used which reduces chip area required and power consumption of FFT processor. Figure 2. Radix-2 8 point pipeline FFT processor. # **PROCESSING ELEMENTS** The three processing elements PE1, PE2, and PE3 of the radix-2 pipeline FFT processor are presented in Figures.3 to 5, respectively. The Processing Elements processes each stage of the butterfly presented in Figure.1. PE3 stage implements a simple radix-2 butterfly, and functions as the sub module for PE2 and PE1 stages. In Figure 3, Iinand Iout denote the real parts, and Qin and Qoutare the imaginary parts of the input and output data, respectively. Similarly, DL\_Iinand DL\_Iout stand for the real parts and DL\_Qinand DL\_Qoutare for the imaginary parts of input and output of the DL buffers, respectively. The multiplication by –*j* or 1 is required for PE2 stage. By taking 2's complement of the input value, multiplication by -1 in Figure.4 can be done practically. Compared to PE2 stage, calculations in PE1 stage are more complex, as it computes the multiplications by -j, $W_N^{N/8}$ and $W_N^{3N/8}$ respectively. Since $W_N^{N/8} = -\frac{1}{2} \sqrt{\frac{1}{N}} \sqrt{\frac{3}{8}} \sqrt{\frac{8}{8}}$ either the multiplication by multiplication with -j or the reverse of the previous calculations and forms a low-cost hardware by saving a bit-parallel $W_N^{3N/8}$ multiplier for computing $DL\_I_{out} \xrightarrow{SO} DL\_I_{in}$ $Q_{in} \xrightarrow{SO} DL\_Q_{out}$ $DL\_Q_{out} \xrightarrow{SO} DL\_Q_{in}$ Figure 4. Architecture of PE2 Figure 5.Architecture of PE1. # 4. PROCESSING ELEMENT(PE3) PE3 is the main component in FFT processor as it serves as the sub module for PE2 and PE1 stages. It processes the stage P=3 of the radix-2 8 point DIF FFT butterfly structure in Figure 1. Hardware implementation of PE3 employs a ten transistor adder and a multiplexer.1-bit and 8-bit PE3 elements are presented in Figure. 6 and 7 respectively. Figure 6.Schematic of 1-bit PE3. Figure 7.Schematic of 8-bit PE3. ## 5. RESULTS PE3 element is simulated with ELDO software in Mentor Graphics. The simulated waveforms of 1-bit and 8-bit PE3 are shown in figure 8 and figure 9-10 respectively. Figure 8. 1-bit PE3 simulated waveforms. PE3 element processes the stage P=3 of theradix-2 DIF-FFT . It takes Input data (Iin) and Delay Output(DL\_Iout) as the inputs and gives the Output data(Iout) and Input Delay to the next buffer(DL\_Iin) based on the selection line of the multiplexer. When $$S_0=0$$ DL\_Iin = Iin (9) Iout = DL\_Iout (10) $S_0=1$ DL\_Iin = DL\_Iout - Iin (11) Iout = DL\_Iout + Iin. (12) ## From Figure 8, When So=0, Inputs are Iin= 1010; Dl\_Iout=0001 then outputs are Dl\_Iin=1010; Iout = 0001 When So=1, Inputs are Iin=1000; Dl\_Iout=1011 then outputs are Dl\_Iin=0011; Iout=0011 Figure 9Input waveforms of 8-bit PE3. Figure 10 Output waveforms of 8-bit PE3. The power dissipation (from the E-Z wave)of 1-bit PE3 is 0.5517 mwatts and for 8-bit PE3 it is 0.9237mwatts. ## 6. CONCLUSIONS The pipelined FFT architecture contains three processing elements PE1, PE2, PE3. PE3 is the important element as it serves as a sub module to the other two processing elements PE2 and PE1.PE3 (1- bit and 8-bit) is implemented using Mentor Graphics tools and the power dissipation is observed. To implement the proposed pipelined architecture of FFT, PE2 and PE1 are to be further designed. # REFERENCES - [1] J. W. Cooley and J. W. Tukey, "An algorithm for the machine calculation of complex Fourier series," Math. Comput., Vol. 19, pp. 297-301, Apr. 1965. - [2] S.-Y. Peng, K.-T. Shr, C.-M. Chen, and Y.-H. Huang, "Energy-efficient 128~2048/1536-point FFT processor with resource block mapping for 3 GPP-LTE system," in Proc. Int. Conf. Green Circuits Syst., Jun. 2010. - [3] Nilesh Chide, ShreyasDeshmukh, Prof. P.B. Borole, "Implementation of OFDM System using IFFT and FFT", International Journal of Engineering Research and Applications (IJERA), Vol. 3, Issue 1, January -February 2013, pp.2009-2014 - [4] Taewon Hwang, Chenyang Yang, Gang Wu, Shaoqian Li, and Geoffrey Ye Li," OFDM and Its Wireless Applications: A Survey", IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 58, NO. 4, MAY 2009. - [5] Lokesh C, Dr. Nataraj K. .," Implementation of an OFDM FFT Kernel for WiMAX", International Journal Of Computational Engineering Research, Vol. 2 Issue. 8, Dec. 2012. - [6] Chua-Chin Wang, Jian-Ming Huang, and Hsian-Chang Cheng, "A 2K/8K mode small-area FFT processor for OFDM demodulation of DVB-T receivers," IEEE Transactions on Consumer Electronics, Vol. 51, no. 1, pp. 28-32, Feb. 2005. - [7] C. P. Hung, S. G. Chen, and K. L. Chen, "Design of an efficient variable-length FFT processor," Proceedings of the 2004 International Symposium on Circuits and Systems, vol. 2, pp. 23–26, May 2004 - [8] KoushikMaharatna, Eckhard Grass, and Ulrich Jagdhold, "A 64-Point Fourier transform chip for high-speed wireless LAN application using OFDM," IEEE Journal of Solid-State Circuits, Vol. 39, no. 3, pp. 484-493, Mar. 2004. - [9] Yu-Wei Lin and Chen-Yi Lee," Design of an FFT/IFFT Processor for MIMO OFDM Systems", IEEE Transactions on circuits and systems—I, VOL. 54, NO. 4, APRIL 2007. - [10] Hsii-Fu Lo; Ming-Der Shieh; Chien-Ming Wu, "Design of an efficient FIT processor for DAB system", IEEE International Symposium on CircuiB and Systems, Volume: 4, May 2001. - [11] P. DivakaraVarma, Dr. R. Ramana Reddy, "A novel 1-bit full adder design using DCVSL XOR / XNOR gate and Pass transistor Multiplexers" in International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-2, Issue-4, March 2013 pp: 142-146 ## **AUTHORS** Mary RoselineThota received B.Tech. degreein ECE from GVP College of Engineering for Women in 2014. Pursing M.Tech(VLSI) in MVGR College of Engineering. Research interest includes VLSI design methodologies.and Low power VLSI design MounikaDandamudireceived B.Tech. degree in ECE from Chirala Engineering College in 2014. Pursing M.Tech(VLSI) in MVGR College of Engineering. Research interest includes VLSI design methodologies and Low power VLSI design. Dr. R. Ramana Reddydid AMIE in ECE from The Institution of Engineers(India) in 2000, M.Tech (I&CS) from JNTU College of Engineering, Kakinadain 2002, MBA (HRM & Marketing) from Andhra University in 2007 and Ph.Din Antennas in 2008 from Andhra University. He is presently working asProfessor & Head, Dept. of ECE in MVGR College of Engineering, Vizianagaram. Coordinator, Center of Excellence – Embedded Systems, Head, National Instruments Lab VIEW academy established in Department of ECE, MVGR College of Engineering. Convener of several national level conferences and workshops. Published about 70 technical papers in National/International Journals / Conferences. He is a member of IETE, IEEE, ISTE, SEMCE (I), IE, and ISOI. His research interests include Phased Array Antennas, Slotted Waveguide Junctions, EMI/EMC, VLSI and Embedded Systems.