Page 191 - 2024-Vol20-Issue2
P. 191

187 |                                                            Hussein & AL-Assfor

                       TABLE II.                                 any propagation for the carry during the partial product re-
PERFORMANCE COMPARISON OF UNPIPELINED                            duction step and the final addition step performed to generate
 FLOATING-POINT (24*24)-BIT MULTIPLIERS                          the final product. Thus, the proposed (24*24)-bit AVM can
                                                                 be used to design an efficient floating-point MAC module to
   Ref.      FPGA family        Delay(ns)  No. of LUTs           meet the requirements of cutting-edge DSP applications.
   [26]   Virtex-7 Design1        47.33         164
                                  28.02        2928
   [3]                 Design2    27.76        1121
   [28]                Design3   21.823           -
          Virtex-6                17.33        1763
Proposed  Virtex-7                12.74        1260
          Virtex-5               12.395        1018
          Virtex-6               11.583        1015
          Virtex-7               11.583        1014
          Zynq

                     TABLE III.
PERFORMANCE COMPARISON OF PIPELINED
FLOATING-POINT (24*24)-BIT MULTIPLIERS

   Ref.   FPGA family  Delay(ns)           No. of LUTs
   [31]      Virtex-5     6.61                    -
             Virtex-5     3.65
Proposed     Virtex-6    3.452                  568
             Virtex-7    3.117                  564
              Zynq        2.58                  563
                                                558

    The internal organization RTL-scheme of the synthesized      Fig. 9. Internal organization of the (24*24)-bit AVM
AVM with more details is depicted in Fig. 9.                                           scheme in RTL.

    The (24*24)-bit AVM is simulated to validate their func-        Fig. 10. Simulation input/output waveforms of
tionality in multiplying mantissa parts of two floating-point                        (24*24)-bit AVM.
operands. The functionality of the (24*24)-bit AVM is verified
by providing several cases of inputs (the inputs are in deci-
mal representation) to verify the corresponding outputs. For
example, case1: 100*24 = 2400, case2: (570*320) =182400,
and case3: (1320*23450) =30954000, etc. as illustrated in
Fig. 10.

    Tables II and III show a comparison between the proposed
(24*24)-bit multiplier without/with pipelining with some ex-
isting multipliers. It is shown from Table II that the proposed
unpipelined (24*24)-bit AVM has achieved reduction in delay
and area utilization of 33.16 % and 42.42%, respectively than
the multiplier offered one in [28] for the same FPGA family
which is Virtex-7.

    For pipelined design case, it can be noticed from table III
that the proposed (24*24)-bit AVM has achieved less delay of
44.78% than the one proposed in [31] for the same FPGA fam-
ily (virtex-5), and that the lowest delay and area occupation
for the pipelined (24*24)-bit AVM are obtained when using
the FPGA Zynq family. It is clear from Tables II and III that
the proposed (24*24)-bit AVM yields less delay and achieves
significant reduction in area utilization compared with the
mentioned multipliers. The reduction in delay is due to the
use of the EBK-CSLA along with a single CSA to eliminate
   186   187   188   189   190   191   192   193   194   195   196