Page 187 - 2024-Vol20-Issue2
P. 187

183 |                                                              Hussein & AL-Assfor

Fig. 1. The format of BSP floating-point number [8]                (24*24)-bit AVM which can be utilized to fulfill the mantissa
                                                                   multiplication for BSP floating-point numbers. The proposed
is consist of three filed: sign-bit (SX ), biased exponent (EX ),  (24*24)-bit AVM is then, optimized using the pipelining ap-
and the mantissa (or significand (Mx = 1.FX ) [9]. Where FX is     proach. The proposed AVM architectures are built using the
the fraction bits of the mantissa. These three fields are packed   improved XOR-gate to substantially improve the multiplier
                                                                   performance. The proposed AVMs are coded in VHDL, sim-
into a word such that.                                             ulated in Xilinx 14.7 ISE software tool, and synthesized by
                                                                   different FPGA families, such as: Virtex-5, Virtex-6, Virtex-7,
X = (-1)SX · MX · 2EX                                (1)           Zynq. and then, a complete analysis for their performances is
                                                                   provided.
Floating-point multiplier is used to carry out the mantissa        This work is arranged as follows: Sec. II. reviews some of
multiplications of two floating-point operands [10]. Multipli-     the previous works related to the floating-point multipliers
cation two (n-bit) numbers X and Y can be carried out by the       and MAC modules. Sec. III. , explains the general design of a
following three steps [4] :                                        BSP floating-point MAC module. Section IV. affords details
Step 1: generation of partial-products.                            of the proposed AVM using an EBK-CSLA architecture, after
Step 2: reduction these partial-products using a set of adders,    which the details of the implementation results, simulations,
like (3:2) carry-save adders (CSA) (or simply CSA) to pro-         and comparing the effectiveness of the proposed multiplier
duce the intermediate product in sum and carry vectors form.       with the existent multiplier designs are offered in Sec. V. , and
The (3:2) in CSA denotes the number of inputs/outputs of the       the conclusion is given in Section VI. .
adder.
Step 3: generation the final product (final multiplication re-                  II. LITERATURE REVIEW
sult) using fast adder. The inputs of this adder are the sum
and carry vectors produced in step 2.                              In 2015, N. Jithendra et al [12] have presented two approaches
Generally, the multiplier speed essentially depends on the         to design MAC modules one to perform fixed-point signed
accumulation of the sum and carry vectors to produce the final     numbers and the other to perform floating-point numbers.
multiplication result and the multiplication algorithm utilized.   Their architectures were designed utilizing Wallace-tree mul-
The goal of this work is to design high speeding and low           tiplier and ripple carry adder (R-CA). Their multiplier and
area VM based on Urdhva-Tiryakbhyam-Sutra (UT-Sutra)               MAC designs had presented enhancement in terms of power,
approach for BSP floating-point MAC modules to achieve             but gained higher delay and area consumption due to using the
high-performance digital-signal processing. This goal may be       Wallce tree structure and due to the utilization of the R-CA
accomplished throughout the steps bellow:                          which leads to high carry propagation delay during addition.
- Design an efficient adder to add the intermediate sum and
carry vectors that generated from the CSA to produce the               In 2016, authors in [13] had proposed a VM for floating-
final multiplication result, since the speed of the multiplier is  point operands. Their design had based on using three cas-
highly relied on the speed of that adder.                          caded carry lookahead-adders (CLA-A)s to perform the partial-
- Usage of the improved XOR gate in [11] to design the entire      product reduction and the final addition to generate the final
parts of the proposed multiplier, and                              product. Nevertheless, their design had consumed higher area
- Improve the speed of the proposed multiplier further using       and had a considerable delay due to the carry propagation
the pipelining concept.                                            among the three adders.
Based on the above steps, this work presents a distinctive
design for a (6*6)-bit VM called here adjusted-VM (simply,             In [14, 15], authors had designed (24*24)-bit Vedic based
AVM). The design has utilized the conventional (3*3)-bit           multipliers to perform mantissa multiplication for floating-
VM along with an enhanced design for the Brent-Kung carry-         point inputs. Their designs had comprised three cascaded
select adder (EBK-CSLA) in [11] to produce the final product       levels of R-CAs to add the generated partial-products and to
result from the sum and carry vectors. The (6*6)-bit AVM           produce the final product. Their designs have achieved low
circuit is in turn, utilized to design (12*12)-bit and then, a     speed due to using the R-CA which is considered the slowest
                                                                   adder among the adders.

                                                                       G. Jha et al [16] had designed four kinds of multipliers
                                                                   to be used in MAC module, namely modified-booth, Wallace
                                                                   tree-reduction, add-shift, and combinational array multipliers
                                                                   and analyzed its performance when using these multipliers.
                                                                   However, none of these multipliers have introduced good
                                                                   performance in terms of power, delay and area occupation on
                                                                   the designed MAC. For example, the Wallce tree-reduction
   182   183   184   185   186   187   188   189   190   191   192