Page 188 - 2024-Vol20-Issue2
P. 188
184 | Hussein & AL-Assfor
based MAC had achieved better speed than the others, while (64-bit MAC) module [7], as depicted in Fig. 2. Fig. 3 illus-
the modified booth-encoding based MAC had the lowest area trates the floating-point MAC module internal organization.
than other multipliers they used. To perform a (32*32) floating-point multiplication, two 32-bit
floating-point operands X and Y are used, each consists of
A. S. K. Vamsi et al [6] had proposed an (8*8)-bit VM us- three fields namely mantissa (M): M = 1.F, where F denotes
ing the UT-Sutra approach for (8*8)-bit MAC module (simply to the fraction bits (23-bit) and the integer bit 1 is hidden),
16-bit MAC). Their design had utilized two cascaded CSAs to 8-bit exponent (E), and a sign bit (S) are used [27–32].
reduce the partial-products generated from the four (4*4)-bit
VMs into two vectors: namely, the sum and carry vectors. The multiplication of X and Y is accomplished using three
However, their design is incomplete and inaccurate, as the de- operations performed in parallel as follows:
sign lacks the most important part which is the adder that must -Generate the sign of the product:
be used to generate the final multiplication result (namely, the -The most significant bits (MSB)s of the two inputs X and Y
final product). The inputs of that adder should be the sum are XORed together to produce the sign of the product.
and carry vectors generated from the cascaded CSAs of their -Compute the exponent (E) of the product: The exponent of
design in order to produce the final multiplication result. X and Y (EX ,EY ) are added using any adder and the bias is
subtracted from the result, namely
K. Thiruvenkadam and S. Saravanan [17] had used a modi-
fied full-adder (FA) to design array multiplier for BSP floating- E = EX + EY - bias (3)
point numbers. Their design has relied on Divide and Con-
quer (D-C) algorithm. The design had implemented using the (the bias=127 for single-precision) [23].
pipelining concept. Although their design had some improve- -Multiplication of the mantissas (M): A (24*24)-bit multiplier
ments in terms of power and area, but it had incurred more such as VM is utilized to perform mantissa multiplication as
delay. shown:
R. Sravani et al [18] have presented a BSP floating-point M = MX * MY (4)
VM based on the Karatsuba algorithm. The multiplier is called
carry-save VM and it consists of a top-level of 23 half-adders where MX = 1.FX and MY = 1.FY
(HAs) followed by multilevel of FAs; each level consists of
23 FAs. The delay of their carry-save VM design is relatively
high due to the carry propagation of the multilevel FAs.
Authors in [19, 20] have proposed multipliers using Karat-
suba algorithm for fixed-point/floating-point MAC module.
Their multipliers had involved three cascaded CSAs to re-
duce the generated partial-products into sum and carry vec-
tors. To generate the final product, the authors had they used
the Kogge-stone adder. However, their designed multiplier
incurred delay and area occupation due to the use of three
cascaded CSAs prior to the Kogge-stone adder.
III. FLOATING -POINT MAC MODULE
Floating-point MAC module is desirable for higher accuracy
and performance computations to perform the operation F as
shown:
F = ?X *Y (2)
Where X and Y are inputs operands. It incorporates a floating- Fig. 2. Block scheme of floating-point MAC module
point adder, floating-point multiplier, and an accumulator-
register. The performance of a DSP system depends substan-
tially on the performance of its MAC module and precisely on
the speed of multiplication processes within the MAC mod-
ule [21, 22].
The backbone of all digital signal computations is the lies in
floating-point arithmetic field [23–26]. BSP format of IEEE-
754 standard is used to design (32*32)-bit floating-point MAC