# A FAST MULTIPLIER HARDWARE DESIGN FOR INTERVAL ARITHMETIC 

${ }^{1}$ Ahmet SERTBAȘ $\quad{ }^{2}$ Hani EL-ABDALLAH ${ }^{3}$ Fethullah KARABIBER<br>${ }^{1,2,3}$ Istanbul University, Engineering Faculty, Computer Engineering Department 34320, Avcilar, Istanbul, Turkey<br>${ }^{1}$ E-mail: asertbas@istanbul.edu.tr ${ }^{2}$ E-mail: hani956@yahoo.com<br>${ }^{3}$ E-mail: fetullah@istanbul.edu.tr


#### Abstract

In this paper, a new parallel hardware unit for interval multiplication is presented. Using the VHDL synthesis results, the area and delay estimates for the new design are given. Compared to previous hardware interval multipliers, our design is faster, but, requires more area.


Keywords: Interval arithmetic, multipliers, VHDL synthesis, performance analysis

## 1. INTRODUCTION

As computer applications have led to rapid increase in computing power, reliable computation requires results to be highly accurate. In most cases, computations that include real-valued numbers contain inaccuracy and results are almost unreliable due to catastrophic cancellations and rounding off. On the other hand, arithmetic errors in embedded systems can lead to disaster. For example, a plane may crash, a rocket may explode, or an engine may fail to operate.

The fundamental problem with most real-number computations is that their accuracy is not guaranteed. Increasing precision does not prevent this. Small errors can accumulate rapidly and limitations in the representation of numbers may lead to totally wrong results. For example, equation (1) shows an equation causing numerical error:
$f=333.75 b^{6}+a^{2}\left(11 a^{2} b^{2}-b^{6}-121 b^{4}-2.0\right)$
$+5.5 b^{8}+a /(2 b)$
For $\mathrm{a}=77617.0$ and $\mathrm{b}=33096.0$, this equation yields $\mathrm{f}=1.17260$ when solved using single precision, double precision, and extended precision arithmetic. Increasing the precision seems to validate the results. However, the correct answer is actually $\mathrm{f}=-0.827396 \times 10-17$.

As conventional real-valued computations contain inaccuracy that makes results unreliable due to catastrophic cancellation and rounding off results, for the reliable and accurate computations, interval arithmetic can produce good results. Interval arithmetic which deals with sets of intervals provides reliability and accuracy needed by computing the lower and upper bounds xl , xu in which the true result x-true relies. So, the interval $\mathrm{X}=[\mathrm{xl}, \mathrm{xu}]$ bounds the true result such that: $\quad \mathrm{xl}<=\mathrm{x}$-true $<=\mathrm{xu}$. When one or both end point could not be represented during computation, outward
rounding towards -infinity, +infinity for each xl, xu respectively, guaranties that the resulting interval includes the true result. For example, the interval $[1.56,2.34]$ is outward rounded to two decimal digits resulting the interval $[1.5,2.4]$.
In the literature, interval arithmetic (and interval analysis) was firstly defined by Ramon E. Moore[1]. Some software packages are developed to support interval arithmetic and can provide the method of bounding the true result like C-XSC [1], PROFL [2], INTLIB [3] and recently, a built-in support is added for interval arithmetic to FORTRAN [4], but yet, enough performance is not achieved yet. The performance is considered acceptable if it does not exceed a factor of five to conventional realworld arithmetic [5]. Unfortunately software implementations achieve a factor of 20 to 100 of a conventional floating point arithmetic.

Regarding to inefficiency in achieving performance in determining the suitable case and rounding process by software implementations for interval arithmetic, it became necessary to search for hardware structures that can automatically select the interval endpoints and serve the rounding process correctly.

In recent years, to improve the performance of interval arithmetic, some hardware designs are proposed [6-10]. In [2], the serial and parallel interval multipliers that lead to a considerable increase in performance were presented. Many hardware structures, that improve interval multiplication and handle efficiently rounding of endpoints are implemented. An optimization of delay is achieved but with a slight overhead in area either by improving existing multiplication structures by adding some registers, multiplexers and some sort of comparing units, with some change in the previous logic or implementing new ones with their own hardware structure and logic. Selecting the right case by examining the sign bits in hardware units is based on the same method used in software.

In this paper, a fast interval multiplier is designed. It is the improved design of the parallel interval multiplier given in[2]. Section 2 describes an interval multiplication analytically. In Section 3, the new interval multiplier is presented. In Section 4, we give area and delay estimates for the new design and compare it to the previous interval multipliers in [2]. Section 5 presents the conclusions.

## 2. INTERVAL MULTIPLICATION

Multiplication could be performed on computers that support either single or double precision. The result have to be outward rounded towards infinity, +infinity for both bounds. If double length is supported, the interval product can be computed as:
$\mathrm{Z}=\left[\boldsymbol{\nabla}\left(\min \left(\mathrm{xl}^{*} \mathrm{yl}, \mathrm{xl}^{*} \mathrm{yu}, \mathrm{xu}^{*} \mathrm{yl}, \mathrm{xu}{ }^{*} \mathrm{yu}\right)\right)\right.$, $\left.\mathbf{\Delta}\left(\max \left(\mathrm{xl} * \mathrm{yl}, \mathrm{xl}{ }^{*} \mathrm{yu}, \mathrm{xu} * \mathrm{yl}, \mathrm{xu}{ }^{*} \mathrm{yu}\right)\right)\right]$

This process needs four multiplications(for the endpoint products) and four comparisons (to get $\min$ and max values) to obtain the result.
If double length is not supported, the interval product $\mathrm{Z}=\mathrm{X}$ * Y can be computed as:
$\mathrm{Z}=\left[\min \left(\boldsymbol{\nabla}\left(\mathrm{xl}^{*} \mathrm{yl}\right), \operatorname{rd}\left(\mathrm{x} \mathrm{l}^{*} \mathrm{yu}\right), \boldsymbol{\nabla}\left(\mathrm{xu}^{*} \mathrm{yl}\right)\right.\right.$, $\left.\operatorname{rd}\left(\mathrm{xu}^{*} \mathrm{yu}\right)\right), \max \left(\boldsymbol{\Delta}\left(\mathrm{xl}{ }^{*} \mathrm{yl}\right), \boldsymbol{\Delta}\left(\mathrm{xl}{ }^{*} \mathrm{yu}\right)\right.$,
© ( $\left.\mathrm{xu}^{*} \mathrm{yl}\right), \boldsymbol{\Delta}(\mathrm{xu}$ * yu$\left.)\right)$ ).
where $\boldsymbol{\nabla}$ and $\boldsymbol{\Delta}$ represent rounding downward toward negative infinity and upward toward positive infinity, respectively.

This also needs eight multiplications for the rounded products and six comparisons to obtain the min and max values [11]. In order to reduce the number of multiplications, the sign bit of the endpoint xl, xu, yl, yu can be examined in advance to determine the product of the right result of $\mathrm{zl}, \mathrm{zu}$.

The sign bits indicate weather multiplied intervals $\mathrm{X}, \mathrm{Y}$ are greater than, less than or contain zero, so, 9 cases for multiplication can be classified. The first eight cases need only two rounded multiplications to determine $\mathrm{zl}, \mathrm{zu}$, where the last case when both intervals X, Y contain zero needs four rounded multiplications and two comparisons to determine $\mathrm{zl}, \mathrm{zu}$.

All these procedures suffer from conditional statements for achieving the right choice of endpoints to be multiplied. The algorithm above declares the conditional branches needed to perform multiplication. which greatly increases the time. Rounding the results downward towards -infinity and upward towards +infinity in computation of results for each case decrease performance of multiplication.

## 3. THE INTERVAL MULTIPLIER DESIGN

In this paper, the method in [2], which provides a considerable increase in performance, is employed for deciding to the interval endpoints and rounding the results. To produce the multiplication results $\mathrm{z}_{l}$ and $\mathrm{z}_{u}$ are determined the interval endpoints $\left\{\mathrm{x}_{l}, \mathrm{x}_{u}, \mathrm{y}_{l}, \mathrm{y}_{u}\right\}$ to be multiplied together by examining their sign bits (Sxl, Sxu, Syl, Syu), as shown in Table 1.

Shown in Table 1, the control bits Zc , tx , tx2, ty 1 , ty 2 select the endpoints to be multiplied at each multiplier. Fig. 1 shows the block diagram of the new design for the parallel interval multiplier that uses 4 IEEE standart multipliers, 2 $\mathrm{min} / \mathrm{max}$ units(fig.2), six registers, 6 multiplexers for choosing the inputs to be multiplied and 2 multiplexers for selecting the results from the multipliers or the min/max units.

```
let \(\mathrm{mn}=\min (\boldsymbol{\nabla}(\mathrm{xl} * \mathrm{yu}), \quad \nabla(\mathrm{xu} * \mathrm{yl})), \mathrm{mx}=\max (\boldsymbol{\nabla}(\mathrm{xl} * \mathrm{yl}), \quad \nabla(\mathrm{xu} * \mathrm{yu}))\),
if \((\mathrm{xl}>=0)\{\) if \((\mathrm{yl}>=0)\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xl} * \mathrm{yl}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xu} * \mathrm{yu}) ;\}\)
else if \((\mathrm{yu}<0)\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xu} * \mathrm{yl}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xl} * \mathrm{yu})\)
else \(\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xu} * \mathrm{yl}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xu} * \mathrm{yu}) ;\}\)
else if \((\mathrm{xu}<0)\{\) if \((\mathrm{yl}>=0)\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xl} * \mathrm{yu}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xu} * \mathrm{yl}) ;\}\)
else if \((\mathrm{yu}<0)\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xu} * \mathrm{yu}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xl} * \mathrm{yl}) ;\}\)
else \(\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xl} * \mathrm{yu}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xl} * \mathrm{yl}) ;\}\)
else \(\{\) if \((\mathrm{yl}>=0)\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xl} * \mathrm{yu}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xu} * \mathrm{yu}) ;\}\)
else if \((\mathrm{yu}<0)\{\mathrm{zl}=\boldsymbol{\nabla}(\mathrm{xu} * \mathrm{yl}) ; \mathrm{zu}=\boldsymbol{\Delta}(\mathrm{xl} * \mathrm{yl}) ;\}\)
else \(\{a=x l * y u ; b=x u * y l ; z l=\min (a, b) ; c=x l * y l ; d=x u * y u ; z u=\max (c, d) ;\)
```

Table 1: All Cases for Interval Multiplication

| Case | Interval X | Interval Y | Sxl | Sxu | Syl | Syu | $\mathrm{Z}=\mathrm{X} * \mathrm{Y}$ | zc | tx1 | tx2 | ty 1 | ty2 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| 1 | $\mathrm{X}>[0,0]$ | $\mathrm{Y}>[0,0]$ | 0 | 0 | 0 | 0 | [ $\left.\mathrm{xl}{ }^{*} \mathrm{yl}, \mathrm{xu}^{*} \mathrm{yu}\right]$ | 0 | 1 | 1 | 1 | 1 |
| 2 | $\mathrm{X}>[0,0]$ | $\mathrm{Y}<[0,0]$ | 0 | 0 | 1 | 1 | [ xu * $\mathrm{yl}, \mathrm{xl}{ }^{*} \mathrm{yu}$ ] | 0 | 0 | 0 | 1 | 1 |
| 3 | $\mathrm{X}<[0,0]$ | $\mathrm{Y}>[0,0]$ | 1 | 1 | 0 | 0 | [ $\mathrm{xl}{ }^{*} \mathrm{yu}, \mathrm{xu}^{*} \mathrm{yl}$ ] | 0 | 1 | 1 | 0 | 0 |
| 4 | $\mathrm{X}<[0,0]$ | $\mathrm{Y}<.0,0]$ | 1 | 1 | 1 | 1 | [ $\mathrm{xu*}$ \%u, $\mathrm{xl}^{*} \mathrm{yl}$ ] | 0 | 0 | 0 | 0 | 0 |
| 5 | $0 € \mathrm{X}$ | $\mathrm{Y}>[0,0]$ | 1 | 0 | 0 | 0 | [ $\left.\mathrm{xl}{ }^{*} \mathrm{yu}, \mathrm{xu}^{*} \mathrm{yu}\right]$ | 0 | 1 | 1 | 0 | 1 |
| 6 | $0 € X$ | $\mathrm{Y}<[0,0]$ | 1 | 0 | 1 | 1 | [ $\mathrm{xu*}{ }^{*} \mathrm{yl}, \mathrm{xl} * \mathrm{yl}$ ] | 0 | 0 | 0 | 1 | 0 |
| 7 | $\mathrm{X}>[0,0]$ | $0 € \mathrm{Y}$ | 0 | 0 | 1 | 0 | [ $\mathrm{xu}{ }^{*} \mathrm{yl}, \mathrm{xu*} \mathrm{yu}$ ] | 0 | 0 | 1 | 1 | 1 |
| 8 | $\mathrm{X}<[0,0]$ | $0 € \mathrm{Y}$ | 1 | 1 | 1 | 0 | [ $\left.\mathrm{xl}{ }^{*} \mathrm{yu}, \mathrm{xl}^{*} \mathrm{yl}\right]$ | 0 | 1 | 0 | 0 | 0 |
| 9 | $0 € X$ | $0 € \mathrm{Y}$ | 1 | 0 | 1 | 0 | * $\mathrm{mn}, \mathrm{mx}$ ] | 1 | 1 | 1 | 1 | 1 |

$*_{\mathrm{mn}}=\min \left(\boldsymbol{\nabla}\left(\mathrm{xl}^{*} \mathrm{yu}\right), \boldsymbol{\nabla}\left(\mathrm{xu}^{*} \mathrm{yl}\right)\right), \mathrm{mx}=\max \left(\mathbf{\Delta}\left(\mathrm{xl}^{*} \mathrm{yl}\right), \boldsymbol{\Delta}\left(\mathrm{xu}^{*} \mathrm{yu}\right)\right)$


Fig.1. Proposed structure for interval parallel multiplier

Table 2. Execution steps for Case 9.

| Cycle | C | Action |
| :---: | :---: | :---: |
| 1 | 0 | $\mathrm{rl}=\mathbf{\nabla}(\mathrm{xl} * \mathrm{yu}), \mathrm{r} 2=\mathbf{\nabla}(\mathrm{xu} * \mathrm{yl}), \mathrm{r} 3=\mathbf{\Lambda}\left(\mathrm{xl}{ }^{*} \mathrm{yl}\right), \mathrm{r} 4=\mathbf{\Delta}(\mathrm{xu} * \mathrm{yu})$ |
| 2 | 1 | Set $\mathrm{zl}=\min (\mathrm{r} 1, \mathrm{r} 2), \quad$ Set $\mathrm{zu}=\max (\mathrm{r} 3, \mathrm{r} 4)$ |

Table 3. Performance comparisons of interval multipliers

| Performance Metrics | Serial (Schulte) | Parallel (Schulte) | Proposed Interval Multiplier |
| :--- | :---: | :---: | :---: |
| Cycle_count1 (Zc=0) | 2 | 1 | 1 |
| Cycle_count2 (Zc=1) | 5 | 3 | 2 |
| Total Logic Cells | 866 | 1373 | 2516 |
| Chip Area (Estm.)/mm ${ }^{2}$ | 60 | 96 | 173 |
| Clock Frequency/MHz. | 75.75 | 80.65 | 77.65 |
| Estm. Total Delay-ns | 30.8 | 15.15 | 14.31 |



Fig. 2. Min/Max unit
The Boolean equations for the control and rounding mode bits are given as follows:

$$
\begin{align*}
& t x 1=\overline{S y l}+S x l * \overline{S y u}+Z c, \\
& t \times 2=\overline{S y l}+\overline{S x l} * \overline{S y u}+Z c, \\
& t y 1=\overline{S x l}+\frac{S y l}{\overline{S x u}}+Z c,  \tag{4}\\
& t y 2=\overline{S x l}+\overline{S y l} * \overline{S x u}+Z c, \\
& Z c=\overline{S x l} * \overline{S x u} * S y l * \overline{S y u}, \\
& r m 1=r m 2=c, \quad r m 3=r m 4=\bar{c}
\end{align*}
$$

Where, the rm1, rm2 mode bits are for rounding towards - infinity and the $\mathrm{rm} 3, \mathrm{rm} 4$ are for rounding towards + infinity, also, c is for the clock cycle.

If X and Y do not both contain zero (the first 8 cases), a single cycle is required to compute the lower and upper interval endpoints of the product. On the other hand, if X and Y both contain zero (Case 9), only two cycles are sufficient to perform the interval computation, as shown in Table 2. On Case 9, at the first cycle, productions of the endpoints are done in parallel with the 4 multipliers. At the second cycle, the $1^{\text {st }} \mathrm{min} / \mathrm{max}$ unit determine the minimum value of the $1^{\text {st }}$ and $2^{\text {nd }}$ multiplier's output according to the $\min 1$ control signal, while the $2^{\text {nd }} \min / \max$ unit determines the maximum value of the $3^{\text {rd }}$ and $4^{\text {th }}$ multipliers output due to $\min 2$, the output registers are loaded with products required for the lower and upper endpoints comprising the result. Table 2 shows the steps for the parallel interval multiplier for Case 9 .

The lower bound zl is selected from the multipliers' outputs $\mathrm{r} 1, \mathrm{r} 2$, according to the control bit min 1 . When $\min 1=1$, if $\mathrm{r} 1<\mathrm{r} 2$, then $\mathrm{zl}=\mathrm{r} 1$ else $\mathrm{zl}=\mathrm{r} 2$. The upper bound zu is selected from the multipliers' outputs r 3 , r 4 , according to
the control bit min2. When $\min 2=1$, if $\mathrm{r} 3<\mathrm{r} 4$, then $\mathrm{z} u=\mathrm{r} 3$ else $\mathrm{zl}=\mathrm{r} 4$.

## 4. COMPARISON

The all architectures and behaviors of serial, parallel and proposed interval multipliers are simulated for functionality at the logic level using using Model-Sim and synthesized for obtaining total logic cells, estimated chip areas, clock cycle frequency and total operation delays using the Quartus VHDL (Version II). Here, the total operation delay is computed as follows:

Estimated Total Delay= [Cycle_count1*(8/9)+Cycle_count2*(1/9)] (5) /Clock_frequency

As shown in Fig.3, compared to the serial and parallel interval multipliers our design requires roughly 190 and 80 percent more area, respectively. Also, the new interval multiplier has cycle time approximately 3 percent shorter than the serial, but 3 percent longer than the parallel multiplier. Fortunately, with respect to the estimated total delay, our interval multiplier is 6 and 105 percent faster than the parallel and the serial multipliers, respectively. The total results obtained by VHDL simulations are given in Table 3.


Fig.3. Relative compares of the interval multipliers with respect to the number of total logic gates

## 5. CONCLUSIONS

This paper proposes a hardware design for the interval multiplier. Serial interval multiplier consumes 2 cycles for executing cases $1-8$ and 5 cycles for case 9 to handle interval multiplication, where parallel interval multiplier needs just
one cycle to execute the cases $1-8$ and 3 cycles for case 9 . The proposed parallel multiplier needs just 2 cycles for case 9 , with a cycle not to be lost and only 1 cycle for cases 1-8.

According to total logic elements, the serial interval multiplier is more than conventional floating point multiplier of 1.35 where the parallel is of 2.16 and the proposed interval multiplier is, the higher, of 3.95 . but improved parallel is nearly like conventional usage with only 1.083 , where parallel is 1.16 and the serial is 2.25 .

These estimations show that the new design gives a little more speedup (roughly 6 percent), but, 80 percent more area than the parallel interval multiplier referenced in [2]. Therefore, the proposed interval multiplier can be used for the need of fast interval computing. By adjusting the selection bits of the multiplexers $\left(\mathrm{t}_{\mathrm{x} 1}, \mathrm{t}_{\mathrm{x} 2}, \mathrm{t}_{\mathrm{y} 1}\right.$, $\mathrm{t}_{\mathrm{y} 2}$ and $\mathrm{z}_{\mathrm{c}}$ ), the new interval multiplier can also perform the interval structures chosen to compare in this work.

The next step, besides the search for efficient software implementations able to support interval arithmetic within permitable ranges of delay, is to work for monitoring other arithmetic functions like the exponent.

## REFERENCES

[1] Moore R.E., 1966, Interval Analysis, Englewood Cliffs: Prentice Hall.
[2] Schulte M.J., Bickersatff K.C., Schwartzlander E.Jr., 1996, "Hardware Units for Interval Multiplication" Proceedings of the 2nd Workshop of Computer Arithmetic, Interval, and Symbolic Computations, pp. 85-87.
[3] Williams G. S., "Processor Support For Interval Arithmetic" thesis for master degree, Lehigh University, May 1998.
[4] Goldberg D., 1991 "What Every Computer Scientist Should Know About Floating Point Arithmetic" ACM Computing Surveys, vol. 23, pp. 5-48.
[5] Arnold D.N., 1997 "Disasters Caused by Computer Arithmetic Errors" available from Internet URL http://www.ima.umn.edu/~arnold/ 455.f96/disasters.html February, 1997.
[6] Rumph S.M, 1988 "Algorithms for Verified Inclusions: Theory and Practice" Reliability in Computing, San Diego: Academic Press, 1988.
[7] Kearfott R.B. et al.,1996, "A Specific Proposal for Interval Arithmetic in FORTRAN" http://interval.louisiana.edu/F90/f96-pro.asc, March 96.
[8] Walster G.W., 1996, "Stimulating Hardware and Software Support for Interval Arithmetic" Applications of Interval Computations, pp. 405416, 1996.
[9] Alefeld G. and Herzberger R.,1983 , Introduction to Interval Computations, New York: Academic Press.
[10] Walster G.W., "Interval Arithmetic: The New Floating-Point Arithmetic Paradigm," http://www.mscs.mu.edu/~globsol/readings.html, March, 1998
[11] Coriliss G.F.,1993 "Comparing Software Packages for Interval Arithmetic," in Abstracts of the International Symposium on Scientific Computing, Computer Arithmetic, and Validated Numerics.

