

## 4:2 COMPRESSOR DESIGN BASED ON DOMINO LOGIC

Supervisor: Dr. M. Ahmadi
Peng Chang
Department of Electrical and Computer Engineering
University of Windsor
2008.08.01



### **Outline**

- **▶** 4:2 Compressors
- **Domino logic**
- **▶** Logical decompositions of 4:2 compressors
- **▶** Circuit level optimization Split Domino Logic
- > Simulation results and Conclusion



## 4:2 Compressors



4:2 Compressor Compressor

4:2 Compressor

4:2 Compressor Array

- The 4:2 compressor takes five equally weighted inputs  $(C_{IN}, X_1, X_2, X_3, X_4)$  and generate a sum bit (S), a carry-bit (C) and a carry-propagate-bit  $(C_{OUT})$ .
- The 4:2 compressor array is formed by a series of 4:2 compressors cascaded together, it is used to perform column-wise compression of the partial product.



## Analysis of 3:2 and 4:2 Reduction Scheme (12×12 Dadda Tree)





## Analysis of 3:2 and 4:2 Reduction Scheme

Max column height per stage of a 3:2 scheme (carry save array)

| h    | 0 | 1 | 2 | 3 | 4 | 5  | 6  | 7  | 8  | 9  | 10 |
|------|---|---|---|---|---|----|----|----|----|----|----|
| n(h) | 2 | 3 | 4 | 6 | 9 | 13 | 19 | 28 | 42 | 63 | 94 |

Max column height per stage of a 4:2 scheme (4:2 compressor)

| h    | 0 | 1 | 2 | 3  | 4  | 5  | 6   | 7   |
|------|---|---|---|----|----|----|-----|-----|
| n(h) | 3 | 4 | 8 | 16 | 32 | 64 | 128 | 256 |

"n(h)" represents max column height

" h " represents the number of stages





An example of domino XOR gate

- ➤It is consist of a pull-down network, clocked PMOS and NMOS transistors.
- ➤Its operation is divided into two major phases: precharge (CLK=0) and evaluation (CLK=1).
- Advantages: lower transistor count, faster switching speed, no short circuit current.
- ➤ Disadvantages: charge leakage, charge sharing and etc.



## Logical Level Decomposition of 4:2 Compressors



Configuration of 4:2 compressor

$$X_0 + X_1 + X_2 + X_3 + C_{IN} = Sum + 2 \cdot (Carry + Cout)$$

$$S = S \oplus X_4 \oplus C_{IN} = X_0 \oplus X_1 \oplus X_2 \oplus X_3 \oplus C_{IN}$$

$$C_{out} = (X_0 \oplus X_1) \cdot X_2 + X_0 \cdot X_1 = (X_0 \oplus X_1) \cdot X_2 + \overline{(X_0 \oplus X_1) \cdot X_0}$$

4:2 compressor could be realized by different combinations of XOR Gates, AND Gates and MUXs.



## Logical Level Decomposition of 4:2 Compressors



Primitive decomposition of 4:2 compressor (Com\_and)

- ➤It is formed by using 3-input XOR gates and 3-input AND gates.
- ➤ Its regularity lends itself to gains at the architecture level of the multiplier.
- The critical path of the compressor is 4 XOR gates.



### Logical Level Decomposition of 4:2 compressors



Alternative decomposition of 4:2 compressor (Com\_mux)

- ➤It is composed of six modules: four 2-input XOR gates and two 2:1 MUX gates.
- ➤2:1 MUX gate is used instead of AND gate to generate two carry signals "Carry" and "Cout".
- The critical path of the compressor is 3 XOR gates.



## Logical Level Decomposition of 4:2 Compressors



Alternative decomposition of 4:2 compressor (Com\_pur\_mux)

- ➤It consist of six 2:1 MUX gates.
- ➤ All three outputs: "Sum", "Carry" and "Cout" are generated by using 2:1 MUX gates.
- The critical path delay of the compressor is 3 XOR gates.



## Optimization of 4:2 Compressors



Configuration of full adder

$$Sum = A \oplus B \oplus C = ABC + \overline{ABC} + \overline{ABC} + \overline{ABC} + \overline{ABC}$$

$$Carry = AB + BC + AC$$

$$\overline{Carry} = \overline{AB} + \overline{BC} + \overline{AC}$$

By taking the NOT of Carry, we could use part of the circuit, which generates Sum signal, to generate Carry signal. Thus the lower transistor count and higher performance of full adder could be achieved.



## Optimization of 4:2 Compressors





Conventional full adder using Domino Logic

**Proposed** full adder using Domino Logic



## Split Domino Logic



N-input Split Domino OR gate

- The pull down network is equally divided into two sub-network, a logical 2-input NAND gate is used to generate the output. The large keeper transistor is also replaced by two smaller transistors.
- The main advantage of Split Domino is to reduce the dynamic node capacitance and consequently fast evaluation.



## Split Domino Logic – XOR Gate



(denoted as "2\_xor\_D")





2-input XOR Gate using Split Domino Logic (denoted as "2\_xor\_SD")



## Split Domino Logic – XOR Gate



3-input XOR Gate using Domino Logic (denoted as "3\_xor\_D")



3-input XOR Gate using Split Domino Logic (denoted as "3\_xor\_SD")



## Split Domino Logic – Full Adder



Proposed full adder using Domino Logic (denoted as "FA\_new")



Proposed full adder using Split Domino Logic (denoted as "FA\_SD")



- ➤2-input XOR Gate, 3-input XOR Gate, Full adder and 4:2 Compressors are designed in Domino Logic and Split Domino Logic style separately. The simulations are performed by using HSPICE in Cadence design tool. All the circuits are targeted for TSMC 0.18 technologies.
- ➤ In the test bench, each input is driven by buffered signals and each output is loaded with buffers, which offer a realistic simulation environment reflecting the operation in actual applications.
- ➤ The delay is measured from the time at which the input signals reaching 50% of its full value to the time when the output signal reaching 50% of its full potential. The average delay is the average of delays of all input data. The worst case delay is the largest delay among all input data.
- ➤ Circuits are thoroughly tested by all the possible input vector combinations at 1.8 voltage source.



#### Simulation Results for logical decompositions of 4:2 Compressors

| Cell Name   | Power<br>Dissipatio<br>n (ns) | Average<br>Delay<br>(ns) | Worst<br>Case<br>Delay<br>(ns) | Average<br>PDP | Worst<br>Case<br>PDP | Operatio<br>n<br>Frequenc<br>y (GHz) |
|-------------|-------------------------------|--------------------------|--------------------------------|----------------|----------------------|--------------------------------------|
| Com_and     | 2.48E-04                      | 0.47                     | 0.59                           | 1.17E-<br>13   | 1.46E-<br>13         | 1                                    |
| Com_mux     | 3.12E-04                      | 0.57                     | 0.89                           | 1.78E-<br>13   | 2.78E-<br>13         | 0.41                                 |
| Com_pur_mux | 2.81E-04                      | 0.51                     | 0.80                           | 1.43E-<br>13   | 2.25E-<br>13         | 0.63                                 |

#### Comparison of different logical decompositions of 4:2 Compressors

| Cell Name   | Power<br>Dissipatio<br>n (ns) | Average<br>Delay<br>(ns) | Worst<br>Case<br>Delay<br>(ns) | Average<br>PDP | Worst<br>Case<br>PDP | Operatio<br>n<br>Frequenc<br>y (GHz) |
|-------------|-------------------------------|--------------------------|--------------------------------|----------------|----------------------|--------------------------------------|
| Com_and     | 100%                          | 100%                     | 100%                           | 100%           | 100%                 | 100%                                 |
| Com_mux     | 126%                          | 121%                     | 151%                           | 154%           | 190%                 | 41%                                  |
| Com_pur_mux | 113%                          | 109%                     | 136%                           | 122%           | 154%                 | 63%                                  |



#### Simulation Results for 2-input XOR Gates

| Cell Name | Power<br>Dissipation<br>(w) | Average<br>Delay (ns) | Worst<br>Case<br>Delay(ns) | Average<br>PDP | Worst<br>Case<br>PDP | Operation<br>Frequency<br>(GHz) |
|-----------|-----------------------------|-----------------------|----------------------------|----------------|----------------------|---------------------------------|
| 2_xor_D   | 1.01E-04                    | 0.17                  | 0.24                       | 1.72E-14       | 2.42E-14             | 2.63GHz                         |
| 2_xor_SD  | 2.26E-04                    | 0.22                  | 0.39                       | 4.97E-14       | 8.81E-14             | 2.17GHz                         |
| % Savings | 224%                        | 129%                  | 165%                       | 288%           | 364%                 | 82.5%                           |

#### Simulation Results for 3-input XOR Gates

| Cell Name | Power<br>Dissipation<br>(w) | Average<br>Delay (ns) | Worst<br>Case<br>Delay (ns) | Average<br>PDP | Worst<br>Case<br>PDP | Operation<br>Frequency<br>(GHz) |
|-----------|-----------------------------|-----------------------|-----------------------------|----------------|----------------------|---------------------------------|
| 3_xor_D   | 1.06E-04                    | 0.21                  | 0.24                        | 2.23E-14       | 2.54E-14             | 2.17                            |
| 3_xor_SD  | 1.19E-04                    | 0.15                  | 0.28                        | 1.79E-14       | 3.33E-14             | 2.38                            |
| % Savings | 112%                        | 71.4%                 | 116%                        | 80.3%          | 131%                 | 109%                            |



#### Simulation Results for Full Adders

| Cell<br>Name | Power<br>Dissipatio<br>n (ns) | Average<br>Delay<br>(ns) | Worst<br>Case<br>Delay<br>(ns) | Average<br>PDP | Worst<br>Case<br>PDP | Operatio<br>n<br>Frequenc<br>y (GHz) |
|--------------|-------------------------------|--------------------------|--------------------------------|----------------|----------------------|--------------------------------------|
| FA_con       | 1.78E-04                      | 0.28                     | 0.41                           | 4.98E-<br>14   | 7.29E-<br>14         | 1.92                                 |
| FA_new       | 1.20E-04                      | 0.29                     | 0.51                           | 3.48E-<br>14   | 6.12E-<br>14         | 1.67                                 |
| FA_SD        | 1.32E-04                      | 0.22                     | 0.39                           | 2.90E-<br>14   | 5.15E-<br>14         | 2.17                                 |

#### Comparison of different Full Adders

| Cell Name | Power<br>Dissipatio<br>n | Average<br>Delay | Worst<br>Case<br>Delay | Average<br>PDP | Worst<br>Case<br>PDP | Operatio<br>n<br>Frequenc<br>y |
|-----------|--------------------------|------------------|------------------------|----------------|----------------------|--------------------------------|
| FA_con    | 100%                     | 100%             | 100%                   | 100%           | 100%                 | 100%                           |
| FA_new    | 67%                      | 104%             | 124%                   | 70%            | 84%                  | 87%                            |
| FA_SD     | 74%                      | 79%              | 95%                    | 58%            | 71%                  | 113%                           |



#### Simulation Results for 4:2 Compressors

| Cell Name | Power<br>Dissipatio<br>n | Average<br>Delay | Worst<br>Case<br>Delay | Average<br>PDP | Worst<br>Case<br>PDP | Operatio<br>n<br>Frequenc<br>y (GHz) |
|-----------|--------------------------|------------------|------------------------|----------------|----------------------|--------------------------------------|
| Com_con   | 2.48E-04                 | 0.47             | 0.60                   | 1.17E-<br>13   | 1.49E-<br>13         | 1                                    |
| Com_new   | 2.29E-04                 | 0.42             | 0.53                   | 0.96E-<br>13   | 1.21E-<br>13         | 1.25                                 |
| Com_SD    | 2.27E-04                 | 0.32             | 0.48                   | 0.73E-<br>13   | 1.09E-<br>13         | 1.67                                 |

#### Comparison of different 4:2 Compressors

| Cell Name | Power<br>Dissipatio<br>n | Average<br>Delay | Worst<br>Case<br>Delay | Average<br>PDP | Worst<br>Case<br>PDP | Operatio<br>n<br>Frequenc<br>y |
|-----------|--------------------------|------------------|------------------------|----------------|----------------------|--------------------------------|
| Com_con   | 100%                     | 100%             | 100%                   | 100%           | 100%                 | 100%                           |
| Com_new   | 92%                      | 89%              | 88%                    | 82%            | 81%                  | 125%                           |
| Com_SD    | 91%                      | 68%              | 80%                    | 62%            | 73%                  | 167%                           |



## Conclusion

- ➤ Three different logical level decompositions of 4:2 compressor are implemented in Domino Logic, followed by the simulation results of these circuits.
- A new architecture of full adder is proposed, and used to implement 4:2 compressor in Domino Logic. Its property is confirmed by the simulation results.
- ➤ 2-input XOR Gate, 3-input XOR Gate, Full adder and 4:2 Compressors are implemented in Domino Logic and Split Domino Logic separately, simulation results confirm that Split Domino Logic outperform Domino Logic in terms of delay, power and operating speed.



#### References

- [1] C.S. Wallace, "A suggestion for a fast multiplier," IEEE Tran. on Electronic Computers, vol. 13, pp. 14-17. 1964
- [2] Luigi Dadda, "Some schemes for parallel multipliers," Alta Frequenza. vol. 45. pp. 574-580.1966
- [3] A.Weinberger, "4:2 carry-save adder module," IBM Technical Disclosure Bulletin. vol.23. Jan.1981
- [4] P.J.Song, G. De Micheli, "Circuit and architecture trade-offs for high-speed multiplication," IEEE Journal of Solide-State Circuits, vol. 26, pp. 1184-1198, 1991
- [5] M.Mehta, V. Parmar, E. Swartzlander, "High-speed multiplier design using multi-input counter and compressor circuits," IEEE Symposium on Computer Arithmetic, pp. 43-50, 1991
- [6] P.Mokrian, "A reconfigurable digital multiplier architecture," Master thesis, University of Windsor, 2003
- [7] G. Michael Howard , "Investigation into arithmetic sub-cells for digital multiplication," Master thesis, University of Windsor, 2005
- [8] A.N. Danysh, E.E. Swartzlander Jr, "A recursive fast multiplier," Asilomar Conference on Signals, Systems & Computers, vol. 1, pp. 197-201, 1998
- [9] J. Kim, E.E. Swartzlander Jr, "Improving the recursive multiplier," Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 1320-1324, 2000
- [10] Michael Jung, Felix Madlener, Markus Ernst, Sorin A. Huss, "A Reconfigurable Coprecessor for Finite Field Multiplication in GF(2^n)," Proceeding of the IEEE Workshop on Heterogeneous Reconfigurable Systems on Chip, April 2002
- [11] S. Fiske, W.J. Dally, "The reconfigurable arithmetic processor," IEEE International Symposium on Computer Architecture, pp. 30-36, 1988
- [12] Synopsys, "DesignWare IP family reference guide," March 2007



# Thank You