## DESIGN AND PROTOTYPING OF TEMPERATURE RESILIENT

## **CLOCK DISTRIBUTION NETWORKS**

A Thesis Presented to The Academic Faculty

by

Nitish Umesh Natu

In Partial Fulfillment of the Requirements for the Degree Masters of Science in the School of Electrical and Computer Engineering

> Georgia Institute of Technology December 2013

## DESIGN AND PROTOTYPING OF TEMPERATURE RESILIENT

## **CLOCK DISTRIBUTION NETWORKS**

Approved by:

Dr. Madhavan Swaminathan, Advisor School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Dr. David Keezer School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Dr. Abhijit Chatterjee School of Electrical and Computer Engineering *Georgia Institute of Technology* 

Date Approved: December 2013

To my Parents

#### ACKNOWLEDGEMENTS

This dissertation would not have been possible without support of the people who have helped and inspired me during my thesis. I would like to express my deepest gratitude and thank my academic advisor, Prof. Madhavan Swaminathan who gave me the invaluable opportunity to work in his esteemed research group. His excellent guidance, encouragement and patience are the primary reasons for the completion of my thesis. I would also like to thank my committee members, Dr. David Keezer, for allowing me to use his lab and resources, as well as Dr. Abhijit Chatterjee for their time and insightful comments.

I would like to take this opportunity to thank the current and past members of the Mixed Signal Design Group (EPISLON Lab). I sincerely thank Sung Joo Park for helping and guiding me throughout the assignment as well as Dr. Jianyong Xie and Rishik Bazaz for their support when I started my thesis. I would also like to thank my fellow labmates – Dr. Junki Min, Dr. Sang-Min Han, Kyu Hwan Han, Satyan Telikepalli, Biancun Xie, Stephen Dumas, David Zhang, Ming Yi, Sang Kyu Kim, Munmun Islam and Colin Pardue for their support. Also, I would like to thank David Stonecyhper who helped me with the lab setup and measurements. I would also like to thank my roommates Ajay Janardanan, Sumit Joshi, Harshal Chaudhari, Varun Thakkar, Siddhartha Gupta as well as all other people who helped and supported me during my stay here at Georgia Tech.

I would like to express my deepest gratitude toward my family. I sincerely thank my parents, Umesh Natu and Neha Natu, and my brother, Nihit Natu, for their love and unconditional support throughout my life.

iv

# TABLE OF CONTENTS

|                                                         | Page |
|---------------------------------------------------------|------|
| ACKNOWLEDGEMENTS                                        | iv   |
| LIST OF TABLES                                          | vii  |
| LIST OF FIGURES                                         | viii |
| SUMMARY                                                 | xiii |
| <u>CHAPTERS</u>                                         |      |
| 1 Introduction                                          | 1    |
| 1.1 The Three Dimensional IC Technology                 | 2    |
| 1.2 Clock Distribution Network Design                   | 3    |
| 1.3 Need for Temperature Resilient CDNs                 | 6    |
| 1.4 Thesis Outline                                      | 7    |
| 2 Thermal and Electrical Analysis                       | 9    |
| 2.1 The 3D Stack and Assumptions                        | 9    |
| 2.2 Solver used for Thermal Analysis                    | 12   |
| 2.3 Thermal Maps for the CDN                            | 14   |
| 2.4 Effect of Temperature on the CDN                    | 20   |
| 2.5 Methods to Compensate for Heat Related Problems     | 23   |
| 2.6 Summary                                             | 25   |
| 3 Test Vehicle                                          | 26   |
| 3.1 The Concept                                         | 26   |
| 3.2 Test Vehicle Architecture                           | 27   |
| 3.3 Correlation with Electrical and Thermal Simulations | 30   |
| 3.4 Simulating the Conditions observed in the 3D Stack  | 32   |

| 3.5 Simulations                                              | 34 |
|--------------------------------------------------------------|----|
| 3.6 Demonstration of the Problem                             | 35 |
| 3.7 Implementation Scheme for Compensation Methods           | 38 |
| 3.8 Validation of Compensation Methods                       | 42 |
| 3.9 Summary                                                  | 50 |
| 4 Buffer Design for ASICs                                    | 51 |
| 4.1 Buffer Circuitry Assumptions                             | 51 |
| 4.2 Implementation of Compensation Techniques                | 54 |
| 4.3 Simulations                                              | 58 |
| 4.4 Summary                                                  | 64 |
| 5 Comparison of Results                                      | 65 |
| 5.1 Power and Area Overheads                                 | 65 |
| 5.2 Correlation of Results                                   | 67 |
| 5.3 Summary                                                  | 68 |
| 6 Conclusion and Future Work                                 | 70 |
| 6.1 Conclusion                                               | 70 |
| 6.2 Future Work                                              | 71 |
| APPENDIX A: Electrical and Thermal Simulation Tool: Power ET | 73 |
| APPENDIX B: Input File Format                                | 74 |
| REFERENCES                                                   | 76 |

## LIST OF TABLES

| Table 1: Assumption of System Parameters Assumption of System Parameters       | 11 |
|--------------------------------------------------------------------------------|----|
| Table 2: Geometral Parameters for the CDN Image: CDN                           | 20 |
| Table 3: Geometral Parameters for the Electrical Parasitics                    | 20 |
| Table 4: Comparison of Compensation Methods                                    | 24 |
| Table 5: Demonstration of Problem in different areas                           | 62 |
| Table 6: Comparison of Compensation Techniques in Test Vehicle and Simulations | 62 |

Page

# LIST OF FIGURES

Page

| Figure 1.1 Moore's Law                                                                                                         | 1  |
|--------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 1.2 Long Term Logic Requirements of Technology Scaling                                                                  | 2  |
| Figure 1.3 Different methods of SiP Design                                                                                     | 3  |
| Figure 1.4 (a) Clock Skew and Jitter (b) Spatial Variation of Clock Skew                                                       | 4  |
| Figure 1.5 Uncertainties in the CDN                                                                                            | 5  |
| Figure 1.6 Skew across the Aplha Processor by DEC                                                                              | 5  |
| Figure 1.6 Temperature Distribution in a TSV-based 3D System<br>(a) Dies (b) Interposer (c) PCB                                | 6  |
| Figure 1.7 (a) Dependence of Delay on Vth<br>(b) Relationship between Temperature and Sub-threshold leakage in a MOSFET        | 7  |
| Figure 2.1 The 3D Stack with PCB, Interposer, Dies and Heatsink                                                                | 9  |
| Figure 2.2 CDN Configurations for 3D Stacks<br>(a) CDN on an Interposer (b) CDN with a TSV-based Tree Structure                | 10 |
| Figure 2.2 CDN Configurations for 3D Stacks<br>(a) CDN on an Interposer (b) CDN with a TSV-based Tree Structure                | 11 |
| Figure 2.4 Operation of the Solver used in generation of Temperature Maps for the CDN Layer                                    | 12 |
| Figure 2.5 Summary of the Operation of the Solver with procedure to generate Temperature Maps                                  | 13 |
| Figure 2.6 Sample Temperature Profile of a CDN for a certain Power Map                                                         | 14 |
| Figure 2.7 Temperature Profile of a CDN with large Gradient                                                                    | 15 |
| Figure 2.8 Division of the Die in order for allocation of different Power Densities such that the total power remains constant | 16 |

| Figure 2.9 Random Power Distribution across all the Dies                                                                                                                                          | 16 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 2.10 Fixed Power Distribution across the CDN with H-Tree architecture                                                                                                                      | 17 |
| Figure 2.11 Comparison of Temperature Profiles generated using fully random power configuration and a constant CDN power configuration in terms of maximum temperature and gradients              | 18 |
| Figure 2.12 (a) Low Gradient (b) Medium Gradient (c) High Gradient                                                                                                                                | 18 |
| Figure 2.13 Schematic of Simulation Model<br>(a) CDN (b) TSV (c) PDN                                                                                                                              | 19 |
| Figure 2.14 Simulated Skew (a) Ideal PDN without temperature effects (b) With PDN effects without temperature effects (c) Ideal PDN with temperature effects (d) With PDN and temperature effects | 21 |
| Figure 2.15 Temperature Dependency of Delay                                                                                                                                                       | 21 |
| Figure 2.16 (a) Temperature Gradient<br>(b) Temperature Profile used for the Delay Calculations                                                                                                   | 22 |
| Figure 2.17 Block diagram and schematic of delay compensation<br>(a) Variable reference voltages for linear regulators (b) Controllable delay for<br>interconnect                                 | 23 |
| Figure 3.1 Block Diagram of the Test Vehicle                                                                                                                                                      | 25 |
| Figure 3.2 The CDN Architecture – H-Tree built on the Center Die                                                                                                                                  | 26 |
| Figure 3.3 Photo of the beard used as the Test Vehicle                                                                                                                                            | 27 |
| Figure 3.4 Port Configurations for the FPGA-based Test Vehicle                                                                                                                                    | 28 |
| Figure 3.5 Summary of Electrical Simulation (a) Skew with Ideal CDN (b) Skew with Thermal Variations (c) Delay vs Temperature Plot                                                                | 29 |
| Figure 3.6 Thermal Profiles sorted by Gradients                                                                                                                                                   | 30 |
| Figure 3.7 Micro PTC Heaters                                                                                                                                                                      | 31 |
| Figure 3.8 Placement of Heaters across the Spartan 6 FPGA                                                                                                                                         | 31 |

| Figure 3.9 Port modifications due to IO and floorplan constraints on the FPGA                                                                 | 32 |
|-----------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 3.10 Skew observed across the ports due to the temperature variations                                                                  | 33 |
| Figure 3.11 (a) Correction in Skew by Adaptive Voltage Technique (b) Correction in Skew by the Controllable Delay Technique                   | 33 |
| Figure 3.12 (a) Variations in delay due to temperature depicting linear dependency (b) Floorplan of the FPGA with placement of heaters        | 34 |
| Figure 3.13 Temperature Vs Delay plot for a Single Buffer                                                                                     | 35 |
| Figure 3.14 Variation of delay with respect to temperature observed at various distribution points.                                           | 35 |
| Figure 3.15 Block Diagram of the Adaptive Voltage technique                                                                                   | 36 |
| Figure 3.16 Implementation Scheme for Adaptive Voltage Technique                                                                              | 37 |
| Figure 3.17 Block Diagram of the Controllable Delay Technique                                                                                 | 38 |
| Figure 3.18 Implementation Scheme of the Controllable Delay Technique                                                                         | 38 |
| Figure 3.19 Algorithm to Implement and Regulate the Compensation Techniques                                                                   | 39 |
| Figure 3.20 Heater and IO setup across the FPGA Floorplan                                                                                     | 40 |
| Figure 3.21 (a) Variation of delay with respect to Temperature<br>(b) Flattening of delay variation due to the Adaptive Voltage technique     | 41 |
| Figure 3.22 Effectiveness of Adaptive Voltage technique observed irrespective of the IO bank used to source the clock signal or distribute it | 42 |
| Figure 3.23 Adaptive Voltage technique compensating in real time                                                                              | 43 |
| Figure 3.24 More examples of compensation using the adaptive voltage technique in real time                                                   | 45 |
| Figure 3.20 (a) Variation in Skew by Temperature<br>(b) Compensation using Controllable Delay                                                 | 45 |
| Figure 3.26 Real time compensation test for the Controllable Delay technique                                                                  | 46 |
| Figure 4.1 Schematic of the Buffer                                                                                                            | 48 |

| Figure 4.2 Simulations of the Buffer Schematic                                                                                                    | 49 |
|---------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 4.3 Schematic of the CDN in the H-Tree Architecture                                                                                        | 49 |
| Figure 4.4 Layout of the CDN in form of H-Tree                                                                                                    | 50 |
| Figure 4.5 Schematic of first implementation scheme of the Adaptive Voltage technique                                                             | 51 |
| Figure 4.6 Schematic of the voltage-divider based implementation scheme of the Adaptive Voltage technique with one resistor having a NTC response | 51 |
| Figure 4.7 Effect of series connection of MOSFETS on its strength                                                                                 | 52 |
| Figure 4.7 (a) Control Unit for the Controllable Delay method<br>(b) Second inverter kept intact                                                  | 53 |
| Figure 4.8 Basic setup and color codes for the Simulations                                                                                        | 54 |
| Figure 4.9 Simulation of the CDN in the ideal condition                                                                                           | 54 |
| Figure 4.10 Creating of skew among the outputs due to introduction of a temperature profile                                                       | 55 |
| Figure 4.11 Compensation for the first path using Adaptive Voltage technique                                                                      | 56 |
| Figure 4.12 Complete compensation using the adaptive voltage technique for all the paths in the CDN                                               | 57 |
| Figure 4.13 Compensation for the first path using the Controllable Delay technique                                                                | 58 |
| Figure 4.13 Complete compensation for all the paths using the Controllable Delay technique                                                        | 59 |
| Figure 5.1 Distribution of Area across the Test Vehicle                                                                                           | 60 |
| Figure 5.2 Distribution of Power across the Test Vehicle                                                                                          | 61 |

#### SUMMARY

This thesis focuses on the undesired effects of thermal gradients on the clock distribution networks (CDN) in a three-dimensional (3D) IC and techniques to compensate for the same. The state-of-the art integrated circuit boasts of more than a billion transistors on a single die. The advancement is achieved through technologies like System-on-Chip and System-in-Package which feature heterogeneous integration, improved power consumption, a small form factor and reduced production cost. However, heat management remains a concern and leads to hotspots and thermal gradients that affect the performance of CDN.

This thesis assumes a 3D structure with three dies and the CDN built in the center. The problem with heat management is established using temperature maps and electrical analysis is then used to show the effects of varying temperature on the CDN. Two methods of compensating for the skew degradation are then presented.

The compensation methods are then validated using a test vehicle and verified using simulations. The test vehicle first demonstrates the problems as the environment observed in the 3D stack is artificially simulated on it. It then displays the effectiveness of the compensation methods by correcting the problems related to skew. The methods are also verified using simulations. Buffers are designed and integrated with control units for the compensation techniques. The system is then simulated to verify the functionality of the proposed methods.

#### **CHAPTER 1**

### **INTRODUCTION**

The semiconductor industry has been one of the primary contributors to the boom of information technology throughout the 21st century. Continuous reduction in size of transistors and interconnects has ensured a steady growth. The miniaturization of transistors has led to an increase in functionality per unit area by a factor of two every 3 years [1]. The state-of-the art integrated circuits boast of more than a billion transistors on a single die. Lower power consumption and faster rate of operation through reduced capacitances have acted as added advantages in the process. But recent studies have shown that the technological research has been finding it increasingly difficult to keep up with Moore's law. Many companies and organizations have announced that the law is near its end.



Figure 1.1 Moore's Law [3]

Figure 1.1 gives the basis of Moore's law that states that the number of transistors on a given chip doubles approximately every two years. Technology scaling has been the primary

contributor in the effort to keep up with the Moore's law but it comes with its share of drawbacks. Figure 1.2 shows the high performance logic requirements in the long term. The highlighted fields are the ones deemed impossible to achieve.

| Year of Production                                                                 | 2010    | 2013    | 2016    |
|------------------------------------------------------------------------------------|---------|---------|---------|
| DRAM ½ PITCH (nm)                                                                  | 45      | 32      | 22      |
| MPU/ASIC ½ PITCH (nm)                                                              | 50      | 35      | 25      |
| MPU PRINTED GATE LENGTH (nm)                                                       | 25      | 18      | 13      |
| MPU Physical Gate Length (nm)                                                      | 18      | 13      | 9       |
| Physical gate length high-performance (HP) (nm) [1]                                | 18      | 13      | 9       |
| Equivalent physical oxide thickness for high-performance $T_{ox}$ (EOT)( nm) [2]   | 0.5-0.8 | 0.4-0.6 | 0.4-0.5 |
| Gate depletion and quantum effects electrical thickness adjustment factor (nm) [3] | 0.5     | 0.5     | 0.5     |
| T <sub>ax</sub> electrical equivalent (nm) [4]                                     | 1.2     | 1.0     | 0.9     |
| Nominal power supply voltage $(V_{dd})$ $(V)$ [5]                                  | 0.6     | 0.5     | 0.4     |

Figure 1.2 Long Term Logic Requirements of Technology Scaling [14]

The primary reason that hampers technology scaling is Process Technology. It has been running into problems such as process variability, increased leakage currents and lithography limitations. Circuit and system engineers have been embedding worst-case margins to work around these issues but the solution tends to increase power consumption, thus acting against the purpose of miniaturization [1]. It is highly unlikely that scaling solely would be able to maintain the rate of improvement observed in the past few decades.

#### 1.1 The Three Dimensional IC Technology

Advancement in chip packaging is regarded as reinforcement to technology scaling in an attempt to keep up with Moore's law. State-of-the-art packaging techniques like System-in-Package (SiP) and Stacked ICs are an improvement over System-on-Chip technology, which integrates multiple chips providing similar functionalities across the package. SiP combines more than one active component providing varied functionality into a single package. Figure 1.3 shows various methods used in SiP.



Figure 1.3 Different methods of SiP Design [18]

Vertical stacking of ICs has many advantages like heterogeneous integration, improved power consumption, a small form factor and reduced production cost. Designers and manufacturers in turn face problems like mechanical stability, additional process steps and difficulty in testing. However, the largest concern has been heat management in these 3D structures; especially those which are developed using Through-Silicon-Vias [2]. The architecture of these chips leads to creation of thermal gradients that vary in due course of operation of the system.

#### **1.2 Clock Distribution Network Design**

Clock distribution networks (CDN) are responsible for the synchronization of signals flowing through the most complex of digital and mixed signal systems. The design of the CDN has become increasingly difficult in today's synchronous systems. The reliability and performance of a system are directly dependent on the CDN making it a critical design step. The design of CDN becomes crucial as a direct function of the complexity of the chip and the timing budget. Designers have been pushing the performance limits by creating timing constraints that are very difficult to meet. As such, even the slightest of variation in the CDN is likely to degrade the system.

Non-idealities in the clock signal are unacceptable in face of sharp timing constraints and budgets. It is impossible to have a perfect clock signal delivered to each and every part of the chip. There are two non-idealities in the clock signals that have the largest effect on the system's performance – clock skew and clock jitter. Clock skew is more hazardous due to the fact that it affects both, performance as well as reliability of the system. It is defined as the spatial variation in temporally equivalent clock edges. Clock skew can be deterministic as well as random in nature. Figure 1.4(a) shows clock skew (t<sub>sk</sub>) and clock jitter (t<sub>js</sub>) while 1.4(b) shows the spatial variation characteristics of the clock signal.



Figure 1.4 (a) Clock Skew and Jitter (b) Spatial Variation of Clock Skew [19]

Clock skew can be defined as positive or negative depending on the direction of the cock signal. If data is travelling in the direction of the clock then the skew is termed 'positive' and an opposite travelling direction of the data results in 'negative' skew. Positive skew can increase the system performance by allowing it to function at a higher frequency but decreases the reliability of the system considerably. A negative skew in turn increases the reliability of the system but severely degrades its performance. Thus, clock skew needs to be handled carefully in order to guarantee that the system would meet its specifications.

Figure 1.5 shows various sources of uncertainly in the clock signal that lead to deterministic skew and jitter. A large amount of research has been dedicated to make the CDNs free of these parameters.



Figure 1.5 Uncertainties in the CDN [19]

Modifications in the CDN architecture have successfully eliminated the effects of factors like clock generation (by use of stable sources like crystal oscillators), devices (by appropriate sizing), interconnects (by careful routing), capacitive load (by logic and synthesis optimizations) and coupling to adjacent lines (by shielding and routing techniques), also known as crosstalk. However, temperature remains a concern for the CDNs.



Figure 1.6 Skew across the Alpha Processor by DEC [20]

For instance, Figure 1.6 shows the skew generated in the Alpha Processor, a 64-bit RISC architecture based system by DEC, due to temperature variations [20]. The magnitude of the skew is large enough to affect the performance and reliability of the system.

#### **1.3 Need for Temperature Resilient CDNs**

The semiconductor industry has accepted 3D integration as a possible solution to address speed and power management problems. Through Silicon Vias (TSVs) are popular in such 3D structures due to their small lengths and high densities. These 3D stacking techniques have proved to dramatically increase the density of transistors in digital and mixed-signal systems.

However, heat management has proved to be a concern with TSV-based systems as they are prone to temperature gradients as much as 50°C [1]. For instance, consider a 3D system shown in Figure 1.7 comprising of two stacked dies with different power maps. They are stacked on top of a silicon interposer which is in turn mounted on a printed circuit board [2]. The hot spots in various parts of the system shown in Figure 1.7 (a, b, c) can affect active as well as passive circuits with changes in resistivity, mobility, and threshold voltages.



Figure 1.6 Temperature Distribution in a TSV-based 3D System (a) Dies (b) Interposer (c) PCB

As described in the previous section, CDN is very crucial to the performance of the system and such temperature hot spots can affect it significantly. Temperature variations can affect the sub-threshold leakage of the device and also alter the mobility, slowing down the

buffers by exponentially increasing the propagation delay. The effect of temperature on the MOSFETs are shown in Figure 1.7 (b) while its indirect effect on threshold voltage and delay is shown in Figure 1.7 (a).



Figure 1.7 (a) Dependence of Delay on V<sub>th</sub>

(b) Relationship between Temperature and Sub-threshold leakage in a MOSFET [19]

#### 1.4. Thesis Outline

This thesis is structured as follows. Chapter 2 describes the thermal and electrical analysis of a 3D stacked structure and the CDN that it contains. It also lists the assumptions and details of the parameters used in the stack. This chapter derives the temperature maps for the CDN that would be used throughout the thesis. Lastly, it gives the effects of the temperature maps on the CDN and describes compensation methods that can be used to counter them.

Chapter 3 describes the test vehicle that has been used in order to validate the results. It gives the architecture of the test vehicle and its correlation with the analysis presented in Chapter 2. It also contains implementation schemes for the compensation methods. The chapter gives a detailed description and explanation of the measurements performed using the test vehicle and presents the results showing effectiveness of the compensation methods.

Chapter 4 describes circuitry modifications that will allow the proposed compensation methods to be implemented in the Application Specific Integrated Circuits along with corresponding simulations. Chapter 5 compares the effect on compensation methods on overheads like area and power and correlates results obtained through analysis in Chapter 2, test vehicle measurements in Chapter 3 and simulations in Chapter 4.

Lastly, Chapter 6 summarizes and concludes the thesis. It also provides potential future work and possible improvements based on the current research.

#### CHAPTER 2

### THERMAL AND ELECTRICAL ANALYSIS

This chapter provides an overview of the assumptions made in the thesis regarding the three dimensional IC stack and details of various parameters related to it. The first section lists the dimensions of various layers, properties of the materials used for their construction and details of the environment surrounding the structure. The reminder of the chapter gives a detailed description of the thermal and electrical analysis that were performed on the 3D stack.

The primary objective of the thermal analysis was to generate temperature maps that can be used for understanding the effects of thermal gradients on the CDN and creating test scenarios for the test vehicle, which has been described in the next chapter. The electrical analysis used the results of the thermal analysis to create delay profiles for buffers. This was then extrapolated to perform electrical analysis on the H-Tree CDN. The results of the electrical analysis demonstrate the fundamental problem of variation in propagation delay and skew across the CDN due to thermal gradients. The last part of the chapter describes solutions to this problem in form of compensation methods. A comparison of various methods has been provided to justify the selection of the two compensation techniques that were chosen for implementation.

#### 2.1 The 3D Stack and Assumptions

A large number of integration techniques are available for SiPs today. One of such techniques uses Through Silicon Vias (TSV) for integration. Figure 2.1 shows the TSV-based 3D stack that will be used in this thesis.



Figure 2.1The 3D Stack with PCB, Interposer, Dies and Heatsink

The 3D system is comprised of three dies mounted on an interposer. Die 1 and Die 2 represent any digital or mixed signal synchronous logic. The center die contains the Clock Distribution Network (CDN) that supplies clock signals to Die 1 and Die 2.



(b)CDN with a tree-structure \*\*



The main objective of the CDN is to provide a clock that has minimum amount of skew across the floorplan of the die. This stack of three dies is mounted on an interposer. The interposer is in turn connected to a Printed Circuit Board. Through-silicon Vias (TSV) are used for connection and integration of the interposers, logic dies and the CDN.

There are various approaches to the configuration of the CDN architecture in 3D systems. Some of these techniques have been presented in [4,5]. Figure 2.2(a) shows the design of the CDN wherein the clock signal was routed from the interposer. The source of the clock in this case lies in the interposer. The higher layers receive the clock through a distribution network formed using the TSVs [4]. Another approach to CDN architecture can be seen in Figure 2.2(b). This configuration uses multiple symmetrical TSVs to form a tree structure across the height of the stack. The clock originates in the interposer and is then fed to the TSVs. The structure resembles a very common clock distribution technique known as H-Tree in two dimensional ICs. In this thesis, the system contains the CDN on the center die.

|                  | Unit                 | Value           | Note      |
|------------------|----------------------|-----------------|-----------|
| Die size         | mm <sup>3</sup>      | 10 x 10 x 0.2   | HxLxW     |
| Interposer size  | mm <sup>3</sup>      | 30 x 30 x 0.2   | HxLxW     |
| PCB size         | mm <sup>3</sup>      | 100 x 100 x TBD | HxLxW     |
| Air convection   | W/(m <sup>2</sup> K) | 20              | Fans      |
| TIM conductivity | W/(m·K)              | 2               |           |
| T <sub>A</sub>   | °C                   | 25              | Heat Sink |
| Underfill        | W/(m·K)              | 4.3             |           |
| TSV (interposer) | um                   | 30/100/100      | d/h/p     |
| TSV (die)        | um                   | 5/100/-         | d/h/p     |
| Microbumps       | um                   | 30/100          | d/p       |

Table 1: Assumption of System Parameters

Sizes of the chip, the interposer and the PCB are 10 mm x 10 mm, 30 mm x 30 mm, and 100 mm x 100 mm, respectively. Ratios of TSV diameters and heights are 30  $\mu$ m / 100  $\mu$ m for interposer

and 5  $\mu$ m / 50  $\mu$ m for the die. The thermal environment and related parameters such as convection, ambient temperature, and thermal conductivities are shown in Table 1.

Figure 2.3 shows the detailed architecture of the 3D stack. The top and the bottom die represent synchronous logic while the center die is the CDN [17].



Figure 2.3 Configurations of the Stacked Dies

(a) Bottom Die (b) CDN Die (c) Top Die

The CDN has H-Tree architecture as can be seen in Figure 2.3(b). The circles represent the TSVs. Figure 2.3(a) shows the map with TSVs for supply  $(V_{DD})$ , ground  $(V_{SS})$  and CDN connections. The solid lines represent interconnects. Figure 2.3(c) shows the mapping of repeaters and buffers using triangular symbols in the top die.

#### 2.2 Solver used for Thermal Analysis

A finite volume formulation presented in [6] has been used for the thermal simulations. The solver can accurately capture voltage and current distributions with temperature distribution across any layer in the 3D stack with Joule heating. It takes the material parameters shown in Table I as input and considers them in correspondence to the 3D structure. Figure 2.4 shows the flowchart depicting the operation of the solver.



Figure 2.4 Operation of the Solver used in generation of Temperature Maps for the CDN Layer [6]

The solver considers the electrical excitation and powers maps of all the dies in the stack as well as the boundary conditions. It also factors in the layout of the system and the materials used for its construction. These are fed into the solver as a text input file. Once the solver has the essential information about the system being analyzed, it starts the operation by assigning arbitrary values to the temperature sensitive material parameters and running the voltage drop solver. The data generated is used for Joule heating calculations and checked for convergence. Positive result yields the temperature maps while a negative result calls the thermal solver in order to update the temperature sensitive material parameters and continues the loop.Figure 2.5 summarizes the operation of the solver.



Figure 2.5 Summary of the Operation of the Solver with procedures to generate Temperature Maps

The 3D system defined earlier is modeled using a text file. The details of the tool that contains the solver can be found in Appendix A and the details of the input file can be found in Appendix B. This file is fed to the solver in order to generate temperature maps. The thermal profiles of all the modeled layers are available. The profiles for the center die containing the CDN were extracted from the solver for further analysis.

#### 2.3 Thermal Maps for the CDN

The solver described in the previous section was used to generate temperature maps of the CDN layer in the 3D stack. The temperature maps change in accordance with the power allocation in the system and the thermally sensitive parameters like air convection and TIM conductivity. Figure 2.6 shows an example of the CDN temperature profile.



Figure 2.6 Sample Temperature Profile of a CDN for a certain Power Map

Two factors were considered while selecting the thermal profiles for further analysis. Firstly, the maximum temperature across the die was noted. The temperature map shown above is a good example of this. The maximum temperature is over 120°C and a good approximation of a corner condition in the CDN. Gradient was the other factor taken into consideration. The change in temperature across the die also has a significant effect on the CDN and represents the other corner condition for the CDN.

Figure 2.7 shows a temperature map with a large gradient across the die. The maximum temperature observed here is lesser than the values in Figure 2.6 but the rate of change of temperature against the length and breadth of the die is substantial. The temperature rises from 95°C to over 115°C in just over 5mm distance. This sudden change presents a challenge to the buffers in this region and provides importance to the temperature profile.



Figure 2.7 Temperature Profile of a CDN with large Gradient

The factors in consideration – maximum temperature and gradient – are a direct function of power maps across the dies in the system. A significant change can be observed when the power across the die is varied. Figure 2.8 shows an example of how the power maps were developed for the dies.

| P1  | P2  | P3  | P4  | Die 1 & 2 - P1 + P2          |
|-----|-----|-----|-----|------------------------------|
| P5  | P6  | P7  | P8  | +P15+P16 = <mark>20</mark> W |
| P9  | P10 | P11 | P12 | CDN- P1 + P2+                |
| P13 | P14 | P15 | P16 | +P15+P16 = 10W               |

Figure 2.8 Division of the Die in order for allocation of different Power Densities such that the total power remains constant

The die was divided equally into 16 partitions. The total power of the die is given by the summation of the product of the power density and area of each of the individual blocks. In the example above, Die 1 and Die 2 have a total power of 20W which means that the summation of products of power densities and areas of the blocks P1 to P16 must equal 20. Similarly the sum should be 10 in case of the CDN which has 10W of power.

Two different power configurations were used in order to derive the worst case conditions for the thermal profiles. The first configuration distributed power randomly across the dies. The total power across the die was fixed. Various power distributions were tried in order to gauge the worst case condition in terms of maximum temperature as well as gradient. An evenly distributed power yielded small thermal gradients and designated a highly non-practical scenario. Power concentrated in certain parts of the die gave very large temperatures for the areas in question. However, current IC design techniques are competent enough to avoid such scenarios. Thus, distributing power randomly across the die was the most practical assumption given the possibilities in which power is spread across the die. It gave a good approximation of power distribution in the scenario when today's ICs are put for use on a particular application. Figure 2.9 shows this configuration.

| Die 2 – 20W<br>Randomly Distributed |
|-------------------------------------|
| CDN – 10W<br>Randomly Distributed   |
| Die 1 – 20W<br>Randomly Distributed |

Figure 2.9 Random Power Distribution across all the Dies

The other power configuration used a fixed power distribution for the CDN. The upper and the bottom dies still had random power distributions. Power in the center die was approximated by the H-tree architecture that was used to construct the CDN. Figure 2.10 shows this configuration.

| 5.5% | 5.5% | 5.5% | 5.5% |                                                                    |
|------|------|------|------|--------------------------------------------------------------------|
| 5.5% | 8%   | 8%   | 5.5% |                                                                    |
| 5.5% | 8%   | 8%   | 5.5% |                                                                    |
| 5.5% | 5.5% | 5.5% | 5.5% | ↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓<br>↓ |

Figure 2.10 Fixed Power Distribution across the CDN with H-Tree architecture

Power was slightly concentrated in the center of the die where the clock source resides. The edges, denoting the distribution points were assumed to consume lesser power in accordance with the buffer density in these areas.

The thermal analysis was carried out on an array of power profiles based on the two combinations mentioned above. Maximum temperature and gradients were considered while analyzing the temperature maps. Figure 2.11 shows the graphs that compare temperature and gradients for both the power configurations. The circles on the graphs highlight the points and corresponding configurations which yield the worst case conditions in the temperature maps. Using this information, three distinct thermal profiles were generated to be used for further analysis. The selection of the final thermal profiles considered both the factors mentioned earlier – maximum temperature and gradient.



Figure 2.11 Comparison of Temperature Profiles generated using fully random power configuration and a constant CDN power configuration in terms of maximum temperature and gradients

Figure 2.12 shows the thermal profiles. Figure 2.12(a) has the least gradient and the lowest temperatures while 2.12(b) has slightly steeper contours than (a). Figure 2.12(c) has the highest temperature and the largest thermal gradient.



Figure 2.12 (a) Low Gradient (b) Medium Gradient (c) High Gradient

#### 2.4Effect of Temperature on the CDN

The electrical analysis in [17] presents the effects of temperature on the CDN. A BSIM4 CMOS model of 45nm technology from [7] was chosen, with clock repeaters that depict a buffer sizing profile described in [8]. The evaluation of the CDN was done with lumped RC values for interconnects from [9]. The unit buffer sizes (W/L) for PMOS and NMOS are 630 nm and 195 nm respectively. Note that the TSVs used for connecting the ends of the CDN to adjacent dies are only a subset of TSVs used for the complete 3D integration. A lumped TSV model from [10] was selected to complete the electrical model of the system. Figure 2.13 shows the schematic of the simulation model. The buffer can be seen in (a) while the TSVs and PDN are shown in (b) and (c) respectively [17].



Figure 2.13 Schematic of Simulation Model (a) CDN (b) TSV (c) PDN [17]

A meshed on-chip power distribution network model in [11] has been added to estimate Power Distribution Network (PDN) effect. Data buffers for PDN noise sources were added along with their model for on-chip decoupling capacitors [12]. The resistance of the lumped resistor used for modeling the CDN and PDN interconnects as well as the TSVs in the BSIM4 model is directly dependenton temperature [17].

Table 2 shows the geometric parameters for the CDN, the TSVs, and the PDN models and Table 3 lists the parasitics from the models with respect to the geometric parameters.

| Component | Width/Diameter            | Thickness/Height          | Pitch/Space               |
|-----------|---------------------------|---------------------------|---------------------------|
| CDN       | 1 um (w <sub>CDN</sub> )  | 1 um (t <sub>CDN</sub> )  | N/A                       |
| TSV       | 5 um (d <sub>TSV</sub> )  | 50 um (h <sub>TSV</sub> ) | N/A (p <sub>tsv</sub> )   |
| PDN       | 10 um (w <sub>PDN</sub> ) | 50 um (t <sub>PDN</sub> ) | 50 um (s <sub>PDN</sub> ) |

Table 2: Geometric Parameters for the CDN [17]

Table 3: Geometric Parameters for the Electrical Parasitics [17]

| Component | R         | L       | С      | Note                |
|-----------|-----------|---------|--------|---------------------|
| CDN       | 30 ohm    | N/A     | 200 fF | per mm              |
| TSV       | 61.4 mohm | 29.4 pH | 4.0 fF | per TSV             |
| PDN       | 430 mohm  | 22.3 pH | 1740fF | per mm <sup>2</sup> |

Electrical transient simulations were done in [17] using Agilent's ADS 2009 with the aforementioned BSIM4 model. The clock signal has amplitude of 1.1V with a frequency of 500MHz. A supply voltage ( $V_{DD}$ ) of 1.1V is fed to the clock buffers by the voltage regulator through the PDN model.

Figure 2.14 shows the simulations for the electrical model presented above. Simulations show that the base condition of an ideal PDNin Figure 2.14 (a) has skew of 30.7ps. The addition of PDN anomalies inFigure 2.14 (b) increases the skew by 19.2ps Figure 2.14 (62.5%). The addition of temperature effects to the ideal PDN in Figure 2.14 (c) gives an increase of 143.6ps (467.8%). A further rise of 20.3ps (68.2%) can then be seen when temperature gradient is superimposed on the PDN in Figure 2.14 (d).



Figure 2.14 Simulated Skew

(a) Ideal PDN without temperature effects (b) With PDN effects without temperature effects(c) Ideal PDN with temperature effects (d) With PDN and temperature effects [17]

The values for the delay are dependent on temperature. Figure 2.15 shows thermal dependency of each of the 4 parts of the RC Delay that include the inverter driving the wire capacitance, the inverter itself, the wire and the wire driving the load inverter.



Figure 2.15 Temperature Dependency of Delay [17]

Resistance of copper is known to have a linear dependency on temperature with a coefficient of 0.0039 while the capacitance constructed on a silicon substrate or  $S_iO_2$  is considerably stable with a negligible temperature coefficient. It can be concluded from the figure that the RC delay has a linear relationship with temperature [17].

A delay profile simulated for all the four ends of the CDN, with the temperature profile superimposed on the electrical model is shown in Figure 2.16.



Figure 2.16 (a) Temperature Gradient

(b) Temperature Profile used for the Delay Calculations

#### 2.5 Methods to Compensate for Heat Related Problems

Delay variations caused by thermal variations can be compensated by adjusting buffer parameters. A combination of two methods to ensure compensation against delay variations is presented here. It includes using adaptive voltage scaling and controlling the interconnect delay. First approach makes use of the fact that temperature gradient affects threshold voltage and mobility, which can also be controlled through bias voltages and V<sub>DD</sub> [15]. This would require temperature sensors and level converters. The other approach compensates by delaying faster
signals using adjustable loads [16]. However, the values of additional tunable loading capacitors tend to cause problems as they are delay dependent.

The methods can however be modified in order to increase stability and usability. The modifications include inclusion of an error amplifier and a feedback network to [15]. The use of control switches can help the cause in [16]. The modified methods and sample circuitry are shown in Figure 2.17.



Figure 2.17 Block diagram and schematic of delay compensation

(a) Variable reference voltages for linear regulators (b) Controllable delay for interconnect [15]

A combination of these two methods is used in order to compensate for the variations in CDN due to temperature. Both the methods have their share of advantages and disadvantages, a comparison of which can be seen in Table 4.

| Items                                                          | Adaptive<br>Voltage                  | Controllable<br>Delay           |
|----------------------------------------------------------------|--------------------------------------|---------------------------------|
| <b>Compensation Performance</b><br>(Range/Resolution/Accuracy) | Precise<br>(Small range)             | Coarse<br>(Wide range)          |
| <b>Power Consumption</b><br>(Static/Dynamic)                   | Static<br>(varying with R)           | Dynamic<br>(negligible)         |
| <b>Die Size Overhead</b><br>(Additional chip size)             | Small overhead<br>& regulators       | Large overhead (Interconnect)   |
| <b>Controllability</b><br>(Latency/Compensation time)          | Not required TS<br>(but, inbuilt TS) | Required TS<br>Need calibration |
| <b>Signal Integrity</b><br>(Jitter, Duty, Cross-point)         | Impact on Duty/Cross-point           | No impact                       |
| <b>Stability/Reliability</b><br>(eg. Thermal runaway)          | Stable                               | Stable                          |

 Table 4: Comparison of Compensation Methods

## 2.6 Summary

The temperature profiles generated in the thermal analysis are used in the next two chapters for formulating test cases and scenarios. They were also used to help perform the electrical analysis. The problem of variation in propagation delay and skew across the CDN was demonstrated using the results of the electrical analysis. The linear dependency of propagation delay on temperature was also established. This linear relationship was used to construct the algorithm for the control unit in the test vehicle.

Two compensation techniques called Adaptive Voltage and Controllable Delay were selected as solutions to the problem. A detailed description and comparison of these techniques was provided. It can be concluded from Table 4 that the Controllable Delay technique acts as a coarse control while the Adaptive Voltage method acts a fine control in the process to adjust and reduce the skew across the CDN.

#### CHAPTER 3

# **TEST VEHICLE**

This chapter describes the details of the test vehicle. The first section gives the idea behind the test vehicle by explaining the concept and listing the structure's objectives. This is followed by the hardware architecture of the test vehicle and the specifications of the same. The next section describes the correlation of the test vehicle with the electrical and thermal analysis performed in the previous chapter. Key electrical characteristics were replicated by using specific hardware components and Verilog coding was used to make sure that models used in the analysis were similar to the one's being synthesized on the test vehicle. This section also provides the procedure used to create the thermal environment observed during the analysis.

The test vehicle is first verified using simulations. Waveform analysis tools were used to check the functionality of the algorithms and the control unit that houses it. The reminder of the chapter is dedicated to elaborating the three objectives of the test vehicle. It starts with demonstrating the problem by measuring propagation delays and skews across the CDN in presence of external heat. This constitutes the first set of results. The implementation schemes for the compensation methods are then provided. Extensive test scenarios were used to completely validate both the compensation techniques. The last part of the chapter compares the measurements with the initial results to gauge the improvement in performance.

## 3.1 The Concept

A FPGA-based Test Vehicle was designed in order to validate the results of the thermal analysis and prove the effectiveness of the compensation algorithms. Figure 3.1 shows the basic building blocks of the test vehicle.



Figure 3.1 Block Diagram of the Test Vehicle

There are three primary objectives that the test vehicle needs to satisfy. Firstly, it should replicate the conditions observed in the 3D stack. This has been done by inducing temperature gradients externally as can be seen above. The temperature sensors provide the feedback necessary to ensure that the correct temperature gradients are in place. Secondly, the test vehicle needs to demonstrate the problems observed during the electrical analysis. Lastly, it needs to validate the compensation techniques. This has been achieved by means of measurements and manual observations for different scenarios presented by the first two objectives.

#### **3.2 Test Vehicle Architecture**

The test vehicle was built using the Spartan 6 series of FPGA. Figure 3.2 shows the central die which houses the CDN in the 3D stack defined in Chapter 2. The CDN has been constructed using H-Tree architecture and was synthesized on the FPGA.



Figure 3.2 The CDN Architecture – H-Tree built on the Center Die

The H-tree architecture uses buffers and repeaters to deliver the clock signal to the output ports. The sizing of the buffers was not altered and kept at the default values available for the FPGA. The buffers and repeaters were constructed using switch modeling technique of Verilog and then synthesized on to the FPGA. Xilinx ISE Design Suite was used to code and simulate the initial design. The Plan Ahead tool, also by Xilinx, was then used to add floorplanning and pin constraints before implementing the design on the FPGA. The CDN was constructed to replicate the architecture used in electrical analysis presented in the previous chapter.

The Spartan 6 Evaluation Board by Xilinx was used for initial measurements. Once the feasibility of the test vehicle had been established, the remainder of the tests was then completed using a board that included all the essential circuitry from the Evaluation Board but did not have unnecessary interfaces. Figure 3.3 shows the board used for interfacing the test vehicle with testers and oscilloscopes.



Figure 3.3 Photo of the beard used as the Test Vehicle

The specification and features of the FPGA board, relevant to the test vehicle, are as follows:

- 1. The XC6SLX45T FGG484-3C FPGA of the Spartan 6 family has been used as the central device that contains the test vehicle.
- 2. SMA Connectors are provided in order to supply external clock to the device.
- 3. The device is compatible with supply voltages ranging from 2.5V to 1V. The lowest possible supply voltage, 1V, was used.
- 4. On-board JTAG for programming was available for burning the design to the FPGA.
- 5. An 80-pin connector provided interface with the IO ports.

Figure 3.4 shows the port configuration that would be used for measurements throughout this chapter. The CDN has one input source and distributes the clock signal through 16 distinct output ports. Ports 1, 4, 7 and 10 display the corner cases in terms of interconnect lengths and were thus selected for further observation under an array of scenarios that the test vehicle was subjected to.



Figure 3.4 Port Configurations for the FPGA-based Test Vehicle

## 3.3 Correlation with Electrical and Thermal Simulations

The first objective of the test vehicle was to simulate the conditions witnessed in the 3D stack and thus, the test cases were generated in conjunction with the electrical and thermal simulations. Figure 3.5 shows the effect of thermal variations on the CDN.



Figure 3.5 Summary of Electrical Simulation

(a) Skew with Ideal CDN (b) Skew with Thermal Variations (c) Delay vs Temperature Plot

When the feasibility of the test vehicle was established, the skew profiles from the initial measurements were compared to those seen above in Figure 3.5 (a) and Figure 3.5 (b). There is a linear relationship between temperature and delay of the buffer, seen in Figure 3.5 (c). This was also considered while developing the test vehicle. The fact that the technology node in the simulations as well as Spartan 6 FPGA was 45nm helped the cause. Thus, the test vehicle not only displayed similar values of skew as observed earlier in the electrical analysis, but it also had the same linear dependency of propagation delay on the temperature.

The other factor of correlation was the thermal profiles. Figure 3.6 shows the final temperature maps of the CDN that were selected for further analysis. The temperatures in the above figure vary from anywhere between 90°C and120°C. Thus, the temperature range selected for generating thermal test cases was restricted from 85°C to 125°C. Additionally, gradients of the thermal profiles were also considered. The three profiles show varying gradients and Profile 3, with the highest gradient was selected for majority of the test scenarios since it represents the corner case for the parameter in question.



Figure 3.6 Thermal Profiles sorted by Gradients

Thus, it was ensured that the test vehicle completed its primary objective by setting constraints that replicate the conditions observed during the electrical and thermal analysis.

## **3.4 Simulating the Conditions observed in the 3D Stack**

The previous section highlighted the environment that needs to be created in order to simulate the conditions in the 3D stack which was defined in the previous chapter. This section explains the procedure to create the conditions to be enforced on the test vehicle.

The electrical part of the environment takes care of itself due to the fact that the simulations and the test vehicle share the same technology node -45nm. However, the temperature profiles need to be created manually since the FPGA regulates the internal thermal parameters when operating at room temperature. The temperature variations were generated using micro PTC heaters shown in Figure 3.7.

The heaters make use of the positive temperature coefficient (PTC) of resistors in order to emit heat. They have a wide temperature range from  $40^{\circ}$ C to  $135^{\circ}$ C and thus cover the  $85^{\circ}$ C- $125^{\circ}$ C region required for the test vehicle. The SMD packaging is extremely compact with dimensions of 12mm (L) x 6mm (W) x 1.5mm (T). The temperature at the surface of the heaters can be controlled by varying the supply voltage.



Figure 3.7 Micro PTC Heaters

Figure 3.8 shows the placement of heaters across the Spartan 6 chip in order to create the necessary gradients.



Figure 3.8 Placement of Heaters across the Spartan 6 FPGA

The limitation on dimensions of the FPGA and the heaters allow only four heaters to be placed across the chip. The thermal profile is thus modified such that the temperatures are divided into four quadrants instead of 16.

There is another limitation in the FPGA related to the clock source. Since the IO banks are located on the edges of the chip, it is not possible to source the clock in the center of the chip. Figure 3.9 gives a more practical implementation of the clock sourcing and distribution network. Note that port labeled 'in' corresponds to 'out\_10' in Figure 3.4. Similar mapping can be found between out1 and out\_1, out2 and out\_4, out3 and out\_7.



Figure 3.9 Port modifications due to IO and floorplan constraints on the FPGA

This configuration was then duplicated such that each of the output ports takes turns to source the CDN and the input port becomes the distribution point. In essence, symmetry across

the FPGA was established to prove that the results remain the same regardless of the IO bank that is serving as the source or the sink.

#### **3.5 Simulations**

The CDN was coded in Verilog on the Xilinx ISE Design Suite. Similarly, the compensation methods were modeled in the form of RTL code and then integrated with the CDN. The functionality of the entire system was then verified using simulations.

The iSim waveform analyzer was used to verify the design. The delays were modeled using '#delay' statements and had direct temperature dependence. Figure 3.10 gives the skew seen across the CDN due to the given temperature profile.



Figure 3.10 Skew observed across the ports due to the temperature variations

Since the simulations were purely based on the CDN model, the source was considered to be in the center, labeled 'in'. The skew was measured with respect to this sourcing point across the four ports 1, 4, 7 and 10 from Figure 3.4.

The control unit containing the compensation methods was then activated and the effectiveness of the techniques was observed. Figure 3.11 gives the correction in the skew.

| Skew after compensation                                                                                    | Skew after con      |
|------------------------------------------------------------------------------------------------------------|---------------------|
| • $out_1 = 8ps$                                                                                            | • $out_1 = 39ps$    |
| • $out_4 = 14ps$                                                                                           | • $out_4 = 17ps$    |
| • $out_7 = 7ps$                                                                                            | • $out_7 = 32ps$    |
| • $out_{10} = 4ps$                                                                                         | • $out_{10} = 28ps$ |
| 1997,000 ps       1997,200 ps         in       out_1         out_4       out_7         out_10       out_10 | 997,000 ps          |

# after compensation

| • $out_{10} = 28ps$ |            |            |
|---------------------|------------|------------|
|                     | 2007.000   | 1007 100   |
|                     | 997,000 ps | 997,100 ps |
|                     | in         |            |
|                     |            | out_1      |
|                     |            | out_4      |
|                     |            | out_7      |
|                     |            | out_10     |
|                     |            |            |
|                     |            |            |

Figure 3.11 (a) Correction in Skew by Adaptive Voltage Technique (b) Correction in Skew by the Controllable Delay Technique

Again, the skew was measured with respect to the source port in the center. The correction due to adaptive voltage technique can be seen in Figure 3.11 (a) and due to controllable delay can be seen in Figure 3.11 (b). The simulations successfully verified the design and confirmed the functionality of the compensation methods.

## **3.6 Demonstration of the Problem**

The second objective of the test vehicle was to demonstrate the problems encountered during the electrical analysis. This was done by creating the thermal gradients across the FPGA chip using micro PTC heaters described in an earlier section. Figure 3.12 shows the variation of delay with respect to temperature. The floorplan of the FPGA is shown in the adjacent figure.



Figure 3.12 (a) Variations in delay due to temperature depicting linear dependency (b) Floorplan of the FPGA with placement of heaters

The solid box denotes the source while the dotted box above it denotes the distribution point where the delay was measured. The linear dependence of propagation delay on temperature is visible here. The delay observed is a function of 12 buffers that occur along the path between the input and the output. Furthermore, an approximation of the delay variation in a single buffer was made. Figure 3.13 shows the calculated response of the buffer to temperature variations.



Figure 3.13 Temperature Vs Delay plot for a Single Buffer

Symmetry across the FPGA was then established by rotating the input and the output ports. Figure 3.14 shows the measurement results of the experiment.



Figure 3.14 Variation of delay with respect to temperature observed at various distribution points.

The graph given in Figure 3.12 (a) builds on the results shown earlier in Figure 3.12. The remaining graphs in Figure 3.12 (b), Figure 3.12 (c) and Figure 3.12 (d) show that the linear relationship between delay and temperature remains the same even the though the source and distribution points change across the floorplan of the FPGA. For instance, Figure 3.12 (c) has its source at the block '3' and distribution at the remaining three blocks.

#### **3.7 Implementation Scheme for Compensation Methods**

The third objective of the test vehicle was validation of the compensation methods. Two techniques were used to counter the problems faced due to thermal variations over the CDN. The first one is called Adaptive Voltage and the second is termed as Controllable Delay. This section explains the implementation scheme for both these techniques and explains the algorithm that makes use of them to compensate for the temperature based delay variations.

The adaptive voltage scheme was discussed in the earlier chapter. Figure 3.15 gives the operation summary of the same.



Figure 3.15 Block Diagram of the Adaptive Voltage technique

The method makes use of a temperature variable voltage in order to correct or rather, reduce the propagation delay of the affected buffers. The propagation delay is inversely proportional to supply voltage and thus increasing  $V_{DD}$  of the buffer reduces its inherent delay. A feedback mechanism may or may not be incorporated in order to check for stability and accuracy. Figure 3.16 shows the implementation scheme for the Adaptive Voltage technique in the test vehicle.



Figure 3.16 Implementation Scheme for Adaptive Voltage Technique

The scheme essentially changes the supply voltage of the operational buffers. The switch models used to construct the buffers allow an IO port to feed the supply voltages. Now, the voltage on this IO port can be changed in order to speed up the buffers. Alternatively, the VCCint parameter of the FPGA can be varied to gain control over the central supply voltage of the FPGA but this would allow to selectively modify  $V_{DD}$  of certain buffers as needed.

The second compensation technique is called the Controllable Delay method. This was also discussed in the earlier chapter and a summary of its operation can be found in Figure 3.17 below.



Figure 3.17 Block Diagram of the Controllable Delay Technique

The method controls the delay of interconnects by adjusting the capacitive load across the wires between the buffers. A higher load inserts delay into the path, thus slowing it down. This technique is used when it's not possible to reduce the propagation delay of the buffer sufficiently. So, it inserts delay in the paths that were not affected by temperature and thus directly compensates for the skew. Figure 3.18 shows the implementation scheme for this technique in the test vehicle.



Figure 3.18 Implementation Scheme of the Controllable Delay Technique

The D flip-flop acts as the basic delay element. There are several chains of D flip-flops along the paths in the CDN. The control unit connects appropriate number of D flip-flops between any source and destination buffers in order to insert delay in the given path.

Although both these techniques are effective in their own right, they need a control unit in order to call and control them as needed. This control unit follows a built in algorithm that can be seen in Figure 3.19.



Figure 3.19 Algorithm to Implement and Regulate the Compensation Techniques

The algorithm starts by sensing temperature across the CDN die. In the test vehicle, this is accomplished by feeding the control unit with a set of data points that correspond to the

upcoming temperature changes. These data points are stored in the control unit before the algorithm starts. Once the algorithm is aware of the temperature map, it determines the maximum temperature gradient across any given path. The temperature gradient directly provides the delay across the path since it has a linear relationship with the temperature. The gradient is compared with a certain predefined threshold and the algorithm takes a decision to compensate using the Adaptive Voltage technique or the Controllable Delay technique. Similarly, the algorithm compensates for each of the paths in the CDN, starting from the input and following it to the output. Once the entire network is compensated, it waits for an arbitrary amount of time before returning to sense the temperature again.

#### **3.8 Validation of the Compensation Methods**

The implementation of the compensation methods makes it feasible to achieve the third and final objective of the test vehicle. This section gives details of how both the compensation methods perform in several scenarios. Figure 3.20 gives the nomenclature used to demonstrate the measurements.



Figure 3.20 Heater and IO setup across the FPGA Floorplan

The four blocks denote heaters, which in turn represent different temperature zones. They blocks also contain input and output ports which give the source and distribution points in the Clock Distribution Network.

There are some primary assumptions made during the course of the measurements. The delay and skew improvement in the graphs do not represent the actual readings made with the test vehicle. The actual measurements had added delays due to the signal propagation time through probes of the oscilloscope and channels of the tester. These delays were subtracted before plotting the graphs in this section. It is safe to assume that the delays will not change since only the temperature across the FPGA is changing while the environment around the rest of the equipment remains constant.

The changes in the temperature across the test vehicle were done manually by changing the supply voltages of the heaters. The test scenarios and the subsequent temperature maps were defined as a part of the test plan. This information related to the thermal gradients was documented using vectors and stored in the memory of the control unit. This was essential since the FPGA did not possess any temperature sensors and thus was not capable of sensing the temperature and determining the gradients in real time. In order to simulate the working of the temperature sensors, the control unit was programmed to access the stored temperature values in real time. Thus, whenever the temperature changed, the control unit would 'sense' the variation by accessing its memory and sent it to the algorithm. Note that the temperature values would arrive in the control unit an instance after the actual change in temperature. This was done to make sure that the response was in real time and not premeditated by the control unit.

The Adaptive Voltage method was validated first. This technique adjusts the supply voltage of the buffers in order to compensate for the propagation delay due to increased temperature. Figure 3.21 show the effect of Adaptive Voltage and the improvement in propagation delay due to application of the method.



Figure 3.21 (a) Variation of delay with respect to Temperature (b) Flattening of delay variation due to the Adaptive Voltage technique

The configuration in Figure 3.20 can be used for reference here. The solid box denotes room temperature and the input clock signal for the CDN. The rest of the boxes, named serially (2, 3 and 4) for appropriate representation in the graphs above, denote heaters as well as distribution points. The temperatures are varied in the predefined range from 85°C to 125°C. The increasing propagation delay due to rise in the temperature is visible in Figure 3.21(a) which does not have any compensation techniques to support the cause. Alternatively, Figure 3.21(b) gives a much better response as the slope of increase of the delay is reduced significantly on account of the adaptive Voltage compensation technique. The starting point of the delay lines are different due to the fact that they exhibit some inherent interconnect delay. This holds since the distribution points are not equidistant from the source, which would have been the case if the clock was supplied from the center.

The next experiment was conducted to establish symmetry across the test vehicle. This was the follow up to the measurements in Figure 3.14. The results can be seen in Figure 3.22.



Figure 3.22 Effectiveness of Adaptive Voltage technique observed irrespective of the IO bank used to source the clock signal or distribute it

The graph in Figure 3.22 (a) is derived from Figure 3.21 (b). The remaining graphs are plotted by shifting the source and distribution points around the FPGA. The nomenclature is similar to the one discussed earlier. The graph in Figure 3.22 (b) denotes source at the block named '2' and outputs plotted in the rest of the three areas. This also hold true for graphs in Figure 3.22 (c) and Figure 3.22 (d). It can be concluded from Figure 3.22 that the adaptive voltage technique is effective in reducing the propagation delay of a set of buffers and thus increasing the immunity of the path to temperature effects through a performance improvement of 63%. The percentage improvement is calculated based on the propagation delay measured before and after the application of the adaptive voltage technique.

This test scheme subjects the Adaptive Voltage technique to very predictable scenario where the temperature increases with a fixed gradient. A more practical scenario would be to vary the temperature in real time.

The next experiment achieves this by changing the temperature in a random fashion. Again, the change in temperature was stored into the control unit prior to the experiment due to the absence of temperature sensors. The time step is not constant here since larger gradients take a longer time to heat up as against the smaller ones. Figure 3.23 shows the results for various configurations input and output to the CDN.



Figure 3.23 Adaptive Voltage Technique compensating in real time

Figure 3.20 can again be used as a reference here. The clock source was located at the solid box named '1' and the rest of the areas were used as outputs. The horizontal axis has two components in the form of time and temperature. As time increases, temperature is varied randomly. It can be seen from the graph that the Adaptive Voltage technique can keep the propagation delay within about 200ps, which is a huge improvement over the delay observed originally (600-650ps).

The experiment was repeated for the remaining configurations and the results, following the nomenclature mentioned before, can be seen in Figures 3.24(a), 3.24(b) and 3.24(c) below.



47



Figure 3.24 More examples of compensation using the adaptive voltage technique in real time

The Controllable Delay method was validated next. This technique directly improves the skew observed across the network. Figure 3.20 can be used as a reference here as well. The solid box named '1' was again used as the input source and the rest of the boxes denote outputs.



Figure 3.20 (a) Variation in Skew by Temperature (b) Compensation using Controllable Delay

Figure 3.25(a) shows the variation of skew across the CDN due to temperature. The axes are reversed here with the vertical axis denoting temperature. Figure 3.25(b) displays the improvement in terms of stability of the magnitude across the temperature spectrum. The inherent skew due to lengths of interconnects is still present but the variation due to temperature has been reduced to a great extent.

This test scenario, like the one for Adaptive Voltage, is not very practical and a real time approach should be a better test of the technique. The temperature is thus varied in real time, with controller knowing the upcoming temperature values only at the instance when then change occurs. Figure 3.26 shows the results of this test case.



Figure 3.26 Real time compensation test for the Controllable Delay technique

Again, the time step is not constant since larger gradients consume more time than the smaller ones. The skews remain in the range of about 400ps which is a huge improvement over the response without the compensation techniques (1.2ns–1.8 ns).

## 3.9 Summary

The details of the test vehicle have been provided in this chapter. It established the three objectives of the test vehicle and explained how each was achieved. The implementation and the hardware details laid the foundation and an elaborate test plan ensured the functionality of the CDN as well as the compensation techniques.

The problem was demonstrated by artificially creating the conditions observed in the previous chapter by using external PTC heaters. The compensation techniques were then modified for implementation and simulated to check the correctness. Various test scenarios were then created by changing the temperatures of the heaters. This change in environment affected the performance of the CDN built on the test vehicle. Lastly, the compensation techniques were activated and over 57% improvement in performance was observed.

It can thus be concluded that this chapter validates both the compensation methods and proves that their implementation can successfully tackle the problems that arise in clock distribution networks due to changes in the temperature.

#### CHAPTER 4

## **BUFFER DESIGN FOR ASICs**

The pervious chapters presented the problems faced by CDNs due to thermal gradients and techniques to overcome them. They also gave validation of these techniques using a FPGAbased test vehicle. This chapter explores implementation schemes for these techniques for the Application Specific Integrated Circuits.

The first section correlates the traditional buffer circuit comprising of two cascaded inverters with the electrical analysis as well as the test vehicle. Models used for simulation ensure that the results remain congruent with the ones presented in last chapter. This section also explains the procedure used to create the CDN on an ASIC. The next section explores the implementation schemes for the compensation techniques. Circuit modifications were used to implement both the methods and these have been presented in detail.

The last section uses simulations to verify the functionality of the compensation techniques. Various test scenarios similar to those used in the test vehicle have been simulated in order to make sure that the methods successfully aid in reducing the skew across the CDN. The control algorithm has also been modified and provides optimum results.

## **4.1 Buffer Circuitry Assumptions**

The buffer was constructed as a simple cascade of two inverters. The sizing of the buffer was done in accordance with the electrical simulations. Xilinx provides the transistor models which were used for the simulations. Thus, correspondence between the simulations as well as the test vehicle was established before constructing and testing the CDN and its compensation methods. Figure 4.1 shows the schematic of the buffer.



Figure 4.1 Schematic of the Buffer

The simulations were carried out in the Cadence ADE environment using the same frequency as the one used for the test vehicle. Figure 4.2 shows the results of the same.



Figure 4.2 Simulations of the Buffer Schematic

The buffers were then converted to a symbol and extended to build an H-tree architecture for the CDN. Figure 4.3 shows the schematic of the same.



Figure 4.3 Schematic of the CDN in the H-Tree Architecture

Lastly, the layout of the CDN was constructed and checked against the schematic shown above. The layout was essential in order to extract parasitic and interconnect capacitances. The extracted file was then added during the simulation so that the parasitic effects were considered by the ADE when computing the results. A significant amount of parasitics are added while building CDNs due to large interconnect lengths. The critical length of a wire is defined as the threshold over which wire delay cannot be ignored when calculating total path delay. The lengths of interconnects in the layout was kept below this critical length to ensure that no more buffer insertion was needed for optimal performance. Figure 4.4 shows the complete layout of the CDN in H-tree fashion.



Figure 4.4 Layout of the CDN in form of H-Tree

# **4.2 Implementation of Compensation Techniques**

The previous chapter showed the implementation schemes for the compensation techniques as necessary in the test vehicle. This section presents the modified implementation schemes for the methods for ASICs.

The Adaptive Voltage technique can be implemented by simply inserting a Negative Temperature Coefficient resistor in the supply line. This can be seen in Figure 4.5.



Figure 4.5 Schematic of first implementation scheme of the Adaptive Voltage technique

A NTC response for the resistor is assumed in this case. This implementation may however lead to larger voltage drops and cause problems in calibration since the NMOS essentially acts as a potentiometer. A better implementation scheme would be to have a resistor divider network with only one resistor displaying NTC characteristics. This is a more stable way of implementing adaptive voltage technique and can be seen in Figure 4.6.

The fact remains that both of the implementations presented here are crude in their own ways but will serve the purpose of demonstrating the functionality of the compensation methods. The implementation also assumes NTC response for the NMOS being used as resistors. Regular MOSFETs do not exhibit such characteristics. Modifications in the standard MOSFETs, in terms of materials, are essential in order to achieve a NTC response out of the transistors.



Figure 4.6 Schematic of the voltage-divider based implementation scheme of the Adaptive Voltage technique with one resistor having a NTC response

The Controllable Delay technique has a more robust implementation that is based on the concept that a series connection of transistors makes them weaker. This phenomenon can be seen in Figure 4.7.



Figure 4.7 Effect of series connection of MOSFETS on its strength [21]

A weaker transistor means increased propagation delay. The propagation delay through the path is thus the direct function of number of active transistors. The control unit can handle the taps that decide how many transistors remain active for the given buffer. The control unit and the related circuitry can be seen in Figure 4.8. Note that only the first inverter is tapped while the second inverter remains unchanged. The design does so in order to assure that the voltage levels are maintained and that the rise and fall times are equal for the signal at the output.

The control unit again dominates the circuitry being added to implement compensation methods. It decides the use of switches in order to control the number of taps and active transistors in the circuit. Unlike the test vehicle, the Adaptive Voltage technique is automated in the case of ASICs. This is mainly due to the fact that a NTC response was assumed for the resistors and thus the resistance automatically adjusts itself with the change in temperature, regulating the supply voltage as needed.



Figure 4.7 (a) Control Unit for the Controllable Delay method

(b) Second inverter kept intact

The algorithm remains unchanged. It follows the same steps as it did for the test vehicle. It starts with sensing the temperature. It then calculates the gradient and decides which method to activate. Lastly, it repeats for all the paths and pauses before sensing temperature again.

#### 4.3 Simulations

The simulations were carried out on the circuit that integrates the H-Tree Architecture explained earlier and the control unit for the compensation methods. The RC extraction files were included in the ADE while running the simulations in order to duly consider the parasitics. Figure 4.8 shows the basic setup for each of the simulation examples that follow.



Figure 4.8 Basic setup and nomenclature for the simulations

The simulations are named such that the source point is called 'input' while the distribution points are called 'Out 1', 'Out 2', 'Out 3' and 'Out 4' and presents different blocks in Figure 4.8. Both the simulation methods have been proved to be functional using the results observed through waveforms.

Figure 4.9 shows the ideal condition of the CDN. No temperature gradient has been applied to the network yet. It exhibits some inherent delays resulting from signal traversal through a chain of buffers.



Figure 4.9 Simulation of the CDN in the ideal condition

The next step involved creation of the thermal profile. This was done by individually changing the temperature condition for each of the transistors inside the buffer. The gradient was created by varying the temperature values fed to the MOSFETS. Figure 4.10 shows the variations in delay due to the temperature map.



Figure 4.10 Creating of skew among the outputs due to introduction of a temperature profile
The problem was thus demonstrated by means of simulations as well. The compensation techniques were now made active to test their functionality. A NTC response of the transistor in the resistor divider was ensured by changing the model file related to it. Since the temperature values were already in place, the results showed a completely compensated network. In order to investigate the results in more detail, the model file granting NTC characteristics to the transistors was applied to just one branch. This simulated the first leg of the algorithm which compensates for only one path at the time. Figure 4.11 shows the compensated path in form of 'Out 2'. Note that the output is being compensated (Out 2) with respect to the output that shows the least amount of delay in the first place (Out 1). The main objective of the compensation techniques is to reduce the skew across the output and thus the changes take place in accordance with the output that displays the most optimal result. In this case, if any further compensation was applied to the path given by 'Out 1', it would only mean wastage of power.



Figure 4.11 Compensation for the first path using Adaptive Voltage technique

The process was continued further in order to compensate for the remaining two paths given by 'Out 3' and 'Out 4' waveforms. Figure 4.12 (a) and Figure 4.12 (b) show the results.



Figure 4.12 Complete compensation using the adaptive voltage technique for all the paths in the CDN

Figure 4.12 (b) also shows the final compensation results. All the paths are being compensated with minimal skew which can be observed at the output. The performance of the Adaptive Voltage implementation in the buffer is better than that of the test vehicle.

The Controllable Delay technique was activated next. Figure 4.10 is the place where the control unit is activated. Note that, in this case, the output given by 'Out 4' is used as reference to adjust the remaining outputs since it displays maximum skew. The first step delays 'Out 1' to correspond with the 'Out 4'.

Figure 4.13 shows the result as the control unit compensates for the first path denoted by the 'Out 1' waveform. Delay is introduced by increasing the number of transistors in the pull up and the pull down circuits. The waveform still has equal rise and fall times even though the signal tends to be distorted when number of transistors in series is increased. This is due to the fact that the second inverter was kept intact during the design.



Time (0 to 8ns) for Each Waveform

Figure 4.13 Compensation for the first path using the Controllable Delay technique

Similar approaches were used in order to compensate for the remaining two outputs. Figure 4.14 (a) and (b) show the results after the compensation.



Figure 4.13 Complete compensation for all the paths using the Controllable Delay technique

Figure 4.14 (b) also shows the final output with compensation completed in all the paths. Again, the results are better than the ones observed in the test vehicle.

#### 4.4 Summary

This chapter provides the circuit modifications that are essential in order to implement the compensation techniques in application specific ICs. The traditional buffer design comprising of two cascaded inverters is changed by addition of transistors in the power line as well as the pull-up and pull-down circuits. The transistors in the supply line have been connected such that they act as resistors and implement the adaptive voltage technique. The additional transistors in pull-up and pull-down circuits are switched to implement the controllable delay technique.

Simulations were used to show the basic functionality of the CDN. The verification of the compensation methods was then done by applying several test cases. The temperatures of the MOSFETs were changed to create the thermal gradient. The control algorithm then corrected for propagation delay and skew, thus improving performance by over 89%.

Thus, both the compensation techniques, Adaptive Voltage and Controllable Delay, were implemented in designs targeted at ASICs and then verified for functionality and performance using simulations.

#### CHAPTER 5

## **COMPARSION OF RESULTS**

The last two chapters provided complete set of solutions for the problems that were observed during the analysis in Chapter 2. It was proved that the discrepancies and variations in skew across the CDN can be corrected using compensation methods. Two different implementations of the techniques were also provided for the test vehicle and the ASICs. The performance improvement was evident from results of simulations and measurements.

However, this increased immunity against temperature comes at the cost of power and area. The additional hardware not only takes extra space in the system but also consumes power in order to calculate the amount of compensation that would be needed and make the adjustment in propagation delay of each path. The degradation in power is much more severe than the degradation observed in area. This chapter provides the details and justifies the increased power and area overhead.

It also compares the performance of both the implementation presented in the last two chapters. Lastly, it correlates the results and explains the performance improvement as well as the power and area degradation for each of the implementations.

### 5.1 Power and Area Overheads

Area overhead is mainly due to the additional circuitry for the compensation techniques. The control units as well as modifications in the buffer circuits lead to increase in the area. Figure 5.1 shows the distribution of area in the test vehicle based on the area reports.

65



Figure 5.1 Distribution of Area across the Test Vehicle

Area rises significantly in the test vehicle since there is no straight forward way to implement the compensation techniques. The increase will be less when the implementation is done in ASICs. Also, the area penalty is not severe since most CDNs have some spare area around the H-Tree or any symmetrical architecture for that matter.

The degradation of power is more severe. Power in CMOS circuits is divided into two parts: static and dynamic. The dynamic power is a function of frequency, capacitive load and the square of the supply voltage. Variation in the supply voltage will have a non-negligible effect on the total power consumption of the system. Static power remains constant for the most part but can increase when implementing the Controllable Delay method since it may lead to additional charge sharing and some additional leakage current. Also, the subthreshold leakage increases exponentially with rise in temperature which might contribute to static power. Figure 5.2 shows the distribution of power in the test vehicle based on the PowerPC estimator.



Figure 5.2 Distribution of Power across the Test Vehicle

### **5.2 Correlation of Results**

The thesis was based on three primary areas of investigation. They are thermal/electrical analysis, test vehicle and ASIC design related simulations. The problem has been demonstrated in each of the areas and the solutions have also been validated. This section presents a comparison of various parameters over the different areas and gives a correlation of the results. Table 5 shows the severity of the problem observed in all the three areas.

|                                       | Thermal and<br>Electrical Analysis | Test Vehicle                       | ASIC Design related<br>Simulations |  |
|---------------------------------------|------------------------------------|------------------------------------|------------------------------------|--|
| Dependence of Delay<br>on Temperature | Linear                             | Linear<br>(Flat in rare scenarios) | Linear                             |  |
| Range of Skew                         | 0 to 5ns                           | 0 to 6.8ns                         | 0 to 760ps                         |  |
| Range of Temperature                  | 90°C - 120°C                       | 85°C - 125°C                       | 90°C - 120°C                       |  |

Table 5: Demonstration of Problem in different areas

The solutions schemes including the adaptive voltage technique and the controllable delay technique were validated using the test vehicle. They were also implemented by modifying the traditional buffer design in ASICs and verified using subsequent simulations. Table 6 compares the parameters associated with the compensation techniques. Note that the values in the table denote the worst case condition among the two methods.

|                                                                       | Test Vehicle    | ASIC Design related<br>Simulations |  |  |  |
|-----------------------------------------------------------------------|-----------------|------------------------------------|--|--|--|
| Range of Skew (Original –<br>from Thermal and Electrical<br>Analysis) | 0 to 6.8ns      | 0 to 760ps                         |  |  |  |
| Range of Skew (After<br>Compensation)                                 | 0 to 2.9ns      | 0 to 80ps                          |  |  |  |
| Percentage Improvement                                                | 57%             | 89%                                |  |  |  |
| Original Power                                                        | 0.8W            | 440mW                              |  |  |  |
| Power with Compensation<br>Techniques                                 | 1.3W            | 510mW                              |  |  |  |
| Power Degradation                                                     | 38%             | 13%                                |  |  |  |
| Original Area                                                         | 6% Utilization  | 1.6µ <sup>2</sup>                  |  |  |  |
| Area with Compensation<br>Techniques                                  | 11% Utilization | 1.62µ <sup>2</sup>                 |  |  |  |
| Area Degradation                                                      | 45%             | 1.3%                               |  |  |  |

Table 6: Comparison of Compensation Techniques across Test Vehicle and Simulations

The values for area in the test vehicle are compared by percentage utilization of the logic in the FPGA. Area degradation in ASIC Design related simulations is significantly low since a lot of spare area was available with the H-Tree architecture when considering rectangular patch used for calculation of area.

#### 5.3 Summary

This chapter quantifies and explains the effects of implementing the compensation techniques on area and power overhead. The degradation in power is about 38% for the test vehicle but has a much lesser impact in case of the implementation for ASIC design at about 13%. Area, on the other hand, increases by about 45% due to practical limitations of the FGPA used in the test vehicle but has a negligible 1.3% increase in case of the ASICs. CDNs in typical ASICs tend to have empty spaces if a die is dedicated for routing the clock signal and as such, increase in area would not cause much of a problem.

The second part of the chapter correlates the environment and the implementation across all three parts of the thesis – analysis, test vehicle and ASIC design. The chapter concludes with the comparison between implementation on the test vehicle and the one for ASICs. The correlation was done to establish congruency between the analysis and the two physical implementations of the proposed solution for the problem of variation in skew due to changing thermal gradients.

#### CHAPTER 6

## **CONCLUSION AND FUTURE WORK**

#### 6.1 Conclusion

The thesis focuses on the undesired effects of thermal gradients on the clock distribution networks (CDN) in three-dimensional ICs and techniques to compensate for the same.

A 3D configuration with a PCB, an interposer and a stack of three dies was considered for the thermal and electrical simulations. The interposer is connected to the PCB with the help of micro bumps and the three dies are stacked on top of the interposer followed by the heat sink. The stack is connected using Through-Silicon-Vias (TSVs). The bottom and top dies represent logic while the center die is the Clock Distribution Network. Thermal analysis of the structure was done in order togenerate the temperature maps which were then superimposed on the electrical characteristics to obtain the delay profiles along the CDN.

The thesis provides techniques to compensate for the effects and performance degradation in CDN due to thermal gradients and gives an implementation scheme for the same. It uses two methods of compensation in the form of adaptive voltage and controllable delay. A FPGA-based test vehicle was used to demonstrate the effects of temperature on the CDN as well as the effectiveness of the compensation methods to counter the effects of thermal gradients. A clock distribution network in the H-Tree architecture was coded and synthesized on a Spartan-6 FPGA and micro-PTC heaters were used to create thermal gradients. An intuitive algorithm then created checkpoints along the H-Tree and went from the source to the distribution points, compensating along the way. It determined the gradient across any given checkpoint and then changed the supply voltage or added delays in corresponding paths to compensate for skew.

The test vehicle successfully demonstrated the problems caused by temperature gradients as well as the functionality of the compensation methods.

Implementation scheme for integrating the compensation methods in traditional buffer design is discussed. The functionality of the techniques is again verified using simulations. Lastly, the improvement in performance due to the use of compensation methods is summarized and compared across the analysis results, test vehicle and simulations. Degradations in area and power performance were also considered during the comparison.

#### 6.2 Future Work

The scope of future work after this thesis primarily extends to two domains compensation techniques and implementation schemes. Current work builds on existing compensation techniques and introduces new methods that modify and implement them. The compensation is done mainly through active compensation techniques. Passive compensation techniques can be explored in order to counter the thermal effects.

New methods of compensation can also be developed. The state of the art techniques as well as the ones presented here focus on three parameters of the buffer – bias voltages, delay and strength. Additional parameters like power supply noise at the buffer and crosstalk could be used to create new compensation techniques since these factors are temperature sensitive.

There is a scope of improvement in the implementation schemes also. The designs in this work were developed with the objective of verifying the functionality of the compensation methods. However, they displayed excessive degradation in terms of power and area. Performance always trades off with power and area but at least one of them can be optimized while the other can be kept at level where it provides satisfactory performance. Power efficient techniques can be built by gating and scaling schemes that make sure only the right amount of

71

compensation is provided at the right time. Area efficient designs would require advanced logic optimization of the control units and circuit modifications.

### **APPENDIX A**

# ELECTRICAL AND THERMAL TOOL: POWER ET

The tool used is a three dimensional DC IR drop solver for simulation of 3D power delivery networks. The tool uses Finite Volume Method (FVM) [22-24] with non-uniform grid. The tool has the capability to solve for inhomogeneous power delivery networks. The tool is able to solve for at least 10 million unknowns on a 3GB machine. The thermal analysis tool also uses FVM to solve the steady state heat equation. Both Joule heating effect from the PDN and convection effect are considered. In this thesis Joule heating effect was not considered.

#### **Implementation:** MATLAB

Platform: Windows

**Input Format:** ASCII text file (.txt)

Output Format: MATLAB Data file (.mat)

Author: Jianyong Xie, M. Swaminathan

### **APPENDIX B**

# **INPUT FILE FORMAT**

thermal\_coefficient(dielectric)

| This is<br>Materia                                                       | a steady<br>1 parame                                                                            | y-state<br>ters                                                       | electric                                                                | al-therr                                                                | nal sin                                                                 | nulation in                                                             | put fil                                                                      | e            |             |
|--------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------------|-------------------------------------------------------------------------|------------------------------------------------------------------------------|--------------|-------------|
| number                                                                   | . E.                                                                                            | _resisti                                                              | vity                                                                    | Т_С                                                                     | coeffic                                                                 | ient T_c                                                                | onducti                                                                      | vity ()      | (,y,z)      |
| 7<br>0<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>8<br>9<br>10<br>#<br>#<br># | 1000000<br>1.68e-0<br>2.82e-0<br>1000000<br>1000000<br>1000000<br>1000000<br>1000000<br>5.6e-00 | 000<br>08<br>00<br>000<br>000<br>000<br>000<br>000<br>000<br>000<br>8 | 0<br>0.0039<br>0.0039<br>0<br>0<br>0.0045<br>0.0045<br>0.0045<br>0.0045 | 0<br>400<br>237<br>110<br>1.2<br>4.3<br>0.8<br>0.3<br>1.3<br>1.2<br>174 | 0<br>400<br>237<br>110<br>1.2<br>4.3<br>0.8<br>0.3<br>1.3<br>1.2<br>174 | 0<br>400<br>237<br>110<br>1.2<br>4.3<br>0.8<br>0.3<br>1.3<br>1.2<br>174 | 0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0<br>0 |              |             |
| Öbject I<br>number                                                       | Paramete<br>x1 v1                                                                               | rs (Body<br>z1 x2                                                     | )<br>v2                                                                 | z2                                                                      | materi                                                                  | al                                                                      |                                                                              |              |             |
| 8<br>0<br>1<br>2<br>3<br>4<br>5<br>6<br>#<br>#                           | 0<br>0<br>35<br>35<br>35<br>45<br>45                                                            | 0<br>0<br>35<br>35<br>35<br>45<br>45                                  | 0<br>1.585<br>1.6<br>1.8<br>1.81<br>2<br>2.03                           | 100<br>100<br>65.0<br>65.0<br>65.0<br>55<br>55                          | 100<br>100<br>65<br>65<br>65<br>55<br>55                                | 1.585<br>1.6<br>1.8<br>1.81<br>2<br>2.03<br>2.04                        | 7<br>1<br>5<br>8<br>3<br>4<br>8                                              |              |             |
| Electri                                                                  | cal Exci                                                                                        | tation (<br>1 z1                                                      | Ebody)<br>x2                                                            | v2                                                                      | 72                                                                      | material                                                                | type                                                                         | value        |             |
| 10<br>1<br>#<br>#                                                        | 0.0                                                                                             | 0.0                                                                   | 1.6                                                                     | 1                                                                       | 1                                                                       | 1.602                                                                   | 1                                                                            | 0            | 1.2         |
| #<br>Thermal<br>number                                                   | Excitat<br>x1                                                                                   | ion (Tbo<br>y1 z1                                                     | dy)<br>x2                                                               | y2                                                                      | z2                                                                      | material                                                                | type                                                                         | value        | (Celsius/W) |
| 10<br>1 45.0<br>2<br>3<br>#<br>#                                         | 0.0<br>65                                                                                       | 45.0<br>0.0<br>0.0                                                    | 2.89<br>1.6<br>1.6                                                      | 55.0<br>35.0<br>100.0                                                   | 55.0<br>100<br>100                                                      | 2.89<br>1.6<br>1.6                                                      | 5<br>1 2<br>1                                                                | 0<br>20<br>2 | 25<br>20    |

There are four blocks in the file namely material parameters, object parameters, electrical excitation and thermal excitation. The first number after the labels (shown in red) in each body denoted the number of columns in the body. All individual blocks are ended with three '#' signs. Materials in all the blocks should correspond to the same material number in material parameters. All units are in millimeters.

#### **Object Parameters:**

Every object is a cuboid. The vectors specified denote the two ends of the diagonal of the cuboid.

### Electrical Excitation:

Type = 0 for voltage excitation (Unit: Volt).

Type = 1 for current excitation (Unit: Ampere). Negative sign is for current consumed and positive for current delivered.

### Thermal Excitation:

Type = 0 for constant temperature boundary condition (Unit:  ${}^{0}C$ ).

Type = 1 for power density. When  $z_1=z_2$  then surface power density (Unit:  $W/m^2$ ).

When  $z_{2>z_1}$ , then volumetric power density (Unit:  $W/m^3$ ).

Type = 3 for convection boundary condition (Unit:  $W/m^2K$ ).

While defining surface excitations care should be taken that surface excitation should not go beyond the structure definition in the object parameter block.

# REFERENCES

- [1] S. Borkar et al., "Parameter variations and impact on circuits and microarchitecture," Proc. of DAC, vol. 64, pp. 338-342, 2003.
- [2] J. Xie, M. Swaminathan, "Electrical-thermal co-simulation of 3D integrated systems with micro-fluidic cooling and Joule heating effects," IEEE Trans. on CPMT, vol. 1, no. 2, pp. 234-246, 2011.
- [3] http://www.intel.com/technology/mooreslaw/index.htm.
- [4] D. Kim et al., "Distributed multi TSV 3D clock distribution network in TSV-based 3D IC," IEEE 20th Conference on EPEPS, pp. 87-90, 2011.
- [5] D. Kim et al., "Vertical Tree 3-dimensional TSV Clock Distribution Network in 3D IC," IEEE 62nd ECTC, pp. 1945-1950, 2012.
- [6] J. Xie and M. Swaminathan, "Fast electrical-thermal co-simulation using multigrid method for 3D integration," IEEE 62nd ECTC, pp. 651-657, 2012.
- [7] 45nm NCSU FreePDK<sup>TM</sup>, http://www.si2.org, 2012.
- [8] PTM (Predictive Technology Model), http://ptm.asu.edu, 2012.
- [9] ITRS Roadmap Interconnect, http://www.itrs.net, 2011.
- [10] J. Kim et al., "High-frequency scalable electrical model and analysis of a Through Silicon Via (TSV)," IEEE Trans. on CPMT, vol. 1, no. 2, pp. 181-195, 2011.
- [11] J. Pak et al., "PDN impedance modeling and analysis of 3D TSV IC by using proposed P/G TSV array model based on separated P/G TSV and chip-PDN models," IEEE Trans. on CPMT, vol. 1, pp. 208-219, 2011.

- [12] K. Kim et al., "Modeling and analysis of a Power Distribution Network in TSV-based 3-D memory IC including P/G TSVs, on-chip decoupling capacitors, and silicon substrate effects," IEEE Transactions on CPMT, vol. 2, no. 12, pp. 2057-2070, 2012.
- [13] J. M. Rabaey et al, "Digital Integrated Circuits," Prentice-Hall, 2003.
- [14] www.ece.msstate.edu/~reese/EE8273/lectures/tech\_scale/tech\_scale.pdf
- [15] K. Shakeri and J. D. Meindl, "Temperature variable supply voltage for power reduction," Proc. of ISVLSI, pp. 1-4, 2002.
- [16] A. Chakraborty et Al., "Dynamic thermal clock skew compensation using tunable delay buffers," IEEE Trans. on VLSI Systems, vol. 16, no. 6, pp. 639-649, 2008.
- Swaminathan, Byunghyun Lee, Sang [17] S.J. Park, N. Natu, M. Min Lee, "Timing Analysis Woong Hwan Ryu, and Kee Sup Kim for Thermally Robust Distribution Network Design for 22nd Clock 3D ICs", IEEE Conference of Electrical Performance of Electronic Packaging and Systems, 2013.
- [18] International Technology Roadmap for Semiconductors (ITRS), 2009 Edition, Assembly and Packaging, pp 21.
- [19] Uyemura, John P. (2002), "Introduction to VLSI Circuits", 3<sup>rd</sup> Edition, John Wiley and Sons Inc, Phoenix, AZ, pp 231-245.
- [20] www.digital.com/aplha\_proc/21264C
- [21] Uyemura, John P. (1999), "CMOS Logic Circuit Design", 2nd Edition, John Wiley and Sons Inc, Phoenix, AZ, pp 481-532.