The propagation delay in a standard cell is given by average of two time intervals.
tp = (th+ tr)/2
Propagation delay is the time difference between approximately 50% of the input transition and 50% of the output transition.
If the input waveform changes from zero to supply voltage (VDD) or from supply
voltage (VDD) to zero value, then low-to-high and high-to-low propagation
delays can be expressed as
tplh = ClVdd/(beta)p(Vdd-|vtp|)2
tphl = clVdd/(beta)p(Vdd - |vtn|)2
to improve the propagation delay of a given standard cell, one could
Increase supply voltage,
Reduce threshold voltage,
Increase transistors gain factors, or
Reduce the load capacitance.
Reducing load capacitance and increasing supply voltage is outside the standard cell.
Reduction of threshold voltage depends on semiconductor foundry and is part of the standard cell characterization.The only available parameter to the circuit designer is to increase gain factor.
the length and width of the transistor are related to gain factor beta.
In Wp/Wn ratio determination, it is desired to set
betan = k betap.
Wp/Wn = 2/k
In the ideal situation, k is equal to 1. This means that for a CMOS inverter to
charge and discharge capacitive loads in the same amount of time, the
channel width of the PMOS transistor must be twice as large as the channel
width of the NMOS transistor.
Although increasing the value of Wp / Wn reduces the cell propagation
delay, it also increases the active area capacitance and gate capacitance. This
increase in capacitance adversely affects the gate speed. Therefore, circuit
designers must make a trade off in determining how large the transistors
should be such that their propagation delays are optimal.
Fast circuits consume more area than slow circuits.
Friday, April 23, 2010
Steps in Placement
Detach Scan chain and Scan chain reordering:
what is a scan chain?
Scan chain is one of the DFT strategies to improve chip's observability and controllability.It is collection of flip flops connected by scan chain.In the chain the output of the previous flip flop is
connected to the scan data input of the next flip flop.The data is sent to scan input and output of the logic operation is observed at the scan output.The nodes along the scan chain can be set to intended value 0 or 1 by the scan chain.The effect of these settings can be seen by shifting the data through out the scan chain, controlled and observed.
Physical wire connection between adjacent flip flops depends on the logical order of the chain.Logical proximity does not match with physical proximity.
Logical order is decided during logic synthesis stage through the random process or alphabetical order since physical locations are unknown at that time.
As a result if the original chains are retained, then routing,physical wire connections will not be optimized.
Best approach is to disconnect the scan chain before placement , so that the normal placement will not be disturbed by the connectivity of the scan chains.Then reorder the chain arrangement after the placement step after all the physical locations are fixed and known.The reordering based on physical location information improves overall routability and total connecting wire length.
All WLM (wire load models ) are removed before performing timing optimization and timing is calculated based on VR virtual route.Virtual route is the shortest Manhattan distance between two pins.
Manhattan right angle ruler for backend.It is also called as city-block distance. It is so named because it is the distance a car would drive in a city laid out in square blocks, like Manhattan (discounting the facts that in Manhattan there are one-way and oblique streets and that real streets only exist at the edges of blocks - there is no 3.14th Avenue). Any route from a corner to another one that is 3 blocks East and 6 blocks North, will cover at least 9 blocks.
The distance between the two points measured along axes at right angles.
Set Placement and timing options:
No cells under the preroute of the metal layer and under the via selected.
P & R tool prevents pins of standard cells from being placed under the metal layers you specify. This means that a standard is not be placed in a location when any pin of the cell overlaps with a preroute of the metal layer. For example, if M3 is selected, a standard cell will not be placed when any of its pins (regardless of the pin’s metal layer) overlaps with a preroute on M3.
Avoiding pin overlap with preroutes improves routability because there are less routing resources under preroutes due to the preroute and any vias and contacts along the preroutes.
Placement optimization:
PrePlace optimization:
Post Placement optimization before CTS:
Effect of CTS:
Worst negative slack corresponds to the path having maximum negative slack.
Total negative slack is the summation of all WNS per end point.
When TNS >> WNS, there might be sub critical path violations which are as good as the critical path violations.
Optimization during placement mainly works on the critical path of each clock domain and stops when it cannot further improve timing.
Critical range optimization works on sub -critical paths to reduce TNS and the total number of violation paths.
Iterate Post placement optimization and critical range optimization until remaining violations are acceptably small and if further improvement are seen.
Timing Driven Placement
P & R tool requires timing constraints to understand design timing objectives.The most standard
timing constraints on most designs include arrival times of the input signals to the design as well as the required arrival time at the output of the chip.This also include clock period of the system clock and as well as other clocks if the design contains multiple clock domains.
The timing information that tool uses is based upon the standard cell delays and wire connected to all these cells in the design.
The standard cell delays are the function of input transition and as well as the summation of capacitance of output wire and input gates of all the logic connected to the output wire.
Wire delays are the function of resistance of the metal layers and summation of wire capacitance and input gate capacitance.
Timing driven placement is the process of placing the standard cells in the rows of the core area using timing constraints as the guidelines as to where to place the cells.
Evaluation of Placement:
After performing automatic placement, evaluate the placement and make changes to improve the routability of the design.
During placement, tool calculates routing congestion, based on the availability of wire tracks inside the global routing cells. Using these routing congestion calculations, It produces a placement congestion map that shows the estimated amount of routing congestion within the design.
what is a scan chain?
Scan chain is one of the DFT strategies to improve chip's observability and controllability.It is collection of flip flops connected by scan chain.In the chain the output of the previous flip flop is
connected to the scan data input of the next flip flop.The data is sent to scan input and output of the logic operation is observed at the scan output.The nodes along the scan chain can be set to intended value 0 or 1 by the scan chain.The effect of these settings can be seen by shifting the data through out the scan chain, controlled and observed.
Physical wire connection between adjacent flip flops depends on the logical order of the chain.Logical proximity does not match with physical proximity.
Logical order is decided during logic synthesis stage through the random process or alphabetical order since physical locations are unknown at that time.
As a result if the original chains are retained, then routing,physical wire connections will not be optimized.
Best approach is to disconnect the scan chain before placement , so that the normal placement will not be disturbed by the connectivity of the scan chains.Then reorder the chain arrangement after the placement step after all the physical locations are fixed and known.The reordering based on physical location information improves overall routability and total connecting wire length.
All WLM (wire load models ) are removed before performing timing optimization and timing is calculated based on VR virtual route.Virtual route is the shortest Manhattan distance between two pins.
Manhattan right angle ruler for backend.It is also called as city-block distance. It is so named because it is the distance a car would drive in a city laid out in square blocks, like Manhattan (discounting the facts that in Manhattan there are one-way and oblique streets and that real streets only exist at the edges of blocks - there is no 3.14th Avenue). Any route from a corner to another one that is 3 blocks East and 6 blocks North, will cover at least 9 blocks.
The distance between the two points measured along axes at right angles.
Set Placement and timing options:
No cells under the preroute of the metal layer and under the via selected.
P & R tool prevents pins of standard cells from being placed under the metal layers you specify. This means that a standard is not be placed in a location when any pin of the cell overlaps with a preroute of the metal layer. For example, if M3 is selected, a standard cell will not be placed when any of its pins (regardless of the pin’s metal layer) overlaps with a preroute on M3.
Avoiding pin overlap with preroutes improves routability because there are less routing resources under preroutes due to the preroute and any vias and contacts along the preroutes.
Placement optimization:
PrePlace optimization:
- It generates the initial placement before optimizing the netlist to get the wiring information.
- It collapses non-critical buffer, reduces total cell area by downsizing so that the netlist is easier to place.
- High fanout nets contain significantly large number of buffers which can impact placement so,instead of minimizing the buffer , it rebuilds the HFN nets based on more accurate RC estimates.
- Performs quick logic synthesis.
- Places all the standard cells.
- It re-optimizes the logic based on virtual route.It does cell sizing,area recovery, gate duplication,buffer insertion,net splitting.Optimizes the gates for setup timing based on virtual route.
- incremental timing, and congestion-driven placement.
Post Placement optimization before CTS:
- Optimization is done with ideal clocks.
- It performs a more specific timing optimization of the netlist and the layout, including quick fixing of setup and hold violations and maximum transition and maximum capacitance violations by buffering the gates.
- It can do placement optimization based on global routing.
Effect of CTS:
- Clock buffers are added.
- Congestion may increase.
- Non clock tree cells may have been moved to less ideal locations.
- Can introduce new timing and max tran/cap violation.
- Post-placement optimization after clock tree synthesis improves the timing results of your design with propagated clocks. It takes the clock tree into account so that the clock skew can be preserved. Post-placement optimization after clock tree synthesis has an option to perform congestion removal before running optimization
- Perform logical and placement optimization to fix timing and max cap/tran violation.
- Fixing hold time is recommended here.
- Reduce congestion by removing unnecessary non-clock tree buffers.
Worst negative slack corresponds to the path having maximum negative slack.
Total negative slack is the summation of all WNS per end point.
When TNS >> WNS, there might be sub critical path violations which are as good as the critical path violations.
Optimization during placement mainly works on the critical path of each clock domain and stops when it cannot further improve timing.
Critical range optimization works on sub -critical paths to reduce TNS and the total number of violation paths.
Iterate Post placement optimization and critical range optimization until remaining violations are acceptably small and if further improvement are seen.
Timing Driven Placement
P & R tool requires timing constraints to understand design timing objectives.The most standard
timing constraints on most designs include arrival times of the input signals to the design as well as the required arrival time at the output of the chip.This also include clock period of the system clock and as well as other clocks if the design contains multiple clock domains.
The timing information that tool uses is based upon the standard cell delays and wire connected to all these cells in the design.
The standard cell delays are the function of input transition and as well as the summation of capacitance of output wire and input gates of all the logic connected to the output wire.
Wire delays are the function of resistance of the metal layers and summation of wire capacitance and input gate capacitance.
Timing driven placement is the process of placing the standard cells in the rows of the core area using timing constraints as the guidelines as to where to place the cells.
Evaluation of Placement:
After performing automatic placement, evaluate the placement and make changes to improve the routability of the design.
During placement, tool calculates routing congestion, based on the availability of wire tracks inside the global routing cells. Using these routing congestion calculations, It produces a placement congestion map that shows the estimated amount of routing congestion within the design.
Thursday, April 22, 2010
Placement objectives
what is Placement?
Placement is process of placing the cells,searching for appropriate place within the floorplan of the chip for each cell in the netlist.
Placement objectives:
-Guarantee the router can complete routing step
-minimize critical net delay
-make the chip as dense as possible.
Placement additional objectives are as follows:
Minimize the estimated interconnect length.
Meet the timing requirements for the critical nets
Minimize interconnect congestion
Placement is process of placing the cells,searching for appropriate place within the floorplan of the chip for each cell in the netlist.
Placement objectives:
-Guarantee the router can complete routing step
-minimize critical net delay
-make the chip as dense as possible.
Placement additional objectives are as follows:
Minimize the estimated interconnect length.
Meet the timing requirements for the critical nets
Minimize interconnect congestion
Channel definition and Slicing Floor Plan
During the floor plan we assign the areas between the blocks for interconnect.This is called channel definition or channel allocation
T-shaped junction between two rectangular channels


Routing a T-junction between two channels in two-level metal. The dots represent logic cell pins. (a) Routing channel A (the stem of the T) first allows us to adjust the width of channel B. (b) If we route channel B first (the top of the T), this fixes the width of channel A. We have to route the stem of a T-junction before we route the top.
Channel ordering:
Choosing the order of rectangular channel to route is channel ordering.
Slicing Floor Plan:
Suppose a chip has several blocks.We cut along the block boundaries in the chip into two pieces.
And we continue in the same manner until we separate all the blocks is called slicing floorplan.

Defining the channel routing order for a slicing floorplan using a slicing tree. (a) Make a cut all the way across the chip between circuit blocks. Continue slicing until each piece contains just one circuit block. Each cut divides a piece into two without cutting through a circuit block. (b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit blocks are left. (c) The slicing tree corresponding to the sequence of cuts gives the order in which to route the channels: 4, 3, 2, and finally 1.
It shows how the sequence we use to slice the chip defines a hierarchy of the blocks. Reversing the slicing order ensures that we route the stems of all the channel T-junctions first.
Cyclic constraint and non-slicing floorplan:

Non -slicing floor plan is the one where we cannot cut the chip into pieces without chopping a circuit block into two.We cannot route a channel until other channels are routed.This is called cyclic constraint.
The only solution to remove the cyclic constraint is to move the block, but this will make it area inefficient,routing difficult.We may have to use area based router or L-shaped or switch boxes(fixed connectors) for routing.

Channel definition and ordering. (a) We can eliminate the cyclic constraint by merging the blocks A and C. (b) A slicing structure.
We can also merge circuit blocks since it is more efficient to route the row -based block by flattening them than route between the blocks.Now , we get a slicing floorplan .Fig(b) shows the channel definition and routing order for our chip.
T-shaped junction between two rectangular channels


Routing a T-junction between two channels in two-level metal. The dots represent logic cell pins. (a) Routing channel A (the stem of the T) first allows us to adjust the width of channel B. (b) If we route channel B first (the top of the T), this fixes the width of channel A. We have to route the stem of a T-junction before we route the top.
Channel ordering:
Choosing the order of rectangular channel to route is channel ordering.
Slicing Floor Plan:
Suppose a chip has several blocks.We cut along the block boundaries in the chip into two pieces.
And we continue in the same manner until we separate all the blocks is called slicing floorplan.

Defining the channel routing order for a slicing floorplan using a slicing tree. (a) Make a cut all the way across the chip between circuit blocks. Continue slicing until each piece contains just one circuit block. Each cut divides a piece into two without cutting through a circuit block. (b) A sequence of cuts: 1, 2, 3, and 4 that successively slices the chip until only circuit blocks are left. (c) The slicing tree corresponding to the sequence of cuts gives the order in which to route the channels: 4, 3, 2, and finally 1.
It shows how the sequence we use to slice the chip defines a hierarchy of the blocks. Reversing the slicing order ensures that we route the stems of all the channel T-junctions first.
Cyclic constraint and non-slicing floorplan:

Non -slicing floor plan is the one where we cannot cut the chip into pieces without chopping a circuit block into two.We cannot route a channel until other channels are routed.This is called cyclic constraint.
The only solution to remove the cyclic constraint is to move the block, but this will make it area inefficient,routing difficult.We may have to use area based router or L-shaped or switch boxes(fixed connectors) for routing.

Channel definition and ordering. (a) We can eliminate the cyclic constraint by merging the blocks A and C. (b) A slicing structure.
We can also merge circuit blocks since it is more efficient to route the row -based block by flattening them than route between the blocks.Now , we get a slicing floorplan .Fig(b) shows the channel definition and routing order for our chip.
Tuesday, April 20, 2010
Goal and objectives of the Floor plan and its Evaluation
The goals of floor planning are to:
The percentage of core utilized by placed standard cell and macros.
Determines die size:
Core limited :Core logic determines the die size.
Pad limited :No of I/O pads determines the die size.Pad area is more than cell area.
We need to control the aspect ratio of our floor plan because we have to fit our chip into the die cavity (a fixed-size hole, usually square) inside a package
Evaluation of Floorplan
-a through away placement
-Estimate global route congestion
- defining the core area, aspect ratio .
- arrange the blocks on a chip.
- decide the location of the I/O pads.
- decide the location and number of the power pads.
- decide the type of power distribution
Floor planning control parameters like aspect ratio, core utilization are defined as follows:
Aspect Ratio =W/L
The percentage of core utilized by placed standard cell and macros.
Determines die size:
Core limited :Core logic determines the die size.
Pad limited :No of I/O pads determines the die size.Pad area is more than cell area.
We need to control the aspect ratio of our floor plan because we have to fit our chip into the die cavity (a fixed-size hole, usually square) inside a package
Evaluation of Floorplan
-a through away placement
-Estimate global route congestion
Monday, April 19, 2010
I/O Pad Placement
There are three types of I/O pads:
Electromigration:
It is the movement of molecular atoms from one are to another area caused by excessive flowof current in the direction of flow of electrons.It results in shorts between electric wires, hillocks, high metal resistance causing ASIC failure.
No of pads can be determined by Ngnd = Itotal/Imax
Ngnd -No of ground pads
Itotal -ASIC total current
Imax -Maximum EM current in amperes /ground pad.


Switching noise:
switching noise is generated when ASIC output make transitions from one state to another.
Insufficient ground and power pads may lead to data errors due to switching noise transitions.
1.Capacitive coupling
dv/dt
It is the disturbance caused in adjacent package pin when switching transients inject pulses via parasitic capacitive coupling.
This can be reduced by:
1.Isolate sensitive clock inputs pin from the switching signal pads.
2.Group bidirectional pads together so that all are in either input or output mode.
2.Inductive coupling
Simultaneous switching of the ASIC output induces rapid current changes in power and ground busses.The inductance in power and ground pins causes voltage fluctuations in internal ASIC power and ground level.
The rapid current changes may induce logic error or may cause noise spikes on non-switching output pads that affect signals connected to other systems.
The maximum L (di/dt) occurs when ASIC output make a transition to another voltage level and
absolute current increases from zero through a wire of inductance L.Factors such as process, ambient temperature, voltage, location of output pads, and number of simultaneous
switching output pads determine the magnitude of inductive switching noise.
To control inductive switching noise, enough power and ground pads mustbe assigned and placed correctly. This way the noise magnitude will be limited. This noise reduction will prevent inputs of ASIC design from interpreting the noise as valid logic level.
Successful reduction of inductive switching noise can be accomplished by
the following:
- Power
- Ground


- Signal
Electromigration:
It is the movement of molecular atoms from one are to another area caused by excessive flowof current in the direction of flow of electrons.It results in shorts between electric wires, hillocks, high metal resistance causing ASIC failure.
No of pads can be determined by Ngnd = Itotal/Imax
Ngnd -No of ground pads
Itotal -ASIC total current
Imax -Maximum EM current in amperes /ground pad.


Switching noise:
switching noise is generated when ASIC output make transitions from one state to another.
Insufficient ground and power pads may lead to data errors due to switching noise transitions.
1.Capacitive coupling
dv/dt
It is the disturbance caused in adjacent package pin when switching transients inject pulses via parasitic capacitive coupling.
This can be reduced by:
1.Isolate sensitive clock inputs pin from the switching signal pads.
2.Group bidirectional pads together so that all are in either input or output mode.
2.Inductive coupling
Simultaneous switching of the ASIC output induces rapid current changes in power and ground busses.The inductance in power and ground pins causes voltage fluctuations in internal ASIC power and ground level.
The rapid current changes may induce logic error or may cause noise spikes on non-switching output pads that affect signals connected to other systems.
The maximum L (di/dt) occurs when ASIC output make a transition to another voltage level and
absolute current increases from zero through a wire of inductance L.Factors such as process, ambient temperature, voltage, location of output pads, and number of simultaneous
switching output pads determine the magnitude of inductive switching noise.
To control inductive switching noise, enough power and ground pads mustbe assigned and placed correctly. This way the noise magnitude will be limited. This noise reduction will prevent inputs of ASIC design from interpreting the noise as valid logic level.
Successful reduction of inductive switching noise can be accomplished by
the following:
- Reduce the number of outputs that switch simultaneously by dividinthem into groups with each group having a number of delay buffers inserted into their data paths
- Reduce the effective power and ground pin inductance by assigning as many power and ground pads as possible
Sunday, April 18, 2010
Macro Placement
Macro placement takes place after I/O placement.Macro placement can be done manually or automated.
The physical measure of the quality of placement of macros can be as follows:
1.wire length
2.data flow
3.port accessibility
4.timing
Placement of the macro should not lead to segmented floor plan.In a segmented floor plan , standard cells area is not continuous.There is excessive interconnections between the standard cells located at the bottom of the die to those located at the top.This leads to increase in wire length.
To avoid segmentation , macro should be placed such that standard cell area is continuous.Macros could be kept along the ASIC core area.

Segmented Floor Plan
There can be increase in wire length due to the orientation of macro placement and pin locations, to nets being connected may be of different length.Macros should be placed accordingly so that their ports are facing the core area or to the standard cells, orientation should match available routing layers.

Floor plan with Macros Facing Standard Cells Region.
Placement of macro relative to standard cell placement and macro port accessibility has an impact on chip's final routing.The global router gives a statistical and graphical report on the analysis of routing congestion.
Routing congestion is caused by not enough space to provide routing channels between the macros for I/O connection and macro;routing is prohibited and standard cells trap pockets at the edges of the macros or within the corners of the floor plan.
Standard cell trap pockets are long, thin channels between macros.If there are more cells in the channel then it may cause routing congestion.Therefore,these channels need to be kept free for most standard cells and should be available for repeater or buffer insertion

Floor plan with Standard Cells Trap Pocket
Blockage layers:
Most Physical designs require keep-out regions or buffer only which is defined by blockage layer ,which refrain placer from moving the standard cells into this region.
Blockage layers avoid routing congestion.Suppose a macro blocks routing layers and then wire try to detour along the corners of the macros causing congestion at the corners.Blockage layer can be placed at the corners of the macros to reserve resources for the router.These are placed over pre-placed macros covering the power and ground rings.
Naturally, the wires that are used in keep-out regions have a tendency to be long. By allowing buffer insertion in those areas by using a buffer-only region (or blockage), the placer will taper these long nets and thus avoid the long transition times associated with them.

Fly lines:
After macro placement and standard cells are placed , connectivity analysis is performed.This is done by the fly lines.This is to study the connections between i/o pads,macros, standard cells.One can identify seeing the fly lines that whether moving or rotating the macros can reduce wire length which improve routability during floor planning stage.
The physical measure of the quality of placement of macros can be as follows:
1.wire length
2.data flow
3.port accessibility
4.timing
Placement of the macro should not lead to segmented floor plan.In a segmented floor plan , standard cells area is not continuous.There is excessive interconnections between the standard cells located at the bottom of the die to those located at the top.This leads to increase in wire length.
To avoid segmentation , macro should be placed such that standard cell area is continuous.Macros could be kept along the ASIC core area.

Segmented Floor Plan
There can be increase in wire length due to the orientation of macro placement and pin locations, to nets being connected may be of different length.Macros should be placed accordingly so that their ports are facing the core area or to the standard cells, orientation should match available routing layers.

Floor plan with Macros Facing Standard Cells Region.
Placement of macro relative to standard cell placement and macro port accessibility has an impact on chip's final routing.The global router gives a statistical and graphical report on the analysis of routing congestion.
Routing congestion is caused by not enough space to provide routing channels between the macros for I/O connection and macro;routing is prohibited and standard cells trap pockets at the edges of the macros or within the corners of the floor plan.
Standard cell trap pockets are long, thin channels between macros.If there are more cells in the channel then it may cause routing congestion.Therefore,these channels need to be kept free for most standard cells and should be available for repeater or buffer insertion

Floor plan with Standard Cells Trap Pocket
Blockage layers:
Most Physical designs require keep-out regions or buffer only which is defined by blockage layer ,which refrain placer from moving the standard cells into this region.
Blockage layers avoid routing congestion.Suppose a macro blocks routing layers and then wire try to detour along the corners of the macros causing congestion at the corners.Blockage layer can be placed at the corners of the macros to reserve resources for the router.These are placed over pre-placed macros covering the power and ground rings.
Naturally, the wires that are used in keep-out regions have a tendency to be long. By allowing buffer insertion in those areas by using a buffer-only region (or blockage), the placer will taper these long nets and thus avoid the long transition times associated with them.

Fly lines:
After macro placement and standard cells are placed , connectivity analysis is performed.This is done by the fly lines.This is to study the connections between i/o pads,macros, standard cells.One can identify seeing the fly lines that whether moving or rotating the macros can reduce wire length which improve routability during floor planning stage.
Saturday, April 17, 2010
Low power design techniques
Why Low power in today's chip design ?
Dynamic power reduction techniques:
Power optimization techniques can be introduced at the RTL level.
This includes:
We have many flip flops in our design.So there is clock transitioning continuously happening at the clock inputs of these flip flops.This contributes to dynamic power dissipation due to switching activity of the clock.
If the clock is gated and then sent to the flip flops , seems to reduce power dissipation than the one which is not gated.
The amount of power saving increases with increase in the number of registers.
FSM encoding:
Power reduction can be done at the algorithmic level by using proper encoding scheme for FSM state assignments.This depends on the transitions that take place while going from one state to next state.
Say from 3 to 4 in binary encoding scheme requires 3 transitions, while gray -code representation requires only one.
Less power is consumed when FSM state use gray-code encoding scheme.
Bus encoding:
Gray coding is also useful for power reduction n SOC bus interconnects.
Dynamic power in this case depends on the width of the bus and loading capacitance, so bus segmentation will naturally help in power reduction by reducing capacitance in bus segment.
In Bus, BI bus invert coding is implemented , where in it is decided whether to send actual data or its complement depending on the transitions on bus interconnect.This depends on the hamming distance between the present and the next state.If the hamming distance is greater than the half of width of the bus ,then next bus value is the complement of the next data of the bus.If the hamming distance is less than the half of the width of the bus , then the data sent is actual data.
Deglitching:
Power dissipation occurs due to the switching activity in cmos logic gates.Power can be saved significantly if the unnecessary switching can be avoided.Glitches are the unwanted momentary transitions that occur due to the delay in logic gates.
Glitches add to the number of transitions , they should be avoided.
The arithmetic operations are prone to produce glitches, if all the multiplies and adders are sequential and latches are not provided to hold their values until they become stable.Latches
hold the previous input value at each level thus avoiding glitches .
Another low power design technique is to replace flip flops with latches wherever possible.
Both latches and flip flops are building blocks of sequentialcircuits and their outputs depend on the current inputs as well as previous inputs and outputs.
The difference between latches and flip flop is that latches are level sensitive and flip flops are edge sensitive.
In D-latch, output Q obtain the value of the D-input at the specified level of the clock signal,it responds to the changes in the input as long as clock signal is asserted.
In D-flip-flop,output responds to changes in the input at the specified edges of the clock signal, thus preventing the output to respond to changes in the input after the rising or falling edge .The output of the flip flop remains constant even if the input changes after the rising or falling edge.
Multi-Threshold -
This techniques uses both low Vt and high Vt cells.Low -vt cells can be used in critical path while high -Vt cells off the critical path.This improves performance without increase in power.
The flip side of this technique that :
multi -vt cells causes fabrication complexity and increases design time.Improper optimization of the design may utilize more Low Vt cells and hence could end up with increased power.
Multi-Vdd (Voltage)
Power supply is directly proportional to dynamic power.Reducing voltage reduces dynamic power .But lower threshold voltage causes delay in the logic gates.Higher voltage can be applied to timing critical path and rest of the chip runs in lower voltage.Different blocks have different voltages which can be integrated in SOC. This increases power planning complexity in terms of laying down the power rails and power grid structure. Level shifters are necessary to interface between different blocks.
Power Gating:
Power gating is where the circuit blocks not in operation are temporarily turned off going to low power mode.And when they are required in operation, turned on to active mode.The goal of power gating is to reduce leakage power by temporarily turning off the circuit blocks and switching the two modes in a suitable manner so to reduce its impact on performance and maximize power performance.
It increases time delay as the gated modes should be safely entered and exited.
Dynamic power reduction techniques:
Power optimization techniques can be introduced at the RTL level.
This includes:
- Clock gating
- FSM encoding
- Avoiding glitches or Deglitching
- bus encoding
We have many flip flops in our design.So there is clock transitioning continuously happening at the clock inputs of these flip flops.This contributes to dynamic power dissipation due to switching activity of the clock.
If the clock is gated and then sent to the flip flops , seems to reduce power dissipation than the one which is not gated.
The amount of power saving increases with increase in the number of registers.
FSM encoding:
Power reduction can be done at the algorithmic level by using proper encoding scheme for FSM state assignments.This depends on the transitions that take place while going from one state to next state.
Say from 3 to 4 in binary encoding scheme requires 3 transitions, while gray -code representation requires only one.
Less power is consumed when FSM state use gray-code encoding scheme.
Bus encoding:
Gray coding is also useful for power reduction n SOC bus interconnects.
Dynamic power in this case depends on the width of the bus and loading capacitance, so bus segmentation will naturally help in power reduction by reducing capacitance in bus segment.
In Bus, BI bus invert coding is implemented , where in it is decided whether to send actual data or its complement depending on the transitions on bus interconnect.This depends on the hamming distance between the present and the next state.If the hamming distance is greater than the half of width of the bus ,then next bus value is the complement of the next data of the bus.If the hamming distance is less than the half of the width of the bus , then the data sent is actual data.
Deglitching:
Power dissipation occurs due to the switching activity in cmos logic gates.Power can be saved significantly if the unnecessary switching can be avoided.Glitches are the unwanted momentary transitions that occur due to the delay in logic gates.
Glitches add to the number of transitions , they should be avoided.
The arithmetic operations are prone to produce glitches, if all the multiplies and adders are sequential and latches are not provided to hold their values until they become stable.Latches
hold the previous input value at each level thus avoiding glitches .
Another low power design technique is to replace flip flops with latches wherever possible.
Both latches and flip flops are building blocks of sequentialcircuits and their outputs depend on the current inputs as well as previous inputs and outputs.
The difference between latches and flip flop is that latches are level sensitive and flip flops are edge sensitive.
In D-latch, output Q obtain the value of the D-input at the specified level of the clock signal,it responds to the changes in the input as long as clock signal is asserted.
In D-flip-flop,output responds to changes in the input at the specified edges of the clock signal, thus preventing the output to respond to changes in the input after the rising or falling edge .The output of the flip flop remains constant even if the input changes after the rising or falling edge.
Multi-Threshold -
This techniques uses both low Vt and high Vt cells.Low -vt cells can be used in critical path while high -Vt cells off the critical path.This improves performance without increase in power.
The flip side of this technique that :
multi -vt cells causes fabrication complexity and increases design time.Improper optimization of the design may utilize more Low Vt cells and hence could end up with increased power.
Multi-Vdd (Voltage)
Power supply is directly proportional to dynamic power.Reducing voltage reduces dynamic power .But lower threshold voltage causes delay in the logic gates.Higher voltage can be applied to timing critical path and rest of the chip runs in lower voltage.Different blocks have different voltages which can be integrated in SOC. This increases power planning complexity in terms of laying down the power rails and power grid structure. Level shifters are necessary to interface between different blocks.
Power Gating:
Power gating is where the circuit blocks not in operation are temporarily turned off going to low power mode.And when they are required in operation, turned on to active mode.The goal of power gating is to reduce leakage power by temporarily turning off the circuit blocks and switching the two modes in a suitable manner so to reduce its impact on performance and maximize power performance.
It increases time delay as the gated modes should be safely entered and exited.
Monday, April 12, 2010
Power dissipation in CMOS
Power dissipation in CMOS comes from two components:
Static dissipation due to:
1.sub threshold conduction while the transistor is OFF.
2.Tunneling current through gate oxide.
3Leakage current through reverse biased diodes.
Dynamic dissipation due to:
1.charging and discharging of input and load capacitance.
2.short circuit current while both PMOS and NMOS networks are ON.
Ptotal = Pstatic + Pdynamic
Let us see one by one:
Static power dissipation
Subthreshold Leakage:
The V-I characteristics of transistor shows that current Id flows only when gate to source voltage Vgs > Vt.But in reality when Vgs < Vt ,the transistor does not become OFF, there is some leakage,given by following expression.
Leakage = exp(-qVt/kT)
This happens due to carrier diffusion from source to drain in weak inversion.
So static power dissipation depend on temperature .So ,when chip heats up static power dissipation also exponentially increases.
Gate oxide tunneling:
When a high electric field is applied to gate oxide ,electrons may tunnel into gate oxide layer if it is less than 3-4nm thick which results in leakage.This leakage current exponentially depends on oxide thickness and Vdd.Electrons may tunnel into conduction band of the oxide layer.
There is finite probability that carriers may tunnel into gate oxide causing gate leakage current flowing into the gate.
Junction Leakage:
There are many parasitic diodes which are formed for e.g, p-n junction between diffusion and substrate or well form diodes.To make these diodes reverse biased ,substrate is connected to GND and n-well to VDD .But these reverse biased diodes conduct a small amount of current.
Ireverse=A.Js.(e(q.Vbias/kT)-1)
Vbias-->reverse bias voltage across the junction
Js-->reverse satuartion current density
A-->junction area
Junction leakage is caused by diffusion and drift of minority carriers at the edges of depletion region and generation of electron -hole pair in the depletion region of the reverse biased junctions.
Dynamic Power Dissipation
Dynamic power dissipation occur when signal flow through CMOS logic circuit which change logic state.Power is drawn from power supply to charge the output node capacitance.
The output node capacitance consists of the following:
1.Output node capacitance of the logic gate:This is due to drain diffusion region.
2.Total interconnect capacitance.
3.Input node capacitance of the driven gate :This is due to the gate oxide capacitance.
Let us consider this inverter circuit .Power is consumed from the power supply to charge the output node capacitance. Power drawn from power supply is dissipated in PMOS during charge up and charge down process dissipates power in NMOS transistor.
Only half of the power is stored as energy in capacitance,
Therefore energy stored in capacitor is= CL.VDD2 / 2.The other half is dissipated as heat in PMOS transistor.We see that energy dissipation in PMOS is independent of the size of PMOS.
This energy is then dissipated as heat in NMOS transistor.During discharge phase charge is removed from the capacitor and its energy is dissipated as heat in NMOS.
Each switching cycle takes a fixed amount of energy = CL. VDD2.
If a gate is switched on and off ‘fn’ times / second, then Pdynamic = CL. VDD2. fn.
Below mentioned steps can be taken to reduce dynamic power
1) Reduce power supply voltage Vdd
2) Reduce voltage swing in all nodes
3) Reduce the switching probability (transition factor)
4) Reduce load capacitance
Short Circuit Power:
The finite rise and fall time of the input to the CMOS logic gates causes a direct current path from VDD to Gnd,this exist for short duration during switching.

During switching both NMOS and PMOS are simultaneously turned ON,especially when the condition ,VTn < Vin < Vdd - |VTp| holds for the input voltage, where VTn and VTp are NMOS and PMOS thresholds, there will be a conductive path open between Vdd and GND because both the NMOS and PMOS devices will be simultaneously on.
When the input voltage exceeds threshold voltage VTn the NMOS starts conducting and until input voltage reaches Vdd-|Vtp| PMOS is ON.Thus for some time bot transistor are ON.Similar event causes short circuit current to flow when signal is falling.Short circuit current terminates when transition is completed.
short circuit current is directly dependent on rise time and fall time, reducing transition short circuit component decreases.But propagation delay need to be considered.
when input rise and fall time is greater than the output rise and fall time ,short circuit path will be for longer time ,so it is desirable to have equal rise and fall time edges.
also, if Vdd is less than the sum of Vtn and Vtp then short circuit current can be eliminated since there is no way that both transistor can be turned on for any input voltage.
Static dissipation due to:
1.sub threshold conduction while the transistor is OFF.
2.Tunneling current through gate oxide.
3Leakage current through reverse biased diodes.
Dynamic dissipation due to:
1.charging and discharging of input and load capacitance.
2.short circuit current while both PMOS and NMOS networks are ON.
Ptotal = Pstatic + Pdynamic
Let us see one by one:
Static power dissipation
Subthreshold Leakage:
The V-I characteristics of transistor shows that current Id flows only when gate to source voltage Vgs > Vt.But in reality when Vgs < Vt ,the transistor does not become OFF, there is some leakage,given by following expression.
Leakage = exp(-qVt/kT)
This happens due to carrier diffusion from source to drain in weak inversion.
So static power dissipation depend on temperature .So ,when chip heats up static power dissipation also exponentially increases.
Gate oxide tunneling:
When a high electric field is applied to gate oxide ,electrons may tunnel into gate oxide layer if it is less than 3-4nm thick which results in leakage.This leakage current exponentially depends on oxide thickness and Vdd.Electrons may tunnel into conduction band of the oxide layer.
There is finite probability that carriers may tunnel into gate oxide causing gate leakage current flowing into the gate.Junction Leakage:
There are many parasitic diodes which are formed for e.g, p-n junction between diffusion and substrate or well form diodes.To make these diodes reverse biased ,substrate is connected to GND and n-well to VDD .But these reverse biased diodes conduct a small amount of current.
Ireverse=A.Js.(e(q.Vbias/kT)-1)
Vbias-->reverse bias voltage across the junction
Js-->reverse satuartion current density
A-->junction area
Junction leakage is caused by diffusion and drift of minority carriers at the edges of depletion region and generation of electron -hole pair in the depletion region of the reverse biased junctions.
Dynamic Power Dissipation
Dynamic power dissipation occur when signal flow through CMOS logic circuit which change logic state.Power is drawn from power supply to charge the output node capacitance.
The output node capacitance consists of the following:
1.Output node capacitance of the logic gate:This is due to drain diffusion region.
2.Total interconnect capacitance.
3.Input node capacitance of the driven gate :This is due to the gate oxide capacitance.
Let us consider this inverter circuit .Power is consumed from the power supply to charge the output node capacitance. Power drawn from power supply is dissipated in PMOS during charge up and charge down process dissipates power in NMOS transistor.Only half of the power is stored as energy in capacitance,
Therefore energy stored in capacitor is= CL.VDD2 / 2.The other half is dissipated as heat in PMOS transistor.We see that energy dissipation in PMOS is independent of the size of PMOS.
This energy is then dissipated as heat in NMOS transistor.During discharge phase charge is removed from the capacitor and its energy is dissipated as heat in NMOS.
Each switching cycle takes a fixed amount of energy = CL. VDD2.
If a gate is switched on and off ‘fn’ times / second, then Pdynamic = CL. VDD2. fn.
Below mentioned steps can be taken to reduce dynamic power
1) Reduce power supply voltage Vdd
2) Reduce voltage swing in all nodes
3) Reduce the switching probability (transition factor)
4) Reduce load capacitance
Short Circuit Power:
The finite rise and fall time of the input to the CMOS logic gates causes a direct current path from VDD to Gnd,this exist for short duration during switching.

During switching both NMOS and PMOS are simultaneously turned ON,especially when the condition ,VTn < Vin < Vdd - |VTp| holds for the input voltage, where VTn and VTp are NMOS and PMOS thresholds, there will be a conductive path open between Vdd and GND because both the NMOS and PMOS devices will be simultaneously on.
When the input voltage exceeds threshold voltage VTn the NMOS starts conducting and until input voltage reaches Vdd-|Vtp| PMOS is ON.Thus for some time bot transistor are ON.Similar event causes short circuit current to flow when signal is falling.Short circuit current terminates when transition is completed.
short circuit current is directly dependent on rise time and fall time, reducing transition short circuit component decreases.But propagation delay need to be considered.
when input rise and fall time is greater than the output rise and fall time ,short circuit path will be for longer time ,so it is desirable to have equal rise and fall time edges.
also, if Vdd is less than the sum of Vtn and Vtp then short circuit current can be eliminated since there is no way that both transistor can be turned on for any input voltage.
Saturday, April 10, 2010
Signal integrity issues and solutions
Signal integrity -is the quality of a signal on a line.Ignoring them can cause logic or timing problems leading to compromised performance or even component failures.
Following are the issues we would come across which would be considered while power planning.
1.Crosstalk
2.IR drop
Crosstalk is an unwanted signal on the victim line due to transmission line close to it.This happens due to coupling capacitance between them which are the function of separation between the lines and dielectric constant of the separated materials.
Crosstalk effects is function:
rise and fall edges on aggressor,
distance between the lines,
and the presence of signal reference planes.
Wires have capacitance to their adjacent neighbors and to the ground.
1.If wire A switches ,it brings with it along wire B due to capacitive coupling.
2.If wire A switches and wire B is supposed to switch simultaneously ,then it may cause switching delay.
3.If B is not supposed to switch then crosstalk introduces noise in B.
Crosstalk effect is larger is long wires , but it is negligible in short wires for large load,since load capacitance dominates.
Crosstalk delay effects:
The charge delivered to coupling capacitor is Q=Cadj.delta V.Delta V is change in voltage between A and B.
If A switches and B not switching then Delta V = Vdd.The total capacitance seen by A is coupling with B and wrt ground.
If A and B switch is the same direction, then delta V is zero,hence Cadj is completely absent.
If A and B switch in the opposite direction ,then delta V=2Vdd.So,twice as much charge is required,capacitance is twice as large switching through Vdd.
The crosstalk induces incremental delay on victim line,so delay is calculated due to crosstalk and then the length of the driven net is adjusted so that delay does not exceed a certain value.
Cross talk is in a sense treated as a delay by the router and the additional delay coming from cross coupled capacitance. When the nets are too close and run for a long
length in parallel the cross coupling cap is huge. To avoid this scenario make sure your routes don't run in parallel for a longer length.
Crosstalk Noise effects:If A switches and B is left floating then,B also partially switches due to noise introduced by A.
If B is driven then it opposes coupling ,so only small percentage of supply voltage will be the noise introduced in B.
During the noise event the aggressor transistor will be in saturation and victim is in linear region
For equal sized drivers,due to velocity saturation aggressor resistance will be twice to 4 times as larger than R victim.
It introduces noise glitches in victim net that might exceed the switching threshold of the receiving logic element.This effect is checked based on drive strength and relative position.the nets will be shortened and moved to reduce the crosstalk effects.
Solution:
1.increase the distance between them , coupling capacitance reduces which is straight forward solution.
2.Shielding victim from aggressor line is to put low impedance trace between lines.This way it provides path for the return current from signal line.
3.Another technique is wire re-ordering.In the original configuration,it may feel that the lines l1,l2,l3 are susceptible to crosstalk effects.
The transition on l2 is delayed due to opposing l1,while l2 induces downward transition on l3.But by reordering l2 and l3 , the transitions get canceled on l3.At the same time l2 is farther moved from l1 making it immune to the effects of l1.
Original

reorder

4.Microstrip and stripline architectures are used on high-speed PC boards to reduce the crosstalk between traces. These techniques pair a signal trace with a solid reference plane above or below it. The reference plane can be any DC voltage since crosstalk is an AC effect, so usually the planes are ground or one of the power supplies.
The low impedance reference plane captures the return current of the signal trace. This current creates a magnetic field that opposes the field in the signal trace, and creates an overall field that is confined locally and falls off rapidly with distance. Microstrip architecture has one reference plane while stripline has two, one on each side of the signal trace.


Since clock lines are spread throughout the chip, they can be victim to many aggressor lines, so special care must be taken. Long clock nets should be identified and isolated with either with spacing or shielding.
IR drop:
IR drop is occurs both in power and ground network due to the resistance in the metal layers.
narrower metal line width causes increase in resistance and hence IR drop.
The amount of voltage drop can be calculated as
deltaVdrop = Iavg *Reff
Reff-Effective resistance from power pads to logic gates.
Iavg -Average current switched by logic gates from the power lines coming from Vdd pad.

http://www.vlsitechnology.org/html/irdrop_1.html
Static IR drop:
refers to the drop due to current flow when the circuit is in the steady state,has no switching inputs.Steady-state IR drop is caused by the resistance of the metal wires comprising the power distribution network.
Dynamic IR drop:
refers to the voltage drop due to the current flow in the circuit which is switching,performing some function.Further, dynamic IR drop occurs when the simultaneous switching of on-chip components such as clocks, clocked elements, bus drivers and memory decoder drivers causes a dip or spike in the power/ground grid.
Dynamic IR drop is greater than static IR drop since the current flowing in the metal interconnect is greater than when the circuit is in a steady state.
IR drop effect:
IR drop reduces speed and noise immunity of the cells and macros.
First a reduced voltage difference between Vdd and Vss will reduce the cell's operating performance.If that cell is in the critical path,the decrease in cell performance will reduce the chip's operating frequency.IR drop reduces noise immunity and in extreme cases causes functional failure.
Methods to reduce IR drop:
Following are the issues we would come across which would be considered while power planning.
1.Crosstalk
2.IR drop
Crosstalk is an unwanted signal on the victim line due to transmission line close to it.This happens due to coupling capacitance between them which are the function of separation between the lines and dielectric constant of the separated materials.
Crosstalk effects is function:
rise and fall edges on aggressor,
distance between the lines,
and the presence of signal reference planes.
Wires have capacitance to their adjacent neighbors and to the ground.
1.If wire A switches ,it brings with it along wire B due to capacitive coupling.
2.If wire A switches and wire B is supposed to switch simultaneously ,then it may cause switching delay.
3.If B is not supposed to switch then crosstalk introduces noise in B.
Crosstalk effect is larger is long wires , but it is negligible in short wires for large load,since load capacitance dominates.
Crosstalk delay effects:
The charge delivered to coupling capacitor is Q=Cadj.delta V.Delta V is change in voltage between A and B.
If A switches and B not switching then Delta V = Vdd.The total capacitance seen by A is coupling with B and wrt ground.
If A and B switch is the same direction, then delta V is zero,hence Cadj is completely absent.
If A and B switch in the opposite direction ,then delta V=2Vdd.So,twice as much charge is required,capacitance is twice as large switching through Vdd.
The crosstalk induces incremental delay on victim line,so delay is calculated due to crosstalk and then the length of the driven net is adjusted so that delay does not exceed a certain value.
Cross talk is in a sense treated as a delay by the router and the additional delay coming from cross coupled capacitance. When the nets are too close and run for a long
length in parallel the cross coupling cap is huge. To avoid this scenario make sure your routes don't run in parallel for a longer length.
Crosstalk Noise effects:If A switches and B is left floating then,B also partially switches due to noise introduced by A.
If B is driven then it opposes coupling ,so only small percentage of supply voltage will be the noise introduced in B.
During the noise event the aggressor transistor will be in saturation and victim is in linear region
For equal sized drivers,due to velocity saturation aggressor resistance will be twice to 4 times as larger than R victim.
It introduces noise glitches in victim net that might exceed the switching threshold of the receiving logic element.This effect is checked based on drive strength and relative position.the nets will be shortened and moved to reduce the crosstalk effects.
Solution:
1.increase the distance between them , coupling capacitance reduces which is straight forward solution.
2.Shielding victim from aggressor line is to put low impedance trace between lines.This way it provides path for the return current from signal line.
3.Another technique is wire re-ordering.In the original configuration,it may feel that the lines l1,l2,l3 are susceptible to crosstalk effects.
The transition on l2 is delayed due to opposing l1,while l2 induces downward transition on l3.But by reordering l2 and l3 , the transitions get canceled on l3.At the same time l2 is farther moved from l1 making it immune to the effects of l1.
Original

reorder

4.Microstrip and stripline architectures are used on high-speed PC boards to reduce the crosstalk between traces. These techniques pair a signal trace with a solid reference plane above or below it. The reference plane can be any DC voltage since crosstalk is an AC effect, so usually the planes are ground or one of the power supplies.
The low impedance reference plane captures the return current of the signal trace. This current creates a magnetic field that opposes the field in the signal trace, and creates an overall field that is confined locally and falls off rapidly with distance. Microstrip architecture has one reference plane while stripline has two, one on each side of the signal trace.


Since clock lines are spread throughout the chip, they can be victim to many aggressor lines, so special care must be taken. Long clock nets should be identified and isolated with either with spacing or shielding.
IR drop:
IR drop is occurs both in power and ground network due to the resistance in the metal layers.
narrower metal line width causes increase in resistance and hence IR drop.
The amount of voltage drop can be calculated as
deltaVdrop = Iavg *Reff
Reff-Effective resistance from power pads to logic gates.
Iavg -Average current switched by logic gates from the power lines coming from Vdd pad.

http://www.vlsitechnology.org/html/irdrop_1.html
Static IR drop:
refers to the drop due to current flow when the circuit is in the steady state,has no switching inputs.Steady-state IR drop is caused by the resistance of the metal wires comprising the power distribution network.
Dynamic IR drop:
refers to the voltage drop due to the current flow in the circuit which is switching,performing some function.Further, dynamic IR drop occurs when the simultaneous switching of on-chip components such as clocks, clocked elements, bus drivers and memory decoder drivers causes a dip or spike in the power/ground grid.
Dynamic IR drop is greater than static IR drop since the current flowing in the metal interconnect is greater than when the circuit is in a steady state.
IR drop effect:
IR drop reduces speed and noise immunity of the cells and macros.
First a reduced voltage difference between Vdd and Vss will reduce the cell's operating performance.If that cell is in the critical path,the decrease in cell performance will reduce the chip's operating frequency.IR drop reduces noise immunity and in extreme cases causes functional failure.
Methods to reduce IR drop:
- Reduction in current consumption by logic gates .So low-power design techniques can be implemented.
- widening the metal wires.widening the power lines and also adding more power lines.Extremely dense power lines are used in high performance chips.the wire resistance is proportional to the wire length from power pad to logic gates.
- Increasing the number of Vdd and Vss pads in the chip to reduce current consumption for each pair of pads.
- Spread the logic
- If the gates along the metal line switch together ,then there is increase in IR drop,so some kind of switching order to be followed in case of larger current.
- C4 technology can be used which provide area I/Os, that provide shorter power lines.
- Clock gating reduces IR drop
- Using low power cells.
- Proper CTS structure-minimizing clock buffers in clock tree structure as they switch frequently.
Friday, April 9, 2010
What is power budgeting?
Calculating the power dissipation at block level in the design will help us to know whether the design can meet the power specification and to estimate the size of the power grid needed in the floor plan.
Power planning
Why power planning is required?
To ensure that all the components in the chip have adequate power and ground connections.
Power network generally includes following things:
Power pad -to supply power to entire chip
power rings -run around the periphery of the die to supply power to standard cell 's core area, and individual hard mac
ros.Rings are put in higher level routing layers so that lower level are used for signal routing.
Power rails-The horizontal wires are often referred as rails and vertical wires are called as straps.They run from end to end,crossing the entire die ,sections of the die.
Rails and straps are placed uniformly spaced array.Rails connect the power pins of the standard cell and extend to power rings,There they connect with vias.Widest trunks are put in higher level routing layers as with power rings.After the straps and trunks are inserted ,they are all well tied together using the vias and via stacks.
The uniformly spaced array of straps and rails may get modified to allow hard macro power rings , wiring keep out area and other restrictions.
Using lower level routing (typically metal1) ,rails are created only in standard cell placement areas that aren't already blocked by hard macros or wiring keep out.





There are two types of power planning and management.
1.core cell power
2.I/O cell power
In core cell power -VDD and VSS power rings are formed around the core and macro, depending on power requirement straps and trunks are created for the macros.
I/O cell power Power rings are formed around the I/O cell, and trunks are created connecting core power ring an power pad.
For Flattened design top to bottom approach is suitable..
For macros bottom to top approach is suitable.

Why power rings are made up of top metal layers?Why top metal layers should have low resistance and not lower metal layers?
R=rho*L/A;
Top metal layers have larger width than lower one.Lower metal layers are used for signal routing and to have adequate supply of power to all the components ,to decrease IR drop due to the resistance in the metal layers ,top metal layers are made wider,hence low resistance and are made up of copper .Lower metal layers are made up of aluminum.
To ensure that all the components in the chip have adequate power and ground connections.
Power network generally includes following things:
Power pad -to supply power to entire chip
power rings -run around the periphery of the die to supply power to standard cell 's core area, and individual hard mac
ros.Rings are put in higher level routing layers so that lower level are used for signal routing.Power rails-The horizontal wires are often referred as rails and vertical wires are called as straps.They run from end to end,crossing the entire die ,sections of the die.
Rails and straps are placed uniformly spaced array.Rails connect the power pins of the standard cell and extend to power rings,There they connect with vias.Widest trunks are put in higher level routing layers as with power rings.After the straps and trunks are inserted ,they are all well tied together using the vias and via stacks.
The uniformly spaced array of straps and rails may get modified to allow hard macro power rings , wiring keep out area and other restrictions.
Using lower level routing (typically metal1) ,rails are created only in standard cell placement areas that aren't already blocked by hard macros or wiring keep out.





There are two types of power planning and management.
1.core cell power
2.I/O cell power
In core cell power -VDD and VSS power rings are formed around the core and macro, depending on power requirement straps and trunks are created for the macros.
I/O cell power Power rings are formed around the I/O cell, and trunks are created connecting core power ring an power pad.
For Flattened design top to bottom approach is suitable..
For macros bottom to top approach is suitable.

Why power rings are made up of top metal layers?Why top metal layers should have low resistance and not lower metal layers?
R=rho*L/A;
Top metal layers have larger width than lower one.Lower metal layers are used for signal routing and to have adequate supply of power to all the components ,to decrease IR drop due to the resistance in the metal layers ,top metal layers are made wider,hence low resistance and are made up of copper .Lower metal layers are made up of aluminum.
Block Level floorplanning
Initial synthesis should be run to determine the total area of the cells in the block.
Determine area beyond the area of cells will depend on library,characteristics of design,technology.
It should be 70% utilization,An unusually high percentage of registers
or hard IP will increase this number; large numbers of multiplexers or other small, pin-dense cells will decrease it.
-Aspect Ratio
-Core Utilization
-shape of the block
-Location of pins
-power planning
Determine area beyond the area of cells will depend on library,characteristics of design,technology.
It should be 70% utilization,An unusually high percentage of registers
or hard IP will increase this number; large numbers of multiplexers or other small, pin-dense cells will decrease it.
-Aspect Ratio
-Core Utilization
-shape of the block
-Location of pins
-power planning
Design implementation style
There are two styles of implementing a design:
a)Hierarchical
b)Flat
For small,medium ASIC ,flattening the design is more suited .
For very large concurrent ASIC,partitioning the design into sub design ,hierarchical is preferred.
In Flat implementation style , area usage is better since there is no need to reserve space around each sub partition for power,ground resources for routing.Timing analysis efficient since the entire design can be analyzed at once rather at the later stage after it is assembled.It has lot of information to store in memory space and run time increases rapidly with design size.
It has only leaf cells.
Hierarchical implementation style has blocks and sub -blocks.
The problem is the components in critical path may be sitting in different partitions which make critical path longer and also timing closure difficult,so it is necessary to generate timing constraints so that all the critical components are in the same partition closer to each other..It is used when there is need for substantial amount of data computing.It is done when sub circuit can be designed individually.
In this style design can be partitioned logically or physically.
Logical partition :design is partitioned depending on the logical functions, their inter connectivity with partitions and sub circuits.Each partition is place and routed separately is placed as macro or block at the top level ASIC.
Physical partition:
design is partitioned during physical design activity.Partitions can be a group of sub circuits combined , or a large circuit partitioned into small sub circuits.
Partitions are formed by recursively partitioning rectangular area having the design using horizontal and vertical lines.
Physical partitioning helps in minimizing delay (since each cluster will be subjected to constraints) satisfying design requirements in a small number of sub circuits.Initially,
these partitions have undefined dimensions and fixed area (i.e. the total area
of cells or instance added to the partition) with their associated ports, or
terminals, assigned to their boundaries such that the connectivity among
them is minimized. In order to place these partitions, or blocks, at the chip
level, their dimensions as well as their port placement must be defined.
a)Hierarchical
b)Flat
For small,medium ASIC ,flattening the design is more suited .
For very large concurrent ASIC,partitioning the design into sub design ,hierarchical is preferred.
In Flat implementation style , area usage is better since there is no need to reserve space around each sub partition for power,ground resources for routing.Timing analysis efficient since the entire design can be analyzed at once rather at the later stage after it is assembled.It has lot of information to store in memory space and run time increases rapidly with design size.
It has only leaf cells.
Hierarchical implementation style has blocks and sub -blocks.
The problem is the components in critical path may be sitting in different partitions which make critical path longer and also timing closure difficult,so it is necessary to generate timing constraints so that all the critical components are in the same partition closer to each other..It is used when there is need for substantial amount of data computing.It is done when sub circuit can be designed individually.
In this style design can be partitioned logically or physically.
Logical partition :design is partitioned depending on the logical functions, their inter connectivity with partitions and sub circuits.Each partition is place and routed separately is placed as macro or block at the top level ASIC.
Physical partition:
design is partitioned during physical design activity.Partitions can be a group of sub circuits combined , or a large circuit partitioned into small sub circuits.
Partitions are formed by recursively partitioning rectangular area having the design using horizontal and vertical lines.
Physical partitioning helps in minimizing delay (since each cluster will be subjected to constraints) satisfying design requirements in a small number of sub circuits.Initially,
these partitions have undefined dimensions and fixed area (i.e. the total area
of cells or instance added to the partition) with their associated ports, or
terminals, assigned to their boundaries such that the connectivity among
them is minimized. In order to place these partitions, or blocks, at the chip
level, their dimensions as well as their port placement must be defined.
what parameters differentiate between block and chip design?
Chip design has I/O pads and block has pins.
Chip design uses all the metal layers available,block may not use all the metal layers.
Chip design generally rectangular in shape,block can be in rectangle or rectilinear shape.
Chip design requires several packaging,block design ends in a macro.
Chip design uses all the metal layers available,block may not use all the metal layers.
Chip design generally rectangular in shape,block can be in rectangle or rectilinear shape.
Chip design requires several packaging,block design ends in a macro.
Thursday, April 8, 2010
Chip level Floorplanning
1.Floor plan should be started with I/Os at the periphery (depending on package design).
2.The blocks in the chip which require special design needs should be accommodated.
For ex -PLL,analog block,blocks that require different voltage , block that work with double speed clock.Suppose a flash memory has high programming voltage input then it should be placed closer to the i/o pins.
3.If there are more than two or three larger blocks or other features which would make the present floor plan an impossible one , then business decisions can be taken whether to increase the die size ,which is financially viable with larger and expensive die?Or can it be solved by rearranging the i/os.If any of the larger blocks are soft IP or available as RTL code so that it might avoid going for a larger die by repartitioning the blocks into smaller ones.
4.RTL should be examined for logical models to break out into hierarchical physical elements.If there are multiple instances of hierarchical logical element then they should be grouped into one physical element.
5.It is easy to floor plan the same size blocks so small blocks should be grouped together.
6.Floor Planning can be completed by placing rest of the blocks in the remaining space available based on their i/o and power consumption.
7.It is better to avoid placing blocks that consume more power at the center of the chip.
2.The blocks in the chip which require special design needs should be accommodated.
For ex -PLL,analog block,blocks that require different voltage , block that work with double speed clock.Suppose a flash memory has high programming voltage input then it should be placed closer to the i/o pins.
3.If there are more than two or three larger blocks or other features which would make the present floor plan an impossible one , then business decisions can be taken whether to increase the die size ,which is financially viable with larger and expensive die?Or can it be solved by rearranging the i/os.If any of the larger blocks are soft IP or available as RTL code so that it might avoid going for a larger die by repartitioning the blocks into smaller ones.
4.RTL should be examined for logical models to break out into hierarchical physical elements.If there are multiple instances of hierarchical logical element then they should be grouped into one physical element.
5.It is easy to floor plan the same size blocks so small blocks should be grouped together.
6.Floor Planning can be completed by placing rest of the blocks in the remaining space available based on their i/o and power consumption.
7.It is better to avoid placing blocks that consume more power at the center of the chip.
why Design Planning?
Design planning is essential for following reasons:
1.Major design problems can be targeted well ahead in time.
2.You can come into a conclusion that whether the design would be viable or not.
3.In larger chips hierarchical designs timing closure is becoming difficult due to the long inter block path delay,might lead to unpredictable tape out schedule.
4.More time may be consumed in the design process.
The two useful strategies in design planning are:
1.Floor planning
2. Power planning
1.Major design problems can be targeted well ahead in time.
2.You can come into a conclusion that whether the design would be viable or not.
3.In larger chips hierarchical designs timing closure is becoming difficult due to the long inter block path delay,might lead to unpredictable tape out schedule.
4.More time may be consumed in the design process.
The two useful strategies in design planning are:
1.Floor planning
2. Power planning
Monday, April 5, 2010
why i would like to become a physical design engineer?
So that i would get a chance to visit fab once in my lifetime,hoping for the best.
what is physical design?
The process of converting the design of a system into layout ,which can then be fabricated is known as physical design.
what do you require for doing physical design?
In other words what are the inputs to the physical design.
I had used tool synopsys Place and route tool ASTRO , what inputs does ASTRO need?
1.Gate level netlist.
2.Library.
3.Technology file.
4.TLU+.
5.Synopsys design constraints (SDC).
Let us see why do we require ?Where do we get it from?What is the use of it?
1.Gate level netlist:
why? of course we get the design from gate level netlist, it is description of the architecture and function of the design.It is mapped to standard cell library ,logic optimization was carried on the design in order to meet the constraints before generating this netlist during synthesis stage.
Where do we get it from?
In front end we generate the gate level netlist at the synthesis stage, where it combines the RTL code and design constraints and produces final gate level netlist which can be interpreted by P & R tool.
what is the use of it?
P & R tool consider this as "golden netlist" for all stages in physical design.Finally netlist generated after the Place and route is compared with the original netlist to verify that functionality is changed.
2.Standard cell library :
Why ?
a)Our design will be made up of logic functions like AND,OR etc,so standard cell library consists of list of logic functions -AND,OR etc.
b)We will be concerned about physical shapes of the logic functions.
Standard cell library provides
1.Abstract view describing I/O and location of metal pins.
2. Layout view describing the mask layers required in fabrication.
c) It also provides timing information such as
1.Cell delay
2.Input capacitance etc
This will help ASTRO to perform static timing analysis during physical design process.
d) Very importantly , standard cell (different logic function)in the library have fixed height but variable width depending upon their driving strength.
There are other libraries which may be required:
custom cells -I/O ,reusable Intellectual property (IP) cores ,RAM.
Where do we get it from?
It is usually supplied by the vendor or generated manually.
3.Technology file:
why?
Ultimately our design need to fabricated which will be 180nm,or 130nm,90nm process.
All the process specific data is given in tech file.
a) It has process design rules to help ASTRO do De.sign rule check.
b)Process specific information-mask layer info,metal layer info -their resistance and capacitance values so that ASTRO can estimate the delay introduced while routing.
via definitions used for connection of metal ,metal attributes-metal spacing ,width,color .
where do we get it from?
Fab .
4.TLU+
why?
Look up table -which contains resistance and capacitance values of metal layers for the ASTTRO to estimate the delay on particular route.
5.SDC.
Where do we get it from?
SDC is derived from design specification.
Why?
Design of system will have certain specifications to be met.After fabrication it should meet the specifications.Speed , area and power are the three features which we look into.
SDC provided in the front end is also given as input to the physical design in order to meet the constraints.
I had used tool synopsys Place and route tool ASTRO , what inputs does ASTRO need?
1.Gate level netlist.
2.Library.
3.Technology file.
4.TLU+.
5.Synopsys design constraints (SDC).
Let us see why do we require ?Where do we get it from?What is the use of it?
1.Gate level netlist:
why? of course we get the design from gate level netlist, it is description of the architecture and function of the design.It is mapped to standard cell library ,logic optimization was carried on the design in order to meet the constraints before generating this netlist during synthesis stage.
Where do we get it from?
In front end we generate the gate level netlist at the synthesis stage, where it combines the RTL code and design constraints and produces final gate level netlist which can be interpreted by P & R tool.
what is the use of it?
P & R tool consider this as "golden netlist" for all stages in physical design.Finally netlist generated after the Place and route is compared with the original netlist to verify that functionality is changed.
2.Standard cell library :
Why ?
a)Our design will be made up of logic functions like AND,OR etc,so standard cell library consists of list of logic functions -AND,OR etc.
b)We will be concerned about physical shapes of the logic functions.
Standard cell library provides
1.Abstract view describing I/O and location of metal pins.
2. Layout view describing the mask layers required in fabrication.
c) It also provides timing information such as
1.Cell delay
2.Input capacitance etc
This will help ASTRO to perform static timing analysis during physical design process.
d) Very importantly , standard cell (different logic function)in the library have fixed height but variable width depending upon their driving strength.
There are other libraries which may be required:
custom cells -I/O ,reusable Intellectual property (IP) cores ,RAM.
Where do we get it from?
It is usually supplied by the vendor or generated manually.
3.Technology file:
why?
Ultimately our design need to fabricated which will be 180nm,or 130nm,90nm process.
All the process specific data is given in tech file.
a) It has process design rules to help ASTRO do De.sign rule check.
b)Process specific information-mask layer info,metal layer info -their resistance and capacitance values so that ASTRO can estimate the delay introduced while routing.
via definitions used for connection of metal ,metal attributes-metal spacing ,width,color .
where do we get it from?
Fab .
4.TLU+
why?
Look up table -which contains resistance and capacitance values of metal layers for the ASTTRO to estimate the delay on particular route.
5.SDC.
Where do we get it from?
SDC is derived from design specification.
Why?
Design of system will have certain specifications to be met.After fabrication it should meet the specifications.Speed , area and power are the three features which we look into.
SDC provided in the front end is also given as input to the physical design in order to meet the constraints.
Thursday, April 1, 2010
Physical design flow.
Physical design flow or otherwise called as P & R flow.
Back end flow:
1.Floorplanning
2.Placement
3.Clock tree Synthesis
4.Routing
5.Parasitic Extraction
6.Backannotation to STA
7.Physical verification
8.Tapeout
Back end flow:
1.Floorplanning
2.Placement
3.Clock tree Synthesis
4.Routing
5.Parasitic Extraction
6.Backannotation to STA
7.Physical verification
8.Tapeout
What is a IC layout,mask design or IC mask layout?
Layout is the representation of an integrated circuit in terms of polygons ,geometric shapes which represent metal, oxide ,semiconductor layers that make up the component.
Layout is the top view of the component containing polygons of definite shape and size which represents semiconductor layers,metal,poly,Field oxide that make up the component.
Layout is the top view of the component containing polygons of definite shape and size which represents semiconductor layers,metal,poly,Field oxide that make up the component.
Subscribe to:
Comments (Atom)
