Friday, April 23, 2010

standard cell delay

The propagation delay in a standard cell is given by average of two time intervals.


tp = (th+ tr)/2

Propagation delay is the time difference between approximately 50% of the input transition and 50% of the output transition.

If the input waveform changes from zero to supply voltage (VDD) or from supply
voltage (VDD) to zero value, then low-to-high and high-to-low propagation
delays can be expressed as
tplh = ClVdd/(beta)p(Vdd-|vtp|)2
tphl = clVdd/(beta)p(Vdd - |vtn|)2

to improve the propagation delay of a given standard cell, one could
Increase supply voltage,
Reduce threshold voltage,
Increase transistors gain factors, or
Reduce the load capacitance.

Reducing load capacitance and increasing supply voltage is outside the standard cell.
Reduction of threshold voltage depends on semiconductor foundry and is part of the standard cell characterization.The only available parameter to the circuit designer is to increase gain factor.

the length and width of the transistor are related to gain factor beta.

In Wp/Wn ratio determination, it is desired to set

betan = k betap.
Wp/Wn = 2/k

In the ideal situation, k is equal to 1. This means that for a CMOS inverter to
charge and discharge capacitive loads in the same amount of time, the
channel width of the PMOS transistor must be twice as large as the channel
width of the NMOS transistor.

Although increasing the value of Wp / Wn reduces the cell propagation
delay, it also increases the active area capacitance and gate capacitance. This
increase in capacitance adversely affects the gate speed. Therefore, circuit
designers must make a trade off in determining how large the transistors
should be such that their propagation delays are optimal.

Fast circuits consume more area than slow circuits.

Steps in Placement

Detach Scan chain and Scan chain reordering:

what is a scan chain?
Scan chain is one of the DFT strategies to improve chip's observability and controllability.It is collection of flip flops connected by scan chain.In the chain the output of the previous flip flop is
connected to the scan data input of the next flip flop.The data is sent to scan input and output of the logic operation is observed at the scan output.The nodes along the scan chain can be set to intended value 0 or 1 by the scan chain.The effect of these settings can be seen by shifting the data through out the scan chain, controlled and observed.

Physical wire connection between adjacent flip flops depends on the logical order of the chain.Logical proximity does not match with physical proximity.

Logical order is decided during logic synthesis stage through the random process or alphabetical order since physical locations are unknown at that time.
As a result if the original chains are retained, then routing,physical wire connections will not be optimized.

Best approach is to disconnect the scan chain before placement , so that the normal placement will not be disturbed by the connectivity of the scan chains.Then reorder the chain arrangement after the placement step after all the physical locations are fixed and known.The reordering based on physical location information improves overall routability and total connecting wire length.


All WLM (wire load models ) are removed before performing timing optimization and timing is calculated based on VR virtual route.Virtual route is the shortest Manhattan distance between two pins.

Manhattan right angle ruler for backend.It is also called as city-block distance. It is so named because it is the distance a car would drive in a city laid out in square blocks, like Manhattan (discounting the facts that in Manhattan there are one-way and oblique streets and that real streets only exist at the edges of blocks - there is no 3.14th Avenue). Any route from a corner to another one that is 3 blocks East and 6 blocks North, will cover at least 9 blocks.

The distance between the two points measured along axes at right angles.

Set Placement and timing options:
No cells under the preroute of the metal layer and under the via selected.
P & R tool prevents pins of standard cells from being placed under the metal layers you specify. This means that a standard is not be placed in a location when any pin of the cell overlaps with a preroute of the metal layer. For example, if M3 is selected, a standard cell will not be placed when any of its pins (regardless of the pin’s metal layer) overlaps with a preroute on M3.

Avoiding pin overlap with preroutes improves routability because there are less routing resources under preroutes due to the preroute and any vias and contacts along the preroutes.

Placement optimization:
PrePlace optimization:
  1. It generates the initial placement before optimizing the netlist to get the wiring information.
  2. It collapses non-critical buffer, reduces total cell area by downsizing so that the netlist is easier to place.
  3. High fanout nets contain significantly large number of buffers which can impact placement so,instead of minimizing the buffer , it rebuilds the HFN nets based on more accurate RC estimates.
  4. Performs quick logic synthesis.
In -Placement optimization:
  1. Places all the standard cells.
  2. It re-optimizes the logic based on virtual route.It does cell sizing,area recovery, gate duplication,buffer insertion,net splitting.Optimizes the gates for setup timing based on virtual route.
  3. incremental timing, and congestion-driven placement.

Post Placement optimization before CTS:
  1. Optimization is done with ideal clocks.
  2. It performs a more specific timing optimization of the netlist and the layout, including quick fixing of setup and hold violations and maximum transition and maximum capacitance violations by buffering the gates.
  3. It can do placement optimization based on global routing.
Post Placement optimization after CTS:

Effect of CTS:
  • Clock buffers are added.
  • Congestion may increase.
  • Non clock tree cells may have been moved to less ideal locations.
  • Can introduce new timing and max tran/cap violation.

  • Post-placement optimization after clock tree synthesis improves the timing results of your design with propagated clocks. It takes the clock tree into account so that the clock skew can be preserved. Post-placement optimization after clock tree synthesis has an option to perform congestion removal before running optimization
  • Perform logical and placement optimization to fix timing and max cap/tran violation.
  • Fixing hold time is recommended here.
  • Reduce congestion by removing unnecessary non-clock tree buffers.
Critical range optimization:

Worst negative slack corresponds to the path having maximum negative slack.
Total negative slack is the summation of all WNS per end point.

When TNS >> WNS, there might be sub critical path violations which are as good as the critical path violations.
Optimization during placement mainly works on the critical path of each clock domain and stops when it cannot further improve timing.
Critical range optimization works on sub -critical paths to reduce TNS and the total number of violation paths.

Iterate Post placement optimization and critical range optimization until remaining violations are acceptably small and if further improvement are seen.


Timing Driven Placement
P & R tool requires timing constraints to understand design timing objectives.The most standard
timing constraints on most designs include arrival times of the input signals to the design as well as the required arrival time at the output of the chip.This also include clock period of the system clock and as well as other clocks if the design contains multiple clock domains.

The timing information that tool uses is based upon the standard cell delays and wire connected to all these cells in the design.
The standard cell delays are the function of input transition and as well as the summation of capacitance of output wire and input gates of all the logic connected to the output wire.
Wire delays are the function of resistance of the metal layers and summation of wire capacitance and input gate capacitance.


Timing driven placement is the process of placing the standard cells in the rows of the core area using timing constraints as the guidelines as to where to place the cells.


Evaluation of Placement:
After performing automatic placement, evaluate the placement and make changes to improve the routability of the design.
During placement, tool calculates routing congestion, based on the availability of wire tracks inside the global routing cells. Using these routing congestion calculations, It produces a placement congestion map that shows the estimated amount of routing congestion within the design.