Measuring Temperatures on Computer Chips with Speed and Accuracy-A New Approach Using Silicon Sensors and Off-Chip Processing

Silicon Temperature Sensors

Silicon sensors are becoming increasingly important transducers in electronic systems. As systems become more complex, more compact, and denser-and run faster and hotter-it becomes increasingly vital to monitor critical temperatures. Traditional sensor techniques, such as thermocouples, thermistors, and RTDs, are now being displaced by silicon sensors, with their ease of integration and use. Many traditional sensor types are inherently nonlinear and require signal conditioning (i.e., compensation, look up tables, excitation circuitry, etc.) to accurately convert temperature into an electrically measurable quantity such as voltage or current.*

*Practical examples of signal-conditioning considerations, designs, and circuits can be seen in the seminar notes, Practical Design Techniques for Power and Thermal Management, Section 6.

Silicon sensors, on the other hand, are linear, accurate, low-cost, and can be integrated on the same IC as amplifiers and any other required processing functions. The actual sensing element in a silicon sensor is a simple P-N transistor junction. The voltage across a regular P-N transistor junction has an inherent temperature dependency of about 2 mV/°C and this fact may be used to develop a temperature measuring system. Silicon sensors are new by sensor-industry standards but are very mature by semiconductor-industry standards. For example, the AD590 1-µA/°C IC sensor was introduced more than 20 years ago!†

Analog Dialogue 12-1, 1978, pp. 3-5. See also M. P. Timko, "A two-terminal IC temperature transducer," IEEE Journal of Solid-State Circuits, vol. SC-11, 1976, pp. 784-788.

In order to separate the variation with temperature from the effect of current level and remove offsets, the most common technique is to base the measurement on two transistor junctions. By operating two identical transistors at a constant ratio of collector current densities, r, the difference in their base-emitter voltages will be (kT/q) (ln r). Since both k (Boltzmann's constant) and q (electron charge) are physical constants, the resulting voltage is directly proportional to absolute temperature, T (PTAT).

Temperature-monitoring products available from ADI, which incorporate this type of temperature sensing, usually integrate it with additional functionality. For example, it may be combined with analog-to-digital conversion circuitry. Figure 1 shows a block diagram of the AD7415; it contains temperature-sensing circuitry, an amplifier, and an ADC, along with a two-wire I2C interface. Other products, such as the ADM9240, which was featured in Analog Dialogue 33-1, include many additional functions, such as voltage monitoring and fan-speed monitoring, as well as on-chip limit setting.

Figure 1
Figure 1. Temperature sensor plus ADC.

Sensor Mounting Considerations–THE PROBLEM

While a silicon sensor is a very accurate temperature transducer, it is important to remember that it will only measure its own junction temperature, and thus its own die temperature. This is fine if one is simply interested in monitoring approximate zone temperatures within an enclosure or environmental temperature (and convection and conduction conditions are adequate). If, however, one must monitor the local temperature within a heat source or a computer chip, such as a Pentium® III CPU, or a high performance graphics chip, much more is at stake and the situation is not quite so straightforward. In order to get an accurate measure of the temperature of the heat source, the sensor must be in close proximity to the source itself. The accumulation of thermal resistances between the sensor and heat source will lead to measurement errors and uncertainties. The physical mounting problems required to be solved in order to get accurate temperature measurement may be simply impossible to deal with in many situations, resulting in derating and suboptimal performance.

For example, if IC temperature sensors had to be mounted on the circuit board, it is very unlikely that they could be in close physical contact with the "hot spot" of the object being monitored. It might be possible to work around the mounting difficulties with tiny two and three terminal devices, but with multi-lead packages it is virtually impossible.

Offset Calibration?

One approach might be to add a well-chosen offset to account for the temperature difference between the sensor and the heat source. The required offset can be derived during system characterization by comparing the displayed temperature with the actual temperature. Since the offset needed at room temperature will almost certainly be different from the offset required at elevated temperatures, a simple offset register is generally not enough. A lookup-table approach is one way of working around the problem. This approach might be effective, albeit unwieldy, for a fixed system, but the look up tables would differ when the system configuration changes.

For example, consider trying to measure the temperature of a CPU on a motherboard by placing a temperature sensor as closely as possible to the CPU. The sensor will most likely be at least 1 cm away from the heat source (CPU). The thermal resistance of the path through the board material between the two is very high, and air currents (i.e., convection or fan-driven flow if directed from source toward sensor) is the principal way in which the heat is transferred to the sensing junction. Convection currents alone are easily disturbed-for example, by adding another card to the system-leading to measurement inaccuracies. Fan-driven flow has its advantages for cooling, but will distort the convection currents and result in wildly inaccurate measurements of local temperature within the CPU chip.

The ideal solution is to integrate the sensor and associated conditioning circuitry on the CPU die itself. This integration would guarantee accurate temperature sensing since the sensor would be in close physical as well as thermal proximity to the heat source. Unfortunately the technology used to build today's high performance CPUs is not compatible with the technology used to build highly accurate temperature sensors and associated amplification circuitry.

The Answer: Sense the CPU Directly

The best approach to the problem is to provide P-N-junction sensing on the CPU die near the hot spot(s)-and then use an external conditioning IC to do the rest. This approach allows CPU temperatures to be directly measured without any uncertainty. The newest Intel Pentium® II and Pentium® III CPU's contain an on-chip thermal diode monitor (TDM) to facilitate this. On Slot 1 CPUs, two pins, THERMDP and THERMDN, provide access to the on-chip diode. To supply signal conditioning and convert the minute voltage changes into robust measurable results in digital form, a new generation of products from Analog Devices, the ADM102x series, supply the required conditioning and conversion circuitry.

TDM to Digital–A New Approach

The trick is now to translate the minute voltage changes due to temperature into really measurable signals and digitize them. The low signal levels would by themselves pose a difficult instrumentation problem, but it is further complicated by the noisy environment that the circuit must operate in. Picture if you will the electrical environment within a digital computer chip! The signal could very easily be swamped by the noise making it impossible to recover the signal. Also, manufacturing variations from unit to unit cause differences in junction response. We will now discuss how the technique works, how it compares with more traditional techniques and how to extract optimum performance from it.

The Solution

First, for a given current level, the absolute forward voltage drop of the diode isn't very well controlled in the CPU manufacturing process. Also, because voltage depends on absolute (i.e., Kelvin) temperature, the forward voltage value is many times larger than the change in its value per 1°C temperature change. Therefore, the most important requirement is to remove the absolute value of the diode voltage from the equation before any amplification can occur.

Individual device calibration is an option but not a practical one. Rather, a technique comparable to the two-transistor approach described above is used, except that the ratio of current density (current per unit area), r, depends on changing the current in the same diode instead of using the differing areas of two diodes with equal currents. This technique, called "delta-VBE calibration," forces two different levels of current through the thermal diode junction and measures the change in forward voltage. The first current may be considered as a calibration current and the VBE forward voltage value of the junction is ascertained. The VBE value is then measured again with a second current. The change or difference in VBE is proportional to absolute temperature. It is independent of the junction's forward voltage or other differences due to manufacturing variations.

Figure 2
Figure 2. TDM monitoring.

VBE = (kT/q) ln (Ic/Is)

Since IS is a property of the transistor and is unchanged for either current,

VBE1-VBE2 = ΔVBE = (kT/q) ln (I/NI) = (kT/q) ln (1/N)

Since N, k, and q are all known constants,

T = (Constant) (ΔVBE)

The output from the ΔVBE sensor varies at approximately 2.2 mV/°C. This signal requires conditioning and amplification The actual ΔVBE sensor is shown as a substrate transistor since this would be the case in practice for an on-chip junction. It could equally well be a discrete transistor. If a discrete transistor is used, the collector will not be grounded and should be linked to the base. To help prevent ground noise from interfering with the measurement, the more negative terminal of the sensor is not referenced to ground, but is biased above ground by an internal diode at the D- input. To measure ΔVBE, the sensor is switched between operating currents of I and N × I.

Filtering and Amplifying

The resulting waveform is passed through a 65-kHz low-pass filter to remove noise, then to a chopper-stabilized amplifier that performs the functions of amplification and rectification of the waveform to produce a dc voltage proportional to ΔVBE. This voltage is measured by the ADC to give a temperature output in 8-bit twos-complement format. To further reduce the effects of noise, 16 measurements are made, the results are averaged, and the average result is then provided at the output.

Figure 3
Figure 3. Signal conditioning.

So how good is the TDM approach in practice?

It is interesting to compare a TDM measurement with a more traditional thermistor approach. The following example compares results obtained using a thermistor and a TDM channel to measure the temperature of a 333-MHz Pentium® II in a Slot 1 cartridge. The thermistor is in direct physical contact with the cartridge heatsink. The TDM channel uses the on-chip diode and an ADM1021 (with circuitry similar to those discussed above) to provide the signal conditioning.

Besides being more accurate, it does not suffer from thermal lag. While contact with the heat sink is superior to earlier approaches where contact was not even made, it still exhibits major disadvantages. As may be seen in Figure 4c the thermistor lag completely misses many of the thermal events due to it's slow response time. Figure 4a shows a power up event while Figure 4b shows a power-down. Errors in excess of 30°C (representing cartridge temperature instead of actual chip temperature) are evident.

Even more significant is Figure 4c where the CPU is cycled in and out of Suspend mode. The thermistor completely misses these 20°C thermal events. It's easy to see how it would fail to protect a system in the event of a rapid temperature rise due to a fault condition. All plots also demonstrate the offset error (due to package temperature drop) between the TDM and the thermistor as the temperature increases. The offset can be taken care of by system calibration but there is nothing one can do to compensate for thermal lag. Indeed if additional system cooling were employed, the errors between the TDM and the thermistor would be greater still.

Figure 4a
Figure 4a. Start up.
Figure 4b
Figure 4b. Shut down.
Figure 4c
Figure 4c. Suspend/wake up cycling.

Using Discrete transistors for TDM

So the TDM approach is very effective if the sensing diode is integrated onto the die of the CPU whose temperature is being measured. What about using this approach to measure temperatures where there isn't an on-chip TDM, or to measure the temperature of heat sources other than IC's? The ΔVBE TDM approach may also be used with stand-alone discrete transistors. Any NPN or PNP general-purpose transistor, such as 2N3904 or 2N3906, may be used as a remote sensor. With a discrete transistor, connect the base to the collector to form a 2-terminal device. Transistors are good temperature sensors since they have low thermal mass and are easily mounted.

Figure 5
Figure 5. Transistor TDM.

If the transistor sensing junction is a significant distance away (>6 feet) and if it is used in noisy environments, the best method to preserve signal integrity and prevent interference is to use twisted shielded cable. The maximum cable length is limited by cable capacitance and by series resistance. Capacitance between D+ and D- causes settling time errors since the switched current needs to have fully settled before the conversion is made.

Figure 6
Figure 6. Transistor TDM plus ADM1021 for remote sensing.

TDM in Noisy Environments

It is very important to observe some guidelines when utilizing thermal diode sensing techniques, especially in noisy environments. The PC environment is inherently noisy and appears to be getting noisier as PCs get faster. As CPU speeds hurl towards 1 GHz, EMC noise becomes more of a headache. High speed graphics ports (AGP), high speed random-access memory, and high speed disk access mean that there are many opportunities and paths for noise to couple into sensitive analog circuitry. TDM is a very sensitive approach. The circuitry that drives the thermal diode consists of high-impedance, low-level current sources. To prevent interference, the TDM lines should be kept as short as possible and shielded if there are high frequency noise sources in the vicinity.

Additional features on the ADM1021

In addition to the TDM channel, the ADM1021 includes an on-chip transistor for local or environmental temperature monitoring. A programmable conversion rate (from 1 conversion per 16 seconds up to 8 conversions per second) facilitates high update rates, where rapid temperature changes must be recorded. If fast updates are not required, then lower update rates may be used to conserve power.

The ADM1021 also contains four limit registers to store local and remote, high and low temperature limits. A functional block diagram is shown in Figure 7 A typical system configuration using the ADM1021 is illustrated in Figure 8.

Figure 7
Figure 7. ADM1021 functional block diagram.
Figure 8
Figure 8. System Architecture.

TDM CPU monitoring facilitates optimum cooling

A thermal profile of a real-life notebook computer is illustrated in Figure 9. This shows how the temperature ramps up on the chip and in the computer environment once the monitoring utility starts running (after power-on and Windows boot). It is interesting to note how hot both the CPU and the internal environment are running. The BIOS sets a CPU temperature limit of 92°C. When this temperature is reached, the fan is switched on and remains on until the temperature drops below 82°C. Because both high and low limits are programmed into the ADM1021, the fan can control the CPU temperature within a band between 82°C and 92°C. The temperature will oscillate between these two levels. If the fan fails, a higher temperature limit will shut the system down if it is breached. It is also interesting to note that the environment temperature also reaches very high temperatures, approximately 10ºC below the CPU temperature, within a notebook housing.

This example illustrates the importance of TDM techniques in CPU temperature management. Prior to this technique, it would have been impossible to extract such high performance levels from the CPU without overheating-or unduly wasting battery life by continuously cooling the system.

Figure 9
Figure 9. Temperature profile from a typical notebook computer.


Matt Smith