Management of the electrical, mechanical, and thermal environment is of growing importance in today's microprocessor based systems. This article will focus on accurate voltage and temperature monitoring as well as serving other hardware monitoring requirements. We will consider techniques for making accurate measurements and offer solutions for the range of hardware monitoring tasks in Pentium II-based systems as embodied in the ADM9240 chip.
As designers of ICs and systems seek to squeeze every last morsel of performance out of their designs, hardware monitoring and control have become an integral part of circuit board design objectives. Examples include the need to maintain accurate supply voltage levels and continually dispose of the heat generated by high-performance chips. The feedback portion of the control loop is accomplished by hardware monitoring -- the continuous measurement of critical system parameters, such as power supply voltages, internal temperatures, cooling fan performance, and other environmental factors. System performance can be maximized by closely controlling these parameters to remain within tight limits so as to maintain optimum operating conditions for the circuitry and avoid reduction of component life.
The latest generation of Intel Pentium-based products clearly demonstrates the importance of hardware monitoring and control. The newest Pentium II microprocessors running with clock rates in excess of 450 MHz require complex, highly regulated supply voltages -- a simple +5 or +3.3-V power supply no longer suffices. Instead, the Pentium core logic demands digitally adjustable voltages ranging from 1.3 to 3.5 V with 50-mV resolution. The actual voltage required depends on a number of factors. Controlling the voltage to this degree is not a trivial requirement when one considers the dynamically varying nature of the currents flowing through the resistance and inductance of the printed-circuit-board (PCB) traces. In addition to the requirements of the processor chip, a typical system will require at least 4 other regulated supplies, +12V, -12V, +5V, and +3.3V, for other functions, such as disk drives, video circuitry, PC cards etc. In order to maintain long-term reliability, all these voltages must be accurately monitored and controlled. Simple voltage-comparator types of circuits can be used to monitor fixed supplies; but monitoring variably loaded high-accuracy supplies requires more sophisticated solutions involving analog-to-digital conversion of voltage levels.
Besides tightly controlled operating voltages, many of today's systems also rely on thermal management approaches, such as the cooling provided by active heat sinking, convection cooling and forced air cooling to maintain reliable operating conditions. As ICs and systems become increasingly faster, more complex, and more dense, removing the excess heat and maintaining safe, reliable operating temperatures has become increasingly important. Temperature sensing, often coupled with fan-speed monitoring and control, are a couple of the techniques being employed today to ensure system reliability. By controlling fan speed, greater efficiency, reduced power dissipation and lowered noise levels are achieved.
Another important area that benefits by effective hardware monitoring is total cost of ownership (TCO). All the vital functions are monitored continuously and the results communicated to the systems management software. Impending failures can be detected, the sources identified, and corrective action taken -- or even system shutdown invoked -- before expensive damage occurs. For example, a clogged-up cooling fan may be detected by monitoring its speed. When the speed has decreased by 10 to 15% from its nominal speed, the software can note the problem and shut down the system before the deterioration has caused additional damage. Replacing a fan for $10 is more appealing than replacing a $1000 CPU or an even more-expensive system board.
Multi-channel voltage, temperature and fan-speed monitoring, together with programmable limit-setting for each of these parameters, goes a long way towards meeting the monitoring and control objective. We will discuss below some specific techniques for achieving this in Pentium-based systems. The demonstration will employ a new monitoring IC, ADM9240, from Analog Devices, to show approaches to optimizing the monitoring strategy.
As many as six voltage channels require monitoring in a typical system. Typical system supplies include a combination of some or all of the following: +12, -12, +5, +3.3, +2.7, and +2.5 V.
With so many (and different) supplies to keep track of, a multiplexed data-acquisition system with digital readout provides the greatest flexibility. An A/D converter-based solution facilitates software control and limit setting. Once converted to the digital domain, the data is easily manipulated, processed, and stored for historical reference.
Here are a few of the considerations that must be addressed so that the signals are accurately converted from analog to digital: Since the supplies being measured are usually generated using switched mode techniques, noise introduced by the switching can make their voltage difficult to monitor accurately. Switching glitches and load-dependent voltage excursions can be a source of spurious alarms. Thus it is important that the monitoring circuitry reject supply glitches and excursions but still be fast enough to detect when the supply is really out of tolerance. When the supply is indeed out of tolerance, it is important to report it so that the situation can be dealt with as quickly as possible to avoid errors in system performance or even damage.
The input circuitry of the ADM9240 (Figure 1) serves the dual role of
a. Filtering the input signals
b. Attenuating the input levels to scale them to the reference voltage of the
Having the attenuation network integrated on-chip provides an important advantage. Any errors it introduces due to inaccurate resistors or mismatch are already included in the specifications for the channel, so the user does not need to further increase the system error budget.
The input range on the ADM9240 is also biased so that nominal input voltage levels correspond to ¾ full-scale on the ADC (Figure 2). This scaling provides a range of from +25% overvoltage to total failure. Having most of the dynamic range at the lower end takes into account the majority of cases of error and also allows greater flexibility, since it is possible to monitor lower voltage supplies (specified at less than the standard levels listed), but with some reduction of accuracy.
Monitoring the Core Voltage
Besides monitoring the fixed supplies, Pentium II-based systems also require accurate monitoring of the processor core voltage, VCCP. Today's Pentium IIs (P2) use a 5-bit VID (voltage identification) code (up from 4 bits on previous-generation products) Depending on the VID code provided by the P2, the core voltage can be set anywhere between 1.3 V and 3.5 V.
VID Code Table
The voltage monitoring requirements discussed earlier apply also to the VCCP supply, but the tolerances are much tighter. The A/D converter input range for monitoring this is set at 0 V to 3.6 V with ¾ full scale at 2.7 V. This provides sufficient dynamic range and accuracy to accommodate other processor core voltages, even beyond P2 requirements.
In dual processor systems, which may employ different processor core voltages, the ADM9240 makes available a second multiplexed input channel (VCCP2).
Monitoring Negative Voltages
Negative voltages can be monitored on positive input channels by inverting the signal's polarity. But this may not be cost effective -- it requires an inverting op amp and wastes chip "real estate". A lower cost scheme, using positive bias and an inverted interpretation of the range, can also be utilized. This is illustrated in Figure 3: Resistor R2 is biased up to +5 V; and the upper and lower limits (as well as the overvoltage (75%) and undervoltage (25%) ranges) will be transposed. Since the offset voltage will be dependent on the +5-V reference level, either an accurate +5-V reference should be used or -- if the 5V supply itself is used -- this input should be measured first and the -12-V supply's limits set accordingly.
Temperature monitoring enhances reliability as well as allowing the efficiency of a close approach to maximum performance. It can also serve to protect the system against overheating if the cooling system fails completely or deteriorates to the point of inadequacy.
Silicon sensors are becoming increasingly important as temperature transducers in electronic systems because they are linear, accurate, cheap, reliable; and can be incorporated on the same IC as other analog or digital functions. They take advantage of the relationship between base-emitter voltage (VBE) and current density (current/emitter area) in silicon bipolar junction transistors to generate a voltage proportional to absolute temperature (PTAT). If currents in a fixed ratio, r, flow through two identical transistors (or if equal currents flow through one transistor and a set of r identical paralleled transistors) the differential VBE is PTAT. Figure 4 is a circuit that illustrates the principle. A1 and A2 are emitter areas, IS is reverse saturation current, and k/q is the ratio of Boltzmann's constant to electron charge, about 86 µV/K.
On the ADM9240, on-chip temperature sensing uses an additional multiplexer channel (Figure 5). When cycling, the analog inputs and the temperature channel are each selected in turn by the multiplexer and converted into a digital quantity by the ADC.
Offsetting, scaling and data manipulation provide a twos-complement output. Although the theoretical temperature span is from minus 128ºC to +127ºC, practical device and package constraints limit it to about -40ºC to +125ºC.
The location of the temperature sensor is important to accurate temperature measurement. Ideally it should be in intimate physical contact with the object being measured. This is not always possible, especially when a single sensor is just one function on a multifunction IC where other considerations must be taken into account. If direct thermal contact is not possible, it is important to characterize the difference between the temperatures of the sensor and the desired measurement point. In this way, a known offset may be used to compensate for the temperature difference.
Fan Speed Measurement
Fan speed sensing provides an invaluable early warning signal of potential problems. Fans are a weak mechanical component in an otherwise highly reliable electronic system. While modern brushless fans are much more reliable than earlier brush types, they are still prone to mechanical wear and tear. Bearing wear and increased friction slow the fan's rotational speed, resulting in reduced air. If speed is continually monitored, the telltale signs that predict trouble can be picked up well before insufficient cooling causes serious problems.
Modern fans are available with tachometer outputs (usually two pulses per revolution), to facilitate speed monitoring. Rotational speed is ascertained by simply counting the number of pulses over a fixed period of time.
While this is the simplest possible scheme for speed monitoring, it is slow. For example, with a fan operating at less than 1000 RPM, several seconds would be needed to accumulate a reasonably large and accurate count.
The technique employed on the ADM9240 does not count the fan tacho output pulses directly. Instead it uses the tacho output as a gating signal for a high-frequency internal clock. By counting the number of gated pulses, the fan's period may be determined. The accumulated count is proportional to the fan's tacho period and inversely proportional to speed.
Specifically, an on-chip 22.5-kHz oscillator is gated into the input of an 8-bit counter for two periods of the fan tacho output, corresponding to the time for one revolution of the fan (Fig. 6).
To accommodate fans of different speed, a pre-scaler (divisor) may be added before the counter. Consider the following example using the ADM9240.
With a divisor of 2, a fan with two output pulses per revolution running at 4000 rpm, gives a count of 168. That is, at 4000 rpm, there are 8000 pulses per minute, 133.3 pulses per second; so the interval between pairs of pulses is 15 ms, giving a count of 0.015(22,500/2) = 168+.
As the fan slows down, the count increases to the counter's maximum count of 255, which occurs at 4000 (168.75/255) = 2647 rpm.
Interfacing Fan Tachometer Outputs to 5V/3V Logic
Because fans are generally powered from voltages higher than the logic/monitoring circuitry it is necessary to provide an interface that does not overstress the logic or forward bias some internal junctions . Voltage clamping using a resistor/Zener diode network provides a good solution (Figure 7). The Zener breakdown voltage should be chosen so that it is lower than the power supply voltage to the logic. With 5-V logic, a 4.0-V Zener is suitable.
Controlling the Fan Speed
If the fan moves more air at rated voltage and ambient temperature than is needed for adequate cooling, its speed may be controlled to reduce acoustic fan noise and power consumption while maintaining the temperature at a safe level.
The simplest form of control is linear adjustment of the supply voltage to the fan. For example the speed of a 12-V fan may be limited by adjusting the supply to voltages less than 12 V.
However, one must take into consideration that the fan may not start up reliably if the supply voltage is fixed at a low value. With a D/A converter to vary the speed (Figure 8), the fan can be started at a higher speed, then slowed down to the correct value. With a 12-V fan, the minimum reliable operating voltage may be as high as 6 or 7 V, allowing for a considerable range of adjustment.
The ADM9240's 8-bit DAC can be used for fan speed control. The 1.25-V output of the DAC will need an external amplification/current-boosting stage to drive the fan.
Control and Monitoring over a LAN network
Hardware monitoring and control may be further extended to operate over a network so that the health of an entire network of computers can be continually monitored. Intel's LAN Desk Client Manager (LDCM) is an example of network management software which can monitor and troubleshoot individual workstations over a network, as well as providing early warning to the systems administrator of potential future problems.
Many network problems occur as consequences of the installation of incompatible software -- or hardware -- by an inexperienced user. Both can be monitored remotely over the LAN. Chassis intrusion sensors can be used to detect unauthorized tampering with a system. Typical sensors include simple microswitches , reed switches, Hall-effect switches or even optical sensors. When the case is opened, the switch is toggled or the optical beam is broken.
The ADM9240 includes an input line, which can be connected to a chassis intrusion switch to alert the monitoring system. Generally the switch is wired to a latching circuit, using a flipflop or a thyristor. For this reason, the chassis intrusion latch must be resettable by the systems administrator. On the ADM9240, the chassis intrusion line can also be temporarily configured as an output line so that a Clear pulse may be sent to clear the latch.
Figure 9 shows a block diagram of the complete ADM9240. With its combination of voltage, temperature, fan, and chassis intrusion monitoring a better-controlled operating environment is available for the electronics. The benefits to the user are increased stability, reliability and reduced ownership costs.
Figure 10 illustrates a complete monitoring solution using the ADM9240. This circuit is suitable for Pentium II-type motherboards. All six power supplies are simultaneously monitored for either overvoltage or undervoltage conditions which would pose a threat to the electronics. The high and low limits are programmed over a 2-wire Systems Management Bus (SMBus). The master controller is generally a PIIX4 Southbridge chip but could also be a dedicated microcontroller. In addition to voltage monitoring, the circuit monitors the speed of a pair of cooling fans via J2 and J3. One of these fans (J3) is being speed controlled to limit acoustic noise using the DAC on the ADM9240, while the second fan runs continuously at full speed. Linear speed control provides a reliable, low-noise solution to maintaining low acoustic emissions. Unauthorized tampering with the system is detected and latched via an optical, mechanical or magnetic switch -- appropriately located to avoid tampering -- connected to J1. Note that the detection and latching circuitry is being powered from a backup battery so that monitoring continues even when the system is unplugged. When asserted, the latching logic may be cleared by the systems administrator, using a special command to the ADM9240.
The Next Generation
As this was written, the next generation hardware monitoring solution was in development; and it is just becoming available. The ADM1024 embodies all the principles described above. In addition, it includes thermal diode-monitoring (TDM) techniques. This revolutionary technique allows the temperature of the Pentium die itself to be constantly monitored via a diode-connected transistor on the Pentium II. By switching two different currents through the on-chip diode and measuring the minute change in the diode's forward voltage, the die temperature may be accurately ascertained by a multiplexed bandgap measurement similar in philosophy to that described in Figure 4.
The principal benefit of this scheme is that accurate temperature measurements are obtained without the problems of positioning an external sensor for intimate contact. Since the sensor is now right at the point of measurement, it can be highly accurate, and thermal lag is completely eliminated. The ADM1024 contains the switched current sources as well as filtering and amplification input stages. As with the ADM9240, an on board ADC converts the temperature measurement into a digital reading.
In addition to these TDM channels, the ADM1024 contains additional configuration registers to provide even greater flexibility. Input channels may be configured as needed to measure fan speed or voltage or thermal inputs. Other channels may be configured to monitor Voltage Identification Bits (VID) or indeed may be used as interrupt monitoring inputs.
Voltage, thermal, fan speed, chassis, VID monitoring combined with extra flexibility makes the ADM1024 suitable for use on a wide range of next generation motherboard designs whether it be desktop, server or workstation.