Black box diagnostics use advanced digital controllers, providing an opportunity to revolutionize failure analysis of customer returns by improving the accuracy of failure information and reducing turnaround time of failure diagnostics.
What is in this Article?
- A digital controller IC that minimizes field returns using black box tool and online diagnostics
- Black box contents
- Data detection and recovery
- EEPROM lifespan and data retention
Power supply companies can borrow a concept from the aircraft industry by using a black box that monitors operation and stores that data for review if there is a failure. This concept would aid failure analysis for field returns that can be costly in both time and money to the companies and their customers, and the added time pressure to diagnose and report a comprehensive failure analysis can further strain the vendor to customer relationship. Having the proper failure diagnostic tools to quickly debug and resolve the issue can mean the difference between the success and failure of the product. You can configure an ADP1055 advanced PMBus™ digital controller IC for isolated power supply systems to provide the in-circuit black box.
With online diagnostics and an in-circuit black box, problems can be mitigated and the problems result in a more robust design practice and system knowledge in the long run. The in-circuit black box features a data recorder of all the relevant and critical information of the power supply prior to a critical event or interrupt. Besides power supplies, you can easily apply this concept to other systems.
Black Box Operation
The black box feature of the ADP1055 can record to its EEPROM vital data about the faults that cause the system to shut down. The black box diagnostics tool has two essential functions: first, the first flag ID feature records the first instance of failure, such as from overcurrent, voltage, and temperature; and second, as the controller encounters such a fault, a snapshot of the telemetry is captured (as shown in Figure 1). This information is saved to the embedded nonvolatile EEPROM, where it can be retrieved later for debugging purposes. In the presence of multiple faults, the first flag ID that caused the system to shutdown is captured by the black box, along with all the telemetry information.
Since there are several parameters being measured in a digitally controlled power supply, the ADP1055 utilizes dedicated (not multiplexed) Σ-Δ ADCs that are averaged over time for each measurement such as voltage, current, and temperature, and to ensure that the accurate data is captured, the measured quantity is recorded into the black box at the moment of shutdown.
This black box feature is extremely helpful in troubleshooting a failed system during testing and evaluation. If a system is recalled for failure analysis, it is possible to read this information from the EEPROM to help investigate the root cause of the failure.
Several options are available for recording to the black box, which include:
- No recording, black box disabled
- Only record telemetry just before the final shutdown
- Record telemetry of final shutdown and all intermittent retry attempts (if the device is set to shut down and retry)
- Record telemetry of the final shutdown, all retry attempts, and normal power-down operations using the CTRL pin or the OPERATION command (as described by PMBus)
Black Box Contents
Two pages (Page A and Page B) of the EEPROM are dedicated to store the black box contents. This allows for a total of 16 records (each page is comprised of eight records with 64 bytes each). The two pages form a circular buffer for recording black box information with data that gets overwritten on every 16th record.
The EEPROM is a page erase memory, meaning an entire page must be erased before the page can be written to. Due to the page erase requirement of the EEPROM, after writing the eighth record of any page, the next page is automatically erased to allow for continuous black box recording.
Each time a record is written in the black box, the device increments the record number. Each black box write records the PMBus and manufacturer specific registers listed in Figure 1 and Figure 2.
A single black box record takes about 1.2 ms to program. However, there is an added page erase time that must be taken into consideration to ensure that the fault recording occurs successfully. In the ADP1055 device, eight records can be written per page so whenever the record number is a multiple of 8n − 1 (n > 0), a page erase operation is initiated on the other page. The erase operation takes an additional 32 ms to complete. Hence every (8n − 1) write requires a page erase as well, which brings the total recording time to 33.2 ms. The minimum delay time between each shutdown and retry cycle is recommended to be greater than the minimum black box programing time, which is 1.2 ms and can be extended to 33.2 ms in the worst-case scenario.
Figure 3 shows the timing of the write operation. Another consideration in successful black box recording is the loss of power supply voltage or VDD to the IC. The ADP1055 requires a constant VDD of 3.3 V for normal operation and black box operation. Typically in an isolated, dc-to-dc converter, an auxiliary or always on supply provides the power to the controller. In other situations, a holdup capacitor on the VDD pin can be used to maintain the voltage above the UVLO threshold.
Black Box Readback
Two dedicated manufacturer specific commands can be used to read back the contents of the black box data stored in the EEPROM. The READ_ BLACKBOX_CURR command is a block read command that returns the current record N (last record saved) with all related data, as defined in the Black Box Contents section. The READ_BLACKBOX_PREV command is a block read command that returns the data for the previous record N − 1 (next to last record saved). Because these commands are block read commands, the first byte received is called the BYTE_COUNT and indicates to the PMBus master how many more bytes to read.
It is recommended to use the ADP1055 GUI for viewing black box data, as it displays the entire black box contents in an easy to read, user accessible format.
The black box feature in the ADP1055 uses packet error checking (PEC) to ensure data validity. A PEC byte at the end of each black box record is specific to each record and is calculated using a cyclic redundancy check (CRC) 8 polynomial. In a write to EEPROM, the PEC byte is appended to the data and is the last valid byte of that record. In a read from EEPROM, the header block of each record is used to calculate an expected PEC code, and this internally calculated PEC code is compared to the received PEC byte. If the comparison fails, the PEC_ERR bit in the STATUS_CML register is set, and that record is discarded because the validity of the data has been compromised.
Data Detection and Recovery
The black box algorithm relies on sufficient time to save the black box data and/or perform a page erase operation to prep the black box for recording. In a situation where VDD collapsed before the minimum programming time has elapsed, there is a potential of corrupting the data in the EEPROM. In addition, if VDD collapses during an EEPROM erase operation, the data inside the black box may also be corrupted.
In such a case, the black box algorithm can detect that data corruption has occurred and attempt to take corrective actions to resume proper black box recording. Note that the black box does not attempt to correct the corrupted data, but instead disregards the corrupted record and resumes with recording at a different record. The description below details this scenario.
During VDD power-up, the header block of all records in the two pages of the EEPROM are read, and determines if the record is valid. A record is valid if it passes the following tests:
- The header block and the PEC bytes must not be all ones, as that is the initial data of each record following a page erase.
- The calculated PEC code (using the data from the header block) must match the received PEC byte.
- The record number must fall within the valid record range, but it must be greater than the current record number and less than the maximum record number.
If a record fails any of the above tests, that record is considered invalid and is discarded. If the record passes all tests, then the pointer to the last valid record number is updated. The information below shows two scenarios where a potential VDD collapse can cause corrupted data and its recovery process.
Scenario 1
In a scenario where VDD collapses before a record is completed, the PEC byte, which is the last byte written, will most likely be corrupted. During the scanning process on VDD power-up, this record will fail Test 2 and will be discarded.
During the scanning process on power-up, if a record on Page A fails Test 2, then Page B will be erased and the next record pointer will be the first record of Page B. For example,
Page A has:
0. Valid Rec_No_0
1. Valid Rec_No_1
2. Valid Rec_No_2
3. Valid Rec_No_3
4. Valid Rec_No_4
5. Valid Rec_No_5
6. Valid Rec_No_6
7. Valid Rec_No_7
Page B has:
8. Valid Rec_No_8
9. Valid Rec_No_9
10. Valid Rec_No_10
11. Valid Rec_No_11
12. Invalid Rec_No_12 (corrupted due to loss of VDD; will fail
Test 2 on power-up)
13. Empty
14. Empty
15. Empty
At the end of the scanning process,
- The last valid record is Rec_No_11 of Page B
- Invalid Rec_No_12 of Page B is discarded, and the PEC_ERR is set
- Page A will be erased, and the next record for storing is Rec_No_16 of Page A
- READ_BLACKBOX_CURR returns Rec_No_11
- READ_BLACKBOX_PREV returns Rec_No_10
Note that Rec_No 12-15 of Page B is lost, but that is acceptable to resume proper operation of the black box recording process.
Scenario 2
In the scenario where VDD collapses before a page erase is completed, you have the potential of data corruption on the entire page that is being erased. During the scanning process on VDD power-up, all the records of this page may fail Test 2 and may also fail Test 3, in which case the records will be discarded.
Page A has:
0. Valid Rec_No_0
1. Valid Rec_No_1
2. Valid Rec_No_2
3. Valid Rec_No_3
4. Valid Rec_No_4
5. Valid Rec_No_5
6. Valid Rec_No_6
7. Valid Rec_No_7 (black box recording was successful; however,
the page erase was corrupted due to loss of VDD)
Page B has:
8. Corrupted due to loss of VDD during page erase following Rec_No_7
9. Corrupted due to loss of VDD during page erase following Rec_No_7
10. Corrupted due to loss of VDD during page erase following Rec_No_7
11. Corrupted due to loss of VDD during page erase following Rec_No_7
12. Corrupted due to loss of VDD during page erase following Rec_No_7
13. Corrupted due to loss of VDD during page erase following Rec_No_7
14. Corrupted due to loss of VDD during page erase following Rec_No_7
15. Corrupted due to loss of VDD during page erase following Rec_No_7
At the end of the scanning process,
- The last valid record is Rec_No_7 of Page A
- Invalid Rec_No_8 through Rec_No_15 of Page B is discarded, and the PEC_ERR is set
- Page B will be erased, and the next record for storing is Rec_No_8 of Page B
- READ_BLACKBOX_CURR returns Rec_No_7
- READ_BLACKBOX_PREV returns Rec_No_6
Note that in this scenario, there is no loss of records, as the incomplete Page B erase operation was restarted on power-up.
EEPROM Lifespan and Data Retention
The EEPROM of the ADP1055 has been specifically designed with the knowledge that power supplies are made to operate in the field for long lifetimes. The EEPROM of the ADP1055 has a data retention of up to 15 years at 125°C. Also, during the lifetime of the power supply, there can be multiple writes to the EEPROM, which is also a limiting factor in data retention. To improve data reliability of the EEPROM following excessive erase program cycles, the ADP1055 limits the maximum number of fault records to either 158,000 (recommended when the ambient temperature of the ADP1055 is less than 85°C) or to 16,000 (when the ambient temperature of the ADP1055 is less than 125°C).
Following each black box recording to EEPROM, the current record number is incremented. On the occasion that a fault occurs and the current record number is greater than the maximum record number mentioned above, no additional black box recording is allowed because the EEPROM has reached its maximum allowed erase program cycles and any additional recording is unreliable. The MEM_ERR bit in the STATUS_CML register is set to indicate this condition.
Figure 4 describes how the ADP1055, an advanced digital dc-to-dc controller with a black box feature, is used in a typical application. It is suitable for topologies such as full bridge, phase shifted, and active clamp forward. It has several features such as redundant OVP, average and peak overcurrent protection, GPIOs with mini-FPGA, and an active snubber. The ADP1055 can also function as a multiphase controller. No additional hardware is required to configure the black box. There is no firmware involved, as the ADP1055 is FSM (Finite State Machine)-based with dedicated logic for ease of use so the user does not have to learn any new programming language.
The black box feature can also be effectively deployed during the manufacturing flow to detect failures in burn in and stress testing during product verification and the early stages of pilot production. The black box takes debugging of a power supply to the next level and provides a focused guide for troubleshooting a complicated system. This leads to fewer customer failures and improved reliability metrics, such as meantime between failures (MTBF), through elimination of design issues found by the black box recorder