Understanding the DSSHA1 Synthesizable SHA-1 Coprocessor
Abstract
Challenge-and-response authentication requires a MAC originator and a MAC recipient to compute a message authentication code based on a hidden secret and public data. The originator is typically a SHA-1 authenticator or a protected memory with SHA-1 engine. The MAC recipient is the application's host processor. This application note describes the DSSHA1 synthesizable SHA-1 coprocessor, which can be implemented in an application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) as alternative to the DS2460 SHA-1 coprocessor or a microprocessor-based implementation.
Introduction
Challenge-and-response authentication is based on the computation of message authentication codes (MACs). The method involves two entities, the MAC originator and MAC recipient, which share a hidden secret. To prove the authenticity of the MAC originator, the MAC recipient generates a random number and sends it as a challenge to the originator. The MAC originator must then compute a new MAC based on the secret, message, and challenge and send it back to the recipient. If the originator proves capable of generating a valid MAC for any challenge, it is very certain that it knows the secret and therefore can be considered authentic. A thoroughly scrutinized and internationally certified algorithm to compute message authentication codes is SHA-1, which was developed by the National Institute of Standards and Technology (NIST).
Analog Devices manufactures a series of authentication devices that employ the SHA-1 algorithm. Tutorial 3675, "Protecting the R&D Investment with Secure Authentication," explains the Analog Devices authentication solution in the form of secure memories and the DS2460 SHA-1 coprocessor. The DSSHA1 memory-mapped SHA-1 coprocessor allows the computational capabilities of the DS2460 to be implemented in an application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA), eliminating the need to develop software to perform the complex SHA-1 computation. The MAC computed by the DSSHA1 or DS2460 applies only to Analog Devices SHA-1 devices.
Description
The DSSHA1 is a synthesizable, memory-mapped SHA-1 coprocessor that includes a 64-byte general-purpose RAM that stores the 64-byte message. The input message is used to compute the SHA-1 MAC. The DSSHA1 input and output port signals are designed to internally connect to a 32-bit bus. By a positive comparison result, authentication security is achieved between a host system and slave accessories.
Figure 1 shows the DSSHA1 block diagram. Table 1 describes the signals that connect the DSSHA1 to the host system. Using the data bus input, address, and control signals, the 64-byte SHA-1 message is inserted into the RAM. Triggering the input signal RUN_SHA to logic-high starts the SHA-1 computation. The output BUSY signal indicates an occurring computation. Upon completion of the BUSY signal, the result registers contain the 20-byte message digest for reading.
Name | Type* | Function |
CLK | I | Clock. On the positive edge, data on signals DATAI[31:0] and DATAO[31:0] are clocked in and out. |
RSTZ | I | Active-low reset. The RSTZ signal is evaluated at each interval of the positive edge of the CLK signal. It is necessary to do a reset before every load of a 512-bit message and MAC computation. |
CSZ | I | Active-low chip select. This signal must be low for all accesses to registers and memory. |
WRZ | I | Active-low write enable. This signal must be low during all write operations. |
ADDR[4:0] | I | Address[4:0]. These five signals are the address signals. |
DATAI[31:0] | I | Data bus input. These 32 signals are the input data bus. |
DATAO[31:0] | O | Data bus output. These 32 signals are the output data bus. |
BUSY | O | Busy. When high, this signal indicates that the SHA-1 coprocessor is busy performing a computation. There should be no data accesses while this signal is high. |
RUN_SHA | I | Run SHA-1. This signal must only be one clock period wide and initiates a SHA-1 computation upon the positive edge of the CLK signal. |
*I = input, O = output. |
Detailed Register Description
The DSSHA1 memory consists of twenty-one 32-bit words, beginning with the input buffer and ending with the registers to read the MAC result (Table 2).
Address (Hex) | Type | Access | Function |
00h to 0Fh | RAM | Read/write | 64-byte buffer input. This is the 512-bit input block that usually includes the 64-bit slave device secret and a 448-bit input message consisting of a random challenge and various data. |
10h to 14h | Registers | Read | 20-byte result. This is the MAC for comparison to the received MAC of the SHA-1 slave device. |
Input Buffer (00h to 0Fh)
The SHA-1 engine receives the data to be processed through the 64-byte input buffer. This buffer holds the 512-bit message that the SHA-1 engine processes to generate a MAC. Secret and other message data are contained in the input buffer. Security of the secret is a task left for the designer. The format of the data is defined by each Analog Devices SHA-1 slave device.
MAC Result (10h to 14h)
A 20-byte MAC of a SHA-1 computation resides in the MAC result address space.
Device Operation
The typical use of the DSSHA1 in an application involves writing, reading, and running the SHA-1 engine, and using the MAC result to externally compare this block to the MAC of a 1-Wire SHA-1 device. All these activities are controlled through the 32-bit interface with separate data input and output lines to easily connect to the internal bus inside an ASIC or FPGA. The SHA-1 Engine Control section below explains the data input and output format and how to instruct the SHA-1 engine to perform a MAC computation.
SHA-1 Engine Control
The DSSHA1 performs the job of a SHA-1 engine. The input buffer accepts the message. The MAC output buffer receives the resultant SHA-1 computation. Figure 2 illustrates data flow into and out of the SHA-1 engine.
Applying a power reset initiates the first step of using the SHA-1 engine. Next, a message is loaded into the input buffer in the format of Table 3. Upon completion of a message load, the user pulses the RUN_SHA input signal. For the duration of the SHA-1 computation, the BUSY signal goes and remains logic-high. A BUSY signal goes logic-low again when the SHA-1 computation completes. All five of the MRR registers (see Table 4) contain the MAC result for reading.
Figure 2. Data flow diagram.
M0[31:24] = (IB + 0) | M0[23:16] = (IB + 1) | M0[15:8] = (IB + 2) | M0[7:0] = (IB + 3) |
M1[31:24] = (IB + 4) | M1[23:16] = (IB + 5) | M1[15:8] = (IB + 6) | M1[7:0] = (IB + 7) |
M2[31:24] = (IB + 8) | M2[23:16] = (IB + 9) | M2[15:8] = (IB + 10) | M2[7:0] = (IB + 11) |
M3[31:24] = (IB + 12) | M3[23:16] = (IB + 13) | M3[15:8] = (IB + 14) | M3[7:0] = (IB + 15) |
M4[31:24] = (IB + 16) | M4[23:16] = (IB + 17) | M4[15:8] = (IB + 18) | M4[7:0] = (IB + 19) |
M5[31:24] = (IB + 20) | M5[23:16] = (IB + 21) | M5[15:8] = (IB + 22) | M5[7:0] = (IB + 23) |
M6[31:24] = (IB + 24) | M6[23:16] = (IB + 25) | M6[15:8] = (IB + 26) | M6[7:0] = (IB + 27) |
M7[31:24] = (IB + 28) | M7[23:16] = (IB + 29) | M7[15:8] = (IB + 30) | M7[7:0] = (IB + 31) |
M8[31:24] = (IB + 32) | M8[23:16] = (IB + 33) | M8[15:8] = (IB + 34) | M8[7:0] = (IB + 35) |
M9[31:24] = (IB + 36) | M9[23:16] = (IB + 37) | M9[15:8] = (IB + 38) | M9[7:0] = (IB + 39) |
M10[31:24] = (IB + 40) | M10[23:16] = (IB + 41) | M10[15:8] = (IB + 42) | M10[7:0] = (IB + 43) |
M11[31:24] = (IB + 44) | M11[23:16] = (IB + 45) | M11[15:8] = (IB + 46) | M11[7:0] = (IB + 47) |
M12[31:24] = (IB + 48) | M12[23:16] = (IB + 49) | M12[15:8] = (IB + 50) | M12[7:0] = (IB + 51) |
M13[31:24] = (IB + 52) | M13[23:16] = (IB + 53) | M13[15:8] = (IB + 54) | M13[7:0] = (IB + 55) |
M14[31:24] = (IB + 56) | M14[23:16] = (IB + 57) | M14[15:8] = (IB + 58) | M14[7:0] = (IB + 59) |
M15[31:24] = (IB + 60) | M15[23:16] = (IB + 61) | M15[15:8] = (IB + 62) | M15[7:0] = (IB + 63) |
Mt = input buffer of SHA-1 engine; 0 ≤ t ≤ 15; 32-bit words with a start address at 00h and ending address at 0Fh. IB = input buffer. |
Table 4 shows how the five 32-bit variables A to E that hold the MAC are mapped to the respective locations.
Address (Hex) | MAC Result Registers (MRR) |
10h | MRR[31:0] = A[31:0] (least significant) |
11h | MRR[31:0] = B[31:0] |
12h | MRR[31:0] = C[31:0] |
13h | MRR[31:0] = D[31:0] |
14h | MRR[31:0] = E[31:0] (most significant) |
MAC Comparison
The master has the requirement to test the slave MAC against the DSSHA1 MAC. Authenticity is verified if the slave MAC and the DSSHA1 MAC are equal in value. A fraud is verified if the slave MAC and the DSSHA1 MAC are different.
Functional Verification
To test the DSSHA1, the test message "abc" can verify functionality. This test message with proper padding can be translated into an input block of:
W[0] = 61626380 | W[8] = 00000000 |
W[1] = 00000000 | W[9] = 00000000 |
W[2] = 00000000 | W[10] = 00000000 |
W[3] = 00000000 | W[11] = 00000000 |
W[4] = 00000000 | W[12] = 00000000 |
W[5] = 00000000 | W[13] = 00000000 |
W[6] = 00000000 | W[14] = 00000000 |
W[7] = 00000000 | W[15] = 00000018 |
Using the format of Table 3, the input block of this test message will be the values in Table 5.
M0[31:24] = 61h | M0[23:16] = 62h | M0[15:8] = 63h | M0[7:0] = 80h |
M1[31:24] = 00h | M1[23:16] = 00h | M1[15:8] = 00h | M1[7:0] = 00h |
M2[31:24] = 00h | M2[23:16] = 00h | M2[15:8] = 00h | M2[7:0] = 00h |
M3[31:24] = 00h | M3[23:16] = 00h | M3[15:8] = 00h | M3[7:0] = 00h |
M4[31:24] = 00h | M4[23:16] = 00h | M4[15:8] = 00h | M4[7:0] = 00h |
M5[31:24] = 00h | M5[23:16] = 00h | M5[15:8] = 00h | M5[7:0] = 00h |
M6[31:24] = 00h | M6[23:16] = 00h | M6[15:8] = 00h | M6[7:0] = 00h |
M7[31:24] = 00h | M7[23:16] = 00h | M7[15:8] = 00h | M7[7:0] = 00h |
M8[31:24] = 00h | M8[23:16] = 00h | M8[15:8] = 00h | M8[7:0] = 00h |
M9[31:24] = 00h | M9[23:16] = 00h | M9[15:8] = 00h | M9[7:0] = 00h |
M10[31:24] = 00h | M10[23:16] = 00h | M10[15:8] = 00h | M10[7:0] = 00h |
M11[31:24] = 00h | M11[23:16] = 00h | M11[15:8] = 00h | M11[7:0] = 00h |
M12[31:24] = 00h | M12[23:16] = 00h | M12[15:8] = 00h | M12[7:0] = 00h |
M13[31:24] = 00h | M13[23:16] = 00h | M13[15:8] = 00h | M13[7:0] = 00h |
M14[31:24] = 00h | M14[23:16] = 00h | M14[15:8] = 00h | M14[7:0] = 00h |
M15[31:24] = 00h | M15[23:16] = 00h | M15[15:8] = 00h | M15[7:0] = 18h |
Mt = input buffer of SHA-1 engine; 0 ≤ t ≤ 15; 32-bit words with a start address at 00h and ending address at 0Fh. |
The output of the computation of this block is:
A[31:0] = 42541B35
B[31:0] = 5738D5E1
C[31:0] = 21834873
D[31:0] = 681E6DF6
E[31:0] = D8FDF6AD
The Analog Devices devices take these words as most significant word first and the inpidual bytes as least significant byte (LSB) first. So the byte level transmission sequence of the MAC would be:
AD F6 FD D8 | F6 6D 1E 68 | 73 48 83 21 | E1 D5 38 57 | 35 1B 54 42 |
Timing Specification
Figure 3 and Figure 4 show the timing diagram for writing to and reading from the DSSHA1. Table 6 shows delay values measured from 50% of supply to 50% of supply using an ARM TSMC CL018G (0.18µm generic process) 1.8V SAGE-X standard cells library, version 2004q3v1, at +25°C. The output signals are not loaded. Input signals are driven with a standard slew of 0.200ns from 10% to 90% of supply.
Parameter | Symbol | Min | Max | Units | ||
CLK cycle (Note 1) | tCYC | 12.500 | ns | |||
Chip select setup before rising edge of CLK (Note 1) | tCSS | 0.229 | ns | |||
Chip select hold after rising edge of CLK (Note 1) | tCSH | 0.000 | ns | |||
Address and data setup before rising edge of CLK (Note 1) | tAS | 0.229 | ns | |||
Address and data hold after rising edge of CLK (Note 1) | tAH | 0.000 | ns | |||
Active output time to DATAO valid (Notes 1, 2) | tAO | 0.984 | ns | |||
Deactivate DATAO[31:0] (Note 1) | tD | 0.984 | ns | |||
Note 1: These values depend upon the process used to realize the circuit. Values shown are for example purposes only and modeled using the ARM TSMC CL018G (0.18µm generic process) 1.8V SAGE-X standard cells library, version 2004q3v1. The ARM part number is A0082. Note 2: This time is defined as the longest possible delay to valid output for the typical corners. |
Applications Information
FPGAs or ASICs integrate the designed DSSHA1. Using several modules, an achievable authentication method makes a design secure. In Figure 5, the design module with a microprocessor can offload the SHA-1 computation to the DSSHA1. In Figure 5, the designer first crafts a randomly generated challenge and compares the result from DSSHA1 to the response received from the DS28E01-100. If the result and response match, then the design has been authenticated and can enable the product's functionality. It is often desirable to make variations in the authentication process in software and hardware. This makes successful attacks less likely. Refer to application note 1098, "White Paper 3: Why are 1-Wire SHA-1 Devices Secure?" for more information.
Physical Estimates
- Gate count 6,423 (NAND 2x1 used for calculation).
- Area is 85,470µm² without routing.
- Area is 102,256µm² with routing estimate.
Library used for estimate:
ARM TSMC CL018G (0.18µm generic process) 1.8V SAGE-X standard cells library, version 2004q3v1. The ARM part number is A0082.
Verification
The industry typically denotes the level of verification of an IP block with the following conventions:
- Gold IP has been to target silicon.
- Silver IP has been to target silicon in FPGA.
- Bronze IP has been verified in silicon models with logical timing closure.
- In-development IP has not yet been verified.
Note: The DSSHA1 has achieved silver status.
Deliverables
The DSSHA1 package comes complete with:
- Verilog HDL
- Verilog Test Bench
- Readme information on setup and scripts
Summary
The DSSHA1 synthesizable SHA-1 coprocessor is an alternative to the DS2460 or a microprocessor-based implementation. It can be embedded in an FPGA or ASIC where it appears as a memory-mapped device. For operation, the 16-word input buffer is first filled with data for the MAC computation. Activating the RUN_SHA signal starts the computation process, which changes the BUSY signal from low to high. After the BUSY signal has returned to low, the MAC is ready and can be read from the 5-word result register. The host processor compares the MAC computed by the DSSHA1 to the MAC delivered by the secure memory. Authenticity is confirmed if both MAC values are identical.