Introduction

This EE-Note highlights relevant details when migrating a system design from ADSP-2106x or ADSP-2116x SHARC® processors to ADSP-2126x, ADSP-2136x, or ADSP-2137x SHARC processors. While all SHARC processors are code compatible and have similar core and peripheral architecture, some key differences in the processor cores, internal memory operation, external memory access, and peripherals configuration can provide challenges to a successful migration.

It is important to note that this document identifies migration issues, provides guidelines for resolution, and refers to product documentation (data sheets, processor hardware reference books, and tools manuals) for detailed information. To implement the guidelines in this EE-Note, you will need to use the product documentation.

Table 1 compares processor features. Some of these feature comparisons identify easy-to-spot differences between processors (such as different operating voltages) that must be accommodated when migrating the systems design.

These types of migration issues are not the focus of this note. Instead, this note focuses on more subtle, less obvious feature differences (such as different pipeline depths, which relate to stall issues) that must be addressed when optimizing the system design migration.

This EE-Note addresses the following migration issues:

- Internal Memory Access
- Pipeline Depth
- SISD/SIMD Program Execution
- PLL Configuration
- External Memory Access
- External Port Throughput
- SPORT Feature Differences
- DAI/SRU Programming
- DMA/IOP Usage
- Interrupt Vector Table Setup
- Power Dissipation Calculations
<table>
<thead>
<tr>
<th>Processors →→→→</th>
<th>ADSP-21060/21061/21062</th>
<th>ADSP-21065L</th>
<th>ADSP-21160/21161</th>
<th>ADSP-21261</th>
<th>ADSP-21262</th>
<th>ADSP-21266(^1)</th>
<th>ADSP-21362/21363/21364/21365(^1)/21366(^1)</th>
<th>ADSP-21367(^1)/21368/21369</th>
<th>ADSP-21371/21375</th>
</tr>
</thead>
<tbody>
<tr>
<td>Max Freq. (MHz)</td>
<td>40</td>
<td>66</td>
<td>100</td>
<td>150</td>
<td>200</td>
<td>150/200</td>
<td>333</td>
<td>400</td>
<td>266</td>
</tr>
<tr>
<td>Core Voltage (3.3V I/O)</td>
<td>3.3(^2)</td>
<td>3.3</td>
<td>1.8</td>
<td>1.2</td>
<td>1.2</td>
<td>1.2</td>
<td>1.2</td>
<td>1.3</td>
<td>1.2</td>
</tr>
<tr>
<td>Dual- / Single-Ported RAM</td>
<td>Dual</td>
<td>Dual</td>
<td>Dual</td>
<td>Dual</td>
<td>Dual</td>
<td>Single</td>
<td>Single</td>
<td>Single</td>
<td></td>
</tr>
<tr>
<td>Int. Mem. Mbts (RAM/ROM)</td>
<td>4/0</td>
<td>0.5/0</td>
<td>1/0</td>
<td>1/3</td>
<td>2/4</td>
<td>2/4</td>
<td>3/4</td>
<td>2/6</td>
<td>1/4 0.5/2</td>
</tr>
<tr>
<td>Pipeline Depth</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
<tr>
<td>SISD/SIMD</td>
<td>SISD</td>
<td>SISD</td>
<td>SIMD</td>
<td>SIMD</td>
<td>SIMD</td>
<td>SIMD</td>
<td>SIMD</td>
<td>SIMD</td>
<td>SIMD</td>
</tr>
<tr>
<td>PLL Config.</td>
<td>XTAL only</td>
<td>XTAL only</td>
<td>H/W only</td>
<td>H/W+S/W</td>
<td>H/W+S/W</td>
<td>H/W+S/W</td>
<td>H/W+S/W</td>
<td>H/W+S/W</td>
<td>H/W+S/W</td>
</tr>
<tr>
<td>Ext. Port (A/D)</td>
<td>32/48</td>
<td>24/32</td>
<td>24/32</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>24/32</td>
<td>24/32(^3)</td>
</tr>
<tr>
<td>Ext./Para. Port Throughput(^4)</td>
<td>160M Bytes/s</td>
<td>264M Bytes/s</td>
<td>200M Bytes/s</td>
<td>66M Bytes/s</td>
<td>66M Bytes/s</td>
<td>66M Bytes/s</td>
<td>55M Bytes/s</td>
<td>222M Bytes/s</td>
<td>176M Bytes/s(^3)</td>
</tr>
<tr>
<td>Execute External</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>Yes</td>
</tr>
<tr>
<td>Parallel Port (muxed A/D)</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>16</td>
<td>16</td>
<td>16</td>
<td>16</td>
<td>n/a</td>
<td>n/a</td>
</tr>
<tr>
<td>MP/Shared Memory</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>Yes(^5)</td>
<td>n/a</td>
</tr>
<tr>
<td>SDRAM Controller</td>
<td>n/a</td>
<td>Yes</td>
<td>Yes</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>SPORTs (duplex)</td>
<td>2 (full)</td>
<td>2 (full)</td>
<td>4 (full)</td>
<td>4 (half)</td>
<td>6 (half)</td>
<td>6 (half)</td>
<td>6 (half)</td>
<td>8 (half)</td>
<td>8 (half)(^3)</td>
</tr>
<tr>
<td>I²S Support</td>
<td>n/a</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
</tr>
<tr>
<td>Link Ports</td>
<td>up to 6</td>
<td>n/a</td>
<td>up to 4</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
</tr>
<tr>
<td>BGA (Balls)</td>
<td>225</td>
<td>225</td>
<td>225</td>
<td>136</td>
<td>136</td>
<td>136</td>
<td>256</td>
<td>n/a</td>
<td>n/a</td>
</tr>
<tr>
<td>LQFP (leads)</td>
<td>n/a</td>
<td>n/a</td>
<td>n/a</td>
<td>144</td>
<td>144</td>
<td>144</td>
<td>144</td>
<td>208</td>
<td>208</td>
</tr>
</tbody>
</table>

Table 1. SHARC processor feature comparison

\(^1\) Processor includes audio specific peripherals and on-chip factory programmed ROM. IP holder license agreement required.
\(^2\) The ADSP-21060/21061/21062 are also available in 5.0V versions.
\(^3\) Value shown applies to the ADSP-21371. The ADSP-21375 external data bus is 16 bits wide, the ADSP-21375 throughput is 88 MBytes/s, and the ADSP-21375 has four SPORTs.
\(^4\) External Port throughput is estimated for data accesses over a 32-bit-wide data bus. See External Port Throughput for details.
\(^5\) Shared memory is available on the ADSP-21368. See External Memory Access for details.
Internal Memory Access

The Dual-/Single- Ported RAM row in Table 1 identifies an important difference between ADSP-2106x/2116x and ADSP-2126x/2136x/2137x SHARC processors. This memory access architecture difference can greatly influence the success of a design migration.

Access on Legacy SHARC Processors

The internal memory of ADSP-2106x/2116x SHARC processors (referred to as legacy SHARC processors) have a dual-ported memory structure with two (2) memory blocks accessible by any two (2) of the program memory (PM), data memory (DM), and I/O buses in the same memory cycle.

On legacy SHARC processors, a PM access and a DM access to the same block relies on the instruction cache to provide single-cycle throughput after the first iteration of a looped instruction. The PM or DM and the I/O bus can access either of the two blocks in the same core clock cycle.

The I/O bus used by the DMA controller provides legacy SHARC processors with core access to memory-mapped IOP registers used to control peripherals. These processors allow mixing code and data segments in both blocks, with core stalls for PM and DM memory block conflicts.

Access on Newer SHARC Processors

The internal memory of ADSP-2126x/2136x/2137x SHARC processors (referred to as newer SHARC processors) permits similar mixing of code and data segments across any internal memory block and allows DMA and core access of any block as is available on legacy SHARC processors. A crucial difference between the two architectures is that the dual-ported internal memory blocks in the legacy SHARC processors—which prevent memory block conflicts between the core (PM or DM bus) and the IOP (I/O bus)—are not available in the newer SHARC processors. Instead, newer SHARC processors provide four single-ported memory blocks. Because the memory blocks are single ported, there is an additional memory block conflict when the core and IOP attempt to access the same memory block in the same cycle. The extra two blocks of memory—as compared to legacy SHARC processors—are intended to help avoid this type of memory block conflict.

The SHARC processors handle memory block conflicts as follows:

- On all SHARC processors (legacy and newer), a conflict between DM and PM access is always resolved in favor of DM, with the PM access occurring in the second cycle.
- On newer SHARC processors, a conflict between DM/PM and I/O is resolved in favor of I/O accesses. Because the I/O bus runs at half the core clock frequency (CCLK), I/O accesses are requested at a maximum rate of once in two core clock cycles. This provides a fair sharing of memory access to the core and I/O buses.

The I/O bus is used in core accesses of memory-mapped IOP registers, used to configure peripherals, and by the DMA controller to transfer data to/from memory and peripherals. The I/O bus is also used by the DMA controller to access Transfer Control Blocks (TCBs) for DMA chaining, as TCBs are stored in internal memory.

Despite the potential for block access conflicts, with some forethought and analysis, system designers can use memory with full performance by following these guidelines:

- Use the default linker description file (LDF) as the starting point for describing system memory and placing program/data.
- If performance becomes an issue due to conflict-caused stalls, do the following:
Place code and data in separate blocks whenever possible.

- Allow DMA (data buffers and TCBs if applicable) to use a block not being used by the core.
- Allow DMA to ping-pong between memory blocks instead of within a block.
- Use the PM bus for instructions only.

The “Memory” chapter of the ADSP-2136x SHARC Processor Programming Reference includes information on these conflict-caused stalls and provides diagrams describing the use of the buses with the internal memory blocks.

### Pipeline Depth

The **Pipeline Depth** row in Table 1 identifies the difference between all SHARC processors that affects program execution performance. To accommodate faster memory and processor core speeds, the ADSP-2136x and ADSP-2137x SHARC processors changed from a 3-stage pipeline to a 5-stage pipeline. Table 2 shows a comparison.

<table>
<thead>
<tr>
<th>Stage</th>
<th>ADSP-2106x</th>
<th>ADSP-2116x</th>
<th>ADSP-2126x</th>
<th>ADSP-2136x</th>
<th>ADSP-2137x</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Fetch</td>
<td>Fetch</td>
<td>Fetch 1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>Decode</td>
<td>Fetch 2</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>Execute</td>
<td>Decode</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>N/A</td>
<td>Address</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>N/A</td>
<td>Execute</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

*Table 2. Pipeline structure comparison*

This increase in pipeline depth introduces some slight changes in behavior. This behavior is seen only in short loops and in the latency in interrupts and branches. For developers porting code from the 3-stage pipeline SHARC processors, it is important to note the following migration issues relating to **Stalls**, **Hardware Loops**, and **Latencies**, which stem from increased pipeline depth.

### Stalls

Potential sources of stalls include:

- DAG register load to usage in address generation:
  
  ```
  M0 = 1;
  DM(I2,M0) = R1; /* 2 cycle stall */
  ```

- DAG register load to usage in indirect jump/call:
  
  ```
  M0 = 1;
  JUMP (M0,I1);  /* 2 cycle stall */
  ```

- Post- to pre-modify using same index register:
  
  ```
  dm(I0,M1) = R1;  
  R2 = dm(-1,I0); /* 1 cycle stall */
  ```

- Ureg load to start of a H/W counter-based loop:
  
  ```
  USTAT1 = 0x5;
  LCNTR = USTAT1, do ( ... ) until LCE; /* 1 cycle stall */
  ```

- Compute to usage of generated condition:
  
  ```
  R0 = R0 – 1;
  if ne jump BEGIN_OF_LOOP; /* 1 cycle stall */
  ```

### Hardware Loops

The following are additional cases of short loops that incur stalls. To achieve no-overhead loops (eliminate all stall cycles), apply these guidelines:

- A loop of length one must iterate at least four times.
- A loop of length two must iterate at least two times.
- A loop of length three must iterate at least two times.
Latencies

Interrupts and jumps/calls have different latencies due to pipeline lengthening.

For interrupts, the lengthened pipeline causes some response latency:
- 5 cycles if forced by write to IRPTL
- 6 cycles if generated by hardware

For jumps/calls, there are some related latency issues:
- Immediate branch: 3 cycles
- Delayed branch: 1 cycle
- Conditional branch: 4 if taken, 0 if not taken

SISD/SIMD Program Execution

One of the issues that can greatly improve performance during a migration is updating the system to SIMD execution. The SISD/SIMD row in Table 1 identifies the SHARC processors that support SIMD.

ADSP-2106x SHARC processors are single-instruction, single-data (SISD) machines. Their single processing element and SHARC architecture provides up to 66 MIPS, 66 MMACS, and 132 MFLOPS of performance.

ADSP-2116x, ADSP-2126x, ADSP-2136x, and ADSP-2137x SHARC processors support single-instruction, multiple-data (SIMD) execution. This architecture enhancement provides a second identical processing element, which can effectively double performance. For example, when combined with the increased (266 MHz) core instruction rate, ADSP-21375 processors can perform at 266 MIPS, 533 MMACS, and 1.596 GFLOPS.

Several EE-Notes describe how to implement SIMD operation, including:
- Extended-Precision Fixed-Point Arithmetic on SIMD SHARC Processors (EE-270)

PLL Configuration

With the increase of processor speed, when moving from legacy SHARC processors to newer SHARC processors, support for greater flexibility in clock and phase-locked loop (PLL) configuration became more important. The PLL Config row in Table 1 identifies the SHARC processors that support PLL configuration through clock crystal input (XTAL only), clock crystal input plus clock ratio selection pins (H/W only), or clock crystal input with clock ratio selection pins plus software configuration (H/W+S/W).

Clock Control on Earliest SHARC Processors

On ADSP-2106x SHARC processors, the CLKIN input frequency from a microprocessor-grade clock crystal provides the clock input for processor core operation directly. One member of this processor family (the ADSP-21065L processor), doubled the input frequency, running the processor core at twice the frequency of the CLKIN input.

Unlike ADSP-21161x, ADSP-2126x, ADSP-2136x, and ADSP-2137x SHARC processors, ADSP-2106x SHARC processors do not have clock configuration pins to program an internal phase-locked loop (PLL) and thus the core clock rate.

Clock Control on Later SHARC Processors

On ADSP-2116x SHARC processors, an on-chip PLL provides a “clean” clock for the processor core. The ratio between the CLKIN input and the PLL output (to the processor core) is controlled by setting external clock configuration (CLKCFG) pins. The state of these CLKCFG pins defines an effective multiply ratio, yielding a desired core clock (CCLK) rate from a slower, readily-
available crystal or crystal oscillator (XO). The PLL locks to the CLKin source and provides the requested CCLK rate just after startup. The CLKCFG pins can only be selected while the SHARC processor is in reset state.

One member of this processor family (ADSP-21161 processors) includes a clock doubling (CLKDBL) pin, which multiplies the CLKin input by a factor of 2 before the 2x clock source passes through the internal PLL.

**Clock Control on Newer SHARC Processors**

On ADSP-2126x, ADSP-2136x, and ADSP-2137x SHARC processors, a software-configurable PLL is available, offering greater flexibility in core clock frequency control. In addition to the CLKCFG pins, the PLL on these processors is configurable using software, permitting a choice between relying on the CLKCFG pin settings or applying an additional set of multipliers and divisors giving a wider range of granularity than the three (3) ratios supplied by the CLKCFG pins alone. This software clock control can be applied during an initialization routine or anytime the processor is operating (not in reset state).

**Clock Control Guidelines**

It is important to understand some details about PLL programmability to ease migration from earlier SHARC processors:

- **PLL headroom limit** (this limit affects ADSP-2126x, ADSP-2136x, and ADSP-2137x SHARC processors)

  Use the initial divisor (INDIV) bit in PMCTL to divide the CLKin source by two (2) before passing it to the PLL input.

  - When the INDIV bit is cleared, CLKin * PLLM should be <400 MHz.
  - When the INDIV bit is set, CLKin * PLLM should be <800 MHz.

- Use the divisor enable (DIVEN) bit to cause the PLL to lock using the PLLD divisor value that has been entered.

- Remember to clear the DIVEN bit when setting the PLL in bypass mode.

Refer to Managing the Core PLL on Third-Generation SHARC Processors (EE-290) and/or the “System Design” chapter of the appropriate Hardware Reference for the SHARC processor being used.

**External Memory Access**

Depending on the amount of data a system stores in external memory and whether the system executes instructions directly from external memory, external memory access can be one of the most challenging issues for system migration. Looking at Table 1, the following rows identify external memory access features that affect system migration:

- **Ext. Port (Add/Data)** — width of address and data buses extended off-chip

- **Execute Ext.** — support for program execution from external memory

- **Parallel Port (muxed A/D)** — width of multiplexed external address and data bus

- **MP/shared memory** — support for multiprocessor and shared memory access

- **SDRAM Controller** — glueless support for external SDRAM in system

**Access on Legacy SHARC Processors**

Legacy SHARC processors (ADSP-2106x and ADSP-2116x processors) have external ports with dedicated data and address pins, which extend the 32-bit data bus and the 24-bit address bus off-chip.

These legacy SHARC processors have a variety of feature support in their external ports. One key feature that influences migration is support for
data packing (automatically accommodating differences between internal on-chip bus width and external off-chip bus width).

These processors can be configured to work with 16-, 32-, or (in some cases) 64-bit external data buses. For example, ADSP-21160 processors have a 64-bit data bus, but can be configured to operate with a 16-, 32-, or 64-bit external bus with data packing. Another example of flexible external port support is that ADSP-21161 processors can apply unused link port data pins (when the link ports are disabled) to widen their external data bus to 48 bits, allowing direct execution from external space without the need for instruction packing.

Among legacy SHARC processors, note that the ADSP-21065L, ADSP-21160, and ADSP-21161 processors can execute instructions from external memory space.

Combined with some memory control signals (/RD, /WR, ACK, and /MSx), the external port on legacy SHARC processors allows system-friendly connectivity to common SRAM devices as well as parallel DACs and ADCs.

Some legacy SHARC processors (ADSP-21065L and ADSP-21161 processors) include SDRAM controller functionality within their external port. This functionality adds external control signals for SDRAM devices (/RAS, /CAS, DQM, /SDWE, SDCLK0-1, and SDCKESD/A0) and internal logic to manage the startup and refresh needs of the SDRAM device.

Multiprocessor-based clustering and host processor support are available on most legacy SHARC processors. These features consist of additional external port signals (/HBR, /HGG, ID2-0, and /BRx). These signals allow multiple SHARC processors to arbitrate for access to common external memory and/or specified segments of each other's internal memory, referred to as multiprocessor memory space. These multiprocessor control signals also allow a host processor to access each SHARC processor's memory-mapped I/O processor space.

Access on Newer SHARC Processors

There are two types of external ports on newer SHARC processors.

Some of the newer SHARC processors (ADSP-2126x and ADSP-21362/3/4/5/6) have a parallel port (simpler support than a full external port), which uses fewer external pins. Other newer SHARC processors (ADSP-21367/8/9 and ADSP-21371/5) have full external ports. It is important to understand the difference between these two types of ports and how the difference in support may affect system migration.

A parallel port on a newer SHARC processor is 16 bits wide and uses a multiplex scheme to share the address and data signals on the same external pins (AD15-0). This parallel port feature is important because:

- Multiplexing of address and data pins on the port requires an external latch (glue logic) to address external memory (SRAM).
- The port cannot support SDRAM usage.
- The port cannot support execution of instructions from external memory space.
- The port cannot support host or multiprocessor access.
- The port cannot support processor core access directly to external memory. All port access to external memory is accomplished using DMA. The port’s control and DMA setup registers can be read/written by the core, permitting the DMA through the parallel port.

Because of this last limitation on parallel port support, the software development tools provide a separate external memory window (External Data (DM) Byte Memory) to display external memory data. Starting with the VisualDSP++® 4.5 development tools, the DMA ONLY specifier...
was introduced for external memory segments on these processors.

The ADSP-21367/8/9 and ADSP-21371/5 SHARC processors have a more robust external port that returns to much of the functionality provided on ADSP-2106x and ADSP-2116x processors. Specifically, the external port on these newer SHARC processors has the dedicated address and data pins, and the external port control signals (/RD, /WRACK, and /MSx). Note that the external port data bus width on these newer SHARC processors is 32 bits, except on the ADSP-21375 processor, which has a 16-bit data bus.

While the external port on these newer SHARC processors provides no host or multiprocessor functionality, these processors do include an SDRAM controller. There is some variety in the external port features among these processors. For example, ADSP-21371/5 processors support program execution from external memory (Bank 0 only). Also, ADSP-21368 processors provide some shared memory support — a common bank of SRAM or SDRAM among four ADSP-21368 SHARC processors. Additional external signals (ID1-0 and /BRx) arbitrate among the ADSP-21368 processors for the bus.

Unlike legacy SHARC processors with SDRAM control, the ADSP-21367/8/9 processors’ SDRAM controller does not have pins to drive the DQM pins on typical SDRAM devices. The SDRAM’s DQM pin can be tied active when connected to an ADSP-21367/8/9 processor.

For more information on using SDRAM with newer SHARC processors, see the appropriate SHARC processor and SDRAM device data sheets and see Interfacing SDRAM Memory to ADSP-21368 and ADSP-2137x SHARC Processors (EE-286).

---

**External Port Throughput**

Because data throughput using the external port can greatly influence system design, migration planning should include a comparison of this feature of legacy SHARC processors and newer SHARC processors. Table 1 lists this feature in the Ext./Para.Port Throughput row.

**Data Throughput on Legacy SHARC Processors**

Calculations of external port throughput for legacy SHARC processors are defined using the speed at which the external port is timed. For example, an ADSP-21160 processor external port functions at the CLKin rate (maximum rate of 50 MHz). Calculating the data throughput for this processor (assuming a 32-bit-wide external bus) yields:

$$\text{Throughput} = \frac{4 \text{Bytes}}{1} \times \frac{50 \times 10^6}{\text{sec}} = 200 \frac{\text{MBytes}}{\text{sec}}$$

It is important to note that ADSP-2116x processors also support a 64-bit external bus. Using the whole 64-bit data bus, this yields a throughput of 400 Mbyte/s.

**Data Throughput on Newer SHARC Processors**

Calculations of external port throughput for newer SHARC processors are not defined as simply as on legacy SHARC processors—the external port throughput does not scale with the increase in core clock speed as compared with the legacy SHARC processors. Looking at the throughput comparison in Table 1 shows that the newer SHARC processor’s external port throughput does not differ greatly from legacy SHARC processors.

On ADSP-21369 processors, the throughput calculation for external port stems from:

- Using a 32-bit bus running at 166 MHz (CCLK/2 = 333/2 = 166 MHz)
- External accesses that require three cycles to complete
Calculating effective external bus speed as 166/3 (55.3 MHz)

All of which provides a calculation of:

\[
\text{Throughput} = \frac{4\text{Bytes}}{1\text{sec}} \times \frac{55.3 \times 10^6}{\text{sec}} = 222 \text{ MBytes/sec}
\]

On some of the newer SHARC processors (ADSP-21367/8/9), asynchronous accesses occur at the speed of the external port logic, which is clocked by the SDCLK—even though the accesses are asynchronous. At a 333-MHz CCLK, the fastest SDCLK selectable is 166 MHz (CCLK/2).

If the system uses external SRAM only, the external port throughput for an ADSP-21367/8/9 at 400 MHz is slightly better. With an effective external bus speed of 200/3 (66.67 MHz), the calculation is:

\[
\text{Throughput} = \frac{4\text{Bytes}}{1\text{sec}} \times \frac{66.67 \times 10^6}{\text{sec}} = 266 \text{ MBytes/sec}
\]

This technique is used in ADSP-21367/8/9 SHARC processor data sheet to derive the throughput, but it is important to know that using SDRAM in the system (for example, in Bank0) means this number is not realistic.

If the system includes SDRAM, the above calculation cannot apply. Systems with SDRAM use the CCLK:SDCLK ratio, which is set to 2:1 for a 333-MHz CCLK, and 166-MHz SDCLK. This means that the asynchronous transfers (even in other banks) run at an external bus speed of 166/3 (55.3 MHz), yielding a throughput of approximately 220 Mbytes/s.

**Data Throughput Summary**

Table 3 summarizes the best throughput for data access in the various SHARC processors.

Note that ADSP-21367/8/9 processor SDRAM write throughput is dependent upon whether the core or DMA controller manages the access. Also note that ADSP-2116x processors support access to synchronous burst SRAM (SBSRAM) devices, yet ADSP-2126x and ADSP-2136x processors do not provide this support.

<table>
<thead>
<tr>
<th>Processor</th>
<th>Access Type</th>
<th>Oper.</th>
<th>Page</th>
<th>Throughput per EPort Cycles (not CCLK)</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADSP-21367/8/9</td>
<td>Sequential, uninterrupted</td>
<td>Read</td>
<td>Same</td>
<td>32 words per 37 cycles</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Write</td>
<td>Same</td>
<td>Core: 1 word per cycle</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>DMA: 1 word per 2 cycles</td>
</tr>
<tr>
<td>ADSP-2137x</td>
<td>Sequential, uninterrupted</td>
<td>Read</td>
<td>Same</td>
<td>32 words per 37 cycles</td>
</tr>
<tr>
<td></td>
<td></td>
<td>Write</td>
<td>Same</td>
<td>1 word per cycle</td>
</tr>
</tbody>
</table>

*Table 3. Data throughput comparison*

**Instruction Packing and Throughput**

The external port’s instruction packing lets the processor fetch instructions from external memory. Support for external execution is available on the ADSP-21065L, ADSP-21161, and ADSP-2137x processors only.
ADSP-21065L and ADSP-21161 instruction packing features include:

- 48-bit instructions in 32-bit external memory
- Two (2) \texttt{CLKIN} or external port cycles per instruction; for example, two (2) 32-bit locations per instruction. This manner of packing instructions into 32-bit memory wastes 16 bits of memory per instruction.

ADSP-21161 instruction packing features also include:

- Supported 48-bit instructions in 48-bit external memory (using unused link port data pins when link ports are disabled), or packed instructions in 32-bit external memory, 16-bit external memory, and 8-bit external memory.
- One (1) \texttt{CLKIN} or external port instruction in 48-bit-wide memory
- Two (2) \texttt{CLKIN} or external port instructions in 32-bit-wide memory (wastes 16 bits/instruction)
- Four (4) \texttt{CLKIN} or external port cycles per instruction in 16-bit-wide memory (wastes 2 bytes/instruction)
- Eight (8) \texttt{CLKIN} or external port cycles per instruction in 8-bit-wide memory (wastes 2 bytes/instruction)

ADSP-2137x instruction packing features include:

- 48-bit instructions in 32-bit external memory
- Three (3) \texttt{SDCLK} cycles per 2 instructions; for example, three (3) 32-bit locations per 2 instructions. This is a more efficient manner to pack instructions in 32-bit memory.

**SPORT Feature Differences**

Synchronous serial port (SPORT) features vary among SHARC processors. The differences stem from the steady increase in functionality and data throughput over the life span of the SHARC family. In Table 1, these differences are highlighted with the SPORTs (duplex) and I\textsuperscript{2}S Support rows, but the difference in features are more subtle and detailed than these rows imply.

The SPORTs on legacy SHARC processors (ADSP-2106x and ADSP-21160 processors) are capable of full-duplex operation. But with feature changes going to the ADSP-21161, system designs must use two SPORTs together to implement full-duplex operation.

The ADSP-21161 and newer SHARC processors (ADSP-2126x, ADSP-2136x, and ADSP-2137x) SPORT data pins support programmable direction of either inputs or outputs. This feature is an enhancement over the original ADSP-2106x and ADSP-21160 processors’ SPORTs, which were fixed transmitters or receivers.

Another feature enhancement is I\textsuperscript{2}S support. The SPORTs on ADSP-21065L, ADSP-21161, and newer SHARC processors provide support for I\textsuperscript{2}S.

The TDM support within the SPORT also has changed over time. ADSP-21065L and ADSP-21161 processors were the first SPORTs to support a channel B secondary Tx/Rx pair. On these legacy SHARC processors, the SPORT only supports multichannel TDM mode on the primary A channel only. On newer SHARC processors the SPORT supports TDM mode on the B channel as well, doubling TDM throughput over legacy SHARC processors on paired SPORTs.

TDM channel support has also broadened. The ADSP-2106x SPORTs only support up to 32 channels in TDM mode. ADSP-2126x and ADSP-2136x SHARC processors TDM support has been expanded to 128 channels per frame.

The SPORTs on the newest SHARC processors (ADSP-21367/8/9 and ADSP-2137x) have no restrictions regarding which SPORTs are used in TDM mode. Earlier SHARC processors had a pairing scheme in which a particular SPORT was used for transmitting, and a corresponding...
SPORT had to be used for receiving, or if not needed for receiving, could not be used at all.

Framing error logic is available in SPORTs for ADSP-21367/8/9 processors. This logic can detect frame syncs occurring early and assert an interrupt, even before the previous transmit or receive completes. On the interrupt, a SPERRSTAT register is polled to determine the SPORT that suffered the framing error.

The maximum internally generated clock varies on SHARC processors. On the ADSP-2106x SHARC processors, the SPORTs ran at up to 50 MHz, with a divide by 0 in the CLKDIV registers. ADSP-2116x processors restricted the maximum serial clock rate with a divide by 2 of the CCLK. ADSP-2126x processors restrict the maximum clock rate to a divide by 4 of the CCLK. ADSP-2136x processors restrict the maximum clock rate to a divide by 8 of the CCLK.

**DAI/SRU Programming**

When planning system migration, it is important not just to examine how features used for the earlier design have changed, but also to closely examine completely new features, identifying how these can improve performance in the new system. The DAI/SRU features in the new SHARC processors fall into this category.

With ADSP-2126x processors, a new way for peripherals to share pins was introduced, the Signal Routing Unit (SRU). This feature is fully described in the processor Hardware Reference, but it bears mention here because it is important to know that this is an easily-programmed group of pins that allows very flexible use of the many peripherals. There are several ways of programming the signal routing, including a GUI plug-in for the VisualDSP++ development tools, manual register manipulation, and a software macro usable in C and assembly. The VisualDSP++ examples use the SRU macro.

Note that the GUI plug-in does not come with the VisualDSP++ package. For more information, refer to *Using the Expert DAI for ADSP-2126x, ADSP-2136x and ADSP-2137x SHARC Processors (EE-243)*.

In the past, when routing SPORT clocking signals as outputs, some system design choices have led to signal integrity issues. Use the information available in the “Serial Ports” chapter of the *ADSP-21368 SHARC Processor Hardware Reference* to ensure that this issue does not occur during system migration.

**DMA/IOP Usage**

On the ADSP-2137x SHARC processors, the external DMA port has been enhanced to provide support for audio delay-lines. Essentially, this feature consists of a chained DMA or block of audio data writes to external memory, followed by reads of samples (taps) for audio playback. This feature was implemented first on ADSP-21367/8/9 SHARC processors, but was limited to reading single samples only for each tap. Since each tap in external memory requires an entry in internal memory (the tap list), having an internal word to describe each external word is not space-efficient. This limitation was rectified in the later ADSP-2137x SHARC processors by allowing reads of multiple samples for each tap.

When porting chained DMA setup code, keep in mind that due to the internal memory starting address changes between SHARC processors, the chain point register PCI bit is different between newer SHARC processors and legacy SHARC processors. A port of DMA code without alteration of the PCI bit will result in no interrupts generated after the completion of the DMA.
Interrupt Vector Table Setup

In the list of items for migration planning, remember to check the interrupt vector table setup. In migrating between legacy and newer SHARC processors (particularly when porting legacy assembly code), the programmer should pay attention to new mappings of peripherals vector addresses which might have moved to different interrupt vector locations. Porting code without these modifications could result in an inability to service SPORTs or other interrupts.

Power Dissipation Calculations

Power dissipation is a critical system feature that can greatly influence successful system migration. Over the lifetime of SHARC processors, power dissipation has become more difficult to specify due to the larger proportion of leakage current and the increasing design attention on power consumption and heat dissipation.

Calculations for ADSP-21060L Processors

The power dissipation calculation for this processor is:

\[ P_{\text{EXT}} = 0.074 \text{ W} \]
\[ P_{\text{INT}} = I_{\text{DDINHIGH}} \times V_{\text{DD}} = 0.475 \text{ A} \times 3.3 \text{ V} \]
\[ P_{\text{TOTAL}} = 1.644 \text{ W} \]

Also, the ADSP-21061L SHARC processor is unique in the addition of an “idle16” instruction. This instruction executes a NOP while slowing the core clock to 1/16th the original resulting in a savings of idle power, approximately 50 mA vs. 180 mA compared to “IDLE”, and compared to approximately 475 mA used in the example above, representing \( I_{\text{DDINHIGH}} \).

For more details, refer to the appropriate ADSP-2106x SHARC processor data sheet.

Calculations for ADSP-21065L Processors

The power dissipation calculation for this processor is:

\[ P_{\text{EXT}} = 0.068 \text{ W} \]
\[ P_{\text{INT}} = I_{\text{DDINHIGH}} \times V_{\text{DD}} = 0.275 \text{ A} \times 3.3 \text{ V} \]
\[ P_{\text{TOTAL}} = 0.9755 \text{ W} \]

Calculations for ADSP-21161N Processors

In the ADSP-21160M/N and ADSP-21161N processors, separate voltages for the core and I/O ring were introduced. The power dissipation calculation for the ADSP-21161N processor is:

\[ P_{\text{EXT}} = 0.185 \text{ W} \]
\[ P_{\text{INT}} = I_{\text{DDINHIGH}} \times V_{\text{DDINT}} = 0.660 \text{ A} \times 1.8 \text{ V} \]
\[ P_{\text{TOTAL}} = 1.373 \text{ W} \]

Note: The 0.660 A value above includes 10 mA for \( A_{\text{IDD}} \) for PLL supply.

Calculations for Newer SHARC Processors

In the newer SHARC processors (ADSP-2126x processors and beyond), as leakage current became a more substantial portion of dissipated power, Analog Devices switched to communicating power dissipation calculation information through EE-Notes, instead of data sheets. This change allows more information and explanation to be shared with the system designers.

With an EE-Note for each group of SHARC processors (ADSP-2126x, ADSP-21362/3/4/5/6, ADSP-21367/8/9, and ADSP-21371/5), system designers can accurately predict \( P_{\text{EXT}} \) using an actual peripheral usage case, estimate \( P_{\text{INT}} \) based on both static (leakage) and dynamic (switching) components, and illustrate the effects of voltage, temperature, and frequency on power dissipation.

The data in the EE-Note, similar to data supplied in data sheets for earlier SHARC processors, is based on characterization data. The power dissipation calculation data provides valuable
information for understanding power supply requirements and for estimating power savings that may be achieved by managing the core clock rate using the programmable PLL.

Reference EE-Notes on SHARC processor power dissipation include:

- *Estimating Power Dissipation for ADSP-21368 SHARC Processors (EE-299)*
- *Estimating Power for the ADSP-21362 SHARC Processors (EE-277)*
- *Estimating Power Dissipation for Industrial Grade ADSP-21262 SHARC Processors (EE-250)*
- *Estimating Power Dissipation for ADSP-21262S SHARC DSPs (EE-216)*

### Conclusion

Migrating a system design from legacy SHARC processors to newer SHARC processors is a manageable task when the differences between the processors are understood. The challenges to migration stem not just from obvious specification differences between the parts, but also from the more subtle, less obvious performance feature differences.

The intent of this EE-Note is to provide clear information on the subtle feature differences to ease system migration. Using this note is only the beginning though. The issues raised here should lead system designers to the relevant sections of processor documentation affecting their migration planning.
References


Document History

<table>
<thead>
<tr>
<th>Revision</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Rev 1 – June 27, 2007 by Divya Sunkara</td>
<td>Initial release</td>
</tr>
</tbody>
</table>