The ADSP-21csp01 is the first member of a new Analog Devices family of 16-bit fixed-point digital signal processors designed specifically for rapid and efficient processing of multiple signals concurrently-and to efficiently process compiled code written in a high-level language. Its core design permits more software to be written and debugged in C, simplifying development of fixed-point DSP applications and speeding time-to-market for product and system designers. Applications such as simultaneous voice-over- data modems, cellular basestations, and computer telephony systems benefit from improved DSP throughput, reduced chip count, and faster time-to-market.
Its newly designed architecture (Figure 1) comprises an arithmetic section supported by a large set of general purpose data registers; a data-address generation section consisting of two address generators; and a program sequencer supported by a 64-word instruction cache. This core is augmented by an ample 20 kilobytes of on-chip SRAM, configured as 4 K × 24 program memory RAM and 4 K × 16 data memory RAM, a 16-bit DMA (direct memory-access) port, two serial ports with DMA and a boot controller. These features-combined with the 50-MIPS (million instructions per second) performance of the ADSP-21csp01 and the 24-bit address bus-supply the processing horsepower and I/O bandwidth required for processing multiple signals concurrently.
Concurrent Signal Processing
Inexorable trends are driving signal-processing systems to smaller size, lower cost, lower power consumption and higher performance-and they are significantly influencing the direction of DSP architecture. The new high-performance processors must be designed with the capability to perform tasks that previously would have required several processors (Figure 2).
More importantly, emerging applications, such as voice-over-data modems-which simultaneously process modem/FAX signals as well as speech signals-impose requirements on the DSP to process concurrent signals.
To accomplish this, the DSP must be able to address a large program- and data memory space-large enough to store the program instructions and data for all the algorithms required by the application. The DSP must also have enough speed and efficiency to execute the multiple algorithms and perform the multiple tasks of the application in real time. Furthermore, to accommodate the multiple signals used in the application, the DSP must also have multiple I/O ports, plus DMA channels to stream data in and out of the DSP's internal memory without interrupting the processor (Figure 3).
A powerful solution is the ADSP-21csp01 Concurrent Signal Processor. With its 50-MIPS instruction rate, highly paralleled instruction set to perform many operations in a single cycle (550 MOPS), 24-bit address reach to access up to 16 Mwords of instructions and data, high I/O bandwidth and DMA channels- it can accommodate the multiple signals from a codec (or multiple codecs) and can handle multiple tasks in real time.
Architectural Details
The arithmetic section of the ADSP-21csp01 consists of a 16-bit arithmetic/logic unit (ALU) and a 16×16-bit multiplier/ accumulator (MAC), with dual 40-bit accumulators, and a barrel shifter. The single-cycle, non-pipelined arithmetic units operate independently of one another and have provisions for multi-precision operations. The 21csp core has a total of 96 on-chip registers, including 64 addressing registers and 32 arithmetic registers,including two sets of multiply result-registers.Two banks of data registers provide data operands to the arithmetic units and store arithmetic results. Any data register(s) can be used to supply a data operand to any arithmetic unit. This high degree of flexibility simplifies programming and enhances the efficiency of systems implemented with high-level languages. The arrangement of data registers in a primary bank and secondary bank simplifies task switching, since it takes only a single cycle to switch between register banks.
The address generators of the ADSP-21csp01 allow data to be accessed with indirect addressing using an address (I) register in conjunction with a modify (M) register or an immediate modify value. 16 sets of these registers are arranged in a primary bank and a secondary bank. Updating of the address can be performed in both a pre- and post-update mode (i.e., before and after the address is outputted to the address bus). Zero-overhead looping instructions that can nest up to five levels produce fast, efficient, and tightly coded loops.
To support the automatic maintenance of circular buffers (with an absolute minimum of instructions),the address generators also employ a set of length (L) registers and base (B) registers. As many as 16 circular buffers can be maintained (8 with the primary registers and another 8 with the secondary registers)-with a starting address for each at any memory location. The ability to maintain many circular buffers simultaneously is a key advantage in the processing of multiple signals concurrently, since the data set associated with each signal needs to reside in its own buffer. Also, an algorithm that processes a single signal might require several circular buffers. This requirement gets multiplied when concurrent signals are to be processed. With address generators dedicated to each circular buffer, no extra processing time is required to swap pointer values in and out of address registers.
The program sequencer is used in conjunction with a 64-word instruction cache to sustain three-bus performance for fetching an instruction and two data values. The cache is selective: only the instructions whose fetches conflict with program memory data accesses are cached. This allows full-speed execution of core, looped operations such as digital filter multiply-accumulates and processing of FFT butterflies.
Another important aspect of efficiently processing multiple signals in real time is interrupt latency. The ADSP-21csp01 responds to external and internal interrupts in a minimal amount of time. This is an extremely important factor, since response time to the external signals is critical to real-time performance.
Unified Memory Space
Modified Harvard architecture, a key characteristic of a DSP, allows two data words, as well as the next instruction, to be fetched in a single cycle. This three-bus performance is what sets a DSP apart from other microprocessors and RISC processors. Traditionally, DSP memories have been configured into two separate spaces to support the Harvard architecture. These two memory sections offer the efficiency required in the dual operand fetch, but at the cost of flexibility. For example, the DSP might have a total of 8 K words of memory arranged as two separate 4-K word blocks. However, the particular application might have a need for a total of 8 K words-deployed as a 6-K-word program section and a 2-K-word data section. The DSP's memory space has enough total memory, but not in the needed configuration. The result is a need for external memory to make up the difference.
The ADSP-21csp01 eliminates this problem by providing memory in a unified undedicated address space. This memory is multiported to accommodate the fetch of two data operands in a single cycle- with optimal flexibility. Any portion of the memory can be used for program instructions or for data stored in either program memory or data memory. This memory configuration also provides additional flexibility required by a high-level language, such as a C compiler.
Development Tools
The architectural innovations of the ADSP-21csp01 are accompanied by new advances in development tools. An integrated development environment (IDE) running under Windows 95 allows the definition of a project where assembly, linking and project building are performed in a single step. An environment menu lets the user specify assembler and linker options to eliminate the older command line use of switches. The IDE remembers user preferences and settings,as well as all the names of files comprising the project.After the initial IDE set-up, code generation and debug can be performed quickly.
The ADSP-21csp01 EZ-ICE (In-Circuit Emulator, with its easy-to- use Microsoft Windows interface) allows non-intrusive access to the internal processor registers through a JTAG serial boundary-scan interface. Consisting of a PC plug-in card and a small attached probe, the EZ-ICE supports full-speed operation,up to 30 software breakpoints, nine hardware break ranges, single-step execution, register Modify and Read, and program and data memory upload/ download.
The ADSP-21csp01 EZ-LAB is a PC plug-in development system that includes an ADSP-21csp01 with a connector for analog front end cards. The EZ-LAB board can also be operated in stand-alone mode, booting from the on-board EPROM. Software is included for program debugging.
High-Level-Language Programming
As demands are placed on system manufacturers to get products to market quicker, designers are required to employ methods that keep the design cycle as short as possible. Also, algorithms and standards are changing at an increasing rate. Development methods that can simplify the creation of code and preserve existing code by making it more transportable among different platforms offer key benefits to system designers. High-level languages, such as ANSI C, can provide this level of simplified code generation and transportability for the large and growing number of skilled C programmers.
The ADSP-21csp01 features a new DSP core, which includes key architectural features for efficient implementation of C compilers. The program sequencer supports PC-relative jumps and calls. PC-relative capability simplifies relocatable code. The large number of registers and the flexibility to use a single register to store a variable used in different arithmetic operations improves the computational efficiency and ensures optimum data flow of compiled code. The C compiler does not need to generate additional instructions in order to save and restore values to and from the registers. The address generator architecture provides the functions required for efficient stack maintenance. The C compiler can manipulate frame pointers and generate linked lists with significantly fewer instructions.
Overall, the architectural features of the ADSP-21csp01 allow for a C compiler that generates code three to five times more efficiently than the earlier Analog Devices ADSP-21xx family.
The ADSP-21csp01 is housed in a 160-lead PQFP package and will be in production in mid 1996. Samples, and the beta version of the development tools, will be available in late spring.