Blackfin Processors include a high performance 16-/32-bit embedded processor core with a 10-stage RISC MCU/DSP pipeline, variable length ISA for optimal code density, and full SIMD support with instructions for accelerated video and multimedia processing. The Blackfin core is described below:
The Blackfin Processor core includes an 8-entry by 32-bit data register file for general use by the computational units. Supported data types include 8-, 16-, or 32-bit signed or unsigned integer and 16- or 32-bit signed fractional. In every clock cycle, this multiported register file supports two 32-bit reads AND two 32-bit writes. It can also be accessed as a 16-entry by 16-bit data register file.
The address register file provides a general purpose addressing mechanism in addition to supporting circular buffering and stack maintenance. This register file consists of 8 entries and includes a frame pointer and a stack pointer. The frame pointer is useful for subroutine parameter passing, while the stack pointer is useful for storing the return address from subroutine calls.
The Data Arithmetic Unit contains roughly twice the system resources of previous Analog Devices 16-bit architectures. It contains:
All computational resources can process 8-, 16-, or 32-bit operands from the data register file-R0 through R7. Each register can be accessed as a 32-bit register or a 16-bit register high or low half.
In a single clock cycle, this SIMD architecture can read AND write up to two 32-bit values. However, since the high and low halves of the R0 through R7 registers are individually addressable (Rx, Rx.H, or Rx.L), each computational block can choose from either two 32-bit input values or four 16-bit input values with no restrictions on input data. The results of the computation can be written back into the register file as either a 32-bit entity or as the high or low 16-bit half of the register. Additionally, the method of accumulation can vary between data paths. For example, A0 could be a constant summation, and A1 could be a constant subtraction. This capability is referred to as 'flexible SIMD'.
Both accumulators are 40 bits in length, providing 8 bits of extended precision. Similar to the general purpose registers, both accumulators can be accessed in 16-, 32-, or 40-bit increments. The Blackfin architecture also supports a combined add/subtract instruction that can generate two 16-, 32-, or 40-bit results or four 16-bit results. In the case where four 16-bit results are desired, the high and low half results can be interchanged. This is a very powerful capability and significantly improves, for instance, the FFT benchmark results.
Two data address generators (DAGs) provide addresses for simultaneous dual operand fetches from memory. The DAGs share a register file that contains four sets of 32-bit index (I), length(L), base(B), and modify(M) registers. There are also eight additional 32-bit address registers—P0 through P5, frame pointer, and stack pointer—that can be used as pointers for general indexing of variables and stack locations.
The four sets of I, L, B, and M registers are useful for implementing circular buffering. Used together, each set of index, length, and base registers can implement a unique circular buffer in internal or external memory. The Blackfin architecture also supports a variety of addressing modes, including indirect, autoincrement and decrement, indexed, and bit reversed. Last, all address registers are 32 bits in length, supporting the full 4 Gbyte address range of the Blackfin Processor architecture.
The program sequencer controls the flow of instruction execution and supports conditional jumps and subroutine calls, as well as nested zero-overhead looping. A multistage fully interlocked pipeline guarantees code is executed as expected and that all data hazards are hidden from the programmer. This type of pipeline guarantees result accuracy by stalling when necessary to achieve proper results. This greatly simplifies the programming task since the software engineer doesn't have to completely understand pipeline latency issues. On-chip interlocking hardware ensures that operand data is valid at the time of a particular instruction's execution.
The Blackfin architecture supports 16- and 32-bit instruction lengths in addition to limited multi-issue 64-bit instruction packets. This ensures maximum code density by encoding the most frequently used control instructions as compact 16-bit words and the more challenging math operations as 32-bit double words.