myAnalog_id
https://my.analog.com
https://registration.analog.com
Analog Devices
  • View Cart
  • myAnalog
  • Log In
  • Contact ADI
  • Parametric Search
  • Replacement Parts Search
  • ADI Home
  •  >Embedded Processing and DSP
  •  >Communities
  •  >Embedded Processing Community
  • Save to myAnalog  
  • Print
  • Email this Page
  • Communities
  • DSP Community
  • Embedded Processing Community
  • All Product Categories
  • All Applications
  • Design Tools & Support
  • Buy Online

Rick Gentile

Multicore Means More than Moore
November 2005

Rick Gentile Last month I attended and presented at both the GSPx conference and the Fall Processor Forum during the same week. It was interesting to see how much focus was placed, at both these events, on multi-core processors. Not only did each conference feature dedicated tracks on the subject, but multi-core was even the subject of a keynote address at GSPx ("Signal Processing Catches the Multicore Wave").

Upon reflection, I came away from these presentations with an impression of several common themes. Central to the discussion is the concern that even though processors are keeping up with Moore's Law in terms of speed, the key to meaningful performance increases lies not in incrementally cranking up clock frequencies, but in creating efficient inter-processor communications architectures for adding processing core elements.

"Symmetric architectures like the BF561 get us closer to 'N times' the performance for 'N' processing elements"

Less well known, but equally important here, is Amdahl's law of parallel computing which states that the speedup of a parallel algorithm is limited by the fraction of the problem that must be performed sequentially. This law recognizes that no matter how many processing units are available, a sequential portion of the code may need to be executed. This code could be the communication time to send messages to parallel execution units, or synchronization time to synchronize the parallel execution units into a single result.

In grail-like pursuit of achieving "N times" the performance, for "N" cores, many different architectures have evolved, both homogeneous and heterogeneous. The latter model has been especially true for processing engines targeted at convergent embedded applications that require equal parts control and DSP.

Analog Devices, on the other hand, has approached the convergence challenge with identical, symmetrically-paired processor cores (the ADSP-BF561) that are equally good at control and signal processing. Our growing, real world experience with this device was the basis for one of the talks I gave at GSPx, and I specifically looked at the different types of programming models that can be used to gain the maximum performance on a dual-core processor.

The old-fashioned heterogeneous dual-core architecture involves discrete and often different tasks running on each of the differing cores. One core might handle the brawn, for example, running an RTOS and its control tasks (network stacks, graphics, and other functional parts of the application), while the second core gets to be the brains of the outfit, handling serious data processing (e.g. audio and video compression). This kind of rigid partitioning is pretty much your only choice when your processor cores are different animals.

A symmetric multiprocessor (SMP) approach, however, gets off to a good start by requiring a single set of development tools and a design team with a single architectural knowledge base. Unused processing resources on one core can often be leveraged by the other, identical core and traditional processing-intensive applications can be split equally across each core. With this "Homogeneous Model," code running on each core is identical and only the data being processed is different. In a streaming multi-channel audio application, for example, this would mean that one core processes half of the audio channels, and the other core processes the remaining half.

In the "Master-Slave" SMP model, both cores perform intensive computation in order to achieve better utilization of the architecture, with one core (the Master) controlling the flow of the processing and performing at least half the processing load. Portions of specific algorithms are handled by the Slave, assuming these portions can be parallelized.

Pipelining, a variation on the Master-Slave Model, allocates serial processing steps to each core so one core's output becomes the next core's input. This kind of task separation, however, is heavily dependent on the processor architecture and memory hierarchy.

With Moore's law doing a great job, we've certainly got the advantage of ever faster processing engines to work with. But at the same time, it's pretty clear that today's applications want more, more, MORE (pun intended...). Symmetric multiprocessor architectures like the BF561 certainly support many more programming models than do asymmetric approaches, and ultimately this is the architecture that gets us far closer to "N times" the performance for "N" processing elements.

I'd like to hear your thoughts on this.


Rick Gentile leads the Blackfin DSP Applications Group at Analog Devices. His column appears here regularly.



  • Privacy/Security
  • myAnalog
  • Contact ADI
  • Site Map
  • Registration
  • Technical Support
  • Terms of Use
© 1995-2008 Analog Devices, Inc. All Rights Reserved