Industry's First Integrated Wavelet Video Codec Sets New Standards for Cost, Image Quality and Flexibility
by Alex Zatsman, Mark Rossman, Rich Greene, Will Hooper, Phil Halmark, Bill Valentine, and David Skolnick
The ADV601 is a low-cost, single-chip, all-digital, dedicated-function CMOS VLSI chip for compression and decompression of digital video signals in real time. It can support compression rates of up to 350:1, with essentially lossless 4:1 compression of natural images. The ADV601 supports all common interlaced video formats (see Table 1). The device has been optimized for video applications demanding compression in real time at low cost and with broadcast quality applications such as nonlinear video editing, video capture systems, remote CCTV surveillance, camcorders, high quality teleconferencing and video distribution systems, video insertion equipment, image and video archival systems, and digital video tape. In addition to compression and decompression, the subband coding architecture of the ADV601 offers inherent support of video scaling and spatial filtering.
With landmark standards already existing in the form of JPEG, H.261, MPEG 1 and MPEG 2, do we need a new compression paradigm? Briefly, Yes. There are many closed-system applications where cost, image quality, and flexibility are more important than inter-operability (the advantage of a standard). For applications that do not require inter-operability, the choice of a compression solution should be driven by the evaluation of these factors, plus symmetry - low cost and complexity in both encode (compression) and decode (decompression). The following overview of Wavelet compression and its role in the ADV601 should be helpful in understanding the value it can bring to video compression applications.
Figure 1 shows that in the Encode mode, the ADV601 accepts component digital video through its Video interface and outputs a compressed bit stream though the Host interface. In Decode mode, the reverse is true: the ADV601 transforms a compressed bit stream at its Host interface to component digital video at the Video interface. The host has access to all of the ADV601's control and status registers via the Host interface.
The ADV601 codec's compression algorithm is based on the bi-orthogonal wavelet transform, embodied with 7-tap high-pass and 9-tap low-pass filters, and implements field-independent sub-band coding. Sub-band coders transform two-dimensional spatial video data into spatial frequency filtered sub-bands. Then adjustable quantization and entropy encoding processes are used to provide compression (see Figure 2).
The ADV601 is based on wavelet theory, a new mathematical tool first explicitly introduced in Morlet and Grossman's works on geophysics during the mid-1980s. This theory quickly became popular in theoretical physics and applied math; and the late '80s and '90s have seen a dramatic growth in wavelet applications to signal and image processing.[2,3,4,5]
Understanding the wavelet kernel is key to comprehending the advantages of wavelets in video applications. This portion of the device contains filters and decimators that work on the image in both horizontal and vertical directions. The filters are based on carefully chosen segmented wavelet basis functions such as those shown in Figure 3. These basis functions have 3 key benefits: they correlate better to the broadband nature of images than do the sinusoidal waves of Fourier transforms; the ADV601 can implement these functions with simple, compact 7- and 9-tap FIR filters (key to low-cost silicon); and these functions provide full-image filtering which eliminates the block-shaped artifacts in the compressed image that occur when an image is broken up into smaller areas to be compressed separately (JPEG and MPEG* can both be subject to this artifact in certain applications).
*Compression based on standards from Joint Photographic Experts' Group and Motion Picture Experts' Group.
The filter tree involves successive high- and low-pass filtering of two-dimensional (x and y) data, with decimation by 2 at each step (Figure 4a), resulting in successively smaller blocks of data, shown combined in a Mallat diagram (4b). All three components (e.g., Y,Cb,Cr) of a color video signal field are alternately passed through the filter tree to create a total of 42 new images (14 for Y, 14 for Cr, and 14 for Cb).
An example of what the transform does to a black and white image (luminance only) is shown in figure 5. In this case the Analog Devices Headquarters image has been transformed into 14 new images, each containing a different set of information about the original image. One can clearly see the vertical edges resulting from high-pass x-filtering in block A, the horizontal edges from high-pass y-filtering in block D, and the reduced-size original image from decimation and low-pass filtering in block N. No compression has occurred yet; the total number of data points used to describe the 14 blocks shown is identical to the number used in the original image. But now that the image has been transformed, we can do some useful things: 1)implement nearly lossless compression, 2) achieve lossy compression at either constant quality or constant bit rate, 3)create high-quality scaled images without computational overhead, and 4) create an error-resilient compressed bit stream, since each block contains information about the whole image.
Figure 6 shows the scheme for using the ADV601 in lossless mode. In this case the 42 transformed blocks are sent to 2 types of lossless entropy coders. The entropy coders benefit from the increased correlation found within the transformed blocks. In this mode of operation, the compression performance is tied to the degree of complexity in the original. If the original image is of a simple ramp, all the blocks will contain zeros, except for the smallest block in the upper left hand corner (N). This would yield lossless compression in excess of 300:1 (requiring less than 0.5M bits/s for 4:2:2-coded CCIR601 resolution video at 60 fields per second). But if the image is of white noise, there are no correlation opportunities, and the compression must be near 1:1 (requiring approximately 168M bits/s). In a typical real video application, the compression would range from 2:1 (84Mb/s) to 5:1 (16Mb/s), depending on the complexity within each field. In applications calling for near-lossless compression, the ADV601 can be used in this mode if large swings in bit rate are allowed.
Figure 7 shows how one might use the ADV601 in a lossy compression mode. As the image is being transformed, a set of statistics is extracted for all 42 blocks, including the sum of squares (or energy), minimum pixel value, and maximum pixel value for each block. This information goes to the quantizer and is coupled with the human visual model, which relates the importance of each block to the human visual system. The quantizer algorithm takes all this information, plus the user-programmed bit rate, and calculates 42 values "bin widths" for each field; they can be thought of as the accuracy budget per block. When lightly quantized (i.e., many small quanta), this number will be large. Heavy quantization (few large quanta), leads to a much smaller number. Two examples follow to help illustrate how this works. The actual quantizer is on the ADV601, but a host or external DSP performs the bin-width calculation.
The first case is of a high-quality application calling for visually lossless compression, while maintaining an accurate bit rate. In this case all low-frequency bands (the smaller blocks) will be given the maximum bin width to ensure perfect reconstruction. The high-frequency bands (the largest blocks) will get as large a bin width as can be allocated, based on the complexity of the image. In this case, a small amount of accuracy of high frequency information is given up in order to maintain a desired bit rate. This does not pose a difficult problem, because the human visual system cannot resolve high spatial frequencies to the same level as low spatial frequencies. It has been shown that frequency blocks with light quantization cannot be detected by the human eye (even with broadcast quality video playback equipment).
In the second case, extremely high compression (over 100 to 1) is required. This means that 99% of the bits in each field must be eliminated! Here, only the smallest block gets a large bin width. The remaining budget of bits is sprinkled across the rest of the blocks as determined by the bin-width allocator. Typical compression schemes based solely on information within each field usually fail at high compression; thus the ADV601's ability to maintain adequate information about the image is remarkable. When the algorithm was tested at 350:1 with a football sequence, it was possible to clearly identify the action and even read numbers on players' uniforms. The video quality at such high compression ratios doesn't fit every application but can be more than adequate for video sequence identification and surveillance.
So what do the artifacts look like, using wavelets, when the compression ratio is too high to render an image accurately? As the compression ratio increases, more and more noise is injected into high spatial frequencies; with less accuracy to describe high frequencies, noise is increased in those bands. As a result, wavelet video compression degrades much like conventional broadcast analog video. Though artifacts in video are never pleasing, humans are highly conditioned to accept this type of artifact. Since the ADV601 allows control over the gain of each subband, it is possible to reduce the noise by making the image "soft". Most other compression schemes break the image up into smaller blocks, and each is processed separately. As compression increases, the first artifact to appear is a stationary grid of blocks laid on top of the image. There is general agreement that such blocking artifacts are more objectionable to the human visual system than high frequency noise or image softening.
Figure 8 is a functional block diagram of the ADV601, and Figure 9 shows how it would be used in a typical host-based ADV601 application. The ADV601 video interface is designed to work with all popular analog video decoders and encoders including those from Analog Devices, Philips, Brooktree, and Raytheon. The video interface is also capable of interfacing directly to all parallel CCIR656-compliant devices (also known as "D1"). Table 1 shows the field rates and image sizes supported by the ADV601. The DRAM manager provides a glue-less interface to a 256Kx16 fast page-mode DRAM required to support the ADV601 during both encode and decode modes. The general-purpose host interface can be configured in widths of 8-, 16- and 32 bits. The host interface also includes a 512x32-bit FIFO to help enable smooth transfer of compressed video data.
A host-based software driver, part of a complete Video for Windows driver package Analog Devices has developed to support the ADV601, assists the ADV601 in calculating the 42 bin-width values for each field. Analog Devices has also created a plug and play PCI board for Windows 95, called Videolab, for evaluating the video quality of the ADV601.
The ADV601 can also be used in stand-alone applications, with the assistance of an ADSP21xx-class DSP to calculate the bin-width values for each field. The ADV601 DSP serial interface supports a glueless interface to all Analog Devices DSPs.
The ADV601JS, packaged in a 160-pin PQFP, operates over the 0 to +70°C commercial temperature range. Get in touch with Analog Devices or your local sales office for further information. Budgetary pricing is $35 in 10,000s.