Use of Video Technology
to Improve Automotive Safety Becomes More Feasible with Blackfin Processors INTRODUCTION While most currently available safety features utilize a wide array of sensorsprincipally involving microwaves, infrared light, lasers, accelerometers, or position detectiononly recently have processors been introduced that can meet the real-time computation requirements that allow video image processing to contribute substantially to safety technology. The Analog Devices Blackfin media-processor family offers attractive solutions for this growing market, with its high processing speeds, versatile data-movement features, and video-specific interfaces. This article will discuss the roles that Blackfin processors can play in the emerging field of video-based automotive safety. VIDEO IN AUTOMOTIVE SAFETY SYSTEMS Figure 1 shows the basic video operational elements of an automotive safety system, indicating where image sensors might be placed throughout a vehicle, and how a lane departure system might be integrated into the chassis. There are a few things worth noting. First, multiple sensors can be shared by different automotive safety functions. For example, the rear-facing sensors can be used when the vehicle is backing up, as well as to track lanes as the vehicle moves forward. In addition, the lane-departure system might accept feeds from any of a number of camera sources, choosing the appropriate inputs for a given situation. In a basic system, a video stream feeds its data to the embedded processor. In more advanced systems, the processor receives other sensor information, such as position data from GPS receivers.
Figure 1. Basic camera-placement regions for automotive safety applications. Smart Airbags In this system, the media processor must acquire multiple image streams at high frame rates, process the images to profile the size and position of each occupant under all types of lighting conditions, and constantly monitor all the crash sensors, located throughout the car, in order to make the best deployment decision possible in a matter of milliseconds. Collision Avoidance and Adaptive Cruise Control ACC systems are promoted as a convenience feature, while true collision avoidance systems actively aim to avoid accidents by coordinating the braking, steering, and engine controllers of the car. As such, they have been slower to evolve because of the complexity of the task, the critical reliability considerations, and legal and social consequences. It is estimated that deployment of these systems may be well on its way by 2010. In view of the typical 5-year automotive design cycle, such system designs are already underway. Collision warning systems, like ACC, are a subset of the collision-avoidance category. These provide a warning of a possibly impending accident, but they dont actively avoid it. There are two main subcategories within this niche: Blind spot monitorsCameras are mounted strategically around the periphery of the vehicle to provide a visual display of the drivers blind spotsand to sound a warning if the processor senses the presence of another vehicle in a blind-spot zone. In reverse gear, these systems also serve as back-up warnings, cautioning the driver about obstructions in the rear of the car. A display could be integrated with the rear-view mirror, providing a full, unobstructed view of the cars surroundings. Moreover, the system might include a video of blind spots within the car cabin, allowing the driver to monitor a rear-facing infant, for example. Lane-departure monitorsThese systems can notify drivers if it is unsafe to change lanes or if they are straying out of a lane or off the roadthus aiding in detecting driver fatigue. Forward-facing cameras monitor the cars position relative to the roadways centerline and side markers, up to 50 to 75 feet in front of the car. The system sounds an alarm if the car starts to leave the lane unintentionally. LANE DEPARTUREA SYSTEM EXAMPLE The overall system diagram of Figure 2 is fairly straightforward, considering the complexity of the signal processing functions being performed. Interestingly, in a video-based lane departure system, the bulk of the processing is image-based, and is carried out within a signal processor rather than by an analog signal chain. This represents a big savings on the system bill-of-materials. The output to the driver consists of a warning to correct the cars projected path before the vehicle leaves the lane unintentionally. It may be an audible rumble-strip sound, a programmed chime, or a voice message. The video input system to the embedded processor must perform reliably in a harsh environment, including wide and drastic temperature shifts and changing road conditions. As the data stream enters the processor, it is transformedin real timeinto a form that can be processed to output a decision. At the simplest level, the lane departure system looks for the vehicles position with respect to the lane markings in the road. To the processor, this means the incoming stream of road imagery must be transformed into a series of lines that delineate the road surface. The processor can find lines within a field of data by looking for edges. These edges form the boundaries within which the driver should keep the vehicle while it is moving forward. The processor must track these line markers and determine whether to notify the driver of irregularities. Keep in mind that several other automobile systems also influence the lane-departure system. For example, use of the braking system and the turn signals typically will block lane departure warnings during intentional lane changes and slow turns.
Figure 2. Basic steps in a lane-departure algorithm and how the processor might connect to the outside world. Lets now drill deeper into the basic components of the lane-departure system example. Figure 3 follows the same basic operational flow as Figure 2 but with more insight into the algorithms being performed. The video stream coming into the system needs to be filtered and smoothed to reduce noise caused by temperature, motion, and electromagnetic interference. Without this step, it would be difficult to find clean lane markings.
Figure 3. Algorithm flow, showing results of intermediate image-processing steps. The next processing step involves edge detection; if the system is set up properly, the edges found will represent the lane markings. These lines must then be matched to the direction and position of the vehicle. The Hough transform will be used for this step. Its output will be tracked across frames of images, and a decision will be made based on all the compiled information. The final challenge is to send a warning in a timely manner without sounding false alarms. Image Acquisition In devices supporting ITU-R 656, each boundary between blanking data and active video data is set using a 4-byte data sequence that is embedded within the data stream. The PPI automatically decodes this sequence, without processor intervention, to collect the incoming active video frames. With this embedded control scheme, the physical connection is simply eight data lines and a clock. The PPI also connects to a wide range of image sensors and data converters that do not have an embedded control scheme. In these cases, the PPI provides up to three frame syncs to manage incoming or outgoing data. For a video stream, these frame syncs function as physical horizontal sync, vertical sync and field lines (HSYNC, VSYNC, and FIELD). For automotive safety applications, image resolutions typically range from VGA (640 × 480 pixels/image) down to QVGA (320 × 240 pixels/image). Regardless of the actual image size, the format of the data transferred remains the samebut lower clock speeds can be used when less data is transferred. Moreover, in the most basic lane-departure warning systems, only gray-scale images are required. The data bandwidth is therefore halved (from 16 bits/pixel to 8 bits/pixel) because chroma information can be ignored. Memory and Data Movement Because video data rates are very demanding, frame buffers must be set up in external memory, as shown in Figure 4. In this scenario, while the processor operates on one buffer, a second buffer is being filled by the PPI via a DMA transfer. A simple semaphore can be set up to maintain synchronization between the frames. With Blackfins flexible DMA controller, an interrupt can be generated at virtually any point in the memory fill process, but it is typically configured to occur at the end of each video line or frame.
Figure 4. Use of external memory for a frame buffer. Once a complete frame is in SDRAM, the data is normally transferred into internal L1 data memory so that the core can access it with single-cycle latency. To do this, the DMA controller can use two-dimensional transfers to bring in pixel blocks. Figure 5 shows an example of how a 16 × 16 macroblock, a construct used in many compression algorithms, can be stored linearly in L1 memory via a 2D DMA engine.
Figure 5. A 2D to 1D DMA transfer from SDRAM into L1 memory. To efficiently navigate through a source image, four parameters need to be controlled: X Count, Y Count, X Modify, and Y Modify. X and Y Counts describe the number of elements to read in/out in the horizontal and vertical directions, respectively. Horizontal and vertical are abstract terms in this application because the image data is actually stored linearly in external memory. X and Y Modify vaues achieve this abstraction by specifying an amount to stride through the data after the requisite X Count or Y Count has been transferred. From a performance standpoint, up to four unique SDRAM internal banks can be active at any time. This means that in the video framework, no additional bank-activation latencies are observed when the 2D-to-1D DMA is pulling data from one bank while the PPI is feeding another. Projection Correction Image Filtering Obviously, an image filter needs to operate fast enough to keep up with the succession of input images. Thus, it is imperative that image filter kernels be optimized for execution in the fewest possible number of processor cycles. One effective means of filtering is accomplished with a basic two-dimensional convolution operation. Lets look at how this computation can be performed efficiently on a Blackfin Processor. Convolution is one of the fundamental operations in image processing. In two-dimensional convolution, the calculation performed for a given pixel is a weighted sum of intensity values from pixels in the neighborhood of that pixel. Since the neighborhood of a mask is centered on a given pixel, the mask area usually has odd dimensions. The mask size is typically small relative to the image; a 3 × 3 mask is a common choice because it is computationally reasonable on a per-pixel basis but large enough to detect edges in an image. The basic structure of the 3 × 3 kernel is shown in Figure 6. As an example, the output of the convolution process for a pixel at row 20, column 10 in an image would be: Out(20,10) = A × (19,9) + B × (19,10) + C × (19,11) + D × (20,9) + E × (20,10) + F × (20,11) + G × (21,9) + H × (21,10) + I × (21,11)
Figure 6. Basic structure of the 3 × 3 convolution kernel. The high-level algorithm can be described with the following steps: Figure 7 shows an input matrix, F, a 3 × 3 mask matrix, H, and an output matrix, G.
Figure 7. Input matrix, F; 3 × 3 mask matrix, H; and output matrix, G. After each output point is computed, the mask is moved to the right. On the image edges, the algorithm wraps around to the first element in the next row. For example, when the mask is centered on element F2M, the H23 element of the mask matrix is multiplied by element F31 of the input matrix. As a result, the usable section of the output matrix is reduced by one element along each edge of the image. By aligning the input data properly, both of Blackfins multiply-accumulate (MAC) units can be used in a single processor cycle to process two output points at a time. During this same cycle, multiple data fetches occur in parallel with the MAC operation. This method allows efficient computation of 2 output points for each loop iteration, or 4.5 cycles per pixel instead of the 9 cycles per pixel of Figure 7. Edge Detection Noise in the image will almost guarantee that pixels having equal gray scale levels in the original image will not have equal levels in the noisy image. Noise will be introduced based on many factors that cant be easily controlled, such as ambient temperature, vehicular motion, and outside weather conditions. Quantization errors in the image will result in edge boundaries extending across a number of pixels. These factors work together to complicate edge detection. Because of this, any image-processing algorithm selected must keep noise immunity as a prime goal. One popular detection method uses a set of common derivative-based operators to help locate edges within the image. Each of the derivative operators is designed to find places where there are changes in intensity. In this scheme, the edges can be modeled by a smaller image that contains the properties of an ideal edge. Well discuss the Sobel Edge Detector because it is easy to understand and illustrates principles that extend into more complex schemes. The Sobel Detector uses two convolution kernels to compute gradients for both horizontal and vertical edges. The first is designed to detect changes in vertical contrast (Sx). The second detects changes in horizontal contrast (Sy).
The output matrix holds an edge likelihood magnitude (based on horizontal and vertical convolutions) for each pixel in the image. This matrix is then thresholded in order to take advantage of the fact that large responses in magnitude correspond to edges within the image. Therefore, at the input of the Hough Transform stage, the image consists only of either pure white or pure black pixels, with no intermediate gradations. If the true magnitude is not required for an application, this can save a costly square root operation. Other common tehniques in building a threshold matrix include summing the gradients from each pixel or simply taking the largest of the two gradients. Straight Line DetectionHough Transform
The meaning of this equation can be visualized by extending a perpendicular from the given line to the origin, such that θ is the angle that the perpendicular makes with the abscissa and θ is the length of the perpendicular. Thus, one pair of coordinates (ρ, θ) can fully describe the line. Lines L1 and L2 in Figure 8a demonstrate this concept. Figure 8b shows that L1 is defined by θ1 and the length of the red perpendicular, while L2 is defined by θ2 and the length of the blue perpendicular line.
Figure 8a. The output of an edge detector is a binary image like this one, which can be visually inspected by a human observer to show lines. A Hough Transform allows localization of these two lines.
Figure 8b. The two white lines in the image above can be described by the lengths and angles of the red and blue perpendicular line segments extending from the origin.
Figure 8c. The Hough transform of the image in Figure 8a. The range for θ is [0, 2π], and the range for ρ is one-half the diagonal of the input image in Figure 8a. The two bright regions correspond to local maxima, which can be used to reconstruct the two lines in Figure 8a. Another way to look at the Hough Transform is to consider a way that the algorithm could be implemented intuitively: The Hough transform is computationally intensive because a sinusoidal curve is calculated for each pixel in the input image. However, certain techniques can speed up the computation considerably. First, some of the computation terms can be computed ahead of time, so that they can be referenced quickly through a lookup table. In Blackfins fixed-point architecture it is very useful to store the lookup table only for the cosine function. Since the sine values are 90 degrees out of phase with the cosines, the same table can be used, with an offset. With the lookup tables in use, the computation of Equation (1) can be represented as two fixed-point multiplications and one addition. Another factor that can improve performance is a set of assumptions about the nature and location of lane markings within the input image. By considering only those input points that could potentially be lane markings, a large number of unnecessary calculations can be avoided, since only a narrow range of θ values need be considered for each white pixel. The output of a Hough Transform is a set of straight lines that could potentially be lane markings. Certain parameters of these lines can be calculated by simple geometric equations. Among the parameters useful for further analysis are the offset from the cameras center axis, the widths of the detected lines, and the angles with respect to the position of the camera. Since lane markings in many highway systems are standardized, a set of rules can eliminate some lines from the list of lane-marking candidates. The set of possible lane-marking variables can then be used to derive the position of the car. Lane Tracking The problem of estimating lane geometry is a challenge that often calls for using a Kalman filter to estimate the road curvature. Specifically, the Kalman filter can predict future road informationwhich can then be used in the next frame to reduce the computational load presented by the Hough transform. As described earlier, the Hough transform is used to find lines in each image. But these lines also need to be tracked over a series of images. In general, a Kalman filter can be described as a recursive filter that estimates the future state of an object. In this case, the object is a line. The state of the line is based on its location and its motion path across several frames. Along with the road state itself, the Kalman filter provides a variance for each state. The predicted state and the variance can be used in conjunction to narrow the search space of the Hough transform in future frames, which saves processing cycles. Decision MakingCurrent Car Position or Time to Lane-Crossing With a processing framework in place, system designers can add their own intellectual property (IP) to the decision phase of each of the processing threads. The simplest approach might be to take into account other vehicle attributes when making a decision. For example, a lane-change warning could be suppressed when a lane change is perceived to be intentionalas when a blinker is used or when the brake is applied. More complex systems may factor in GPS coordinate data, occupant driving profile, time of day, weather, and other parameters. CONCLUSION Copyright 1995- Analog Devices, Inc. All rights reserved. |