Monterey Bay Aquarium Research Institute
Automated Visual Event Detection
 

In order to study the distribution and abundance of oceanic animals, MBARI uses high-resolution video equipment on remotely operated vehicles. Quantitative video transects (QVTs) supplant traditional net tows to assess the quantity and diversity of organisms in the water column. QVTs are run from 50 m to 4000 m and provide high-resolution data at the scale of the individual animals as well as their natural aggregation patterns. However, the current, manual method of analyzing QVTs is labor intensive and tedious.  

We are developing an automated system for detecting marine organisms visible in the videos. Video frames are processed with a neuromorphic selective attention algorithm. The candidate objects of interest are tracked across video frames using linear Kalman filters. If objects can be tracked successfully over several frames, they are labeled as potentially “interesting” and marked in the video frames. The plan is that the system will enhance the productivity of human video annotators and/or cue a subsequent object classification module by marking candidate objects. 

The continued use of ROVs and future use of Autonomous Underwater Vehicles (AUVs) for QVTs offer potential for even more data, perhaps many times what we current collect and analyze. Hence, we see tremendous benefit in automating portions of the analysis. We also see great benefit in automating analysis of video from fixed ocean observatory cameras, where autonomous response to potential events (pan/zoom to events), and automated processing of largely “boring” (event sparse) video streams from 10s or 100s or even 1000s of network cameras could be key to those cameras being useful practical scientific instruments.

Above:  This pair of images shows a single frame from an MBARI video taken during a survey of midwater animals. The upper image shows a barely visible siphonophore (middle) and a jelly (lower left). The lower image shows this same frame, in which the computer has picked out these animals from among the marine snow and debris, even though they are barely visible to the untrained eye. Note: These images have been enhanced for publication. The original frame is darker and has even less contrast.

Above: Flow diagram of a typical model for the control of bottom-up attention. This diagram is based on Koch and Ullman’s hypothesis that a centralized two-dimensional saliency map can provide an efficient control strategy for the deployment of attention on the basis of bottom-up cues. The input image is decomposed through several pre-attentive feature detection mechanisms (sensitive to color, intensity, etc), which operate in parallel over the entire visual scene. Neurons in the feature maps then encode for spatial contrast in each of those feature channels. In addition, neurons in each feature map spatially compete for salience, through long-range connections that extend far beyond the spatial range of the classical receptive field of each neuron (here shown for one channel; the others are similar). After competition, the feature maps are combined unto a unique saliency map, which topographically encodes for saliency irrespective of the feature channel in which stimuli appeared salient. The saliency map is sequentially scanned by attention through the interplay between a winner-take-all network (which detects the point of highest saliency at any given time) and inhibition of return (which suppresses the last attended location from the saliency map, so that attention can focus onto the next most salient location). Top-down attentional bias and training can modulate most stages of this bottom-up model. 

 

Last updated: Feb. 04, 2009