Oceangoing platforms are integrating high-resolution, multi-camera video feeds for scientific observation and navigation, producing a deluge of visual data. The volume and rate of this data collection can rapidly outpace researchers’ abilities to process and analyze them. To manage this deluge, MBARI and partners at CVision AI and MIT’s Media Lab are building FathomNet (Fig. 1), a publicly available database that makes use of existing (and future), expertly curated data from a number of sources, including MBARI’s Video Annotation and Reference System (VARS). FathomNet will provide much-needed training data (e.g., annotated and localized imagery) for developing machine learning algorithms that will enable fast, sophisticated analysis of visual data. Together these tools will allow us to better understand our ocean and its inhabitants, facilitating effective and responsible marine stewardship.
Figure 1 – Diagram illustrating the bottleneck preventing advancement of machine learning at MBARI, i.e., localized images, that FathomNet aims to address.
FathomNet will facilitate the development of novel techniques and technologies that allow researchers to mine existing MBARI video and image data collections in ways not previously possible. Leveraging a fully digitized video archive of nearly 26,000 hours of video, we will be able to refine and rediscover taxa, patterns, and associations in existing data that were not possible with manual analysis techniques. MBARI’s VARS database contains nearly 6.5 million observations, 1 million images, and over 4,600 concepts (or classes). While we have a limited subset of this data currently incorporated into FathomNet, the potential for data augmentation and further refinement is promising (~6%, Fig. 2). We envision developing algorithms that will be used to mine existing data from MBARI video observations during the past 30 years of work in the deep sea. These developments will also assist and, therefore, speed-up future annotations by automating the identification of common or prioritized taxa or features while, at the same time, allowing for novel analytic techniques including temporal changes, species interactions, etc.
Figure 2 – MBARI’s Video Annotation and Reference System (VARS) contains more than 6M annotations, 1M framegrabs, and 4k classes in the knowledgebase; FathomNet currently contains 60k images of midwater and benthic classes (to the genus level), which is approximately 6% of the potential framegrabs contained within VARS. A smaller subset of images currently in FathomNet have more than 23k bounding boxes and labels that span 198 taxonomic classes.
We are implementing a machine learning workflow (Fig. 3) that uses VARS imagery and data to generate image training sets for taxa and habitats of interest. The localized VARS data and imagery is being curated and validated by MBARI deep-sea biology experts, and is the first step in developing machine learning algorithms that can be used for a variety of different purposes.These expertly curated images will be available for model training by MBARI staff and external partners through the FathomNet database and repository. In addition, pre-trained models, which can be used to analyze other image collections (i.e., other MBARI platforms or external partner data such as NOAA or National Geographic Society Fig. 4), will be made available in a ML Model Zoo. Working closely with MBARI scientists and collaborators, we have generated a list of priorities that include midwater and benthic taxa that capture abundant or ecologically important organisms from habitats that also have detailed quantitative transects (e.g., Midwater 1, Smooth Ridge, Station M, and Sur Ridge in the Monterey Bay National Marine Sanctuary).
Figure 3 – Diagram illustrating the FathomNet workflow, the blue, ‘human-in-the-loop’ circle represents the area with the highest labor risk, which is expert annotator time for localization and ML proposal validation.
Ultimately, FathomNet will culminate in an underwater image training set and machine learning algorithms that can be used by scientists to address diverse questions within ocean sciences. FathomNet will be expertly curated, with hierarchical labels and localizations to the genus and species level, rivaling other image training sets in terrestrial applications. By leveraging external partnerships already established as part of the Big Ocean, Big Data effort (e.g., MBNMS, NGS, NOAA, OET, MIT Media Lab, CVision AI) and disseminating the data via a publicly accessible platform, FathomNet will significantly impact a number of areas both in academic, non-profit, and state/federal ocean management sectors. FathomNet data will be publicly available, and will incorporate image/localization data from additional partner sources such as NOAA and National Geographic Society.
Figure 4 – (Top Panel) Using MBARI’s underwater imagery in FathomNet, a machine learning object detection and classification algorithm correctly identifies morphotype classes. Subsequent few-shot learning models for each morphotype should enable classification of objects in imagery to the genus level. (Bottom Panel) GradCAM++ results from (left) NOAA’s ROV Okeanos and (right) National Geographic Society’s DropCam footage. For both images, the machine learning model was trained on MBARI’s benthic imagery in FathomNet. GradCam++ outputs saliency maps (left) for the class Sebastolobus, correctly detecting multiple Sebastolobus in the footage, and (right) for the class Merluccius, detecting a similar-looking class of animals in the footage.
• Principal investigator: Kakani Katija
• Project manager: Lonny Lundsten
• Lead engineer: Brian Schlining
• Image analysts: Lonny Lundsten (primary), Giovanna Sainz (primary), Kyra Schlining, Kris Walz, Megan Bassett, Larissa Lemon
• External partners: Katy Croff Bell (MIT Media Lab), Ben Woodward (CVision AI)