The Oregon State University’s Center for Genome Research and Biocomputing (CGRB) and the Plankton Ecology Lab at OSU Hatfield have been collaborating in implementing an image processing pipeline to automate the classification of in situ images of plankton: microscopic organisms at the base of the food web in the world’s oceans and freshwater ecosystems. The imagery collection from a 10-day cruise typically contains approximately 80 TB worth of video, which, in some cases, may convert into image data yielding several billions of segments representing individual plankton and particles that need to be identified; a near impossible task to carry out manually by human experts. While we have a fully functional Convolutional Neural Net (CNN) algorithm that does an excellent job at predicting the identity of the plankton organisms or particles, we have been limited by GPU computational capabilities. We started working with PCI bus based Tesla K40 and K80 GPUs, which were good enough to manage millions of segments. However, when it came to billions of segments, it became a near insurmountable challenge.

37 days down to 10

To process a single batch of video files (composed of 344,000 individual frames, each yielding 100’s to 1000’s of individual segments of plankton) on a PCI based Tesla K80, could take upwards in the range of 37 days to complete – a prohibitive speed given the amount of data needing to be analyzed and time constrains for research purposes. We were provided access to IBM Minsky servers through the the Oregon State CGRB and OSUOSL OpenPOWER GPU Development Servers. These servers were configured with 160 thread count and 2x Tesla P100 GPUs that were in sockets on the motherboard and part of the system bus with NVLink. These new machines reinvented this algorithm and allowed us to change from 37 days of processing to around 10 days. This was a massive speed-up which allowed us a single machine where we could do both segmentation of the video files as well as GPU processing with massive throughput on large data sets. 

10 days down to 5

As we continue forward, IBM and OpenPOWER again provided us access to the new IBM Newell Power9 servers with 4x Tesla V100 GPUs.  In contrast to the P100, the V100 has again moved us forward taking large steps. With four GPU’s that use NVLink and are part of the system bus, we have been able to process similar sized batches in a matter of 5 days, which is beginning to expedite the availability of data to address important questions in biological oceanography. These new machines allow us to start looking at real time processing of data on the ships reducing labor and changing the scope of work we can address. While we are pushing the boundaries in computational biological oceanography, the availability of such fast processing machines has put us closer to our ultimate goal: to automate the classification of underwater images of plankton at near to real-time.

Read more about the project at Assessing Ocean Health at a Massive Speed & Scale

Christian Briseno-Avena
Post Doctoral Researcher
Hatfield Marine Science Center
Oregon State University

Robert K. Cowen
Hatfield Marine Science Center
Oregon State University

Christopher M. Sullivan
Assistant Director for Biocomputing
Center for Genome Research and Biocomputing
Oregon State University

Join The Discussion

Your email address will not be published. Required fields are marked *