The Center for Genome Research and Biocomputing (CGRB) at Oregon State works closely with its research groups to embrace new technologies. Our researchers look to the CGRB to lower the activation energy using these new technologies while increasing the scope of work and removing bias associated to computational pathways. In the last two years, our groups have been moving to machines with higher thread count and lower latency to interact with research data in new ways. Because of this move, the CGRB started looking at the IBM POWER and OpenPOWER hardware to fill this role as the hardware had recently moved to a little endian byte order format. Our groups were also looking into new methods for processing that included the use of new GPU technologies and the new POWER machines seem to be out in front on this end as well.
To ensure that our researchers can take full advantage of the computational resources within the CGRB, our computational staff help compile and manage software packages and libraries. This process allows our researchers to focus their time and energy on the research questions and not trying to make tools and software work on computing hardware. As we moved onto the POWER architecture, we were impressed with how easy it was to compile and work with tools that were originally developed on x86 based hardware. This of course was because of the major change to little endian and this seem to change the way we were able to use the new hardware in front of us. The main trick we did to ensure software would easily compile was to go back in time and use the Autoconf tool the way it was originally designed. This was to run the command “autoconf” first which help determine the architecture and build the “configure” file based on that architecture. This was of course what we did back in the 1990’s when we had many different flavors of Unix and needed a way to compile the same tool on different architectures. Once we realized how easy it was we decided to have an undergraduate student just work on compiling tools we had on our x86 machines onto the POWER hardware. This of course worked very well and this student was able to compile over 2000 programs within a short time and my groups were quickly and easily able to start using the POWER hardware. All ported tools now have build scripts on github located at http://github.com/ppc64le/build-scripts.
So Easy Students Can Do It:
After my student was able to get so many tools ported, we quickly compiled the binaries needed to interact with our computational cluster and made environment settings to dynamically detect the architecture and allow our users to take advantage of this hardware without even knowing they are using it. Unfortunately many tools finished faster on the new POWER hardware and users quickly figured out how to schedule directly at the POWER hardware. This of course led the CGRB research groups to start purchasing new POWER machines to do some of the big work we have, taking advantage of things like increased threading, CAPI cards (GZIP and NVMe) and NVIDIA GPGPU directly on the motherboard.
Since our groups have had access to these machines, many of the limits we found doing deep-learning and AI processing disappeared and changed the scope of work we could process. For example, we have groups that are processing 100TB of data through CPU and GPU after one week at sea, mining plankton information to determine ocean health. More examples include a project that generates 250TB of data every 3 weeks of sampling to identify and quantify owl sounds in the forest. These research projects both use the CPU threading capabilities of the POWER machines and the new NVIDIA GPGPU. The new GPGPU hardware interactions on the POWER machines have changed the way we work with real data and GPU hardware.
Pitfalls and Future Direction:
There are of course a few tools that will not compile and we found that these was generally associated to the SSE, SSE2 and SSE3 libraries used by some groups to use Intel specific memory libraries. There are now new tools from IBM development group that can help developers change out these libraries for POWER based hardware and make the code work on both architectures. If the developers of tools want to get access to POWER based hardware for development or porting the CGRB has worked with the OSUOSL (https://osuosl.org) to provide free access to these new machines with GPGPU and NVMe technologies (see link below). My student was offered an internship with IBM and worked over a summer to create build scripts for these tools and provided them on the Github site.¬† In the end, moving onto the POWER hardware revolutionized our forward computational pathway and changed the way we approach our research questions by removing limits and extending our reach. I recently gave a talk on this process at the OpenPOWER Summit and you can find a recording of the talk in the links below.
Free Access to Development Environment:
The CGRB has worked with IBM and OpenPOWER to provide free access to POWER8/9 machines with GPU access. These machines will allow developers to gain access and work on porting tools to these incredible architectures. To get free access to these resources, simply click on the link below.
OpenPOWER GPU Free Development Access:
Github Site for PPC64LE build scripts:
OpenPOWER Summit 2018 Keynote Talk:
IBM Edge 2016 Conference Interview:
CGRB Test the CAPI Gzip Card:
Christopher M. Sullivan
Assistant Director for Biocomputing
Center for Genome Research and Biocomputing
Oregon State University