Have you ever wondered whether there is other intelligent life in the Universe? The Search for Extraterrestrial Intelligence (SETI) Institute has been working on this question for years, listening for signs of life in outer space. This endeavor entails data collection and analysis on an enormous scale. The Allen Telescope Array is a set of 42 radio telescopes perched in the Cascade Mountains at the Hat Creek Observatory. Each one is trained on a different part of the sky collecting signals and creating a data stream of over 60 gigabits of data per second.


Allen Telescope Array

SETI collects all this data hoping to find a signal that stands out, or a set of signals that form a pattern distinct from the usual array. Upon discovery of such an anomaly, they divide the data into smaller chunks and take a closer look at known activities and events at the time the signal was detected. Of course, this process requires understanding the nature of all signals over time. Today, their analysis tools can examine patterns within a short time frame. The goal is to be able to analyze patterns over multiple years.

How can SETI process and analyze this huge amount of data? Maybe you’ll remember that SETI used to leverage a volunteer army of personal computers to crunch data in their spare time. Now IBM is working to help the project take advantage of the latest data processing technologies. This job requires sophisticated modeling to tell apart different signals and machine learning algorithms that can weed out man-made signal interference. Apache® Spark™ is the tool we’ve chosen. This open-source cluster-computing framework has in-memory processing, which enables analytic applications to run up to 100 times faster than other technologies on the market today.

“With Spark as a Service on Bluemix, we’ll be able to work with IBM to develop promising new ways to analyze signal data as we hunt for evidence of intelligence elsewhere in the cosmos. This is an exciting example of synergy in the service of science.”

We’re currently establishing a proof-of-concept, performing computations on a large number of signals at high speed using IBM Analytics for Apache Spark. For in-depth details on the type of signals and the nature of computations, read Types of Big Data from the Allen Telescope Array.

To learn more about our work, listen to A Journey Through Space with IBM Analytics for Apache Spark a DBTA roundtable webcast. (Follow the link and register to listen to the recording.) As things progress, I’ll be posting more updates about this project. Stay tuned.

© “Apache”, “Spark,” and “Apache Spark” are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Join The Discussion

Your email address will not be published. Required fields are marked *