“Those who fail to learn from history are doomed to repeat it”

These words are often attributed to Winston Churchill (although probably originally written by the philosopher George Santayana). These words have recently taken on new meaning in the context of security analytics.

The most important development in the world of security is the massive adoption of analytical techniques in order to distill massive amounts of logs, flows and raw data into a few meaningful and actionable events. This has been made possible and commonplace by adopting machine learning techniques and applying them to the world of security. It is changing the way we implement security and is clearly the most important change for us security professionals.

Apache LibCloud Security Tips

What is sometimes forgotten is that machine learning and analytics are more effective when there is more historical data than when there is less. This is true for all machine learning techniques – both supervised and unsupervised. Problems of small-data are many, a few being:

Hence the words by Churchill/Santayana or as Mark Twain is reputed to have said, “history doesn’t repeat itself but it often rhymes”. Simply put – the more history you have, the better and more reliable your analytical results are.

Long-term retention of data has always been a mandate driven by compliance. For example, most companies interpret PCI as requiring data to be retained for 13 months. But this has often been implemented using impractical “frozen archives” – archived data that takes weeks to bring back online, making historical compliance reporting possible but very painful.

Since the quality of the analytics is directly proportional to how much data the algos have to work with, the need to retain data online is coming to the forefront. These new security retention systems allow you to kill two birds with one stone – making long-term retention for compliance an easy thing while helping the machine learning algorithms perform better. As an example, read the developerWorks article on how to enable years of readily accessible online storage of Guardium data here – https://www.ibm.com/developerworks/library/se-sonarg-big-data-security-guardium-trs/index.html

 

More on machine learning and analytics for cybersecurity

Save

Save

Save

Save

Save

Save

1 comment on"Machine learning and analytics to find the signal within security noise"

  1. Thanks, Ron. Also, check out “Use balancing to produce more relevant models and data results” at https://www.ibm.com/developerworks/library/ba-1608balancing-spss-modeler-trs/index.html for easy-to-understand examples of how data can be skewed (and how you can fix it).

Join The Discussion

Your email address will not be published. Required fields are marked *