Announcing the immediate availability of IBM Db2 Big SQL v5.0.4

IBM Db2 Big SQL, an advanced SQL engine on Hadoop, has been making strides with the fast-evolving open source ecosystem by supercharging your analytical workloads on data lakes. The core capabilities of Db2 Big SQL focusses on data virtualization, SQL compatibility, scalability, performance, and of course enterprise security/governance, making it a desirable query engine to seek insights from disparate data sources including Hadoop.

Db2 Big SQL v5.0.4 is now being released that introduces a solution for automatic failover for HA and enhanced performance capabilities to data lakes along with some ease of use and enterprise features. Let’s take a quick look of what’s new in this release:

 

Enterprise readiness

Being enterprise ready means being secure, scalable, stable and easy to run in production. Some of the new capabilities added in 5.0.4 are:

  • New Zookeeper based solution for automatic failover for HA

Being highly available is critical for any enterprise especially being able to automatically failover, from primary to secondary system, when an outage happens needs to be seamless with no disruption or manual intervention. In 5.0.4, Db2 Big SQL introduces a new solution that is based on Zookeeper technology to be synonymous with other Hadoop components. 

  • Create and manage access policies in Apache Ranger for federated sources

Apache Ranger policies for Db2 Big SQL were only able to provide access control for Hadoop objects. But now, you can set Ranger policies for federated sources as well which truly makes Apache Ranger a centralized access policy manager for Hadoop 

 

Performance

Every enterprise looks for high performance for its workloads and applications. Db2 Big SQL brings the advanced SQL engine to Hadoop that enables best query execution even when the query is very complex or tool generated without hitting any Out-Of-Memory errors. With continuous improvements to enhance performance, here are some of the highlights:

  • Better reader performance on Sorted tables

When scanning ORC and Parquet tables, the Db2 Big SQL readers apply query predicates to skip reading portions of files. When tables are sorted on columns that are predicates in queries, the table scan times can be significantly reduced. Testing shows 4-6x speedups in top 10 TPC-DS queries

  • Join Range Filter Predicate (JRFP) is ON by default

JRFP has been available since v4.2 but it had its limitations. In this release, this capability is enhanced to generate JRFP pushdowns for as many joins as possible. This capability is turned on by default and the SQL optimizer will pushdown when appropriate. Significant performance improvement was observed especially for Star Schema queries as it skips reading large portion of fact table rows thereby reducing the overall query processing times. Performance testing shows 3-6x speedup for top 10 most improved TCP-DS queries.

 

Ease of use

Ease of use provides efficient, effective, engaging, error tolerant and easy to learn capabilities in the product. One capability focused in this release is simplification of YARN configuration for Db2 Big SQL. Here are the highlights:.

  • YARN is widely used to centrally manage resources for various services that run in a Hadoop cluster. But it is always challenging to have the right configuration. Few enhancements made to have better control over YARN, here are some of the improvements made:
    • Notify users through Ambari UI when disk failures are detected for all nodes, head and worker nodes
    • When there is a damaged DB or disk, prevent YARN from starting that node which will block any queries to be sent to that node
  • HBase is no longer mandatory when installing Db2 Big SQL
  • Install Db2 Big SQL on heterogenous clusters i.e a cluster with mixed hardware and OS versions
  • Db2 Big SQL can now support huge tables, i.e. supports maximum number of columns to 2048
  • All new Look & Feel of product documentation to help find the details you are looking for or find solutions to the problems you want to solve. Developer written Best Practices are also now part of the documentation for some quick tips and tricks.

 

Some useful links:

Join The Discussion

Your email address will not be published. Required fields are marked *