IBM Support

Announcing IBM Db2 Big SQL v5.0.3 - Hadoop Dev

Technical Blog Post


Abstract

Announcing IBM Db2 Big SQL v5.0.3 - Hadoop Dev

Body

Announcing the immediate availability of IBM Db2 Big SQL v5.0.3 – maintenance release 

IBM Db2 Big SQL, an advanced SQL engine on Hadoop, has been making strides with the fast-evolving open source ecosystem by supercharging your analytical workloads on data lakes. The core capabilities of Db2 Big SQL focusses on data virtualization, SQL compatibility, scalability, performance, and of course enterprise security/governance, making it a desirable query engine to seek insights from disparate data sources including Hadoop.

Db2 Big SQL v5.0.3 is now being released that brings enhanced performance and data virtualization capabilities to data lakes along with some ease of use and enterprise capabilities. Let’s take a quick look of what’s new in this release:

Data virtualization / query data efficiently from where it resides

Data virtualization is a differentiating capability of Db2 Big SQL. Augment or enrich the Hadoop data with other data sources like RDMBS, NoSQL, etc. to generate high quality data for deep insights. Db2 Big SQL’s evaluates the remote databases’ statistics and vendor capabilities before rewriting the query for best execution plan. Some highlights in this release are: 

  • New data sources: MySQL, PostgreSQL, MariaDB
  • Function mapping pushdown for efficient query processing that brings only relevant resultset rather than the whole dataset.
  • Create a local cache of federated data by creating MQTs on Hadoop for improved performance. 

Performance

Having good performance is pivotal for any enterprise workload or application. Db2 Big SQL brings the advanced SQL engine to Hadoop that enables best query execution even when the query is very complex or tool generated without hitting any Out-Of-Memory errors. There are continuous improvements being made to enhance performance in the query engine. Some highlights are:

  • Advanced I/O engine for all open source file formats 

Db2 Big SQL now has one Java I/O engine for all open source file formats to provide better out-of-the-box performance and improved resource utilization. Other benefits include:

  • Enhanced Db2 Big SQL architecture
  • Improved performance on all tables/file types
  • Better interoperability with open source
  • New tool to optimize performance by inspecting file layout

Query performance can suffer when the file layout is fragmented into multiple small files. This tool detects this condition and recommends corrective action to optimize file size and layout is available. For more information, see Detection and compaction of small files in HDFS. 

Ease of use

Ease of use brings provides efficient, effective, engaging, error tolerant and easy to learn capabilities in the product. One capability focused in this release is to simplify the install, upgrade and patch management process so users can easily benefit from the latest code.

  • Simplified install, upgrade and patch management experience

Hadoop ecosystem is very complex and installing Db2 Big SQL has brought about some challenges. As the open source community is working is making interoperability between the components simpler and easier, we are making strides in improving the install and upgrade experience as well. Some highlights are:

  • Flexible deployments for highly customized enterprise environments
  • Fast patch management that reduced downtime for upgrades

Enterprise readiness

Being enterprise ready means being secure, scalable, stable and easy to run in production. Some of the new capabilities added in 5.0.3 are:

  • Query transactional Hive tables from Db2 Big SQL

When Hive tables are enabled with transactional support, you can now query those tables from Db2 Big SQL by federating to the Hive server. These transactional Hive tables are great for scenarios where we need slowly changing dimension tables.

  • Enhanced usability and serviceabilityfor security and governance of data
  • Integration with IGC brings comprehensive and centralized governance to Hadoop. Having Db2 Big SQL’s metadata in Metadata Asset Manager allows central monitoring and management of datalineage which generates lineage reports, impact analysis reports, etc.
  • Db2 Big SQL’s Ranger plugin allows resource based access policies to be defined and the resource lookup helps to easily understand what policies are defined for what objects. We now have an easy way to setup Ranger policies in Db2 Big SQL when it is already defined in Hive Ranger. 

Some useful links:

 

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259763