IBM Support

Announcing IBM Db2 Big SQL v5.0.4 on Cloudera's CDH v5.x Platform - Hadoop Dev

Technical Blog Post


Abstract

Announcing IBM Db2 Big SQL v5.0.4 on Cloudera's CDH v5.x Platform - Hadoop Dev

Body

Announcing the immediate availability of Db2 Big SQL on Cloudera platform. 

 
Extending the support on Hadoop platforms, Db2 Big SQL will now add support to Cloudera Distributed Hadoop (CDH) v5.x platform along with Hortonworks Data Platform (HDP) v2.6 & v3.1
 
Db2 Big SQL is an advanced SQL engine optimized for advanced analytics in a Big Data environment. IBM infuses the power of industry leading Db2 in open source technologies for machine learning, interactive and batch analytics use cases. With Big Data, there is overwhelming amount of data but unless you plan and execute the search for meaning in that data, it is impossible to reap the benefits of modernization. 
 
To analyze this data, Db2 Big SQL offers the enterprise SQL engine capabilities like: ANSI SQL compliant, advanced cost-based optimizer, optimized query rewrite, federation, high performance, high concurrency, data security, automatic workload management, automatic memory management, application portability and many more. Some key aspects where Db2 Big SQL accelerates digital transformation journey are discussed here:

Advanced data fabric for analytics

Digital transformation is about the need to derive new business value from digitizing and optimizing operations rather than finding the needle in haystack. This involves unlocking hidden silos of data that was never available for analysis before. This enables businesses to appreciate having better insight for decision-making and product innovation.  With this advanced data fabric of Cloudera platform and Db2 Big SQL, managing large-scale clusters within complex data environments enables achieving the business goal of improving predictions and finding new opportunities for the business easy. Some highlights that enables them are:
  • Query data where it resides, including Hive, HBase, traditional RDBMS and NoSQL databases
  • Smart federation enables enhanced data virtualization with various data warehouses and also provides access to S3 object storage
  • A single database connection enables data access across Hadoop and other sources, whether on cloud, on premises, or both, to perform analytics across the entire enterprise

Seize cost saving opportunities 

While Hadoop provides high scalability, Db2 Big SQL’s advanced cost-based optimizer and massively parallel processing (MPP) architecture can execute queries not only faster but also smarter. With its ability to run more concurrent users and complex SQLs with less hardware makes it attractive solution to cut costs while reaping benefits of data warehousing on commodity hardware. Db2 Big SQL is the only SQL-on-Hadoop solution to understand different dialects of SQL from vendors and products, such as Oracle, Db2 and Netezza. It has high compatibility with ANSI SQL standards with support for PL/SQL as well. With these rich capabilities, benefits include:
  • Migrate existing applications without major rewrites
  • Port BI applications for business intelligence tools including Cognos, Tableau and others
  • Offload costly ETL processing to free your EDW to perform analytics and operations
  • Archive data from EDW that is running out of capacity for new data
  • Provides elastic scalability with its ability to successfully run all 99 TPCDS queries up to 100TB with numerous concurrent users. It also has the ability to run multiple workers per node for efficient CPU and memory utilization
  • Provides a stable environment for applications and avoid unnecessary query rewrites with platform changes or migrations
  • Seamless integration with Cloudera Manager provides better management of Db2 Big SQL service on CDH platform
  • Db2 Big SQL’s Unified Console provides comprehensive monitoring and administration of the queries being executed

Empower end-users

With data sizes ranging from gigabytes to petabytes, business analyst or data scientists run interactive queries to explore and understand data before building models or charts. With its unmatched scalability and performance, Db2 Big SQL empowers users and applications to unlock insights from data with analytics tools of choice while achieving high concurrency for BI workloads by executing complex queries smarter.  While providing data access to users, we need to ensure only necessary data can be accessed based on privileges and sensitive data are anonymized or masked when they do not have the right access privileges. Such advanced capabilities enables self-service analytics in a governed and safe manner. While keeping data secure, Db2 Big SQL opens up the following opportunities:
  • For developers, the usage pattern allows access to Db2 Big SQL with specific products or tooling that only allows Open Database Connectivity (ODBC) or Java™ Database Connectivity (JDBC)
  • Robust SQL based role-based access control (RBAC), row-based dynamic filtering, column-based dynamic masking are natively available in Db2 Big SQL. For Cloudera platform, Apache Sentry integration provides centralized security administration for data lakes
  • Support for popular open source file formats like Parquet, ORC, Avro, Text, Sequence, etc. enables reusing the schema definitions already setup
  • Centralized Hive metastore provides seamless operability the various SQL engines on Hadoop and does not restrict to lock into just one
  • Db2 Big SQL enables short rapid queries to be used that search by key words or key word ranges. It uses HBase when random, real-time read/write access to your data is needed
  • Data scientists can access data directly using their tool of choice and build, test and deploy models seamlessly using Db2 Big SQL

 

One entry point for users and applications….Many uses. Designed for your enterprise workload needs
 
 
Some useful links:
 

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259731