Article

IBM watsonx.data on AWS

Scale AI workloads for your data, anywhere

By

Christie Yeh,

Arpit Nanavati

This article presents an overview of IBM watsonx.data on Amazon Web Services (AWS). It highlights the new features first showcased at AWS re:Invent 2023 and walks through the solution architecture on AWS and key use cases for the product. You'll also find links to resources and demos for you to dive deeper into watsonx.data and the IBM / AWS partnership.

Data growth brings AI scale challenges

Clients store and access data across a wide range of hybrid environments, including public and private cloud, on premises, and edge environments. The aggregate volume of data stored is set to grow over 250%] in the next five years. This proliferation introduces significant complexity and presents challenges for capitalizing on that data.

IBM’s hybrid cloud strategy focuses on ensuring that you can build your workloads once and deploy anywhere. This approach requires tight connectivity to leading hybrid cloud environments, including Amazon Web Services. Ensuring that clients can capitalize on their data everywhere is central to hybrid cloud strategy.

Various market drivers are prompting clients to change their architecture strategy in order to better capitalize on this data. These drivers include infrastructure modernization AI, generative AI, digital transformation, and security and compliance. Clients want to ensure that they can capitalize on data regardless of where it's located and without introducing complexity or security and compliance challenges. And they want to do that by migrating data into a singular environment.

IBM has long been a market leader in relational database management systems (RDMS), such as IBM Db2, and online analytics processing in leading data warehouses, such as IBM Netezza on AWS and IBM Db2 Warehouse as a Service. We've listened to our clients, enabling them to deploy these data management systems across their hybrid cloud environments. We've also made these systems available to AWS clients, powered by Red Hat OpenShift and as Software as a Service (SaaS) on AWS. Our clients have let us know that as their data management solutions have evolved, access has become increasingly complex across data silos.

While they love the performance capabilities that IBM provides in our RDMS—specifically in a data warehouse for online analytics processing and big data workloads—clients find it challenging to reduce costs and drive significant scale for processing, all while managing data across environments. To realize cost reduction and scale, many clients have turned to data warehouses, data lakes, or cloud data warehouses, but they still face big complexity and cost challenges.

Progression of data management approaches

Progression of traditional data management approaches

Figure 1: Progression of traditional data management approaches

Traditional data warehouses were designed to offer high performance for processing terabytes of structured data for reporting and business intelligence (BI) workloads. However, these data warehouses have become expensive to scale.

Data lakes were designed with semi-structured and unstructured data in mind, utilizing lower cost storage. However, data became scattered and unorganized, resulting in data swamps: badly designed, inadequately documented, or poorly maintained data lakes, usually the result of a lack of proper processes, standards, and governance. These data lakes are complex to manage and compromise the ability to efficiently analyze and exploit the data.

With pay-as-you-go pricing, the shift to cloud data warehouses promised a way to drive analytics costs down. But in fact, customers are often finding they're spending more. Without proper financial guardrails to manage and predict costs, full time on cloud can be even more expensive than full time on-premises.

Given the prohibitive costs of high-performance on-premises and cloud data warehouses, and the performance, governance, and maintenance challenges of legacy data lakes, neither option satisfies the need for analytical flexibility and price-performance. A new approach is required: the data lakehouse architecture.

IBM’s strategy combines hybrid cloud with open source software. This approach ensures that clients don't necessarily need to consolidate their data but instead can process it where they need it to be. To meet our clients' needs, earlier this year we were excited to announce the release of IBM watsonx.data.

Meet IBM watsonx.data

IBM watsonx.data is a fit-for-purpose data store. It's based on an open lakehouse architecture and is supported by querying, governance, and open data formats for accessing and sharing data, which makes it possible for enterprises to scale AI workloads using all their data. This data store is based on fit-for-purpose engines, such as Presto and Spark.

IBM watsonx.data product page

Figure 2: IBM watsonx.data product page

Lakehouses are a new approach meant to combine the advantages of data warehouses and data lakes, but first-generation lakehouse vendors still have key constraints that limit their ability to address cost and complexity challenges. These constraints include:

  • A single query engine for processing, which limits the types of workloads that can be effectively run on it
  • Typically deployed over cloud only, with no support for multi-cloud, hybrid cloud, or on-premises deployment
  • Minimal governance and metadata capabilities to deploy across your entire ecosystem

Data lakehouse approach

Figure 3: Data lakehouse approach

With watsonx.data, clients are able to:

  • Access all data across hybrid-cloud through a single point of entry, eliminating data silos and duplication of data across environments through a built-in metadata layer
  • Connect to data in minutes and accelerate time to trusted insights with built-in governance, security, and automation capabilities
  • Optimize AI and analytics workloads and select the right engine, for the right workload, at the right cost, reducing data warehouse cost by up to 50%

IBM watsonx.data components: Query engines, open table formats, built-in enterprise governance

Figure 4: IBM watsonx.data components -- query engines, open table formats, built-in enterprise governance

IBM watsonx.data on AWS

IBM and AWS came together to help businesses achieve their data management goals, enabling customers to leverage IBM and AWS together. IBM watsonx.data can be provisioned as a fully managed SaaS solution on AWS from AWS marketplace. Some of the unique features of running IBM watsonx.data on AWS include:

  1. Hybrid data and real-time business intelligence: Combine, stream, and ingest data from existing AWS data sources, such as Amazon Aurora and Amazon S3, and IBM on AWS data sources, such as IBM Netezza on AWS and IBM Db2 Warehouse on AWS, with new data to unlock new, faster insights without the cost and complexity of duplicating and moving data across different environments.
  2. Fit-for-purpose data engineering: Reduce data pipelines, simplify data transformation, and enrich data for consumption using IBM watsonx.data ingestion pipelines, AWS services, and an AI-infused conversational interface. For example, Amazon RedShift data ingestion with AWS Glue in IBM watsonx.data through data staging in Amazon S3.
  3. Create access controls for data security and privacy: Enable self-service access for AWS and IBM data sources to ensure governance, security, and compliance using centralized identity and access management, and local automated policies within IBM watsonx.data.
  4. Data lineage to streamline accuracy for generative AI predictions: Data lineage features to capture, reproduce, and roll-back data to a historical point-of-time to achieve compliance, security, and auditability for AI/ML workloads.

AWS architecture and services integrations

The following diagram shows you the architecture of IBM watsonx.data on AWS Infrastructure, integration with AWS Services, and creation of secure connectivity to new and existing data sources.

IBM watsonx.data on AWS platform architecture and ecosystem integrations

Figure 5: IBM watsonx.data on AWS platform architecture and ecosystem integrations

IBM watsonx.data can integrate with the following AWS services, enabling you to pull in data from your existing AWS data sources:

When you integrate data from existing AWS and IBM data sources with new data in IBM watsonx.data, you can realize a wide range of benefits, including:

  • Unified view of data: Provides a unified view of data from different environments, enabling you to gain broader and more comprehensive insights
  • Reduced data duplication and movement: Eliminates the need to duplicate and move data across different environments, saving time, storage costs, and network bandwidth
  • Faster insights: Enables faster and more efficient analysis and insights by combining data from multiple sources
  • Improved decision-making: Empowers better decision-making by providing a holistic view of data from different sources
  • Reduced complexity: Simplifies data management and analysis by providing a single platform for accessing and analyzing data from different environments

Summary and next steps

You've now learned about IBM watsonx.data features, the AWS solution architecture, and integrations with AWS services. With these integrations, you can maximize your AWS and IBM investments. You can also be assured that AWS and IBM are working together to deliver highly available and secure analytics and AI workloads within a unified and governed ecosystem.

To learn more, check out the following resources:

IBM | AWS partnership

With Amazon Web Services (AWS) and IBM, unleash the transformative value of generative AI in your business with greater speed, scale and trust. The IBM, AWS, and Red Hat partnership brings a unique combination of leading enterprise AI, cloud, infrastructure, and open source technologies delivered with deep IBM consulting expertise. This enables companies to quickly and responsibly scale AI workloads using a comprehensive stack of generative AI, composed of Amazon Bedrock and IBM watsonx running on AWS Cloud and across hybrid cloud environments.

Learn more about the IBM / AWS partnership.