In this section, you learn how to create a hybrid cloud data architecture using IBM Cloud Pak for Data and AWS. This section includes four tutorials covering different features of the IBM services in the AWS environment.
Data fabric is a highly scalable, distributed data architecture comprising shared data assets and streamlined data integration and governance capabilities that can be used to tackle modern data challenges. A typical data fabric solution consists of multiple components, such as Data Catalog, Data Integration, Data Governance, and Data Visualization.
In tutorials that follow, you learn how to solve the challenges faced by different personas in data and AI:
Data scientists spend 80% of their time discovering, curating, and cleansing the data. How can you provide them with quality data for building AI-based solutions?
Data engineers face lots of challenges while integrating data from multiple data sources. How can they quickly and efficiently collect and integrate data?
Data stewards deal with data privacy and protection challenges. How can you ensure that the data is being governed and no sensitive information is being shared with data consumers?
These gaps can be addressed by the governed data fabric architecture using IBM Cloud Pak for Data.
After completing this section, you will understand how to:
Create a connection between external data sources and IBM Cloud Pak for Data
Ingest data from multiple data sources
Clean, filter, and reshape data
Query data from multiple data sources without copying or moving the data
Create a data integration pipeline to transform and integrate data from heterogeneous data sources
Protect sensitive data (such as PII) to be shared with data consumers
Schedule a job to periodically run a data integration pipeline
Architecture
Flow
Create an external connection between external data sources (such as Amazon S3 or Amazon Aurora PostgreSQL) and IBM Cloud Pak for Data.
Use IBM Data Virtualization to query data from multiple data sources without creating a data replica.
Use IBM DataStage to create an ETL pipeline.
Use IBM Data Refinery Flow to clean and filter the data.
Use IBM Watson Knowledge Studio to profile and govern the data.
Supply the data to an AI-based predictive system such as Amazon SageMaker or Jupyter Notebook to create machine learning models.
Video demo
For a quick introduction, start with our video introducing data access and governance using IBM Cloud Pak for Data on AWS.
Included components
Here are the components and services that are included in this section:
IBM Cloud Pak for Data: A data and AI platform with a data fabric that makes all data available for AI and analytics on any cloud.
IBM DataStage: An integration tool that helps you design, develop, and run jobs that move and transform data.
IBM Data Refinery: A cloud service that provides a self-service data preparation client to transform raw data into data that's ready for analytics.
IBM Watson Knowledge Catalog: Activate business-ready data for AI and analytics with intelligent cataloging, backed by active metadata and policy management.
Amazon Redshift: Accelerate your time to insights with fast, easy, and secure cloud data warehousing at scale.
Amazon Aurora: Designed for unparalleled high performance and availability at global scale with full MySQL and PostgreSQL compatibility.
Featured technologies
Data Fabric: An architectural approach to simplifying data access in an organization to facilitate self-service data consumption.
Analytics: Uncover insights with data collection, organization, and analysis.
Data Management: Organize and maintain data processes throughout the information lifecycle.
Data Privacy: Ensures that user data is used responsibly.
Data access and governance use cases
This section covers 2 use cases as illustrated in the following diagram:
In next tutorial, you learn how to solve data silo challenges without copying or moving data using the Data Virtualization service offered by IBM Cloud Pak for Data.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.