Data virtualization

Use data virtualization on IBM Cloud Pak for Data to make queries across multiple data sources

By

IBM Developer Staff

For decades, companies have tried to break down silos by copying data from different operational systems into central data stores for analysis, such as data marts, data warehouses, and data lakes. This is often costly and prone to error. Most struggle to manage an average of 33 unique data sources, which are diverse in structure and type, and are often trapped in data silos that are hard to find and access.

With data virtualization, you can query data across many systems without having to copy and replicate data, which helps reduce costs. It also can simplify your analytics and make them more up to date and accurate because you’re querying the latest data at its source.

Watson Query

Watson Query connects multiple data sources across locations and turns all of this data into one virtual data view. This read-only data view makes it easier to get value out of your data. After you create connections to your data sources, you can quickly view all of your organization's data. This virtual data view enables real-time analytics without moving data, duplication, ETLs, or additional storage requirements, so processing times are greatly accelerated.

Security

Centralized authentication and authorization are enforced for platform users to access data sources in a trusted environment. Various data virtualization roles provide granular access management to the virtualized assets. If you need to use data virtualization functions, you must be assigned specific data virtualization roles based on your job description.

All communication between the environment and the application is securely encrypted with robust IBM technology, and SSL/TLS encryption by using standard protocols.

Platform support

Watson Query supports queries by using standard SQL through common interfaces such as R, Spark, Python, and Jupyter Notebooks. In addition, queries are also supported by the most common analytics application tools, including IBM Watson Studio and IBM Cognos Analytics.

To learn more about data virtualization in Cloud Pak for Data, see Watson Query on Cloud Pak for Data.

Administering in Watson Query

After you've initiated the Watson Query service, you can:

  • Connect to data sources: Watson Query supports many relational and nonrelational data sources that you can add to your data source environment. Watson Query connects to relational data sources by using the Java™ Database Connectivity (JDBC) protocol. To learn more, see Connecting and authenticating to the Watson Query service.

  • Access data sources by using remote connectors: Watson Query supports the use of remote connectors to access local files on remote systems or to access remote data sources. To learn more, see Accessing data sources by using remote connectors in Watson Query.

  • Create virtual objects in Watson Query: You can use the Watson Query service to create virtual objects from various data sources so that you can query and use the data as if it came from a single source. Watson Query supports creating a virtual object from a single table, from multiple tables, or from files. You can also create a join view from multiple virtualized tables. To learn more, see Creating virtual objects in Watson Query.

  • Manage access to virtual objects in Watson Query: Watson Query administrators and engineers can grant users or groups access to virtual objects in Watson Query. To learn more, see Managing access to virtual objects in Watson Query.

Governing virtual data in Watson Query

Watson Query can integrate with Watson Knowledge Catalog to govern the virtual data that you publish to governed catalogs. Data governance involves applying business context, data policies, and data protection rules to your virtual data. To learn more, see Governing virtual data in Watson Query.

Summary

This section described data virtualization within IBM Cloud Pak for Data. You can view the product documentation to learn more about this topic.