IBM Watsonx.data

2 min readMay 10, 2023

At Think 2023, IBM announced watsonx, an AI and data platform designed to help enterprises scale and accelerate the impact of the most advanced AI with trusted data.

IBM watsonx.data is an open, hybrid, governed data store optimized for all data, analytics, and AI workloads, built on a data lakehouse architecture. Watsonx. data is a new, truly multi-engine data management system built to maximize the value and performance of Parquet and Iceberg on object storage. In the last several years, the industry has effectively decoupled storage from compute, and made the engine ephemeral via containerization. In addition, the multi-engine approach requires decoupling the database system catalog so that no single engine owns maintenance of the data.

Data can be ingested in Open File formats like Parquet, Avro, CSV, and more. In addition, most customers will utilize Open Table formats — notably Iceberg — to achieve enterprise performance. Data that lands in IBM watsonx.data will be drawn from a wide range of data management systems, including Oracle, Teradata, Snowflake, and all the most popular brands, along with a wide range of other systems and ingestible files.

IBM watsonx.data will run on a Red Hat OpenShift cluster and can be installed with or without Cloud Pak for Data. In addition, IBM watsonx.data support Presto, Netezza, DB2, and Spark engines.

IBM watsonx.data would be helpful to any Data Warehouse Customers (Db2, Netezza, or any Data Warehouse technology) looking to augment their investment by expanding to open data formats and/or considering a multi-engine approach to analytics. Also, existing Hadoop customers want to leverage their existing investment and gain more insights from the data in their data lake. New and existing IBM customers that want more flexibility and a low-cost data analytics solution

IBM watsonx.data provides the following benefits to the users.

Deploy anywhere with full support for hybrid-cloud and multi-cloud environments.
Cost-effective — Price/Performance optimized OLAP with query rewrites and pushdown.
Shared metadata across multiple query engines eliminates the need to re-catalog, accelerating time to value while ensuring governance and eliminating costly implementation.
Easy-to-use integrated console, Connect to your existing analytics data across hybrid cloud and deploy query engines in minutes. Explore and transform data using common SQL.
Fit-for-purpose query engines — Drive analytics costs down with cost-efficient compute and storage and fit-for-purpose analytics engines — Presto, Spark, Db2, Netezza — that dynamically scale up and down
Semantic automation — Discover, augment, refine, and visualize watsonx.data and metadata through the power of watsonx.ai models.

Other Resources

IBM Watsonx.data

Written by kapil rajyaguru