IBM Data Virtualization Installation

kapil rajyaguru
6 min readApr 11, 2022
Photo by Claudio Schwarz on Unsplash

Companies often try to break down silos by copying disparate data for analysis into central data stores, such as data marts, data warehouses, and data lakes. However, this is costly and prone to error when most manage an average of 400 unique data sources for business intelligence. With data virtualization, you can access data at the source without moving data, accelerating time to value with faster and more accurate queries. This article documented step-by-step instructions to deploy IBM Data Virtualization on IBM Cloud Pak for Data running on Red Hat OpenShift.

Assumptions

  • Red Hat OpenShift cluster has access to a high-speed internet connection and can pull images directly from IBM Entitled Registry. If not set up yet, please follow the instructions provided here.
  • IBM Cloud Pak for data Control Plane, Foundational Services are installed and running. If not, please follow the instructions provided here.
  • IBM Cloud Pak for data operator is installed in the “ibm-common-services” namespace, and foundational services are installed in the “cpd-instance” namespace.
  • DV operator will be installed in the “ibm-common-services” namespace, and the DV service will be installed in the “cpd-instance” namespace.
  • Installing for demo purposes and so, the latest version of the software will automatically install on the Red Hat OpenShift cluster.
  • User has knowledge and experience managing Red Hat OpenShift cluster

Pre-Requisite

  • Red Hat OpenShift cluster version 4.6 or later with min 64 vCPU and 256 GB RAM
  • Bastion host with two vCPU and 4GB RAM with Linux OS
  • Internet access for Bastion host and Red Hat OpenShift cluster
  • OpenShift Container Storage (OCS) is attached to the Red Hat OpenShift cluster. This link will help you determine supported storage. In this demo, I used OCS Storage.
  • A User with OpenShift Cluster and Project Administrator access

Step 1: Download files from the GitHub repo using the following command.

git clone https://github.com/kapilrajyaguru/Data-Virtualization.git

After downloading files, switch to the Watson-Knowledge-Catalog-Installation directory.

cd Data-Virtualization/

Step 2 — Creating an operator subscription for services

  • Create the Db2U operator subscription by running the following command
oc apply -f db2u-operator.yaml
  • Validate that the operator was successfully created.

    Run the following command to confirm that the subscription was triggered:
oc get sub -n ibm-common-services ibm-db2u-operator -o jsonpath=’{.status.installedCSV} {“\n”}’

Verify that the command returns db2u-operator.v1.1.11.

  • Run the following command to confirm that the cluster service version (CSV) is ready:
oc get csv -n ibm-common-services db2u-operator.v1.1.11 -o jsonpath=’{ .status.phase } : { .status.message} {“\n”}’

Verify that the command returns Succeeded: install strategy completed with no errors.

  • Run the following command to confirm that the operator is ready:
oc get deployments -n ibm-common-services -l olm.owner=”db2u-operator.v1.1.11" -o jsonpath=”{.items[0].status.availableReplicas} {‘\n’}”

Verify that the command returns an integer greater than or equal to 1. If the command returns 0, wait for the deployment to become available.

Step 3 — Create the Data Virtualization operator subscription.

oc apply -f dv-operator-sub.yaml
  • Validate that the operator was successfully created.
    Run the following command to confirm that the subscription was triggered:
oc get sub -n ibm-common-services ibm-dv-operator-catalog-subscription -o jsonpath=’{.status.installedCSV} {“\n”}’

Verify that the command returns ibm-dv-operator.v1.7.6.

  • Run the following command to confirm that the cluster service version (CSV) is ready:
oc get csv -n ibm-common-services ibm-dv-operator.v1.7.6 -o jsonpath=’{ .status.phase } : { .status.message} {“\n”}’

Verify that the command returns Succeeded: install strategy completed with no errors.

  • Run the following command to confirm that the operator is ready:
oc get deployments -n ibm-common-services -l olm.owner=”ibm-dv-operator.v1.7.6" -o jsonpath=”{.items[0].status.availableReplicas} {‘\n’}”

Verify that the command returns an integer greater than or equal to 1. If the command returns 0, wait for the deployment to become available.

Step 4 — CRI-O Container Settings

If you have already installed Watson Knowledge Catalog, you have steps 4 & 5 performed. So, no need to perform it again, and you can directly jump to Step 6.

  • Copy crio.conf to /tmp directory
cp crio.conf /tmp/
  • Log in to Red Hat open shift in the command line. Use cloned machineconfig object YAML file, as follows, and apply it.
    Note: If you use Cloud Pak for Data on OpenShift Container Platform version 4.6, the ignition version is 3.1.0. If you are using Cloud Pak for Data on OpenShift Container Platform version 4.8, change the ignition version to 3.2.0 in the machineconfig. yaml
oc apply -f machineconfig.yaml

The above action will reboot your cluster nodes one by one. Monitor all of the nodes to ensure that the changes are applied by using the following command:

watch oc get nodes

You can also use the following command to confirm that the MachineConfig sync is complete:

watch oc get mcp

Step 5: Kernel Parameter Settings
The following step will enable unsafe sysctls by configuring kubelet to allow Db2U to make unsafe sysctls calls for db2 to manage required memory settings.
Update all of the nodes to use a custom KubletConfig:

oc apply -f kubeletconfig.yaml

Update the label on the machineconfigpool:

oc label machineconfigpool worker db2u-kubelet=sysctl 

Wait for the cluster to restart and then run the following command to verify that the machineconfigpool is updated:

oc get machineconfigpool

Next, wait until all of the worker nodes are updated and ready.

Step 6 — Create a DvService custom resource to install Data Virtualization.

Important: By creating a DvService custom resource with spec.license.accept: true; you are accepting the license terms for Data Virtualization. You can find links to the relevant licenses in IBM Cloud Pak for Data License Information.
Create a custom resource with the following format.

oc apply -f dv-service.yaml

When you create the custom resource, the Data Virtualization operator installs Data Virtualization.

  • Get the status of Data Virtualization (dv-service):
    Run the following command:
oc get dvservice dv-service

The result is similar to the following example, where the READY field indicates whether the DvService is installed.

NAME        READY
dv-service True
  • To check whether the DvService has finished installing Data Virtualization service pods, run the following command:
oc get DvService dv-service -o jsonpath=”{.status.reconcileStatus}”

Data Virtualization is installed when the command returns Completed. You must now provision a Data Virtualization instance to use Data Virtualization.

Step 7 — Provision the Data Virtualization service:

  • On the navigation menu, click Services > Instances.
  • From the list of instances, locate the Data Virtualization service, click the action menu, and select Provision instance.
  • To configure the service, specify the resources you want to allocate to the Data Virtualization worker nodes in the Nodes step.
  • Specify the number of Data Virtualization worker nodes to allocate to the service.

Recommended: One worker node is sufficient for many workloads.

  • Specify the number of cores to allocate per node.
    You are constrained by the total number of available cores on the OpenShift® compute nodes.
  • Specify the amount of memory in GB to allocate per node.
    You are constrained by the total amount of memory on the OpenShift compute nodes.
  • You can scale the Data Virtualization service up and down at any time after you provision it. For more information, see Scaling Data Virtualization.
  • In the Storage step, specify the storage classes and persistent volume sizes that you want to use for the service nodes and caching storage. For more information, see Storage requirements.
  • Select the storage class in the Node storage section and specify the size to allocate to your nodes. The default size shown in the Node storage section is 50Gi.
  • The term worker pod in Data Virtualization refers to a c-db2u-dv-db2u-x pod that runs one Data Virtualization worker component, where x starts at 1. You can allocate multiple worker components, which are effectively multiple c-db2u-dv-db2u-x pods, to the Data Virtualization service instance.
  • Select the storage class in the Caching storage section and specify the amount of storage to allocate to your data caches.
    Note: Part of the cache storage space is used to refresh active caches with a periodic refresh schedule. This refresh schedule impacts the storage space available for creating new cache entries.
  • Click Next.
  • Ensure that the summary is correct and click Configure.
    Wait for the service to be provisioned. This might take some time because of the number of components that must startup.
  • Optional: If you want to use Cloud Pak for Data while you wait for the Data Virtualization provisioning process to complete, click Home.

I hope this quick step-by-step guide will help you quickly deploy IBM Data Virtualization on IBM cloud pak for data running on the Red Hat OpenShift cluster.

Other Useful Resources

--

--

kapil rajyaguru

Enabling Organizations with IT Transformation & Cloud Migrations | Principal CSM Architect at IBM, Ex-Microsoft, Ex-AWS. My opinions are my own.