Datasets

About datasets

An open dataset is data that is freely available to everyone to use and republish, without restrictions from copyright, patents, or other mechanisms of control. Open datasets will be licensed or have clear terms of use to permit users to use the data in any way they want, including transforming, combining and sharing it with others, even commercially.

Red Hat Marketplace is offering open datasets to provide customers access to the software and data they need in their OpenShift clusters in a single location. This shortens customer’s data discovery time and accelerates the building and updating of their applications to respond to the needs of their customers or current situations. It also allows for easy access of the data they need to uncover new business insights.


View all datasets

Filter products using the Datasets product category.

Procedure

  • To view all datasets, on the Home page, on the Product Categories section, select Datasets.

Result

All datasets show on the page.

Next Steps

To filter your view, select an option from the following facets:

  • Topics
  • Industries
  • Source

Download datasets to a local computer

This task describes how to download a dataset to a local computer.

Prerequisites

  • Purchase or get a free dataset from Red Hat Marketplace

About this task

When you download a dataset to your local computer, you download the latest available version of the dataset.

Procedure

  1. On the main menu, click Workspace, click Datasets, and then click on the product box.
  2. On the Downloads tab, find the Download files directly section.
  3. To download the dataset, on the row of the file, click Download.

Result

The dataset downloads to your browser.


Mount to OpenShift - Automated

This task describes how to automatically mount datasets to OpenShift clusters.

Prerequisites

  • Purchase or get a free dataset from Red Hat Marketplace.
  • Register OpenShift cluster with Red Hat Marketplace. The OpenShift cluster can be on major version 4, with any available supported minor version.
  • Install the version of the OpenShift Command-line Interface (CLI), commonly known as oc, that matches your version of OpenShift.

About this task

When you mount a dataset to OpenShift, Red Hat Marketplace helps ensure you always have the latest version on your cluster. This process will automatically mount your dataset into all OpenShift pods in the namespaces and clusters you select during this process.

To install a dataset into multiple namespaces for a single cluster, you will need to repeat the procedure below for each desired namespace.

Namespaces with the prefix kube- and openshift- are protected and are not selectable in the procedure below. You are also not able to select namespaces that have been previously mounted.

When you unregister your OpenShift cluster after using this automated mount capability, you can still continue to use your datasets as if they were manually mounted, keeping Red Hat Marketplace and CSI drivers operational.

Procedure

  1. On the main menu, click Workspace, click Datasets, and then click on the product box.
  2. On the Downloads tab, click Mount.
  3. The automated method is pre-selected. To mount a dataset to OpenShift, follow the steps listed on the page, and then click Mount.

Result

Red Hat Marketplace Dataset Operator and Red Hat Marketplace CSI Driver are now installed into the the following namespaces:

  • openshift-operators
  • openshift-redhat-marketplace

Red Hat Marketplace Dataset custom resource definition (CRD) is installed into the namespace you selected during the procedure.

This enables automatic mounting of datasets purchased from the marketplace into your OpenShift workloads. All newly created OpenShift pods in the namespaces you selected during the mount procedure will have access to every dataset file in their /var/redhat-marketplace/datasets/ directory. Existing pods will need to be restarted to be mounted as we can not add volumes to running containers.

For more information, see the wiki for Red Hat Marketplace Dataset Operator and CSI Driver on Github.

Next steps

Connect the dataset directory to your application.


Mount to OpenShift - Manual

This task describes how to manually mount a dataset to OpenShift.

Prerequisites

  • Purchase or get a free dataset from Red Hat Marketplace
  • Install OpenShift Container Platform, major version 4 with any available supported minor version or later.
  • Install the version of the OpenShift Command-line Interface (CLI), commonly known as oc, that matches your version of OpenShift.
  • Log in to your OpenShift cluster, as a cluster admin and navigate to the appropriate namespace.

About this task

When you mount a dataset to OpenShift, Red Hat Marketplace helps ensure you always have the latest version on your cluster.

Notice that manually mounted datasets are not managed by Red Hat Marketplace Automated Mount feature and will need to be manually unmounted.

Procedure

  1. On the main menu, click Workspace, click Datasets, and then click on the product box.
  2. On the Downloads tab, click the Mount button, you’ll be taken to the Mount to OpenShift page.
  3. On the Mount to OpenShift page, click the Manual option to switch to the set instructions needed to manually mount your dataset.
  4. Follow the steps listed on the page.

Result

Red Hat Marketplace Dataset Operator and Red Hat Marketplace CSI Driver are now installed into the the following namespaces:

  • openshift-operators
  • openshift-redhat-marketplace

Red Hat Marketplace Dataset custom resource definition (CRD) is installed into the namespace you selected during the procedure.

All newly created OpenShift pods in the namespaces you selected during the procedure will have access to every dataset file in their /var/redhat-marketplace/datasets/ directory. Existing pods will need to be restarted to be mounted as we can not add volumes to running containers.

For more information, see the wiki for Red Hat Marketplace Dataset Operator and CSI Driver on Github.

Next steps

Connect the dataset directory to your application.


Unmount a dataset from an OpenShift namespace

This task describes how to un-mount a dataset from a specific OpenShift cluster and namespace

Prerequisites

Procedure

  1. On the main menu, click Workspace, click Datasets, and then click on the product box.
  2. On the Downloads tab, you will be presented with a table to view where the dataset has been mounted.
  3. Find the row containing the cluster and namespace that you want to remove the dataset from, open the overflow Menu (three dots), and click Unmount.
  4. A confirmation modal will appear to ensure you want to proceed with this action, click Unmount.

Result

The dataset is unmounted from the specified namespace and all containers running in this namespace will no longer have access to the dataset.

Red Hat Marketplace Dataset Operator and Red Hat Marketplace CSI Driver are not removed during this procedure. You must uninstall these components manually using the OpenShift cluster console.


About update frequency for datasets

Update frequency indicates the cadence when the latest version of the dataset becomes available. When you download a dataset to OpenShift, Red Hat Marketplace helps ensure you are always working with the latest available data. When you download locally, you will need to download the latest version as it becomes available.