Red Hat OpenShift on Exoscale

VSHN Supported Features and Configuration

While this page describes the product features, a more detailed technical insight into how we operate OpenShift can be found under openshift.docs.vshn.ch.

Red Hat provides a document with OpenShift Container Platform 4.x Tested Integrations (for x86_x64) which applies also to this product.

Supported by default

These features and configurations are available out-of-the box and installed and configured by default.

Feature / Configuration Description

Feature / Configuration	Description
Authentication	Authentication is by default via VSHN Account (LDAP). User management is done via VSHN Portal (Self-Service).
Network Policy	Network policies are supported. By default, namespaces are configured to allow incoming trafic only from within the same namespace, from ingress and from monitoring. The user docs explain how you can customize this.
Integrated registry	The integrated registry is installed and enabled by default. It uses the cloud providers object storage to store images.
Machine Config (Compute nodes)	A set of default Machine configuration is available (see infrastructure specifics).
Operator Hub	The Operator Hub is enabled by default, no support is given on any Operators installed via the Operator Hub.
Image Builds	Building images on the platform is supported and enabled by default. Please note that using the `Docker build` strategy isn’t secure as it exposes the host system to root privilege escalation.
Cluster Monitoring	Cluster monitoring is enabled and used to ensure and assess cluster stability. Alerts are sent to VSHN and handled accordingly. Alert rules are tweaked regularly by VSHN.
OpenShift Central Logging	The integrated central logging based on Loki is installed and configured by default. Cluster metrics are further used to monitor the resource usage and resource availability on the whole cluster. User workload monitoring:: The same technology stack used for cluster monitoring is also available to monitor workloads deployed by cluster users. Users can create service monitors, alert rules and alert routes to monitor their applications. They can also use the OpenShift web console or a Grafana dashboard to inspect the metrics of their applications over time.
Cluster limits	We adhere to the official numbers which are documented under Planning your environment according to object maximums. Although the following limits are set by VSHN: Pods per node: Maximum 110, as per the recommendations of upstream Kubernetes.
Infrastructure Nodes	Each cluster has at least 3 nodes dedicated to OpenShift infrastructure components like router, registry, web console and monitoring components. No user workload is allowed on these nodes.
OpenShift Cluster Maintenance	OpenShift and node updates are applied continuously when they’re available. See also Version and Upgrade Policy.
OpenShift Cluster Backup	A full backup of the `etcd` database is made every 4 hours. Additionally, a second backup contains a dump of all objects in JSON format. This allows single objects to be restored on request. The backup data is encrypted before it is stored in an object storage backend, usually on the same cloud as the cluster is running. K8up is used as the backup operator, using Restic as backup backend. IMPORTANT: Please understand, this does not replace any sort of per application backup strategy. It also does not protect against failure of the underlying infrastructure and can not be used for disaster recovery purposes. Persistent storage volumes are not automatically backed up. The user of persistent volumes is obliged to take care of this. For that purpose, K8up is available on the cluster to help with that task. We’re also happy to help, just let us know.

Authentication

Authentication is by default via VSHN Account (LDAP). User management is done via VSHN Portal (Self-Service).

Network Policy

Network policies are supported. By default, namespaces are configured to allow incoming trafic only from within the same namespace, from ingress and from monitoring. The user docs explain how you can customize this.

Integrated registry

The integrated registry is installed and enabled by default. It uses the cloud providers object storage to store images.

Machine Config (Compute nodes)

A set of default Machine configuration is available (see infrastructure specifics).

Operator Hub

The Operator Hub is enabled by default, no support is given on any Operators installed via the Operator Hub.

Image Builds

Building images on the platform is supported and enabled by default. Please note that using the Docker build strategy isn’t secure as it exposes the host system to root privilege escalation.

Cluster Monitoring

Cluster monitoring is enabled and used to ensure and assess cluster stability. Alerts are sent to VSHN and handled accordingly. Alert rules are tweaked regularly by VSHN.

OpenShift Central Logging

The integrated central logging based on Loki is installed and configured by default.

Cluster metrics are further used to monitor the resource usage and resource availability on the whole cluster.

User workload monitoring:: The same technology stack used for cluster monitoring is also available to monitor workloads deployed by cluster users. Users can create service monitors, alert rules and alert routes to monitor their applications. They can also use the OpenShift web console or a Grafana dashboard to inspect the metrics of their applications over time.

Cluster limits

We adhere to the official numbers which are documented under Planning your environment according to object maximums. Although the following limits are set by VSHN:

Pods per node: Maximum 110, as per the recommendations of upstream Kubernetes.

Infrastructure Nodes

Each cluster has at least 3 nodes dedicated to OpenShift infrastructure components like router, registry, web console and monitoring components. No user workload is allowed on these nodes.

OpenShift Cluster Maintenance

OpenShift and node updates are applied continuously when they’re available. See also Version and Upgrade Policy.

OpenShift Cluster Backup

A full backup of the etcd database is made every 4 hours. Additionally, a second backup contains a dump of all objects in JSON format. This allows single objects to be restored on request. The backup data is encrypted before it is stored in an object storage backend, usually on the same cloud as the cluster is running. K8up is used as the backup operator, using Restic as backup backend.

IMPORTANT: Please understand, this does not replace any sort of per application backup strategy. It also does not protect against failure of the underlying infrastructure and can not be used for disaster recovery purposes.

Persistent storage volumes are not automatically backed up. The user of persistent volumes is obliged to take care of this. For that purpose, K8up is available on the cluster to help with that task. We’re also happy to help, just let us know.

Supported on request

These features or configuration adjustments must be specifically requested and some restrictions apply. Activation and configuration of these features imply additional engineering costs and can cause additional engineering costs for operating them (although no fixed additional recurring costs apply).

Feature / Configuration Description

Feature / Configuration	Description
Authentication	Authentication can be configured to use a custom provider in addition to the default VSHN Account. See Supported identity providers for a list of available providers.
Cluster-wide HTTP or HTTPS proxy	Configuring OpenShift to use a cluster-wide HTTP or HTTPS proxy is possible, but incurs additional individual engineering effort. The documentation states: The cluster-wide proxy is only supported if you used a user-provisioned infrastructure installation or provide your own networking, such as a virtual private cloud or virtual network, for a supported provider.
Cluster Admin	For private clusters "Cluster Admin" can be granted. This implies "with great power comes great responsibility". A sign-off is needed.
Disabling of Red Hat remote health monitoring (Telemetry)	OpenShift, by default, continuously sends data to Red Hat, see about remote health monitoring for details. This is enabled by default, but can be disabled on request. The exact metrics sent to Red Hat are documented in data-collection.md. Please note, that we have to ask Red Hat before we are allowed to do so.
Registry configuration	Exposing of the registry via the ingress controller can be configured.
Custom Machine Sets	Custom `MachineSet` can be defined to customize compute node availability.
OpenShift Pipelines	OpenShift Pipelines are only available on request.
Egress Gateway	The egress IP feature depends on the possibilities of the underlying networking infrastructure and therefore is only supported where the infrastructure allows it.
Audit logging	While audit logging is enabled by default on OpenShift per control plane node (see Viewing node audit logs) they are not forwarded or stored outside the cluster. There is no availability guarantee by default. If there is a need for special treatment of audit logs, it needs to be requested.
OpenShift Service Mesh	The OpenShift Service Mesh gives you more control over the traffic flow between services and gives you telemetry on that traffic. Furthermore, it can increase security by assigning each service a verifiable identity, and allows you to apply policies.

Authentication

Authentication can be configured to use a custom provider in addition to the default VSHN Account. See Supported identity providers for a list of available providers.

Cluster-wide HTTP or HTTPS proxy

Configuring OpenShift to use a cluster-wide HTTP or HTTPS proxy is possible, but incurs additional individual engineering effort. The documentation states: The cluster-wide proxy is only supported if you used a user-provisioned infrastructure installation or provide your own networking, such as a virtual private cloud or virtual network, for a supported provider.

Cluster Admin

For private clusters "Cluster Admin" can be granted. This implies "with great power comes great responsibility". A sign-off is needed.

Disabling of Red Hat remote health monitoring (Telemetry)

OpenShift, by default, continuously sends data to Red Hat, see about remote health monitoring for details. This is enabled by default, but can be disabled on request. The exact metrics sent to Red Hat are documented in data-collection.md. Please note, that we have to ask Red Hat before we are allowed to do so.

Registry configuration

Exposing of the registry via the ingress controller can be configured.

Custom Machine Sets

Custom MachineSet can be defined to customize compute node availability.

OpenShift Pipelines

OpenShift Pipelines are only available on request.

Egress Gateway

The egress IP feature depends on the possibilities of the underlying networking infrastructure and therefore is only supported where the infrastructure allows it.

Audit logging

While audit logging is enabled by default on OpenShift per control plane node (see Viewing node audit logs) they are not forwarded or stored outside the cluster. There is no availability guarantee by default. If there is a need for special treatment of audit logs, it needs to be requested.

OpenShift Service Mesh

The OpenShift Service Mesh gives you more control over the traffic flow between services and gives you telemetry on that traffic. Furthermore, it can increase security by assigning each service a verifiable identity, and allows you to apply policies.

Constraints

These features or configuration adjustments are mandatory. They are required so that we can provide a stable system.

Feature / Configuration	Description	Reasoning
One node spare capacity	The capacity of one node must always be unused.	In case of a node failure, the system needs free capacity to reschedule the workload that was running on that failed node. This is essential for the self-healing capabilities of Kubernetes. Without that spare capacity, the workload would remain unscheduled and the users of that workload might experience downtime. + The need to reschedule workload is also given during cluster upgrade. While a node is upgraded, its workload is rescheduled to other nodes.

Feature / Configuration

Description

Reasoning

One node spare capacity

The capacity of one node must always be unused.

In case of a node failure, the system needs free capacity to reschedule the workload that was running on that failed node. This is essential for the self-healing capabilities of Kubernetes. Without that spare capacity, the workload would remain unscheduled and the users of that workload might experience downtime. + The need to reschedule workload is also given during cluster upgrade. While a node is upgraded, its workload is rescheduled to other nodes.

Unsupported

These features or configuration adjustments are not supported by VSHN, but can still be activated or changed, although are neither monitored, backed up nor maintained. No guarantees are given, use them at your own risk.

Feature / Configuration Description Reasoning

Feature / Configuration	Description	Reasoning
Upgrade channels	We only support `stable` upgrade channels. Changing the channel isn’t supported or encouraged.	The `stable` upgrade channel offers the most tested upgrades which we see as a cornerstone for a stable service offering. Other channels could be used on non-production clusters. Specifically the `fast` channel is used for VSHN internal lab clusters for our own update QA.
Network configuration	We support only Cilium as the network plugin.	Networking is a complex component, and therefore we partnered with Isovalent, the maintainers of Cillium. With their support, we offer you a scalable and modern networking stack based on eBPF, that provides all the expected features, and even more. + Migrating a networking plugin is nothing done easily or straight up not possible. Thus, for historical reasons, we still operate some clusters with OVN-Kubernetes and OpenShift SDN.
Jaeger	Support for Jaeger is not available from VSHN (yet).	This is mainly caused due to the lack of experience running Jaeger.
OpenShift Virtualization	No support is available for container-native virtualization.	This is mainly caused due to the lack of experience running container-native virtualization and it is currently in Technology Preview.
OpenShift Serverless	Support for OpenShift Serverless is not available from VSHN (yet).	This is mainly caused due to the lack of experience running OpenShift Serverless and it is currently in Technology Preview.
Operator Lifecycle Manager (OLM)	The Operator Lifecycle Manager is installed and fully functional on the cluster, but we don’t guarantee full functionality of Operators installed via OLM by the end-user.	There are many Operators available via OperatorHub and we are not able to provide support for any of them.
Airgapped (disconnected) environments	Installing and running OpenShift in an airgapped environment, meaning that the cluster has no Internet access, is currently not supported by VSHN.	The cluster needs access to specific endpoints which are documented in the official OpenShift documentation and in the VSHN Knowledgebase. Supporting airgapped setups is on our long-term roadmap.
Bring-Your-Own-Subscription	OpenShift clusters managed by VSHN are bound to VSHNs CCSP subscriptions with Red Hat.	Attaching an OpenShift cluster to another subscription brings in a lot of operational support burden.
Disk Encryption	Encryption of local disks is currently not supported. If encryption at rest is needed it’s up to the storage provider (CSI) to support that.	The needed infrastructure (e.g. Tang server) to provide this feature is not available yet.

Upgrade channels

We only support stable upgrade channels. Changing the channel isn’t supported or encouraged.

The stable upgrade channel offers the most tested upgrades which we see as a cornerstone for a stable service offering. Other channels could be used on non-production clusters. Specifically the fast channel is used for VSHN internal lab clusters for our own update QA.

Network configuration

We support only Cilium as the network plugin.

Networking is a complex component, and therefore we partnered with Isovalent, the maintainers of Cillium. With their support, we offer you a scalable and modern networking stack based on eBPF, that provides all the expected features, and even more. + Migrating a networking plugin is nothing done easily or straight up not possible. Thus, for historical reasons, we still operate some clusters with OVN-Kubernetes and OpenShift SDN.

Jaeger

Support for Jaeger is not available from VSHN (yet).

This is mainly caused due to the lack of experience running Jaeger.

OpenShift Virtualization

No support is available for container-native virtualization.

This is mainly caused due to the lack of experience running container-native virtualization and it is currently in Technology Preview.

OpenShift Serverless

Support for OpenShift Serverless is not available from VSHN (yet).

This is mainly caused due to the lack of experience running OpenShift Serverless and it is currently in Technology Preview.

Operator Lifecycle Manager (OLM)

The Operator Lifecycle Manager is installed and fully functional on the cluster, but we don’t guarantee full functionality of Operators installed via OLM by the end-user.

There are many Operators available via OperatorHub and we are not able to provide support for any of them.

Airgapped (disconnected) environments

Installing and running OpenShift in an airgapped environment, meaning that the cluster has no Internet access, is currently not supported by VSHN.

The cluster needs access to specific endpoints which are documented in the official OpenShift documentation and in the VSHN Knowledgebase. Supporting airgapped setups is on our long-term roadmap.

Bring-Your-Own-Subscription

OpenShift clusters managed by VSHN are bound to VSHNs CCSP subscriptions with Red Hat.

Attaching an OpenShift cluster to another subscription brings in a lot of operational support burden.

Disk Encryption

Encryption of local disks is currently not supported. If encryption at rest is needed it’s up to the storage provider (CSI) to support that.

The needed infrastructure (e.g. Tang server) to provide this feature is not available yet.

Features marked as Technology Preview by Red Hat are unsupported by VSHN as well. A list of Technology Preview features is available in the release notes. For OpenShift 4.12 this list can be found in the OpenShift Container Platform release notes.

Still interested in one (or more) of these unsupported options? Get in contact with sales@vshn.ch and we figure out together what we can offer.

Version and Upgrade Policy

The official Red Hat OpenShift Container Platform Life Cycle Policy applies and has implications on the supported versions.

Supported is only the latest available Red Hat OpenShift 4 release. Installations must be upgraded to the next minor release within three months after a new release is available, or the latest when the next minor release is available.

Errata updates are installed as they are released and include updates to OpenShift itself as well as the Red Hat CoreOS nodes. By default the stable upgrade channel is used.

Support Data Sharing

For getting support from Red Hat we usually have to share status information with Red Hat. This is done using the oc adm must-gather command, which collects support information without sensitive data like secrets. More information about this tool is documented under Gathering data about your cluster.

Cluster Resource Handling and Availability

By the nature of a clustered system like Kubernetes is, some constraints apply to how resources are available to the user of the platform and how to work with them:

For having enough room to handle failing nodes and to ease maintenance processes it’s important to adhere to at least n+1 node availability and have at least three worker nodes in the cluster. For example on a three-node cluster it is required to only use the resources of two-thirds of them.
Some resources on each node and in the whole cluster are always reserved for system services.
- Cluster level: there needs to be enough resources available to run the control-plane and other system services like the registry or monitoring component, that’s why there are dedicated nodes in the cluster to run this workload.
- Node level: there is an amount of resources reserved on each node to allow for operating system services to function properly.

Exoscale Specific

The official documentation from Red Hat applies: Installing a cluster on bare metal. As Exoscale is not an official Red Hat OpenShift installer supported provider, the so-called UPI (User-provisioned infrastructure) installation mode applies.

Each cluster is installed in its own Exoscale account. Billing is usually done via VSHN.

The default installation on Exoscale uses public IP addresses for all VMs and restricts access to the cluster with Exoscale’s Security Groups.

Please contact us to discuss your particular requirements if the network architecture outlined above does not fit your needs.

Default Configuration / Minimum Requirements

This table shows the default configuration which is applied when nothing else is specified and defines the minimum requirements.

Item Description

Load Balancer

2 load balancer nodes, all in one zone, separated via anti-affinity rules.

Machine type: Medium
Disk: 20GB SSD

Control Plane

3 control plane nodes, all in one zone, separated via anti-affinity rules.

Machine type: Extra-Large
Disk: 120 GB SSD

Infrastructure Nodes

4 nodes, all in one zone, separated via anti-affinity rules.

Machine type: Extra-Large
Disk: 120 GB SSD

Compute Nodes

3 nodes, all in one zone, separated via anti-affinity rules.

Machine type: Extra-Large
Disk: 120 GB SSD

Storage Nodes

3 nodes, all in one zone, separated via anti-affinity rules.

Machine type: Extra-Large (CPU Optimized)
Disk: 350 GB SSD

We reserve 120 GB for the operating system, which leaves 230 GB for use by Ceph. Ceph requires some safety margin and uses only 85% of its disks (will go into read only mode when exceeded). Thus the available storage will be 195 GB (times 3 for the node count divided by three for the replication factor cancels each other out).

INFO: Prometheus and Alertmanager (Cluster Monitoring) requests a total of 104 GB but is expected to actually use around 30 GB. Hosted Logging requests 430 GB but 150 GB. This results in an expected disk usage of 180 GB leaving 15 GB to cover for some growth.

The default disk size only accounts for the storage required by the cluster itself (Monitoring and Logging). If storage will also be used by cluster users, this needs to be accounted for when defining the actuall cluster sizing.

Persistent Storage

Storage is only available with APPUiO Managed Storage Cluster.

Cloud Region

Defaults to CH-GVA-2 (Geneva, Switzerland)

VSHN supports all available zones of Exoscale.

For a detailed description about the machine types, have a look at the official Exoscale documentation.

Limitations

The following limitations are known on this infrastructure:

No autoscaling for worker nodes since there’s no support for Exoscale in OpenShift itself.
No support for service type LoadBalancer. This needs to be engineered case-by-case with the ExternalIP feature of OpenShift. Note that the Exoscale Cloud Controller Manager is not supported and tested on OpenShift.

Cloud Costs

If you want to calculate what the Exoscale resource costs look like, the minimal set of resources consists of

2 x Medium VMs
9 x Extra-Large VMs
1120 GB SSD storage
2 x Elastic IP
S3 object storage for the OpenShift registry

A good price calculator can be found under Exoscale Pricing Calculator.