Rancher Kubernetes Engine (RKE) Security Best Practices for Cluster Setup - Part 1 of 4

This is the first installment in our four-part RKE security blog series. Don’t forget to follow along to our upcoming posts in the series:

Part 1 — Rancher Kubernetes Engine (RKE) Security Best Practices for Cluster Setup

Part 2 — Rancher Kubernetes Engine (RKE) Security Best Practices for Authentication, Authorization, and Cluster Access

Part 3 — Rancher Kubernetes Engine (RKE) Security Best Practice for Container and Runtime Security

Part 4 — Rancher Kubernetes Engine (RKE) Security Best Practice for Cluster Maintenance and Network Security

Introduction

The Rancher Ecosystem

The Rancher product ecosystem contains four core products that organizations can leverage for their Kubernetes deployments. They are:

Rancher 2.X is a multi-cluster, multi-cloud Kubernetes management platform. Organizations can use Rancher 2.X to configure and manage Kubernetes clusters in various public clouds, all from a single dashboard. Rancher 2.X provides simple, consistent cluster operations, including provisioning, version management, visibility and diagnostics, monitoring, and alerting. The Rancher dashboard also provides a catalog of relevant services that teams may implement in their environments.

For on-premises environments, Rancher leverages RKE to create and manage Kubernetes clusters. With RKE, organizations can leverage declarative configuration to create repeatable cluster deployments, enable quicker updates, and simplify cluster scaling. This is achieved by RKE’s use of a containerized version of the open-source Kubernetes distribution.

This blog will focus on deploying Kubernetes clusters with RKE for on-premises deployments. Securing various public cloud offerings requires knowledge unique to each environment, and Longhorn and K3s are out of scope for this post.

RKE Cluster Setup

RancherOS

To leverage RKE, organizations must use a Linux-based operating system with the correct machine requirements. Minimizing the attack surfaces of operating systems is a critical first step towards cluster security. Rancher has helped with this initial first step by releasing a minimal Linux distribution with Docker installed. With RancherOS, every process is containerized and managed by Docker. Also, by removing unnecessary libraries and services, RancherOS is significantly smaller than most traditional operating systems. RancherOS uses the most stable up-to-date version of Docker to take advantage of bug fixes and updates.

To get started with RancherOS, users can test the operating systems in a docker-machine or download and use the iso file. And lastly, Rancher is very transparent about the vulnerabilities of RancherOS and welcomes user feedback to make the product better.

The Cluster Configuration File

The configuration of an RKE cluster is set up using a config file usually named cluster.yml. The config file contains all of the information required to set up a cluster, including:

Node information
Audit logging setup
Private registry location
Bastion host setup
Ingress controller setup
Authorization mode
And other configurations

Rancher has documented a full list of configuration options outlining the various defaults and customizable options. The ability to write your Kubernetes Cluster configuration in a declarative format is useful for security since cluster configurations can be version-controlled and securely updated.

High Availability (HA)

Keeping clusters highly available and able to withstand outages is a significant security consideration. Organizations should set up Kubernetes and its supporting components so that there is no single point of failure. To do this, Kubernetes requires three or more nodes for the control plane, including etcd. With the RKE config file, nodes can be specified as the control plane, etcd, or worker nodes.

There are two principal options for an HA setup. The first is a Stacked etcd topology. A stacked topology is where the etcd database is located on the same node as the control plane. This setup has the advantage of using half as many machines as would be needed if etcd ran on a separate node from the other control plane components. However, a stacked cluster runs the risk of failed coupling. If one node goes down, both an etcd member and control plane components are lost, and redundancy is compromised.

The second HA topology is the External etcd topology. With external etcd nodes separate from the control plane components, clusters have greater redundancy from node failure. However, this topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.

Make sure to consider high availability in mind when designing your RKE clusters. Luckily RKE makes it easy for users to specify their preferences using a simple declaration in the cluster.yml file. Users can select the components that they want on each node. For example,

nodes:
  - address: 1.1.1.1
    user: ubuntu
    role:
      - controlplane
      - etcd
    ssh_key_path: /home/user/.ssh/id_rsa
    port: 2222

nodes:
  - address: 1.1.1.1
    user: ubuntu
    role:
      - worker
    ssh_key_path: /home/user/.ssh/id_rsa
    port: 2222

Virtual Private Network

Strict network isolation is critical to prevent unauthorized access to cluster API endpoints, nodes, or pods. Ensure that your cluster machines are hosted on a private network to minimize the attack surface. When creating multiple RKE clusters, secure each cluster by providing each with its private network. This will allow for more fine-grained control of network access and for isolating workloads more efficiently.

Secure OpenSSH Access

RKE clusters utilize SSH protocol during the initial bootstrapping process for the clusters. Exploiting SSH access to nodes is a common attack vector. Rancher provides the host requirements necessary for RKE to function; however, there are extra steps users can take to secure SSH access.

Having host access is a privilege that organizations should only grant to specific users.

Securing etcd

The etcd database has specific requirements to guarantee the proper permissions for files and directories during installation time. Users can set the etcd file permission directly from the configuration file. A valid UserID (UID) or GroupID (GID) is used to prevent accidental or malicious user access to the etcd files. For example:

services:
  etcd:
    gid: 52034
    uid: 52034

A second part of securing the etcd database is taking snapshots of the database in case of an outage. Users can configure their etcd backup configuration directly in the cluster.yml file.

services:
  etcd:
    backup_config:
      enabled: true
      interval_hours: 2
      retention: 6
    gid: 52034
    uid: 52034

The example here outlines how backups can be configured - a snapshot is taken every two hours while retaining six snapshots. These settings should vary based on the needs of an organization. To create a cluster from a previous backup, users can run a single command referencing the original config file and the desired snapshot.

Container Runtime

RKE currently uses any modern Docker version as its container runtime; however, K3s supports Docker and containerd. Docker and containerd are both container runtimes; Docker includes extra functionality, such as a networking suite, whereas containerd is designed for simplicity and management of the container lifecycle.

Docker’s networking capabilities allow users to gain more access into the host network than would be capable with containerd. Hopefully, Rancher will expand its support for other container runtimes such as CRI-O and containerd. In the meantime, make sure to understand the various ways of securing Docker.

Private Registries

All of Rancher’s system images are hosted online on Docker Hub. This configuration means a dependency on Docker Hub for access to images when building a cluster and exposes your cluster to public networks. Private registries stop any outages or throttling associated with a public registry and remove the external dependency. Docker allows for private registries so containers can be built, run, and managed all from an internal network’s safety.

Rancher provides instructions for setting up a private registry through the cluster.yml file.

private_registries:
  - url: example_registry.com
    user: username
    password: password
    is_default: true

Private registries should be enabled as the default registry to avoid unnecessary calls to Docker Hub and allow users not to worry about specifying image location in the YAML files. To create a production-ready private registry server, follow Docker’s configuration guide.