This is the first installment in our four-part RKE security blog series. Don’t forget to follow along to our upcoming posts in the series:
Part 1 — Rancher Kubernetes Engine (RKE) Security Best Practices for Cluster Setup
Part 3 — Rancher Kubernetes Engine (RKE) Security Best Practice for Container and Runtime Security
Introduction
The Rancher Ecosystem
The Rancher product ecosystem contains four core products that organizations can leverage for their Kubernetes deployments. They are:
Rancher 2.X is a multi-cluster, multi-cloud Kubernetes management platform. Organizations can use Rancher 2.X to configure and manage Kubernetes clusters in various public clouds, all from a single dashboard. Rancher 2.X provides simple, consistent cluster operations, including provisioning, version management, visibility and diagnostics, monitoring, and alerting. The Rancher dashboard also provides a catalog of relevant services that teams may implement in their environments.
For on-premises environments, Rancher leverages RKE to create and manage Kubernetes clusters. With RKE, organizations can leverage declarative configuration to create repeatable cluster deployments, enable quicker updates, and simplify cluster scaling. This is achieved by RKE’s use of a containerized version of the open-source Kubernetes distribution.
This blog will focus on deploying Kubernetes clusters with RKE for on-premises deployments. Securing various public cloud offerings requires knowledge unique to each environment, and Longhorn and K3s are out of scope for this post.
RKE Cluster Setup
RancherOS
To leverage RKE, organizations must use a Linux-based operating system with the correct machine requirements. Minimizing the attack surfaces of operating systems is a critical first step towards cluster security. Rancher has helped with this initial first step by releasing a minimal Linux distribution with Docker installed. With RancherOS, every process is containerized and managed by Docker. Also, by removing unnecessary libraries and services, RancherOS is significantly smaller than most traditional operating systems. RancherOS uses the most stable up-to-date version of Docker to take advantage of bug fixes and updates.
To get started with RancherOS, users can test the operating systems in a docker-machine or download and use the iso file. And lastly, Rancher is very transparent about the vulnerabilities of RancherOS and welcomes user feedback to make the product better.
The Cluster Configuration File
The configuration of an RKE cluster is set up using a config file usually named cluster.yml
. The config file contains all of the information required to set up a cluster, including:
- Node information
- Audit logging setup
- Private registry location
- Bastion host setup
- Ingress controller setup
- Authorization mode
- And other configurations
Rancher has documented a full list of configuration options outlining the various defaults and customizable options. The ability to write your Kubernetes Cluster configuration in a declarative format is useful for security since cluster configurations can be version-controlled and securely updated.
High Availability (HA)
Keeping clusters highly available and able to withstand outages is a significant security consideration. Organizations should set up Kubernetes and its supporting components so that there is no single point of failure. To do this, Kubernetes requires three or more nodes for the control plane, including etcd. With the RKE config file, nodes can be specified as the control plane, etcd, or worker nodes.
There are two principal options for an HA setup. The first is a Stacked etcd topology. A stacked topology is where the etcd database is located on the same node as the control plane. This setup has the advantage of using half as many machines as would be needed if etcd ran on a separate node from the other control plane components. However, a stacked cluster runs the risk of failed coupling. If one node goes down, both an etcd member and control plane components are lost, and redundancy is compromised.
The second HA topology is the External etcd topology. With external etcd nodes separate from the control plane components, clusters have greater redundancy from node failure. However, this topology requires twice the number of hosts as the stacked HA topology. A minimum of three hosts for control plane nodes and three hosts for etcd nodes are required for an HA cluster with this topology.
Make sure to consider high availability in mind when designing your RKE clusters. Luckily RKE makes it easy for users to specify their preferences using a simple declaration in the cluster.yml
file. Users can select the components that they want on each node. For example,
nodes:
- address: 1.1.1.1
user: ubuntu
role:
- controlplane
- etcd
ssh_key_path: /home/user/.ssh/id_rsa
port: 2222
Or
nodes:
- address: 1.1.1.1
user: ubuntu
role:
- worker
ssh_key_path: /home/user/.ssh/id_rsa
port: 2222
Virtual Private Network
Strict network isolation is critical to prevent unauthorized access to cluster API endpoints, nodes, or pods. Ensure that your cluster machines are hosted on a private network to minimize the attack surface. When creating multiple RKE clusters, secure each cluster by providing each with its private network. This will allow for more fine-grained control of network access and for isolating workloads more efficiently.
Secure OpenSSH Access
RKE clusters utilize SSH protocol during the initial bootstrapping process for the clusters. Exploiting SSH access to nodes is a common attack vector. Rancher provides the host requirements necessary for RKE to function; however, there are extra steps users can take to secure SSH access.
- Use Public/Private keys for authentication
- Avoid using passwords or disable them completely
- Configure a bastion host to run RKE bootstrapping commands
Having host access is a privilege that organizations should only grant to specific users.
Securing etcd
The etcd database has specific requirements to guarantee the proper permissions for files and directories during installation time. Users can set the etcd file permission directly from the configuration file. A valid UserID (UID) or GroupID (GID) is used to prevent accidental or malicious user access to the etcd files. For example:
services:
etcd:
gid: 52034
uid: 52034
A second part of securing the etcd database is taking snapshots of the database in case of an outage. Users can configure their etcd backup configuration directly in the cluster.yml
file.
services:
etcd:
backup_config:
enabled: true
interval_hours: 2
retention: 6
gid: 52034
uid: 52034
The example here outlines how backups can be configured - a snapshot is taken every two hours while retaining six snapshots. These settings should vary based on the needs of an organization. To create a cluster from a previous backup, users can run a single command referencing the original config file and the desired snapshot.
Container Runtime
RKE currently uses any modern Docker version as its container runtime; however, K3s supports Docker and containerd. Docker and containerd are both container runtimes; Docker includes extra functionality, such as a networking suite, whereas containerd is designed for simplicity and management of the container lifecycle.
Docker’s networking capabilities allow users to gain more access into the host network than would be capable with containerd. Hopefully, Rancher will expand its support for other container runtimes such as CRI-O and containerd. In the meantime, make sure to understand the various ways of securing Docker.
Private Registries
All of Rancher’s system images are hosted online on Docker Hub. This configuration means a dependency on Docker Hub for access to images when building a cluster and exposes your cluster to public networks. Private registries stop any outages or throttling associated with a public registry and remove the external dependency. Docker allows for private registries so containers can be built, run, and managed all from an internal network’s safety.
Rancher provides instructions for setting up a private registry through the cluster.yml
file.
private_registries:
- url: example_registry.com
user: username
password: password
is_default: true
Private registries should be enabled as the default registry to avoid unnecessary calls to Docker Hub and allow users not to worry about specifying image location in the YAML files. To create a production-ready private registry server, follow Docker’s configuration guide.