Kubernetes, generally abbreviated as K8s, is an open-source orchestration tool developed by Google for managing containerized applications in different environments. In this article, you will learn about Kubernetes architecture, its different components, and concepts. If you’re interested in becoming a Certified Kubernetes Administrator, you can enroll in our Kubernetes Course where you will gain full knowledge of Kubernetes to automate deployment, scaling, and managing applications.
If you’re familiar with Docker, you might have heard about Docker Swarm, an orchestration tool provided by Docker. But, almost 88% of the organizations choose Kubernetes over Docker Swarm. If you want to learn Kubernetes and Docker, you can enroll in our Docker and Kubernetes Certification, where you will learn to build, test and deploy Docker applications with Kubernetes. But why do you even need Kubernetes? When you deploy your containerized applications in a production environment, you will need to manage many containers. If any container goes down, you would require another container to restart immediately to ensure almost zero downtime for your application. But are you going to do it manually? Obviously not! Kubernetes takes care of all the automation of deployments, scaling, and management of the application.
What is Kubernetes Architecture?
Kubernetes Architecture, in general, follows a client-server architecture. A node is a machine, physical or virtual, on which Kubernetes is installed. It is a worker machine where the containers are launched by Kubernetes. But what if the node on which your application was running fails? The application will go down. So, you need to have more than one node. A cluster is a set of nodes grouped together. So, even if one node crashes, your application is still accessible from the other nodes. Having more than one node helps in sharing the load as well. Now, you need something to manage and monitor your cluster. That’s where the control plane, or the master node, comes in. It is another node with Kubernetes installed and configured as a master. The master node watches over the worker nodes in the cluster and is responsible for the actual orchestration of the containers on the worker nodes. Thus, in short, a Kubernetes cluster consists of at least one worker node that runs pods and a control plane that manages the worker nodes.
Before you jump into the different components of Kubernetes, you should be aware of the different terminologies you will come across later.
- Pods: These are the smallest deployable units in Kubernetes. A pod encapsulates one or more closely related containers that can be treated as a single application. Although you can put more than one container in a pod, it is a good practice to house one container per pod. Each time a pod is created, it is assigned a new IP address.
- Deployment: Kubernetes deployment helps you achieve the desired state of a cluster in a declarative way. You just need to specify the desired state in a YAML file. Then the Deployment controller gradually updates the current state to the desired state by creating or destroying pods as required.
- Services: Pods are ephemeral in nature, i.e., they don’t last long. They are created and destroyed to match the desired state of the cluster. Now, there are conditions such as Deployments, where the set of pods running at one instance of time is different from the set of pods running at another instance of time. So, if some pods depend on other pods, it would be difficult for them to keep track of the IP addresses. That’s where Kubernetes Services jump in. Service defines a logical set of pods and a policy to access them. Now, you need not manage the IP addresses, because the Service can take care of that.
- Volume: Kubernetes supports many types of volumes, including ephemeral and persistent. Ephemeral volumes exist only till the pod exists, whereas persistent volumes store the data beyond the lifetime of the pods. A pod can use any number of volume types at a time.
- Namespaces: Namespaces in Kubernetes offer a way to isolate groups of resources within a single cluster. They are designed to be used in settings where numerous users are dispersed across different teams or projects.
- ConfigMaps: ConfigMaps are APIs that hold configuration information in key-value pairs. Their main job is to keep the container image and configuration apart. It can represent either the complete configuration file or specific properties. You should keep the configuration options distinct from the application code in Kubernetes to keep the image small and portable. ConfigMaps allow you to configure pods differently depending on the environment in which they are executing.
- Secrets: Kubernetes Secrets is an object used to store private information like usernames, passwords, tokens, and keys. When users want to keep confidential information and make it accessible to a pod, they can create secrets either through the system during the installation of an app or by themselves. Passwords, tokens, or keys might be unintentionally disclosed during Kubernetes operations if they were merely included in a pod specification or container image. Therefore, the secret's primary purpose is to keep the information it contains from being accidentally discovered while still keeping it accessible to the user wherever they are.
- ReplicaSets: ReplicaSets is one of the Kubernetes controllers that help you ensure the specified number of pod replicas are running.
As discussed earlier, a Kubernetes cluster consists of two main components - Control Plane and Worker Nodes. Now, these components themselves consist of different components.
The image below depicts the different components of a Kubernetes cluster.
Now, let us discuss these components in detail.
Control Plane Components
The control plane is responsible for managing the cluster’s state. It is the control plane that makes important decisions about the cluster and responds to the cluster events, such as starting a new pod when required.
The major components that comprise the control plane are - the API server, the scheduler, the controller manager, the etcd, and an optional cloud controller manager.
- APIserver: It is the frontend of the control plane and exposes the Kubernetes API. It validates all the internal and external requests and processes them. When you use the kubectl command-line interface, you basically interact with the kube-APIserver through REST calls. kube-APIserver scales horizontally by deploying more instances.
- etcd: The etcd is a consistent distributed key-value data store and the single source of truth about the status of the cluster. It is fault-tolerant and holds the configuration data and information about the state of the cluster.
- scheduler: It is the responsibility of the kube-scheduler to schedule the pods on the different nodes considering the resource utilization and availability. It makes sure that none of the nodes in the cluster is overloaded. The scheduler knows the total resources available and thus schedules the pods on the best fit node.
- controller-manager: The controller-manager is a group of all the controller processes that keep running in the background to control and manage the state of the cluster. It is the controller-manager that makes the changes in the cluster to make sure the current state matches the desired state.
- cloud-controller-manager: In a cloud environment, it is the cloud-controller-manager that helps you link your cluster with the cloud providers’ API. In a local setup where you install minikube, you don’t have a cloud-controller-manager.
There are different kinds of controllers that help you to configure behavior on your Kubernetes cluster.
- ReplicaSet: A ReplicaSet makes sure that a certain number of Pods are active at all times. If some extra Pods are running than the specified requirement, they will be deleted and vice-versa.
- Deployment: Running a Pod with the desired number of replicas requires a deployment controller. There is nothing special about these Pods. The settings over a standalone Pod and a ReplicaSet, such as a deployment strategy to utilize, can be specified by the deployment.
- DaemonSet: A DaemonSet makes sure that a copy of a Pod is running on all or some of the cluster's nodes. For the task of delivering one Pod per node, it is the appropriate controller. You only have one Pod scheduled on each node once you submit the DaemonSet spec (or manifest file) to the API server. One Pod is also deployed for each subset of nodes using daemon sets.
- StatefulSet: The management of stateless apps is appropriate for deployment controllers. On the other hand, StatefulSets help execute workloads that need persistent storage. When Pods need to be rescheduled, they use the same identity while maintaining separate identities for each Pod they oversee.
- Job: A Kubernetes Job is a controller that keeps an eye on Pods while they perform specific activities. The majority of their applications are batch processing. The Pod will launch and carry out a task as soon as you upload a Job manifest file to the API server. It will automatically stop down after the task is finished. Consequently, this is known as a Run-to-completion Job. These Pods must be manually deleted; Kubernetes will not do it for you.
- CronJob: The Job controller and a CronJob controller are very similar. The primary distinction is that it operates according to a user-defined timetable. The CronJob controller will oversee the automated creation of a Job depending on the schedule after the schedule is specified using cron syntax. You may also define how many Jobs should run simultaneously and how many successful and unsuccessful Jobs should be kept around for logging and debugging.
The worker nodes comprise three important components - the kubelet, the kube-proxy and the container runtime such as Docker.
- kubelet: The kubelet runs on each node to make sure that the containers are running in the pods and are healthy. A set of PodSpecs are provided to the kubelet through various techniques in order to ensure that pods running as per the PodSpecs.
- kube-proxy: Each worker node has a proxy service called kube-proxy that manages individual host subnetting and makes services available to the outside world. It performs request forwarding across the multiple isolated networks in a cluster to the appropriate pods/containers.
- container-runtime: The container runtime is the software responsible for running the containers. Kubernetes supports Open Container Initiative-compliant runtimes such as Docker, CRI-O, and containerd.
To implement cluster features, addons utilize Kubernetes resources (such as DaemonSet, Deployment, etc.). Namespaced resources for addons should be located in the kube-system namespace because they enable cluster-level functionalities. These addons extend the functionality of Kubernetes.
Many addons are available in Kubernetes. A few of them have been discussed below:
- DNS: All the clusters should have Cluster DNS. Cluster DNS is a DNS server that provides DNS records for Kubernetes services. This DNS server is automatically incorporated into DNS searches for containers launched by Kubernetes.
- Web UI (Dashboard): Dashboard is a multi-purpose, web UI for Kubernetes clusters. It enables users to control and debug both the cluster itself and any running applications.
- Container Resource Monitoring: Container Resource Monitoring offers a UI for exploring general time-series metrics about containers recorded in a central database.
- Ingress Controllers: For Kubernetes (and other containerized) settings, an Ingress controller is a specialized load balancer. It accepts traffic coming from outside the Kubernetes platform and distributes it to pods (containers) running on the platform. For services that must communicate with other services outside of a cluster, it can control egress traffic within a cluster. Additionally, it keeps track of running Kubernetes pods and automatically updates the load-balancing rules whenever a pod is added or removed from service.
Additional Kubernetes Web Application Architecture Components
Kubernetes can manage the associated application data to a cluster in addition to managing the containers for an application. Users of Kubernetes can ask for storage resources without being familiar with the specifics of the storage infrastructure.
An accessible directory for a pod, which might include data, is really what a Kubernetes volume is. The volume type determines the volume's contents, creation, and medium on which it was stored. Persistent volumes (PVs) are tied to an existing storage resource and are specific to a cluster. They are typically provisioned by an administrator. Therefore, PVs can outlast a particular pod.
Kubernetes uses container images stored in a container registry. It can be a third-party registry or managed by your organization.
A physical cluster contains virtual clusters called namespaces. They are designed to give numerous users and teams virtually independent work environments and stop teams from hampering one another by restricting access to certain Kubernetes objects.
At the pod level, Kubernetes containers within a pod can share IP addresses and network namespaces, and they can access other ports via localhost.
Kubernetes Design Principles
Scalability, high availability, security, and portability are some characteristics that Kubernetes was created to serve.
- Scalability: Based on CPU usage, Kubernetes provides horizontal scaling of pods. The CPU utilization threshold can be configured, and if it is achieved, Kubernetes will launch new pods on its own. For instance, if the application is actively expanding up to 170 percent, the CPU threshold is set at 60 percent. Eventually, 3 more pods will be deployed to bring the average CPU utilization back down to 60 percent. Kubernetes offers the potential for load balancing across several pods for a specific application. Through Stateful sets, Kubernetes also supports horizontal scaling of stateful pods, including NoSQL and RDBMS databases. Similar to deployment, a stateful set guarantees durable and reliable storage even when a pod is eliminated.
- High Availability: At both the application and infrastructure levels, Kubernetes addresses high availability. Replica sets ensure that the desired (minimum) number of stateless pod replicas are active for a particular application. Stateful pods play the same function as stateful sets. Kubernetes supports several distributed storage backends at the infrastructure level, including AWS EBS, Azure Disk, Google Persistent Disk, NFS, and others. Stateful workloads have high availability because of Kubernetes' addition of a dependable, accessible storage layer. Additionally, to increase availability, each master component can be set up for multi-node replication (multi-master).
- Security: Security is addressed by Kubernetes on several levels, including cluster, application, and network. Transport layer security is used to protect the API endpoints (TLS). Operations on the cluster can only be carried out by authenticated users (either service accounts or regular users) (via API requests). At the application level, each cluster's Kubernetes secrets can hold sensitive data (such as tokens or passwords). It should be noted that secrets are reachable from any pod within the same cluster. In a deployment, network policies for pod access can be set. A network policy outlines the channels for communication between pods and other network endpoints.
- Portability: The portability of Kubernetes can be seen in its support for various operating systems (a cluster can run on any common Linux distribution), processor architectures (bare metal or virtual machines), cloud providers (AWS, Azure, or Google Cloud Platform), and additional container runtimes beyond Docker. It can also handle workloads across hybrid (private and public cloud) or multi-cloud settings due to the idea of a federation. Additionally, it offers fault tolerance between availability zones for a single cloud provider.
Configuring Kubernetes Architecture Security
There are various best practices to adhere to in order to secure Kubernetes clusters, nodes, and containers:
- Update Kubernetes with the most recent version. Security updates for recently discovered vulnerabilities are only supported for the most recent three Kubernetes versions.
- Securely configure the Kubernetes API server. Encrypt connections between the API server and kubelets using TLS and disable anonymous/unauthenticated access.
- The etcd should be secured. Although trusted by itself, client connections are only served through TLS.
- Switch off the kubelet's anonymous access feature. Start the kubelet with the anonymous-auth=false flag and use the NodeRestriction admission controller to restrict what the kubelet can access.
- Utilize Kubernetes-native security controls to lower operational risk. To prevent conflicts between your own security controls and the orchestrator, whenever possible, use native Kubernetes controls to implement security standards.
Kubernetes Architecture Best Practices
Here are some best practices you can implement for your Kubernetes architecture:
- Kubernetes keep rolling out updates that include platform upgrades, new features, and bug fixes. So, you must always update your Kubernetes with the latest release.
- By now, you know about Kubernetes namespaces. When multiple teams are working on a large Kubernetes cluster, you would be required to organize it and keep it secured. That’s where you should use Kubernetes namespaces.
- You should keep your Kubernetes cluster and architecture secure by following the above points. You should consider using role-based access control(RBAC) that helps you define what one can do in a Kubernetes cluster. In addition to that, there are some other third-party tools such as Kube-hunter or Kube-bench that can be helpful.
- You should be careful while choosing base images. If possible, consider using smaller Docker images that are easier to pull and reduce the chances of security issues. You can use Alpine images that are almost 10 times smaller than the base images.
- You should define resource requests and limits to avoid your pods from consuming more CPU or memory. Resource requests is the bare minimum of resources that a container may utilize. Resource limits means the maximum resources that a container may utilize.
In this article, you learned that Kubernetes is an open-source orchestration tool for managing containerized applications. You also learned about the complex client-server architecture of Kubernetes. The master node or the control plane is composed of four components namely, the API server, the scheduler, the controller manager, the etcd, and an optional cloud controller manager. However, the worker node consists of three components that are the kubelet, the kube-proxy and the container runtime. You also, by now, know about the design principles and some of the best practices to follow while working with Kubernetes.
Kubernetes is a vast topic that cannot be covered in a single article. If you want to learn more about Kubernetes and enter the world of DevOps, you can enroll in our Devops Tools Courses.