K8s Deployments By Namespace Test

A Deployment provides declarative updates for Pods and ReplicaSets. While a Pod encapsulates an application’s container (or, in some cases, multiple containers), storage resources, a unique network IP, and options that govern how the container(s) should run, a ReplicaSet is used to maintain a stable set of replica (identical) Pods running at any given time.

Using a Deployment, you can easily:

  • Deploy a ReplicaSet
  • Update Pods (PodTemplateSpec)
  • Rollback to older Deployment versions
  • Scale Deployment up or down
  • Pause and resume the Deployment
  • Determine state of replicas
  • Clean up older ReplicaSets
  • Canary Deployment

Whenever a Deployment is used to perform such operations, it is only natural that administrators want to know the status of the deployment - whether it is paused or progressing. Most importantly, administrators will want to be alerted if the Deployment was unable to deliver the intended/desired result of the operation. For instance, an administrator would want to be alerted if any deployment fails to create the desired number of available replicas in a ReplicaSet, fails to update one/more replicas with changes to a Pod template, or does not have the adequate number of Pods to reach full capacity. This is because, such failures may result in a mismatch between the cluster's desired state and its actual state, which in turn may affect the availability and performance of the containerized applications that overlay the cluster. This is where the Deployments by Namespace test helps!

This test auto-discovers Deployments in a Namespace, and for each Deployment, reports the overall status of the deployment - i.e., whether the desired state of the Deployment is the same as its actual state. If the state of the Deployment is Unhealthy, then you can use this test to figure out what could have caused the anomaly - is it because of a replica failure? is it because of unavailable Pods? or is it because of the Pods that are not yet up-to-date with changes made to the Pod template?

Target of the test : A Kubernetes/OpenShift Cluster

Agent deploying the test : A remote agent

Outputs of the test : One set of results for each Deployment in every namespace configured in the Kubernetes/OpenShift cluster being monitored

First-level Descriptor: Namespace

Second-level Descriptor: Deployment

Configurable parameters for the test
Parameter Description

Test Period

How often should the test be executed.

Host

The IP address of the host for which this test is to be configured.

Port

Specify the port at which the specified Host listens. By default, this is 6443.

Load Balancer / Master Node IP

To run this test and report metrics, the eG agent needs to connect to the Kubernetes API on the master node and run API commands. To enable this connection, the eG agent has to be configured with either of the following:

  • If only a single master node exists in the cluster, then configure the eG agent with the IP address of the master node.
  • If the target cluster consists of more than one master node, then you need to configure the eG agent with the IP address of the load balancer that is managing the cluster. In this case, the load balancer will route the eG agent's connection request to any available master node in the cluster, thus enabling the agent to connect with the API server on that node, run API commands on it, and pull metrics.

By default, this parameter will display the Load Balancer / Master Node IP that you configured when manually adding the Kubernetes/OpenShift cluster for monitoring, using the Kubernetes Cluster Preferences page in the eG admin interface (see Figure 3). The steps for managing the cluster using the eG admin interface are discussed elaborately in How to Monitor the Kubernetes/OpenShift Cluster Using eG Enterprise?

Whenever the eG agent runs this test, it uses the IP address that is displayed (by default) against this parameter to connect to the Kubernetes API. If there is any change in this IP address at a later point in time, then make sure that you update this parameter with it, by overriding its default setting.

K8s Cluster API Prefix

By default, this parameter is set to none. Do not disturb this setting if you are monitoring a Kubernetes/OpenShift Cluster.

To run this test and report metrics for Rancher clusters, the eG agent needs to connect to the Kubernetes API on the master node of the Rancher cluster and run API commands. The Kubernetes API of Rancher clusters is of the default format: http(s)://{IP Address of kubernetes}/{api endpoints}. The Server section of the kubeconfig.yaml file downloaded from the Rancher console helps in identifying the Kubernetes API of the cluster. For e.g., https://{IP address of Kubernetes}/k8s/clusters/c-m-bznxvg4w/ is usually the URL of the Kubernetes API of a Rancher cluster.

For the eG agent to connect to the master node of a Rancher cluster and pull out metrics, the eG agent should be made aware of the API endpoints in the Kubernetes API of the Rancher cluster. To aid this, you can specify the API endpoints available in the Kubernetes API of the Rancher cluster against this parameter. In our example, this parameter can be specified as: /k8s/clusters/c-m-bznxvg4w/.

SSL

By default, the Kubernetes/OpenShift cluster is SSL-enabled. This is why, the eG agent, by default, connects to the Kubernetes API via an HTTPS connection. Accordingly, this flag is set to Yes by default.

If the cluster is not SSL-enabled in your environment, then set this flag to No.

Authentication Token

The eG agent requires an authentication bearer token to access the Kubernetes API, run API commands on the cluster, and pull metrics of interest. The steps for generating this token have been detailed in How Does eG Enterprise Monitor a Kubernetes/OpenShift Cluster?

The steps for generating this token for a Rancher cluster has been detailed in How Does eG Enterprise Monitor a Rancher Cluster?

Typically, once you generate the token, you can associate that token with the target Kubernetes/OpenShift cluster, when manually adding that cluster for monitoring using the eG admin interface. The steps for managing the cluster using the eG admin interface are discussed elaborately in How to Monitor the Kubernetes/OpenShift Cluster Using eG Enterprise?

By default, this parameter will display the Authentication Token that you provided in the Kubernetes Cluster Preferences page of the eG admin interface, when manually adding the cluster for monitoring (see Figure 3).

Whenever the eG agent runs this test, it uses the token that is displayed (by default) against this parameter for accessing the API and pulling metrics. If for any reason, you generate a new authentication token for the target cluster at a later point in time, then make sure you update this parameter with the change. For that, copy the new token and paste it against this parameter.

Report System Namespace

The kube-system namespace consists of all objects created by the Kubernetes system. Monitoring such a namespace may not only increase the eG agent's processing overheads, but may also clutter the eG database. Therefore, to optimize agent performance and to conserve database space, this test, by default, excludes the kube-system namespace from monitoring. Accordingly, this flag is set to No by default.

If required, you can set this flag to Yes, and enable monitoring of the kube-system namespace.

Proxy Host

If the eG agent connects to the Kubernetes API on the master node via a proxy server, then provide the IP address of the proxy server here. If no proxy is used, then the default setting -none - of this parameter, need not be changed,

Proxy Port

If the eG agent connects to the Kubernetes API on the master node via a proxy server, then provide the port number at which that proxy server listens here. If no proxy is used, then the default setting -none - of this parameter, need not be changed,

Proxy Username, Proxy Password, Confirm Password

These parameters are applicable only if the eG agent uses a proxy server to connect to the Kubernetes/OpenShift cluster, and that proxy server requires authentication. In this case, provide a valid user name and password against the Proxy Username and Proxy Password parameters, respectively. Then, confirm the password by retyping it in the Confirm Password text box.

If no proxy server is used, or if the proxy server used does not require authentication, then the default setting - none - of these parameters, need not be changed.

DD Frequency

Refers to the frequency with which detailed diagnosis measures are to be generated for this test. The default is 1:1. This indicates that, by default, detailed measures will be generated every time this test runs, and also every time the test detects a problem. You can modify this frequency, if you so desire. Also, if you intend to disable the detailed diagnosis capability for this test, you can do so by specifying none against DD frequency.

Detailed Diagnosis

To make diagnosis more efficient and accurate, the eG Enterprise embeds an optional detailed diagnostic capability. With this capability, the eG agents can be configured to run detailed, more elaborate tests as and when specific problems are detected. To enable the detailed diagnosis capability of this test for a particular server, choose the On option. To disable the capability, click on the Off option.

The option to selectively enable/disable the detailed diagnosis capability will be available only if the following conditions are fulfilled:

  • The eG manager license should allow the detailed diagnosis capability
  • Both the normal and abnormal frequencies configured for the detailed diagnosis measures should not be 0.
Measurements made by the test
Measurement Description Measurement Unit Interpretation

Status

Indicates whether/not the desired state of this deployment is the same as its actual state.

 

This measure reports the value Healthy if the desired state of the Deployment is the same as its actual state. In other words, if the value of the Total pods with deployment measure is the same as the value of Total available pods with deployment measure, then this measure will report the value Healthy. If not, this measure will report the value Unhealthy. For instance, if the Deployment seeks to deploy a ReplicaSet with 3 replica (Pods) in it, and succeeds in creating such a ReplicaSet, then the value of this measure will be Healthy. On the other hand, if the Deployment created a ReplicaSet with only two available replica Pods, then the value of this measure will be Unhealthy.

The numeric values that correspond to these measure values are as follows:

Measure Value Numeric Value
Healthy 1
Unhealthy 0

Note:

By default, this test reports the Measure Values listed in the table above to indicate the state of a Deployment. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Is deployment paused?

Indicates whether/not this Deployment has been paused.

 

You can pause a Deployment before triggering one or more updates and then resume it. This allows you to apply multiple fixes in between pausing and resuming without triggering unnecessary rollouts.

The values that this measure can report and their corresponding numeric values are listed in the table below:

Measure Value Numeric Value
Yes 1
No 0

Note:

By default, this test reports the Measure Values listed in the table above to indicate the whether/not a Deployment has been paused. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Time since deployment creation

Indicates how old this Deployment is.

 

The value of this measure is expressed in number of days, hours, and minutes.

You can use the detailed diagnosis of this measure to know the images that a Deployment pulled from the Container Registry to create containers on replica Pods.

Is deployment progressing?

Indicates whether/not this Deployment is in progress.

 

A Deployment enters various states during its lifecycle.

Kubernetes marks a Deployment as progressing when one of the following tasks is performed:

  • The Deployment creates a new ReplicaSet.
  • The Deployment is scaling up its newest ReplicaSet.
  • The Deployment is scaling down its older ReplicaSet(s).
  • New Pods become ready or available (ready for at least MinReadySeconds).

Kubernetes marks a Deployment as complete when it has the following characteristics:

  • The Deployment has minimum availability. Minimum availability means that the Deployment's number of available replicas equals or exceeds the number required by the Deployment strategy.

  • All of the replicas associated with the Deployment have been updated to the latest version you have specified, meaning any updates you've requested have been completed.

  • No old pods for the Deployment are running.

Your Deployment may get stuck trying to deploy its newest ReplicaSet without ever completing. This can occur due to some of the following factors:

  • Insufficient quota
  • Readiness probe failures
  • Image pull errors
  • Insufficient permissions
  • Limit ranges
  • Application runtime misconfiguration

Typically, a Deployment is considered to have Failed, if it is making progress for a duration beyond the progressDeadlineSeconds configuration.

This measure reports the value Yes for a Deployment, if it is in the progressing or complete state. The value No is reported, if the Deployment is in fail to progress state. The value Unknown is reported if the Deployment is not in any of the above-mentioned states - i.e., if the state cannot be determined.

The numeric values that correspond to these measure values are as follows:

Measure Value Numeric Value
Yes 1
No 0
Unknown 2

Note:

By default, this test reports the Measure Values listed in the table above to indicate whether/not a Deployment is progressing. In the graph of this measure however, the same is indicated using the numeric equivalents only.

Is deployment available?

Indicates whether/not this Deployment is available.

Number

A deployment is said to be Available, if it has minimum availability. Minimum availability is dictated by the parameters specified in the deployment strategy. For instance, if the default Rolling Update strategy is used, then the Max Unavailable parameter of the strategy indicates the Minimum Availability during an upgrade. For example, if the Max Unavailable is set to 25% , then it means that a minimum of 75% of Pods should be available in the Deployment when an update is in progress.

If the Deployment is Available, then the value of this measure is Yes. If the Deployment is unavailable - i.e., if the Minimum Availability criteria is not met - then, the value of this measure is No. If the availability of the Deployment cannot be determined, then the value of this measure will be Unknown.

The numeric values that correspond to these measure values are as follows:

Measure Value Numeric Value
Yes 1
No 0
Unknown 2

Note:

By default, this test reports the Measure Values listed in the table above to indicate whether/not a Deployment is Available. In the graph of this measure however, the same is indicated using the numeric equivalents only.

If the value of this measure for any Deployment is No, then you can use the detailed diagnosis of this measure to identify the unavailable Pods in the Deployment, and the reason for their unavailability. A Pod is said to be Available, if it is ready without any containers crashing for at least the duration configured against minReadySeconds in the Pod specification. Any Pod that is not ready, or is ready but has containers crashing for a period of time beyond the minReadySeconds duration, is automatically considered Unavailable.

Is deployment replica failure?

Indicates whether/not any replica in this Deployment has failed.

 

The value Yes for this measure, indicates that one/more replicas in the Deployment could not be created. The value No indicates that all replicas were created. The value Unknown implies that the state of the replicas could not be determined.

The numeric values that correspond to these measure values are as follows:

Measure Value Numeric Value
Yes 1
No 0
Unknown 2

Note:

By default, this test reports the Measure Values listed in the table above to indicate whether/not any replicas in the Deployment could not be created. In the graph of this measure however, the same is indicated using the numeric equivalents only.

If the failure of one/more replicas causes a a mismatch between the desired state and actual state of the Deployment, then the value of the Status measure of that Deployment will change to Unhealthy.

Observed generations

Indicates the generation observed after this Deployment.

Number

A generation is a sequence number representing a specific generation of the desired state.

If the value of this measure for a Deployment matches with the desired generation sequence number of that Deployment, it implies that the Deployment is complete.

If it does not match, then it means that a Deployment is progressing or has failed. In other words, if the value of the Is progressing? measure for a Deployment is Yes or No, then it means that the desired generation sequence number and the observed generation sequence number of that Deployment is not the same.

Replicas

Indicates the desired number of non-terminated Pod replicas targeted by this Deployment.

Number

 

Updated replicas

Indicates the total number of non-terminated Pod replicas that have been updated by this Deployment with changes (if any) made to the Pod template specification.

Number

Typically, whenever changes are made to a Deployment's Pod template - say, labels or container images of the template are changed - then a Deployment rollout is triggered. A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate.

Ideally, the value of this measure should be the same as the value of the Total pods with deployment measure. If not, then it means that the desired number of Pod replicas are not yet fully updated with the changes to the Pod template.

Ready replicas

Indicates the number of ready Pods created by this Deployment.

Number

 

Total available pods

Indicates the number of available Pods created by this Deployment.

Number

A Pod is said to be Available, if it is ready without any containers crashing for at least the duration configured against minReadySeconds in the Pod specification.

Ideally, the value of this measure should be the same as the value of the Total pods with deployment measure. If not, then the Status measure of this test will report the value Unhealthy. This means that the desired state of the Deployment is not the same as its actual state.

Total unavailable pods

Indicates the total number of unavailable Pods created by this Deployment.

Number

Any Pod that is not ready, or is ready but has containers crashing for a period of time beyond the minReadySeconds duration, is automatically considered Unavailable.

Ideally, the value of this measure should be 0. If this measure reports a non-zero value or a value equal to or close to the value of the Total pods with deployment measure, then the Status measure of this test will report the value Unhealthy. This means that the desired state of the Deployment is not the same as its actual state.

In the event that this measure reports a non-zero value, then use the detailed diagnosis of this measure to identify the unavailable Pod replicas in the Deployment.

Collision count

Indicates the count of hash collisions for this deployment.

Number

The Deployment controller uses this field as a collision avoidance mechanism when it needs to create the name for the newest ReplicaSet.

Pods for this deployment

Indicates the number of Pods created by this Deployment that are currently running.

Number

To know which Pods created by this Deployment are currently running, use the detailed diagnosis of this measure.

Retain old replica count

Indicates the number of old Replica Sets that this Deployment should retain to allow rollback.

Number

A Deployment’s revision history is stored in the ReplicaSets it controls.

When configuring a Deployment, you can optionally specify .spec.revisionHistoryLimit, where you can indicate the number of old ReplicaSets to retain to allow rollback. This value is reported as the value of this measure.

Old ReplicaSets consume resources in etcd and crowd the output of kubectl get rs. The configuration of each Deployment revision is stored in its ReplicaSets; therefore, once an old ReplicaSet is deleted, you lose the ability to rollback to that revision of Deployment. By default, 10 old ReplicaSets will be kept, however its ideal value depends on the frequency and stability of new Deployments.

More specifically, setting this field to zero means that all old ReplicaSets with 0 replicas will be cleaned up. In this case, a new Deployment rollout cannot be undone, since its revision history is cleaned up.

Pods added recently to namespace

Indicates the number of Pods that were added to this Deployment during the last measurement period.

Number

Use the detailed diagnosis of this measure to identify the name of each Pod that was recently added, the IP address of each Pod, the Node to which each Pod is associated with and the IP address of the Node.

Pods recently removed from the namespace

Indicates the number of Pods that were removed from this Deployment during the last measurement period.

Number

Use the detailed diagnosis of this measure to identify the name of each Pod that was recently removed, the IP address of each Pod, the Node to which each Pod is associated with and the IP address of the Node.

Total available deployments in namespace

Indicates the total number of Deployments available in the namespace.

Number

This measure is reported only for the Summary descriptor of each namespace discovered by this test.

Use the detailed diagnosis of this measure to identify the name of the Deployments that are available.

Total unavailable deployments in namespace

Indicates the total number of Deployments that are unavailable in the namespace.

Number

This measure is reported only for the Summary descriptor of each namespace discovered by this test.

Use the detailed diagnosis of this measure to identify the name of the Deployments that are unavailable.

You can use the detailed diagnosis of the Age measure to know the images that a Deployment pulled from the Container Registry to create containers on replica Pods.

Figure 1 : The detailed diagnosis of the Age measure of the Deployments by Namespace test

If the value of the Is available? measure for any Deployment is No, then you can use the detailed diagnosis of this measure to identify the unavailable Pods in the Deployment, and the reason for their unavailability. A Pod is said to be Available, if it is ready without any containers crashing for at least the duration configured against minReadySeconds in the Pod specification. Any Pod that is not ready, or is ready but has containers crashing for a period of time beyond the minReadySeconds duration, is automatically considered Unavailable.

Figure 2 : The detailed diagnosis of the Is available? measure reported by the Deployments by Namespace test

In the event that the Total unavailable pods with deployment measure reports a non-zero value, then use the detailed diagnosis of this measure to identify the unavailable Pod replicas in the Deployment. You can also use the detailed diagnostics to figure what could have rendered a Pod unavailable. This information aids troubleshooting.

Figure 3 : The detailed diagnosis of the Total unavailable pods with deployment measure