Category Archives: terraform

How to automatically include Kubernetes pods in Prometheus monitoring.

When the Prometheus server is installed inside a Kubernetes cluster, the pods that we would like to include in the monitoring will have to have their Prometheus metrics endpoint exposed. A kubernetes service should be created for each pod sharing the exporter’s port along with “prometheus.io/scrape” = “true” annotation which will tell the Prometheus server that this service is actually pointed to an exporter. The prometheus server then automatically scrapes the target service and eventually the pod’s running exporter.

In this example the pod’s app label is prometheus-exporter which was added to selector -> app section of the yaml code below and the rest of the naming has been changed accordingly.

apiVersion: v1
kind: Service
metadata:
  name: prometheus-exporter-service
  annotations:
    prometheus.io/scrape: "true"
  labels:
    app: prometheus-exporter
    instance: prometheus-exporter
spec:
  type: ClusterIP
  selector:
    app: prometheus-exporter
  sessionAffinity: ClientIP
  ports:
    - protocol: TCP
      port: 9696
      targetPort: 9696

There are 2 notable settings here which is crucial for the successful scraping.

  1. Define prometheus.io/scrape: “true” in annotations. This will tell the Prometheus server to scrape this service.
  2. Define instance: prometheus-exporter in labels. This is important because by default the Prometheus server will pick up the internal ip and the service port for the instance name and it will display like that on Grafana which is not really a user friendly way of displaying data. So instead of 10.0.0.180:9696 the instance name defined will show up on Grafana.

If you use terraform to create your kubernetes services just like I would do, use the code below to do so.

resource "kubernetes_service" "prometheus_exporter_service" {
  metadata {
    name      = "prometheus-exporter-service"
    namespace = "prometheus-exporter"
    annotations = {
      "prometheus.io/scrape" = "true"
    }
    labels = {
      app      = "prometheus-exporter"
      instance = "prometheus-exporter"
    }
  }
  spec {
    type = "ClusterIP"
    selector = {
      app      = "prometheus-exporter"
    }
    session_affinity = "ClientIP"
    port {
      port        = 9696
      target_port = 9696
      protocol    = "TCP"
    }
  }
}

Install secure Prometheus server into an AWS Kubernetes ( EKS ) Cluster using Terraform.

In this article we will install a secure Prometheus server into an EKS/Kubernetes cluster automatically using terraform. The advantage of this installation method is that it is fully repetable and every aspect of the installation is controlled by Terraform.

This prometheus server will accept connections over https and its url will be protected by basic access authentication set on the nginx ingress. The following will be provisioned by Terraform automatically:

  • DNS CNAME record to your defined url.
  • Free SSL certificate generated for your defined url.
  • HTTPS secure access to your defined url.
  • Password protection for your defined url.
  • Prometheus server.

Prerequisites

In this example we will only show how to install the Prometheus Server but won’t be dealing with installing the prerequisites. The following is needed before the terraform procedure is ran:

Prerequisite Configuration

All terraform files are published to github and can be found in kubernetes-prometheus-terraform repository. All configuration variables are located inside the variables.tf file.

Define your URL

First configure the settings for your URL.

variable "main_domain" {
default = "example.com"
}

variable "prometheus_hostname" {
type = string
default = "prometheus"
}

Replace the following:

  • main_domain your route 53 managed root domain.
  • prometheus_hostname your hostname/subdomain which will be accessible externally. The default is prometheus and using the default settings the full Prometheus server url would be prometheus.example.com

Nginx ingress controller configuration

variable "nginx_namespace" {
default = "default"
}

variable "nginx_name" {
default = "nginx-ingress-controller"
}

Adjust the following:

  • nginx_namespace is the namespace where your nginx ingress controller is installed into.
  • nginx_name is the name of the nginx ingress controller.

Secrets Configuration

The last step is to add your kubernetes credentials as an aws secret in the AWS Secret Manager. In this example we use the name main_kubernetes for the AWS secret and you need to configure it with the following Secret keys:

This information is available on your EKS cluster’s main page.

  • cluster_id is the name of your cluster.
  • cluster_oidc_issuer_url is the “OpenID Connect provider URL” featured also on the main page of your EKS cluser.
  • oidc_provider_arn is located in the IAM section on your aws Console. Go to IAM -> Identity providers then click on the one with the same name as your cluster_oidc_issuer_url and copy the ARN provided in the upper right corner. Alternatively running aws iam list-open-id-connect-providers command will achieve the same.

We will need to add a kubernetes secret to secure the https url for the prometheus server. Do the following:

  • Run htpasswd -c auth username replace “username” with the user you intend to use to log in to your prometheus server console. This command will ask for the password you would like to use and will save the credentials in a file called “auth”.
[root@app]# htpasswd -c auth username
New password:
Re-type new password:
Adding password for user username
  • Run kubectl create secret generic http-auth –from-file=auth -n prometheus this will create the kubernetes secret using the credentials previously created. If you do not have the prometheus namespace created yet ( which is the case before the first run ) you can execute this step later.

Installation

Once you have completed the configuration changes for your environment you can run the terraform code manually or using any of the propular pipeline integrations like github actions, gitlab pipelines, jenkins, etc.

Verify your installation

Once your terraform code successfully finished visit your url.

Enter the username and password that you have previously defined as a kuberenetes secret.

Enjoy your https and password secured Prometheus server running on EKS/Kubernetes.

Additional security options

Additonally you can set up an ip range where the url would be accessed from to further improve on security by adding the following to your terraform code. Replace 192.168.1.1/32 with your ip range or a single ip. This is useful – for example – if this prometheus instance will serve as a slave prometheus server in a federated setup.

"nginx.ingress.kubernetes.io/whitelist-source-range" = "192.168.1.1/32"

The modified terraform code would look like this.

resource "kubernetes_ingress_v1" "prometheus-ingress" {
wait_for_load_balancer = true
metadata {
name = "${var.prometheus_name}-ingress"
namespace = var.prometheus_namespace
annotations = {
"cert-manager.io/cluster-issuer" = "letsencrypt-prod"
"kubernetes.io/tls-acme" = "true"
"nginx.ingress.kubernetes.io/auth-type" = "basic"
"nginx.ingress.kubernetes.io/auth-secret" = data.kubernetes_secret.prometheus-http.metadata[0].name
"nginx.ingress.kubernetes.io/auth-realm" = "Authentication Required - Prometheus"
"nginx.ingress.kubernetes.io/whitelist-source-range" = "192.168.1.1/32"
}
}

spec {
ingress_class_name = "nginx"
tls {
hosts = ["${var.prometheus_hostname}.${var.main_domain}"]
secret_name = "${var.prometheus_hostname}.${var.main_domain}.tls"
}
rule {
host = "${var.prometheus_hostname}.${var.main_domain}"
http {
path {
backend {
service {
name = kubernetes_service_v1.prometheus-service.metadata.0.name
port {
number = var.prometheus_port
}
}
}

path = "/"
}

}
}

}
}

A reference to a resource type must be followed by at least one attribute access, specifying the resource name.

There are multiple post going around on this error in my case the problem was caused by missing quotes around the value of provisioningMode. So instead of this:

resource "kubernetes_storage_class" "us-east-1a" {
  metadata {
    name = "us-east-1a"
  }
  storage_provisioner = "efs.csi.aws.com"
  reclaim_policy      = "Retain"
  parameters = {
    provisioningMode = efs-ap
    fileSystemId     = var.us-east-1a-vol
    directoryPerms   = "777"
  }
  mount_options = ["file_mode=0700", "dir_mode=0777", "mfsymlinks", "uid=1000", "gid=1000", "nobrl", "cache=none"]
}

Use this:

resource "kubernetes_storage_class" "us-east-1a" {
  metadata {
    name = "us-east-1a"
  }
  storage_provisioner = "efs.csi.aws.com"
  reclaim_policy      = "Retain"
  parameters = {
    provisioningMode = "efs-ap"
    fileSystemId     = var.us-east-1a-vol
    directoryPerms   = "777"
  }
  mount_options = ["file_mode=0700", "dir_mode=0777", "mfsymlinks", "uid=1000", "gid=1000", "nobrl", "cache=none"]
}

Azure Terraform – Error: expected “user_data” to be a base64 string, got #!/usr/bin/bash

I am testing Azure setup with terraform to automate VM instance provisioning using user data scritps. User data is a set of scripts or other metadata that’s inserted to an Azure virtual machine at provision time. It rans right after the OS has been installed and the server boots. I would typically use it to update the os, install tools and software I would need on the provisioned node using a shell script which is defined as a data_template like this:

data "template_file" "init-wp-server" {
  template = file("./init-wp-server.sh")
}

I tried to speficy the user data file the same way I used to do on AWS and I was getting the following error:

Error: expected "user_data" to be a base64 string, got #!/usr/bin/bash

This is because Azure requires the user data file to be Base64-Encoded. So instead of using this in azurerm_linux_virtual_machine definiton:

  user_data = data.template_file.init-wp-server.rendered

The following shoud be used:

  user_data = base64encode(data.template_file.init-wp-server.rendered)

Azure Terraform Error – Please change your resource to Standard sku for Availability Zone support.

I am testing creating resources with Terraform on Azure. I have tried to force one AZ per public ip and I ran the following code:

# Create public IPs
resource "azurerm_public_ip" "external_ip" {
  count               = length(var.wp_test_instance_location)
  name                = "external_ip-0${count.index}"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  allocation_method   = "Dynamic"
  zones               = tolist([var.wp_test_instance_location[count.index], ])
  tags = {
    environment = "test"
    terraform   = "Y"
  }
}

Then I ran into the following error message:

Error: creating/updating Public Ip Address: (Name "external_ip-01" / Resource Group "rg-bright-liger"): network.PublicIPAddressesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="ZonesNotAllowedInBasicSkuResource" Message="Request for resource /subscriptions/[MASKED]/resourceGroups/rg-bright-liger/providers/Microsoft.Network/publicIPAddresses/external_ip-01 is invalid as it is Basic sku and in an Availability Zone which is a deprecated configuration.  Please change your resource to Standard sku for Availability Zone support." Details=[]

The issue is that the SKU setting for the public IP has to be Standard instead of Basic ( which is the default and I didn’t set it before ). Also the allocation_method parmaters has to be changed from Dynamic to Static. The corrected code looks like:

# Create public IPs
resource "azurerm_public_ip" "external_ip" {
  count               = length(var.wp_test_instance_location)
  name                = "external_ip-0${count.index}"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name
  allocation_method   = "Static"
  zones               = tolist([var.wp_test_instance_location[count.index], ])
  sku                 = "Standard"
  tags = {
    environment = "test"
    terraform   = "Y"
  }
}