Tag

terraform

Browsing

AKS Federated identity credentials can access Azure resources like Key Vault or Storage account in the Kubernetes Pods without providing account credentials or connection strings in the code.

Azure Kubernetes Service, or AKS, is a managed Kubernetes platform service which provides an environment for cloud-native apps. The service interoperates with Azure security, identity, cost management and migration services. The platform usage is famous in micro-service applications, but also a perfect environment for long-running data processes and orchestrations.

Regardless of functionality, Azure cloud-native applications require access to other services to perform the CRUD operations on files in a Data Lake or Storage Account or fetch a secret from a Key Vault. The code usually takes a connection string or credentials to create a client for the service and uses those credentials to perform the task. Here are some examples.

A Python code to create ClientSecret credentials using a Service Principal client id and secret.

from azure.identity import ClientSecretCredential
token_credential = ClientSecretCredential(
self.active_directory_tenant_id,
self.active_directory_application_id,
self.active_directory_application_secret
)

# Instantiate a BlobServiceClient using a token credential
from azure.storage.blob import BlobServiceClient
blob_service_client = BlobServiceClient(account_url=self.oauth_url, credential=token_credential)

A C# code to create a blob service client using the connection string:

BlobServiceClient blobServiceClient = new BlobServiceClient("DefaultEndpointsProtocol=https;AccountName=<your-account-key>;AccountKey=<your-account-key>;EndpointSuffix=core.windows.net");

The Default Credentials

The DefaultAzureCredential() method under Azure.Identity namespace would be the best option to get the current credentials within the current environment context where the code is running. To avoid passing sensitive information to the code or overhead of managing the credentials in the secrets service of Kubernetes. The problem with default credentials would be logging into Azure using the CLI to enable the security context for the method.

AKS Federated Identity

AKS has introduced federated identity to solve the problem of fetching the context for the method.  This pod-managed identity allows the hosted workload or application access to resources through Azure Active Directory (Azure AD). For example, a workload stores files in Azure Storage, and when it needs to access those files, the pod authenticates itself against the resource as an Azure-managed identity. This feature works in the cloud and on-premises clusters but is still in preview mode.

Source: https://learn.microsoft.com/en-us/azure/aks/workload-identity-overview

Requirements and Configurations

The feature requires the cluster to have the OIDC Issuer enabled, allowing the API server to discover public signing keys. You can use the CLI to update, create or update a cluster using the –enable-oidc-issuer and –enable-managed-identity flags. In Terraform, you can set the oidc_issuer_enabled and workload_identity_enabled to true.

az aks update -g myResourceGroup -n myAKSCluster –enable-oidc-issuer –enable-managed-identity

To get the OIDC which you will need in the next steps use the following CLI command:

az aks show -n myAKScluster -g myResourceGroup --query "oidcIssuerProfile.issuerUrl" -otsv

User Assigned Managed Identity

The next step is to create a User assigned managed identities to enable Azure resources to authenticate to services that support Azure AD authentication without storing credentials in code. The identity can be created using the portal searching for “User Assigned Managed Identity”, Terraform or the CLI:

az identity create --name myIdentity --resource-group myResourceGroup

resource "azurerm_user_assigned_identity" "saman_identity_poc" {
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
name = "saman-cluster-poc"
}

Kubernetes Service Account

As the Azure User assigned identity is created, we have to create a Kubernetes Service Account, which provides an identity for processes that run in a Pod and map to a ServiceAccount object. We have to provide the client if of the created managed identity in the annotations section. The Client id is presented on the Overview page of the User Assigned Managed Identity. Create a service account using the kubectl CLI or Terraform as follows:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: saman-identity-poc
  annotations: azure.workload.identity/client-id: THE CLIENT ID OF THE IDENTITY
type: kubernetes.io/service-account-token
EOF

Terraform:

resource "kubernetes_service_account" "aks_poc" {
metadata {
name = "saman-identity-poc"
namespace = "saman"
annotations = {
"azure.workload.identity/client-id" = data.azurerm_user_assigned_identity.wlid_managed_identity.client_id
}
labels = {
"azure.workload.identity/use" = "true"
}
}
}

Federated Identity Credentials

The federation can be created by having the two parts identities from Azure and Kubernetes. For the illustration, I have a screenshot from the Azure portal to see the information required in the User assigned identity. You can navigate this view by selecting the identity created earlier in the portal and choosing “Federated credentials” from the navigation. Click on the “Add Credentials” button on the federated credentials page. Choose “Kubernetes accessing Azure Resources” in the next screen to see the following files.

Federated Identity: add credentials.
AKS Federated Identity
AKS Federated Identity Fields
  • The cluster Issuer URL is the one we got from the OIDC part.
  • The namespace is the one used to create the Kubernetes service account.
  • Service Account was created in the previous step using the kubectl. So in my example “saman-identity-poc
  • The name field is your unique name to give to this federation.

When the fields are filled, press the update button and create the federation. You can achieve the same result using the following Terraform definition:

resource "azurerm_federated_identity_credential" "saman_identity_poc" {
name = "saman-identity-poc-federated-credential"
resource_group_name = azurerm_resource_group.rg.name
parent_id = azurerm_user_assigned_identity.saman_identity_poc.id
issuer = azurerm_kubernetes_cluster.aks_cluster.oidc_issuer_url
subject = "system:serviceaccount:saman:saman-identity-poc"
audience = ["api://AzureADTokenExchange"]
}

AKS Federated Identity Conclution

  • Avoid passing credentials and connection strings to your code
  • Create a User Assigned Managed Identity
  • Create a Kubernetes service account
  • Create a federation between the identity and the service account
  • When the federation is created, assign roles to the user management identity in the Azure resource and let the Azure identity providers take care of the rest

Databricks on Azure is essential in data, AI and IoT solutions, but the env. automation can be challenging. Azure DevOps is a great tool for automation. Using Pipelines and product CLI integrations can minimise or even remove these challenges. My team is currently working on a cutting edge IoT platform where data flows from edge devices to Azure. We are dealing with data which is sensitive, and under GDPR so no one should have direct access to the data platform in the production environments.

In the project, data is generated by sensors and sent to the cloud by the edge devices. Ingestion, processing and analysis of data are too complicated for the traditional relational databases; for this reason, there are other tools to refine the data. We use DataBricks in our Lambda Architecture to batch process the data at rest and predictive analytics and machine learning. This blog post is about the DataBricks cluster and environment management, and I’ll not go deeper to the architecture or IoT solution.

The Automation Problems

As any reliable project, we have three environments which are development, user acceptance testing (UAT) and production. In my two previous posts, Azure Infrastructure using Terraform and Pipelines and Implement Azure Infrastructure using Terraform and Pipelines, I had an in-depth review and explanation of why and how Terraform solves environment generation and management problems. Let’s have a study the code Terraform provides for Databricks.

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West US"
}

resource "azurerm_databricks_workspace" "example" {
  name                = "databricks-test"
  resource_group_name = azurerm_resource_group.example.name
  location            = azurerm_resource_group.example.location
  sku                 = "standard"

  tags = {
    Environment = "Production"
  }
}

Wait a minute, but that is only the empty environment!
What about the Clusters, Pools, Libraries, Secrets and WorkSpaces?

The Solution, DataBricks Automation with Azure DevOps

Fortunately, DataBricks has a CLI which we can be imported for DataBricks environment automation using Azure DevOps Pipelines. The Pipelines enable us to run PowerShell or Bash scripts as a job step. By using the CLI interfaces in our Bash Script, we can create, manage and maintain our Data bricks environments. This approach will remove the need to do any manual work on the Production DataBricks Work Space. Let’s review the bash script.

#!/bin/bash
set -e

CLUSTER_JSON_FILE="./cluster.json"
INSTANCE_POOL_JSON_FILE="./instance-pool.json"
WAIT_TIME=10

wait_for_cluster_running_state () {
  while true; do
    CLUSTER_STATUS=$(databricks clusters get --cluster-id $CLUSTER_ID | jq -r '.state')
    if [[ $CLUSTER_STATUS == "RUNNING" ]]; then
        echo "Operation ready."
        break
    fi
    echo "Cluster is still in pending state, waiting $WAIT_TIME sec.."
    sleep $WAIT_TIME
  done
}

wait_for_pool_running_state () {
  while true; do
    POOL_STATUS=$(databricks instance-pools get --instance-pool-id $POOL_INSTANCE_ID | jq -r '.state')
    if [[ $POOL_STATUS == "ACTIVE" ]]; then
        echo "Operation ready."
        break
    fi
    echo "Pool instance is still in not ready yet, waiting $WAIT_TIME sec.."
    sleep $WAIT_TIME
  done
}

arr=( $(databricks clusters list --output JSON | jq -r '.clusters[].cluster_name'))
echo "Current clusters:"
echo "${arr[@]}"

CLUSTER_NAME=$(cat $CLUSTER_JSON_FILE | jq -r  '.cluster_name')

# Cluster already exists
if [[ " ${arr[@]} " =~ $CLUSTER_NAME ]]; then
    echo 'The cluster is already created, skipping the cluster operation.'
    exit 0
fi

# Cluster does not exist
if [[-z "$arr" || ! " ${arr[@]} " =~ $CLUSTER_NAME ]]; then
  printf "Setting up the databricks environment. Cluster name: %s\n" $CLUSTER_NAME

  #Fetching pool-instances
  POOL_INSTANCES=( $(databricks instance-pools list --output JSON | jq -r 'select(.instance_pools != null) | .instance_pools[].instance_pool_name'))
  POOL_NAME=$(cat $INSTANCE_POOL_JSON_FILE | jq -r  '.instance_pool_name')
  if [[ -z "$POOL_INSTANCES" || ! " ${POOL_INSTANCES[@]} " =~ $POOL_NAME ]]; then
    # Creating the pool-instance
    printf 'Creating new Instance-Pool: %s\n' $POOL_NAME
    POOL_INSTANCE_ID=$(databricks instance-pools create --json-file $INSTANCE_POOL_JSON_FILE | jq -r '.instance_pool_id')
    wait_for_pool_running_state
  fi

  if [[ " ${POOL_INSTANCES[@]} " =~ $POOL_NAME ]]; then
    POOL_INSTANCE_ID=$(databricks instance-pools list --output JSON | jq -r --arg I "$POOL_NAME" '.instance_pools[] | select(.instance_pool_name == $I) | .instance_pool_id')
    printf 'The Pool already exists with id: %s\n' $POOL_INSTANCE_ID
  fi

  # Transforming the cluster JSON
  NEW_CLUSTER_CONFIG=$(cat $CLUSTER_JSON_FILE | jq -r --arg var $POOL_INSTANCE_ID '.instance_pool_id = $var')

  # Creating databricks cluster with the cluster.json values
  printf 'Creating cluster: %s\n' $CLUSTER_NAME

  CLUSTER_ID=$(databricks clusters create --json "$NEW_CLUSTER_CONFIG" | jq -r '.cluster_id')
  wait_for_cluster_running_state

  # Adding cosmosdb Library to the cluster
  printf 'Adding the cosmosdb library to the cluster %s\n' $CLUSTER_ID
  databricks libraries install \
    --cluster-id $CLUSTER_ID \
    --maven-coordinates "com.microsoft.azure:azure-cosmosdb-spark_2.4.0_2.11:1.3.5"
  wait_for_cluster_running_state
  echo 'CosmosDB-Spark -library added successfully.'
  
  echo "Databricks setup created successfully."
fi

First, we will create a Pool for the cluster by waiting for the completion status and the Id. Then we will create a cluster by using the created Pool and wait for the completion. As our cluster gets ready then we will be able to use the cluster id to add Libraries and Workspaces using the following script. there are two support JSON files which include the environment properties.

 cluster.json 
{
    "cluster_name": "main-cluster",
    "spark_version": "6.4.x-scala2.11",
    "autoscale": {
        "min_workers": 1,
        "max_workers": 4
    },
    "instance_pool_id": "FROM_EXTERNAL_SOURCE"
}

pool.json
{
    "instance_pool_name": "main-pool",
    "node_type_id": "Standard_D3_v2",
    "min_idle_instances": 2,
    "idle_instances": 2,
    "idle_instance_auto_termination": 60
}

In our Azure DevOps Pipelines definition first we have to install Python runtime and then DataBricks CLI. By having required environment runtimes then we can run the bash script. Here is the code snippet for the Pipelines step:

- bash: |
          python -m pip install --upgrade pip setuptools wheel
          python -m pip install databricks-cli

          databricks --version
        displayName: Install Databricks CLI

      - bash: |
          cat >~/.databrickscfg <<EOL
          [DEFAULT]
          host = https://westeurope.azuredatabricks.net
          token = $(DATABRICKS_TOKEN)
          EOL
        displayName: Configure Databricks CLI

      - task: ShellScript@2
        inputs:
          workingDirectory: $(Build.SourcesDirectory)/assets/databricks
          scriptPath: $(Build.SourcesDirectory)/assets/databricks/setup.sh
          args: ${{ parameters.project }}-${{ parameters.workspace }}
        displayName: Setup Databricks

To be able to run the script against the Databricks environment you need a token. The token can be generated under the workspace and user settings.

DataBricks Automation with Azure DevOps Pipelines. DataBricks Token.

The environment variables and settings are in JSON files, and the complete solution for DataBricks Automation with Azure DevOps Pipelines and support tool files are available from my GitHub repository.

Continuous integration and delivery is part of DevOps processes, and currently part of most software projects. In my previous blog post, I had a review of the problem and why infrastructure as code (IaC) matters. The analysis included the architecture diagram and the Azure components. In this blog post as the continuation, you can read and learn how to Implement Azure Infra using Terraform and Pipelines to be part of your CI/CD in Azure DevOps. This blog post includes a complete technical guide.

Terraform

Terraform is Infrastructure as code to manage and develop cloud components. The tool is not only for specific cloud ecosystem; therefore, it is popular among developer working in different ecosystems. Terraform uses various providers, and in Microsoft cloud case, it uses Azure provisioner (azurerm) to create, modify and delete Azure resources. As explained in the previous blog post, you need the Terraform CLI installed on the environment and Azure account to deploy your resources. To verify the installation, you can write “terraform –version” in the shell terminal to view the installed version of the CLI.

Terraform uses declarative state model. The developer writes the desired state of wanted infrastructure and Terraform will take care of the rest to achieve the results. The workspace consists of one or many .terraform files. The folder on the environment also includes hidden settings and plugins files. The basic Azure provider block with subscription and resource block selection looks like this in main.tf file.

provider "azurerm" {
    version = "~>1.32.0"
    use_msi = true
    subscription_id = "xxxxxxxxxxxxxxxxx"
    tenant_id       = "xxxxxxxxxxxxxxxxx"
}
resource "azurerm_resource_group" "rg" {
    name     = "myExampleGroup"
    location = "westeurope"
}

The provider needs to authenticate to Azure before being able to provision the infrastructure. At the moment there is four authentication models:

Azure DevOps Pipelines Structure

The pipeline definition is defined in the YAML file, which includes one or many stages of the CI/CD process. It’s worth mentioning that currently, Azure pipelines do not support all YAML features. This blog post is not about the YAML and to read more please refer to “Learn YAML in Y minutes“. In the structure of the YAML build file, the stage is the top level of a specific process and includes one or many Jobs with again has one or many jobs. Here is an example of the pipeline structure:

  • Stage A
    • Job 1
      • Step 1.1
      • Step 1.2
    • Job 2
      • Step 2.1
      • Step 2.2
  • Stage B

Terraform + Azure DevOps Pipelines

Now we have the basic understanding to Implement Azure infra using Terraform and Azure DevOps Pipelines. With the knowledge of Terraform definition files and also the YAML file, it is time to jump to the implementation. In my Github InfraProvisioning code repository root folder, there are three folders and the azure-pipelies.yml file. The YML file is the build definition file which has references to the subfolders which includes the job and step definitions for a stage. The stages in the YAML file refers to the validate, plan and apply steps require in Terraform to provision model.

variables:
  project: shared-resources

- stage: validate
    displayName: Validate
    variables:
      - group: shared
    jobs:
      - template: pipelines/jobs/terraform/validate.yml

  - stage: plan_dev
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/master'))
    displayName: Plan for development
    variables:
      - group: shared
      - group: development
    jobs:
      - template: pipelines/jobs/terraform/plan.yml
        parameters:
          workspace: dev

  - stage: apply_dev
    displayName: Apply for development
    variables:
      - group: shared
      - group: development
    jobs:
      - template: pipelines/jobs/terraform/apply.yml
        parameters:
          project: ${{ variables.project }}
          workspace: dev

Each stage in the azure-pipelies.yml file refers to sub .yml files which are:

  • jobs/terraform/validate.yml: the step will download the latest version of the terraform, installs it and validates the installation.
  • jobs/terraform/plan.yml: this step gets the existing infrastructure definition and compares it to the changes, generates the modified infrastructure plan and publish the plan for the next stage.
  • jobs/terraform/apply.yml: will get the plan file, extract it, apply changes and save the output back to the storage account for the next run and comparison.

One more thing, in my previous blog post, I explained how Terraform would use blob storage to save the state files. Include the following job in your build definition if you want to create those initial Azure resources automatically. You can comment the step out once you have the required blob storage. After creating the storage account, create a new blob folder inside the storage account and also create a new secret. You will need these values later when adding all variables to the Azure DevOps environment.

 jobs:
      - job: runbash
        steps:
        - task: Bash@3
          inputs:
            targetType: 'filePath' # Optional. Options: filePath, inline
            filePath: ./tools/create-terraform-backend.sh
            arguments: dev

Settings in Azure DevOps

Most of the environment variables like Azure Resource Manager values are defined in the Group Variables in the Library section under Pipelines on the left navigation. The idea is to have as many environments as necessary in different subscriptions. If you inspect the apply.yml file, you can find the following variables:

  • ARM_CLIENT_ID: $(ARM_CLIENT_ID)
  • ARM_CLIENT_SECRET: $(ARM_CLIENT_SECRET)
  • ARM_TENANT_ID: $(ARM_TENANT_ID)
  • ARM_SUBSCRIPTION_ID: $(ARM_SUBSCRIPTION_ID)

To implement Azure infra using Terraform and Pipelines, we need to create an application in Azure Active Directory so Azure DevOps can access our resources in Azure. Follow the following steps to create the application:

  1. Navigate to Azure Portal and choose your Active Directory from the navigation.
  2. Under the AAD, choose Application Registration and create a new application. You can name it TerraformAzureDevOps.
  3. From the main page of the application, copy the Application ID and the Tenant Id. We will need these values later.
  4. Choose Certificates & secrets from the navigation and create a new secret. Copy this value before changing the view because you will see it once.

Back in the Azure DevOps, and under Library section we have to create the following Variable Groups with the following variables:

Name: Development

  • ARM_CLIENT_ID: [The application ID we created in the AAD]
  • ARM_CLIENT_SECRET: [The secret from the AAD]
  • ARM_SUBSCRIPTION_ID: [The Subscription ID from Azure]
  • TERRAFORM_BACKEND_KEY: [The secret from the storage account created using the create-terraform-backend.sh script ]
  • TERRAFORM_BACKEND_NAME: [The name of the blob folder created using the create-terraform-backend.sh script]
  • WORKSPACE: [Your choice of name, e.g. Dev]

Name: Shared

  • ARM_TENANT_ID: [The AAD Id]
  • TERRAFORM_VERSION: 0.12.18

Under each variable group the “Allow access to all pipelines” should be on!

Implement Azure infra Modifications

As you get the build definition up and running in your Azure DevOps environment all you have to do in the future is to edit the terraform/main.tf file to manage your Azure infrastructure.

Implement Azure infra using Terraform and Pipelines will help you to save a lot of time and money, not to mention it will also improve the quality, maintainability and the security of the environment.

You can find the complete solution and source files from my GitHub repository. Lastly, I want to give credits to my colleague Antti Kivimäki from Futurice, who has helped my team with difficult Terraform tasks.

The size and complexity of cloud infrastructure have a direct relationship with management, maintainability and cost control of cloud projects. It might be easy to create resources using the Azure Portal or the CLI, but when there is a need to update properties, change plans, upgrade services or re-create all services in new regions the situation is more tricky. The combination of Azure DevOps pipelines and Terraform brings an enterprise-level solution to solve the problem.

The Infrastructure Problem

My previous blog post introduced a simple SaaS architecture which included App services, SQL database, storage account and background processes using Azure Functions. But why should we create automation for such a small environment?

In my project, the need started to rise as I was creating background processes for the SaaS platform. Let’s assume we have three Azure Functions to send emails, SMS messages and modify user pictures. Each Function will start using the storage account queue triggers. What if each Function has to use the API to write values to the database. For some reason, the path for the API changes, and It would be insane to manually update the URL of the new end-point in three places. Not to speak if the service is running in three Azure regions and no one should update any value manually in the production environment and without approval processes.

Azure Infrastructure using Terraform

There are few solutions to have Infrastructure as code using Azure DevOps Pipelines. You can always export Azure resource ARM templates and create the pipeline based on that, but I found it complicated and time-consuming. The second choice was to use the Azure CLI in pipelines and create resources, but the problem will be with maintenances, and the more resources you have the management of the script will be laborious.

The third option was to use third-party tools, and to be honest; I got in love with Terraform. The tool has providers for major clouds and of course for tools for Azure. The easiest way to start with the tool is to install the CLI on your computer and following the 12 sectioned learning path. Let’s add the supplement parts to the existing architecture:

Azure Infrastructure using Terraform and Pipelines

The Explanations for items one to five are in my previous blog, so let’s concentrate on the new parts.

The New Components and the Build

Azure Infrastructure using Terraform and Pipelines requires knowledge of Terraform, and at this stage, I assume that you are familiar with the Terraform init, plan and apply terms. While running the Terraform locally, all configuration and metadata files will store on your computer hard disk. By moving the build to Azure DevOps Pipelines, there will be a need to store Terraform generated metadata and infrastructure code files.

Git repositories are a perfect solution to store Terraform infrastructure code files (.tf), and any other tools which are part of the pipeline. I have created a separated repository for my infrastructure in my project and named it InfraProvisioning.

To save the Terraform state and metadata files, we need different storage than GIT repository. Each Terraform execution will compare the current changes with the existing infrastructure and will override the existing environment. Azure Storage BLOB is a perfect location to save these files. The resource is marked as number six in the architecture diagram. But how should we automatically create the storage and include it as part of the automatic infrastructure?

I have created a Bash script which is for download from my Git Hub project. The script will login to the Azure using the CLI, create a Shared resource group, a storage account and BLOB storage. The Bash file is part of the build process and parametrised, but all parameters can be replaced with hard-coded values in the script. The Bash file is the initial part of the pipeline and how to implement the pipeline is the subject of my next blog post.