Apache Airflow Secrets

Vault secrets backend with Kubernetes auth

There often more than one envrionments in any product infrastructure. The good point is to split chart settings into dirrerent environment values files. And create as many Vault roles as many envrionments you have.

Vault ACL policies

The simplest policy you can create to allow access to the dev environment secrets from anywhere will look like:

path "dev/*" {    
    capabilities = ["read", "list"]
}

Helm chart configuration

Create values.dev.yaml in chart directory with secrets configuration section:

config:
  secrets:
    backend: 'airflow.providers.hashicorp.secrets.vault.VaultBackend'
    backend_kwargs: '{"auth_type": "kubernetes", "kubernetes_role": "dev", "kubernetes_jwt_path": "/var/run/secrets/kubernetes.io/serviceaccount/token", "connections_path": "connections", "variables_path": "variables", "url": "https://vault.product.internal.domain", "auth_mount_point": "path", "mount_point": "dev/microservice-secrets", "kv_engine_version": 1}'

Other config sections will not be overwritten in main values.yaml file.

Parameters description:

  • kubernetes_role is the role name from Access methods Roles section.
  • auth_mount_point is the path from Access methods Configuration section.
  • mount_point is the path to secret from Secrets section.
  • kv_engine_version must be equal to the version in Secrets engine configuration.

Warning

The important fact is that the Airflow DAG variables are NOT keys in Vault secrets. Variables.get() function expect secret name as variable name, value keyword in KEY field and value itself in VALUE field.

However, Airflow connections (dev/microservice/connections) in Vault has logically normal predictable structure. The secret name is the connection name. KEY field inside secret can take one of the connection variables name: conn_type, host, login, password, port or schema. Each variable has its own value stored in VALUE field.


Testing DAG

This example DAG demonstrate how to get variables and connection from Vault.

Libraries import section.

import logging
from datetime import datetime
from airflow import DAG
from airflow.models import Variable
from airflow.hooks.base import BaseHook
from airflow.operators.python import PythonOperator

Function get_secrets has two try blocks. First try is for getting variables. And the second try is for connection.

def get_secrets(**kwargs):
    import time
    try:
        test_var = Variable.get(kwargs['var_name1'])
        logging.info(f'myvar value is {test_var}')
        test_var = Variable.get(kwargs['var_name2'])
        logging.info(f'myvar2 value is {test_var}')
    except Exception as e:
        logging.info(str(e))

    try:
        conn = BaseHook.get_connection(kwargs['conn_id1'])
        logging.info(
            f"Password: {conn.password}, Login: {conn.login},"
            f"URI: {conn.get_uri()}, Host: {conn.host}"
            )
    except Exception as e:
        logging.info(str(e))

    time.sleep(3 * 60)

DAG Python Operator configuration. I use op_kwargs to transfer variables names and connection name arguments to the get_secrets function.

with DAG(
        'test_vault_connection',
        start_date=datetime(2022, 11, 15),
        schedule_interval=None
) as dag:
    test_task = PythonOperator(
        task_id='test-task',
        python_callable=get_secrets,
        op_kwargs={
            'var_name1': 'myvar',
            'var_name2': 'myvar2',
            'conn_id1': 'myconn',
        },
    )

Join all this parts and you will have complete working DAG.

Debugging DAG

For each DAG separate pod will be created in Kubernetes. I add the time.sleep in the get_secrets function to prevent immediately pod deletion.

If DAG is not using stdout, you can check logs in container.

$ kubectl -n namespace exec -it <pod_name> -- bash
airflow@<pod_name>:/opt/airflow$ cat logs/dag_id\=<dag_name>/*/*/*

Links

Vault Documentation

Apache Airflow Documentation

Hashicorp Vault Secrets Backend

Objects relating to sourcing connections & variables from Hashicorp Vault

Production Guide

November 23, 2022