Vault secrets backend with Kubernetes auth
There often more than one envrionments in any product infrastructure. The good point is to split chart settings into dirrerent environment values files. And create as many Vault roles as many envrionments you have.
Vault ACL policies
The simplest policy you can create to allow access to the dev
environment secrets from anywhere will look like:
path "dev/*" {
capabilities = ["read", "list"]
}
Helm chart configuration
Create values.dev.yaml
in chart directory with secrets configuration section:
config:
secrets:
backend: 'airflow.providers.hashicorp.secrets.vault.VaultBackend'
backend_kwargs: '{"auth_type": "kubernetes", "kubernetes_role": "dev", "kubernetes_jwt_path": "/var/run/secrets/kubernetes.io/serviceaccount/token", "connections_path": "connections", "variables_path": "variables", "url": "https://vault.product.internal.domain", "auth_mount_point": "path", "mount_point": "dev/microservice-secrets", "kv_engine_version": 1}'
Other config
sections will not be overwritten in main values.yaml
file.
Parameters description:
kubernetes_role
is the role name from Access methods Roles section.auth_mount_point
is the path from Access methods Configuration section.mount_point
is the path to secret from Secrets section.kv_engine_version
must be equal to the version in Secrets engine configuration.
Warning
The important fact is that the Airflow DAG variables are NOT keys in Vault secrets.
Variables.get()
function expect secret name as variable name,value
keyword in KEY field and value itself in VALUE field.However, Airflow connections (
dev/microservice/connections
) in Vault has logically normal predictable structure. The secret name is the connection name. KEY field inside secret can take one of the connection variables name:conn_type
,host
,login
,password
,port
orschema
. Each variable has its own value stored in VALUE field.
Testing DAG
This example DAG demonstrate how to get variables and connection from Vault.
Libraries import section.
import logging
from datetime import datetime
from airflow import DAG
from airflow.models import Variable
from airflow.hooks.base import BaseHook
from airflow.operators.python import PythonOperator
Function get_secrets
has two try blocks. First try is for getting variables. And the second try is for connection.
def get_secrets(**kwargs):
import time
try:
test_var = Variable.get(kwargs['var_name1'])
logging.info(f'myvar value is {test_var}')
test_var = Variable.get(kwargs['var_name2'])
logging.info(f'myvar2 value is {test_var}')
except Exception as e:
logging.info(str(e))
try:
conn = BaseHook.get_connection(kwargs['conn_id1'])
logging.info(
f"Password: {conn.password}, Login: {conn.login},"
f"URI: {conn.get_uri()}, Host: {conn.host}"
)
except Exception as e:
logging.info(str(e))
time.sleep(3 * 60)
DAG Python Operator configuration. I use op_kwargs
to transfer variables names and connection name arguments to the get_secrets
function.
with DAG(
'test_vault_connection',
start_date=datetime(2022, 11, 15),
schedule_interval=None
) as dag:
test_task = PythonOperator(
task_id='test-task',
python_callable=get_secrets,
op_kwargs={
'var_name1': 'myvar',
'var_name2': 'myvar2',
'conn_id1': 'myconn',
},
)
Join all this parts and you will have complete working DAG.
Debugging DAG
For each DAG separate pod will be created in Kubernetes. I add the time.sleep
in the get_secrets
function to prevent immediately pod deletion.
If DAG is not using stdout
, you can check logs in container.
$ kubectl -n namespace exec -it <pod_name> -- bash
airflow@<pod_name>:/opt/airflow$ cat logs/dag_id\=<dag_name>/*/*/*
Links
Hashicorp Vault Secrets Backend
Objects relating to sourcing connections & variables from Hashicorp Vault