Key Vault

Datalogz uses Azure Key Vault to store sensitive secrets required to run the application.

Throughout this guide there will be references to secrets which should be stored in the Key Vault and environment variables which should be stored in an .env file in your project folder. Datalogz supports both the ManagedIdentityCredential the EnvironmentCredential class to authenticate to a Key Vault using an App Registration as a Principal.

To set up your Azure Key Vault follow these steps:

  • Login to Azure Portal

  • Create a new Key Vault if one has not already been created.

  • Proceed to either ManagedIdentity or EnvironmentCredential Identity options.

ManagedIdentityCredential (System)

  1. Navigate to the Virtual Machine you wish to use as a Managed System Identity and select Identity on the left sidebar.

  2. Enable System Assigned identity and add the following role assignments to the VM:

    1. “Key Vault Secrets Officer”

    2. “Virtual Machine Contributor”

    3. "Virtual Machine User Login"

  3. Navigate to Key Vault and select Access configuration on the left sidebar.

  4. Set permission model to “Azure role-based access control” and click Apply.

  5. IAM permissions have already been configured in step 4, and you can confirm that they are present from the Key Vaults IAM page, if desired.

  6. Add the Key vault to the default subnet in the same virtual network as the VM.

    • You can find this in the Key Vault > Networking tab.

    • Select "Allow public access from specific virtual networks and IP addresses."

    • Add the existing virtual network where your VM is located.

    • A service endpoint will be created for this subnet.

      • Note: If the subnet cannot take additional service endpoints, a new subnet will be required.

  7. SSH into VM and install the Azure CLI

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
  1. Login to Azure using 2FA from the SSH terminal, following the prompts

az login --identity
  1. Assign the Identity to the VM and verify

az vm identity assign --resource-group $AZURE_RESOURCE_GROUP_NAME --name $AZURE_VM_NAME
az ad sp list --display-name $AZURE_VM_NAME

EnvironmentCredential

  1. Create a new App Registration to act as a Principal to access the Key Vault

  2. Set Access configuration policy to “vault” access control

  3. Create policy and add the Secrets Officer role to the App Principal.

  4. Add the Key vault to the default subnet in the same virtual network as the VM is located in.

  5. Authenticate the key vault by ensuring you have included the following environment variables in the following dot files for these two repositories (after you SSH into the VM):

    1. datalogz-bi-diagnostic

      • File: .prod.env

        AZURE_TENANT_ID=
        AZURE_CLIENT_ID=
        AZURE_CLIENT_SECRET=
    2. datalogz-bi-gateway

      • File: .env.prod

        AZURE_TENANT_ID=
        AZURE_CLIENT_ID=
        AZURE_CLIENT_SECRET=

Environment Variables List

Repository: datalogz-bi-diagnostic

File: .prod.env

Pull the code and add the following environment variables to a dot file in your project directory named .prod.env using the correct values based on the examples provided.

ENV=PROD
DBT_ENV=prod
WAREHOUSE_TYPE=POSTGRES

# Key Vault Authentication
# Option 1 - If using Managed Identity Access to Key Vault
AZURE_RESOURCE_GROUP_NAME=
AZURE_VM_NAME=
AZURE_KEY_VAULT_NAME=
AZURE_KEY_VAULT_URL=

# Option 2 - If using Environment Access to Key Vault
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
AZURE_CLIENT_SECRET=

# Warehouse
# If using Snowflake, change WAREHOUSE_TYPE to SNOWFLAKE

Key vault

Pull the code and add the following environment variables to a dot file in your project directory named .env.prod using the correct values based on the examples provided.

# Specifies the URL or connection string to the Celery result backend
# e.g. db+postgresql://<user>:<pass>@<host>/airflow
# if SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<name_of_cert>.crt.pem
# Some postgreSQL servers may require <user> to be in <user@host> format.
AIRFLOW--CELERY--RESULT-BACKEND

# Specifies the URL or connection string to the Airflow metadata database.
# e.g. postgresql+psycopg2://<user>:<pass>@<host>/airflow
# if SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<name_of_cert>.crt.pem
# Some postgreSQL servers may require <user> to be in <user@host> format.
AIRFLOW--DATABASE--SQL-ALCHEMY-CONN

# Specifies the Fernet key used for encrypting and decrypting Airflow connections and variables. 
# This must be a 32 base64 encoded string
# e.g. Example format: jHfPb-mvRhWyofw8bzyCJym-HyKjSNNbwS8bLJjK0Vo=
AIRFLOW-FERNET-KEY

# Specifies the username and password for the Airflow web UI used for debugging.
AIRFLOW-WWW-USER
AIRFLOW-WWW-PASSWORD

# Specifies the private access token used for callbacks to gateway on task success/fail
# e.g. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
AIRFLOW-TO-GATEWAY-ACCESS-TOKEN
GATEWAY-SERVICE-BASE-URL=http://bi_gateway_service:5000

# dbt transformation service
DBT-ACCESS-TOKEN

# if using Postgres Warehouse
DIAGNOSTICS-POSTGRES-HOST
DIAGNOSTICS-POSTGRES-DATABASE-NAME=datalogz_wh
DIAGNOSTICS-POSTGRES-PASSWORD
DIAGNOSTICS-POSTGRES-SCHEMA-NAME=public
DIAGNOSTICS-POSTGRES-USERNAME=datalogz_diagnostics_admin

# if using Snowflake Warehouse
SNOWFLAKE-ACCOUNT-IDENTIFIER
DIAGNOSTICS-SNOWFLAKE-WAREHOUSE-NAME=DATALOGZ_BIOPS
DIAGNOSTICS-SNOWFLAKE-DATABASE-NAME=DATALOGZ_WH
DIAGNOSTICS-SNOWFLAKE-PASSWORD
DIAGNOSTICS-SNOWFLAKE-ROLE-NAME=DATALOGZ_DIAGNOSTICS_ADMIN_ROLE
DIAGNOSTICS-SNOWFLAKE-SCHEMA-NAME=PUBLIC
DIAGNOSTICS-SNOWFLAKE-USERNAME=DATALOGZ_DIAGNOSTICS_ADMIN

# (Optional)
# if using Azure Storage Account
# Specifies the connection string to the Azure Blob Storage account.
# e.g. DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.net
AZURE-BLOB-CONNECTION-STRING
AZURE-BLOB-CONTAINER-NAME
AZURE-BLOB-STORAGE-ACCOUNT-NAME

Repository: datalogz-bi-gateway

File: .env.prod

Pull the code and add the following environment variables to a dot file in your project directory named .env.prod using the correct values based on the examples provided.

ENV=PROD

# This variable is used to construct the redirect urls for OAuth.
# The hostname value should be ideally set to the host name of the server that serves the frontend to the client. 
# Moreover, the IP address could also be used if the host name is not registered.
HOST_NAME=app.your_hostname.com
HTTP_SCHEME=https
CRON_SERVICE_URL=https://airflow_webserver:8080

# Include either "MICROSOFT" or "TABLEAU_SERVER"
CONFIGURED_IDPS=["MICROSOFT"]

# Create this SQL user in advance following the SQL code in Databases section
ENTITLEMENT_USER_NAME=datalogz_gateway_user

# Key Vault Authentication
# Option 1 - If using Managed Identity Access to Key Vault
AZURE_RESOURCE_GROUP_NAME=
AZURE_VM_NAME=
AZURE_KEY_VAULT_NAME=
AZURE_KEY_VAULT_URL=

# Option 2 - If using Environment Access to Key Vault
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
AZURE_CLIENT_SECRET=

Key vault

Navigate to your Key Vault in the Azure Portal and add the following environment variables populating them with the correct values based on the examples provided.

# Specifies the connection string to the application's database
# e.g. postgresql://datalogz_gateway_admin:<password>@<host>:<port>/datalogz_bi
# If SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<cert_name>.crt.pem
# Some postgreSQL servers may require <user> to be in <user@host> format.
APP-DB-CONNECTION-STRING

# Specifies the connection string to the application's OLAP warehouse
# e.g. postgresql://datalogz_diagnostics_admin:<password>@<host>:<port>/datalogz_wh?options=-csearch_path%3Dbiops_marts%2Dbiops_general
# If SSL is required include parameters: &sslmode=require&sslrootcert=/opt/airflow/<cert_name>.crt.pem
# Some postgreSQL servers may require <user> to be in <user@host> format.
BI-DB-CONNECTION-STRING

# Specifies the private access token used for callbacks to gateway on task success/fail
# e.g. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
AIRFLOW-TO-GATEWAY-ACCESS-TOKEN

# Specifies the algorithm used to encode and decode JWT tokens
# the secret should at least be 32 characters long, but the longer the better.
JWT-ALGORITHM=HS256
JWT-SECRET-KEY

# Specifies the API key for the mail client to send emails
MAIL-CLIENT-API-KEY

# Specifies the client ID and Decret for the Microsoft OAuth2 application
MICROSOFT-CLIENT-ID
MICROSOFT-CLIENT-SECRET

# PowerBI API Authentication
# Option 1
# Specifies the client ID and Secret for the Power BI OAuth2 application
POWERBI-CLIENT-ID
POWERBI-CLIENT-SECRET

# Option 2
# Specifies the client ID and Secret for the PowerBI Service Principal
POWERBI-SP-TENANT-ID
POWERBI-SP-CLIENT-ID
POWERBI-SP-CLIENT-SECRET 

Last updated