Datalogz uses Azure Key Vault to store sensitive secrets required to run the application.
Throughout this guide there will be references to secrets which should be stored in the Key Vault and environment variables which should be stored in an .env file in your project folder. Datalogz supports both the ManagedIdentityCredential the EnvironmentCredential class to authenticate to a Key Vault using an App Registration as a Principal.
To set up your Azure Key Vault follow these steps:
Login to Azure Portal
Create a new Key Vault if one has not already been created.
Proceed to either ManagedIdentity or EnvironmentCredential Identity options.
ManagedIdentityCredential (System)
Navigate to the Virtual Machine you wish to use as a Managed System Identity and select Identity on the left sidebar.
Enable System Assigned identity and add the following role assignments to the VM:
“Key Vault Secrets Officer”
“Virtual Machine Contributor”
"Virtual Machine User Login"
Navigate to Key Vault and select Access configuration on the left sidebar.
Set permission model to “Azure role-based access control” and click Apply.
IAM permissions have already been configured in step 4, and you can confirm that they are present from the Key Vaults IAM page, if desired.
Add the Key vault to the default subnet in the same virtual network as the VM.
You can find this in the Key Vault > Networking tab.
Select "Allow public access from specific virtual networks and IP addresses."
Add the existing virtual network where your VM is located.
A service endpoint will be created for this subnet.
Note: If the subnet cannot take additional service endpoints, a new subnet will be required.
SSH into VM and install the Azure CLI
curl-sLhttps://aka.ms/InstallAzureCLIDeb|sudobash
Login to Azure using 2FA from the SSH terminal, following the prompts
Create a new App Registration to act as a Principal to access the Key Vault
Set Access configuration policy to “vault” access control
Create policy and add the Secrets Officer role to the App Principal.
Add the Key vault to the default subnet in the same virtual network as the VM is located in.
Authenticate the key vault by ensuring you have included the following environment variables in the following dot files for these two repositories (after you SSH into the VM):
Pull the code and add the following environment variables to a dot file in your project directory named .prod.env using the correct values based on the examples provided.
ENV=PRODDBT_ENV=prodWAREHOUSE_TYPE=POSTGRES# Key Vault Authentication# Option 1 - If using Managed Identity Access to Key VaultAZURE_RESOURCE_GROUP_NAME=AZURE_VM_NAME=AZURE_KEY_VAULT_NAME=AZURE_KEY_VAULT_URL=# Option 2 - If using Environment Access to Key VaultAZURE_TENANT_ID=AZURE_CLIENT_ID=AZURE_CLIENT_SECRET=# Warehouse# If using Snowflake, change WAREHOUSE_TYPE to SNOWFLAKE
Key vault
Pull the code and add the following environment variables to a dot file in your project directory named .env.prod using the correct values based on the examples provided.
# Specifies the URL or connection string to the Celery result backend# e.g. db+postgresql://<user>:<pass>@<host>/airflow# if SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<name_of_cert>.crt.pem# Some postgreSQL servers may require <user> to be in <user@host> format.AIRFLOW--CELERY--RESULT-BACKEND# Specifies the URL or connection string to the Airflow metadata database.# e.g. postgresql+psycopg2://<user>:<pass>@<host>/airflow# if SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<name_of_cert>.crt.pem# Some postgreSQL servers may require <user> to be in <user@host> format.AIRFLOW--DATABASE--SQL-ALCHEMY-CONN# Specifies the Fernet key used for encrypting and decrypting Airflow connections and variables. # This must be a 32 base64 encoded string# e.g. Example format: jHfPb-mvRhWyofw8bzyCJym-HyKjSNNbwS8bLJjK0Vo=AIRFLOW-FERNET-KEY# Specifies the username and password for the Airflow web UI used for debugging.AIRFLOW-WWW-USERAIRFLOW-WWW-PASSWORD# Specifies the private access token used for callbacks to gateway on task success/fail# e.g. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
AIRFLOW-TO-GATEWAY-ACCESS-TOKENGATEWAY-SERVICE-BASE-URL=http://bi_gateway_service:5000# dbt transformation serviceDBT-ACCESS-TOKEN# if using Postgres WarehouseDIAGNOSTICS-POSTGRES-HOSTDIAGNOSTICS-POSTGRES-DATABASE-NAME=datalogz_whDIAGNOSTICS-POSTGRES-PASSWORDDIAGNOSTICS-POSTGRES-SCHEMA-NAME=publicDIAGNOSTICS-POSTGRES-USERNAME=datalogz_diagnostics_admin# if using Snowflake WarehouseSNOWFLAKE-ACCOUNT-IDENTIFIERDIAGNOSTICS-SNOWFLAKE-WAREHOUSE-NAME=DATALOGZ_BIOPSDIAGNOSTICS-SNOWFLAKE-DATABASE-NAME=DATALOGZ_WHDIAGNOSTICS-SNOWFLAKE-PASSWORDDIAGNOSTICS-SNOWFLAKE-ROLE-NAME=DATALOGZ_DIAGNOSTICS_ADMIN_ROLEDIAGNOSTICS-SNOWFLAKE-SCHEMA-NAME=PUBLICDIAGNOSTICS-SNOWFLAKE-USERNAME=DATALOGZ_DIAGNOSTICS_ADMIN# (Optional)# if using Azure Storage Account# Specifies the connection string to the Azure Blob Storage account.# e.g. DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=mykey;EndpointSuffix=core.windows.netAZURE-BLOB-CONNECTION-STRINGAZURE-BLOB-CONTAINER-NAMEAZURE-BLOB-STORAGE-ACCOUNT-NAME
Repository: datalogz-bi-gateway
File: .env.prod
Pull the code and add the following environment variables to a dot file in your project directory named .env.prod using the correct values based on the examples provided.
ENV=PROD# This variable is used to construct the redirect urls for OAuth.# The hostname value should be ideally set to the host name of the server that serves the frontend to the client. # Moreover, the IP address could also be used if the host name is not registered.HOST_NAME=app.your_hostname.comHTTP_SCHEME=httpsCRON_SERVICE_URL=https://airflow_webserver:8080# Include either "MICROSOFT" or "TABLEAU_SERVER"CONFIGURED_IDPS=["MICROSOFT"]# Create this SQL user in advance following the SQL code in Databases sectionENTITLEMENT_USER_NAME=datalogz_gateway_user# Key Vault Authentication# Option 1 - If using Managed Identity Access to Key VaultAZURE_RESOURCE_GROUP_NAME=AZURE_VM_NAME=AZURE_KEY_VAULT_NAME=AZURE_KEY_VAULT_URL=# Option 2 - If using Environment Access to Key VaultAZURE_TENANT_ID=AZURE_CLIENT_ID=AZURE_CLIENT_SECRET=
Key vault
Navigate to your Key Vault in the Azure Portal and add the following environment variables populating them with the correct values based on the examples provided.
# Specifies the connection string to the application's database# e.g. postgresql://datalogz_gateway_admin:<password>@<host>:<port>/datalogz_bi# If SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<cert_name>.crt.pem# Some postgreSQL servers may require <user> to be in <user@host> format.APP-DB-CONNECTION-STRING# Specifies the connection string to the application's OLAP warehouse# e.g. postgresql://datalogz_diagnostics_admin:<password>@<host>:<port>/datalogz_wh?options=-csearch_path%3Dbiops_marts%2Dbiops_general
# If SSL is required include parameters: &sslmode=require&sslrootcert=/opt/airflow/<cert_name>.crt.pem# Some postgreSQL servers may require <user> to be in <user@host> format.BI-DB-CONNECTION-STRING# Specifies the private access token used for callbacks to gateway on task success/fail# e.g. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
AIRFLOW-TO-GATEWAY-ACCESS-TOKEN# Specifies the algorithm used to encode and decode JWT tokens# the secret should at least be 32 characters long, but the longer the better.JWT-ALGORITHM=HS256JWT-SECRET-KEY# Specifies the API key for the mail client to send emailsMAIL-CLIENT-API-KEY# Specifies the client ID and Decret for the Microsoft OAuth2 applicationMICROSOFT-CLIENT-IDMICROSOFT-CLIENT-SECRET# PowerBI API Authentication# Option 1# Specifies the client ID and Secret for the Power BI OAuth2 applicationPOWERBI-CLIENT-IDPOWERBI-CLIENT-SECRET# Option 2# Specifies the client ID and Secret for the PowerBI Service PrincipalPOWERBI-SP-TENANT-IDPOWERBI-SP-CLIENT-IDPOWERBI-SP-CLIENT-SECRET