Create a new Virtual Private Cloud (VPC) to provide an isolated network environment for your Datalogz deployment in the cloud.
If you already have an existing VPC you wish to deploy Datalogz into, skip this step.
The VPC should be created in the region of your choice and should be configured with an IPv4 CIDR block that can support the expected number of subnets and IP addresses for your deployment.
Log in to the cloud provider's console and navigate to the VPC management section.
Click "Create VPC."
Name your VPC "Datalogz VPC" or choose another appropriate name.
Enter the IPv4 CIDR block, such as 10.0.0.0/16, that provides enough IP addresses for your planned deployment.
Leave all the other options as default and click "Create VPC."
Next, you will need to create subnets in this VPC to enable different tiers of your application to communicate with each other securely.
Amazon Web Services
The following documentation describes the resources required to deploy Datalogz on AWS.
Resource List
(1) AWS Virtual Private Cloud
(1) AWS Internet Gateway
(2) Subnets
(1 or 2) AWS EC2 Instance w/ SSH Key Pair
(1) AWS RDS for PostgreSQL Server
App DB
Warehouse DB
Airflow DB
(1) AWS Secrets Manager
(0 or 1) AWS S3 Bucket w/ IAM Role
For Snowflake customers:
(1) Snowflake Warehouse
Warehouse DB (instead of Warehouse DB in Postgres)
Security Group
Create a Security Group in AWS with appropriate inbound and outbound rules to allow specific types of traffic between the VPC and the internet, such as SSH, PostgreSQL, and HTTPS. Ensure that access
Ensure that access is restricted to specific IP addresses or network ranges, and implement other security measures to protect sensitive data and resources.
Go to the VPC dashboard and select the Security Group option.
Click on the Create Security group button.
Provide a name and description for the security group. For example, name it "datalogz-security-group" and provide a description such as "Allow SSH, PostgreSQL, and HTTPS for Datalogz".
Select the VPC that was created previously in the VPC Deployment section.
In the Inbound Rules section:
Add a rule to allow incoming traffic on port 5432 for PostgreSQL, with the source set to 10.0.0.0/24 (the IP address range of the VPC).
Add a rule to allow incoming traffic on port 22 for SSH, with the source set to the IP address or IP address range of the developers who will be accessing the EC2 instance.
Add a rule to allow incoming traffic on port 443 for HTTPS. Set the source to the IP address(es) of your VPN for a private deployment.
In the Outbound Rules section:
Add a rule to allow outgoing traffic on port 443 for HTTPS, with the destination set to anywhere.
Add a rule to allow outgoing traffic on port 80 for HTTP, with the destination set to anywhere.
Click on the Create Security group button to create the security group.
These steps will create a security group that allows incoming traffic on ports 22, 5432, and 443, and outgoing traffic on ports 80 and 443. The security group will also restrict incoming traffic to specific sources, such as the IP address range of the VPC and the IP address range of the developers who will be accessing the EC2 instance via SSH.
Secrets Manager
Datalogz uses AWS Secrets Manager to store sensitive secrets required to run the application.
Access the Secrets Manager service and select "Store".
Choose "Other Type of secret" as the secret type.
Input your secret's key and value. Use "Add Row" to add multiple secrets from the following list.
For Encryption key, select "aws/secretsmanager".
Select "Next".
Name your secret "datalogz_secrets". Add a description such as "secret values used by Datalogz to securely access credentials".
Select "Next".
This step is optional. Datalogz recommends creating a lambda function to enable secrets rotation. Once configured, select "Next".
Review your secrets. If everything looks good, select "Next". Otherwise, select "Previous" to make updates where necessary.
Environment Variables List
Repository: datalogz-bi-diagnostic
File: .prod.env
ENV=PRODDBT_ENV=prodWAREHOUSE_TYPE=POSTGRES# Warehouse# If using Snowflake, change WAREHOUSE_TYPE to SNOWFLAKE
Secrets Manager
# Specifies the URL or connection string to the Celery result backend# e.g. db+postgresql://<user>:<pass>@<host>/airflow# if SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<name_of_cert>.crt.pemAIRFLOW--CELERY--RESULT-BACKEND# Specifies the URL or connection string to the Airflow metadata database.# e.g. postgresql+psycopg2://<user>:<pass>@<host>/airflow# if SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<name_of_cert>.crt.pemAIRFLOW--DATABASE--SQL-ALCHEMY-CONN# Specifies the Fernet key used for encrypting and decrypting Airflow connections and variables. # This must be a 32 base64 encoded string# e.g. Example format: jHfPb-mvRhWyofw8bzyCJym-HyKjSNNbwS8bLJjK0Vo=AIRFLOW-FERNET-KEY# Specifies the hostname or IP address of the PostgreSQL database server used for Airflow metadata storage.AIRFLOW-POSTGRES-DB-HOST# Specifies the name of the PostgreSQL database, user and password used for Airflow metadata storage.AIRFLOW-POSTGRES-DB-NAME=airflowAIRFLOW-POSTGRES-DB-USERAIRFLOW-POSTGRES-DB-PASSWORD# Specifies the username and password for the Airflow web UI.AIRFLOW-WWW-USERAIRFLOW-WWW-PASSWORD# Specifies the private access token used for callbacks to gateway on task success/fail# e.g. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
AIRFLOW-TO-GATEWAY-ACCESS-TOKENGATEWAY-SERVICE-BASE-URL=http://bi_gateway_service:5000# if using Postgres WarehouseDIAGNOSTICS-POSTGRES-HOSTDIAGNOSTICS-POSTGRES-DATABASE-NAME=datalogz_whDIAGNOSTICS-POSTGRES-PASSWORDDIAGNOSTICS-POSTGRES-SCHEMA-NAME=publicDIAGNOSTICS-POSTGRES-USERNAME=datalogz_diagnostics_admin# if using Snowflake WarehouseSNOWFLAKE-ACCOUNT-IDENTIFIERDIAGNOSTICS-SNOWFLAKE-WAREHOUSE-NAME=DATALOGZ_BIOPSDIAGNOSTICS-SNOWFLAKE-DATABASE-NAME=DATALOGZ_WHDIAGNOSTICS-SNOWFLAKE-PASSWORDDIAGNOSTICS-SNOWFLAKE-ROLE-NAME=DATALOGZ_DIAGNOSTICS_ADMIN_ROLEDIAGNOSTICS-SNOWFLAKE-SCHEMA-NAME=PUBLICDIAGNOSTICS-SNOWFLAKE-USERNAME=DATALOGZ_DIAGNOSTICS_ADMIN
Repository: datalogz-bi-gateway
File: .env.prod
ENV=PRODHOST_NAME=app.your_hostname.comHTTP_SCHEME=httpsCRON_SERVICE_URL=https://airflow_webserver:8080# Include either "MICROSOFT" or "TABLEAU_SERVER"CONFIGURED_IDPS=["MICROSOFT"]# Create this SQL user in advance following the SQL code in Databases sectionENTITLEMENT_USER_NAME=datalogz_gateway_user
Secrets Manager
# Specifies the connection string to the application's database# e.g. postgresql://datalogz_gateway_admin:<password>@<host>:<port>/datalogz_bi# If SSL is required include parameters: ?sslmode=require&sslrootcert=/opt/airflow/<cert_name>.crt.pemAPP-DB-CONNECTION-STRING# Specifies the connection string to the application's OLAP warehouse# e.g. postgresql://datalogz_diagnostics_admin:<password>@<host>:<port>/datalogz_wh?options=-csearch_path%3Dbiops_marts%2Dbiops_general
# If SSL is required include parameters: &sslmode=require&sslrootcert=/opt/airflow/<cert_name>.crt.pemBI-DB-CONNECTION-STRING# Specifies the private access token used for callbacks to gateway on task success/fail# e.g. eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
AIRFLOW-TO-GATEWAY-ACCESS-TOKEN# Specifies the algorithm used to encode and decode JWT tokens# the secret should at least be 32 characters long, but the longer the better.JWT-ALGORITHM=HS256JWT-SECRET-KEY# Specifies the API key for the mail client to send emailsMAIL-CLIENT-API-KEY# If using Microsoft Active Directory# Specifies the client ID and Decret for the Microsoft OAuth2 applicationMICROSOFT-CLIENT-IDMICROSOFT-CLIENT-SECRET# If using PowerBI# Specifies the client ID and Secret for the Power BI OAuth2 applicationPOWERBI-CLIENT-IDPOWERBI-CLIENT-SECRET
Internet Gateway and Route Table
Create an Internet Gateway and Route Table in AWS with appropriate routing rules to allow specific types of traffic between the VPC and the internet, such as HTTP, HTTPS, and DNS.
If you are deploying Datalogz into a private VPC to connect to an on-premises RESTful API exposed by the BI tool (e.g. Tableau Server) you can skip this step.
Ensure that access is restricted to specific IP addresses or network ranges and that appropriate security measures are implemented to protect the VPC and its resources.
Go to the VPC dashboard and select the Internet Gateway option.
Click on the Create Internet Gateway button.
Provide a name for the Internet Gateway such as datalogz-internet-gateway.
Click on the Create Internet Gateway button to create the gateway.
Note down the ID of the Internet Gateway created, as it will be needed in step 13.
Go to the Route Table option and click on the Create Route Table button.
Provide a name for the Route Table, such as datalogz-route-table.
Select the VPC that was created earlier in the VPC Deployment section.
Click on the Create Route Table button to create the route table.
Go to the Subnet Association section of the route table and click on the Edit Subnet Association button.
Select the two subnets that were created earlier in the Subnet Deployment section.
Click on the Save association button to save the subnet associations.
Finally, go to the Routes section of the route table and click on the Edit routes button.
Click on the Add routes button, and enter the destination as 0.0.0.0/0 and the target as the ID of the Internet Gateway that was created earlier in step 5.
Click on the Save changes button to save the new route.
Subnets
When you create an RDS Postgres instance and add it to a VPC, the database requires that the VPC contains subnets in at least 2 different availability zones.
Go to subnets and create a new subnet.
Enter the VPC ID of the Datalogz VPC created in the VPC Deployment section.
Enter 10.0.0.0/25 as the IPv4 CIDR.
Choose an Availability Zone for the subnet.
Click Create Subnet.
Create at least two subnets in different Availability Zones. The RDS Postgres database requires that the VPC contains at least two subnets in different availability zones. Repeat the above steps to create another subnet with the following changes:
In step 3, enter the IPv4 CIDR as 10.0.0.128/25.
In step 4, choose a different Availability Zone than the one selected for the first subnet.
S3 Bucket
Datalogz uses AWS S3 to stage external files for ingestion into the target warehouse. This is default for Snowflake warehouses and optional for Postgres warehouses.
Sign in to your AWS console
Navigate to the S3 service and select "Create Bucket"
Name your bucket "datalogzbidiagnostics"
Disable ACLs for object ownership configuration
Choose "Block all public access" for public access configuration
Disable versioning for the bucket
For the default encryption configuration, choose Amazon S3-managed keys and select the "Enabled" option for the bucket key
Click on "Create Bucket" to create your S3 bucket.
IAM Roles
To enable access to the S3 bucket and the Secrets Manager that we created, we will create a custom policy and a role.
Create Policy
Log in to the AWS IAM Management Console.
Click on "Policies".
Create a policy that specifically allows read and write permissions only on the S3 bucket that we created in the S3 deployment guide, and only allows read access to the secrets we created in the Secrets Manager deployment guide.
Click on the JSON tab and paste the following JSON string, replacing "datalogz-s3" with the name of the S3 bucket you created in the S3 deployment guide.
Click "Next: Tags" and optionally add tags for the policy.
Click "Next: Review".
Enter a name for the policy, such as "datalogz-policy-to-access-s3-and-secrets-from-ec2".
Click "Create Policy".
Create Role
Go to "Roles" and click "Create Role".
Choose "AWS service" as the trusted entity type.
Choose "EC2" under "Common use cases".
Click "Next".
In the "Permissions" section, search for the policy we just created and select it.
Click "Next".
Enter a name for the role, such as "datalogz-role-access-resources-from-ec2".
Datalogz runs on EC2 Instances inside Docker Containers for a simple, cost-effective deployment that can be scaled vertically as demand increases.
EC2 System Requirements
Option 1: Monolith VM supporting backend and frontend services
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $160 / mo.
Total Estimated Cost: $80 - $160 / mo.
Option 2: Split VM each supporting either frontend or backend service
Backend VM
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: No changes required
Estimated Cost as of 2/1/2023: $160 / mo.
Frontend VM
Minimum / Recommended
CPU: 2 vCPU
Memory: 4 GB RAM
OS Disk: No change
Estimated Cost as of 2/1/2023: $40 / mo.
Total Estimated Cost: $120 - $200 / mo.
Provisioning
Go to the EC2 section
Click on Launch Instance
Name your EC2
Select Ubuntu as Amazon Machine Image
Select instance type t2.xlarge
Generate a key pair for SSH into the EC2
Expand Networking settings
Select the VPC that we created in the VPC deployment Guide
Select any of the 2 subnets that we created in the Subnet Deployment Guide
Click Auto Assign public IP and click Enable
Under the Firewall (security groups), select "Select Existing Security group"
From the drop-down, select the security group that we created in the Security Group section
Next, configure the storage to at least 50 GiB
Click Advanced details
Select the IAM role that we created in the IAM Roles section
Leave the rest as default
Click Launch Instance
The security group inbound rules on this machine should allow HTTP/HTTPS traffic from your private network IP so your users can access the site. The security group outbound rules on this machine should allow HTTPS traffic from your network IP so the Gateway API Service can make HTTPS connections to 3rd party services, such as Microsoft for running the OAuth2.0.
The security group inbound rules on this machine should allow SSH traffic from your private team's IP so your team can remotely log in and deploy the builds.
SSH into VM
Set the correct permissions on your SSH key pair before connecting to the virtual machine, updating the following variables with your key name, user name, and VM IP address.
SSH into the VM to install Docker and clone the repositories.
Example:
ssh -i "ec2_bi-monolith_testing_us-east-1_001.pem" ubuntu@ec2-xx-xxx-xxx-xx.compute-1.amazonaws.com
Run an update
Ubuntu: sudo apt update
Installing Docker
Both the Datalogz frontend and backend applications are deployed using Docker.
Install Docker Desktop (Windows and Linux)
Please continue with the Docker Desktop installation referencing the official docs.
Set up credential store by running the command in a Command Prompt:
git config --global credential.helper 'store'
The next time you run git pull on a remote origin and sign-in, your credentials will be cached for future reuse.
SSL Certificates
Deploying a Datalogz Proof-of-Concept (POC) will use self-signed keys generated during the build process to enable encrypted communications over HTTPS, and you will access your VM either using the Public IP Address of the VM or an Azure-provided DNS ending in *.cloudapp.azure.com. For example:
Monolith VM
https://x.x.x.x OR https://mono-mycomanywin11.eastus2.cloudapp.azure.com
Split VM
https://x.x.x.x OR https://app-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.y OR https://api-mycomanywin11.eastus2.cloudapp.azure.com
Deploying Datalogz into Production (PROD) enables you to provide your own Certificates for deploying Datalogz to new subdomains on an existing domain. For example:
Monolith VM
https://app.datalogz.mycompany.com
Split VM
https://app.datalogz.mycompany.com
https://api.datalogz.mycompany.com
RDS Databases
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads.
Database System Requirements
Option 1: Postgres RDS Only
Minimum
Memory Optimized
Compute: 2 vCores, 16 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $500 / mo.
Recommended
Memory Optimized
Compute: 4 vCores, 32 GiB Memory, 6400 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $850 / mo.
Total Estimated Cost: $500 - $850 / mo.
Option 2: Postgres RDS + Snowflake
Postgres
Minimum/Recommended
Memory Optimized
Compute: 2 vCores, 8 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $200 / mo.
Snowflake (See Snowflake Option Section)
Object Creation
AWS
Go to RDS and click Create Database
Choose Standard Create, Postgresand choose your desired size
Settings:
Name your database instance
Provide the credentials to your database. Store these credentials somewhere secure. We will come to these credentials later in the Secrets Manager section
Instance configuration:
Leave all the settings as default
Storage:
Set Allocated Storage to 50 GiB
Connectivity:
Select the VPC and the subnets that we created in the VPC deployment section
Select the security group that we created in the Security Group section
Leave the rest as default
Leave all the other sections as default
Click Create database
Option 1: Postgres Only
App DB
Name: datalogz_bi
---- Run the following commands from the master (postgres) database---- Run these commands one at a timeCREATEDATABASEdatalogz_bi;CREATEROLE datalogz_gateway_admin WITHPASSWORD'<password>'LOGIN; -- Add the password!GRANT ALL PRIVILEGES ONDATABASE datalogz_bi TO datalogz_gateway_admin WITHGRANTOPTION;CREATEDATABASEairflow; CREATEROLE airflow WITHPASSWORD'<password>'LOGIN; -- Add the password!GRANT ALL PRIVILEGES ONDATABASE airflow TO airflow;
--- Connect to the datalogz_bi database as datalogz_gateway_admin---- Run the following commands from the datalogz_bi databaseGRANT ALL PRIVILEGES ONSCHEMA public TO datalogz_gateway_admin;GRANTSELECT, INSERT, UPDATE, DELETEON ALL TABLES INSCHEMA public TO datalogz_gateway_admin;
Warehouse DB
Name: datalogz_wh
---- Run the following commands from the master (postgres) database-- Setup the BI Warehouse Database (OLAP - App Workflows)CREATEDATABASEdatalogz_wh;CREATEROLE datalogz_diagnostics_admin WITHPASSWORD'<password>'LOGIN; -- Add the password!CREATEROLE datalogz_gateway_admin WITHPASSWORD'<password>'LOGIN CREATEROLE; -- Add the password!CREATEROLE datalogz_gateway_user WITHPASSWORD'<password>'LOGIN; -- Add the password!CREATEROLE write_user_role;CREATEROLE read_user_role;GRANT ALL ONDATABASE datalogz_wh TO datalogz_diagnostics_admin WITHGRANTOPTION;GRANT ALL ONDATABASE datalogz_wh TO datalogz_gateway_admin WITHGRANTOPTION;GRANT ALL ONDATABASE datalogz_wh TO write_user_role;---- Run the following commands from the datalogz_wh databaseCREATESCHEMApowerbi_ingest;CREATESCHEMApowerbi_source;CREATESCHEMAtableau_ingest;CREATESCHEMAtableau_source;CREATESCHEMAbiops_general;CREATESCHEMAbiops_int;CREATESCHEMAbiops_marts;GRANT USAGE ONSCHEMA powerbi_ingest TO datalogz_diagnostics_admin;GRANT USAGE ONSCHEMA powerbi_source TO datalogz_diagnostics_admin;GRANT USAGE ONSCHEMA tableau_ingest TO datalogz_diagnostics_admin;GRANT USAGE ONSCHEMA tableau_source TO datalogz_diagnostics_admin;GRANT USAGE ONSCHEMA biops_general TO datalogz_diagnostics_admin;GRANT USAGE ONSCHEMA biops_int TO datalogz_diagnostics_admin;GRANT USAGE ONSCHEMA biops_marts TO datalogz_diagnostics_admin;GRANT USAGE ONSCHEMA biops_general TO datalogz_gateway_admin;GRANT USAGE ONSCHEMA biops_marts TO datalogz_gateway_admin;GRANT USAGE ONSCHEMA biops_general TO read_user_role;GRANT USAGE ONSCHEMA biops_marts TO read_user_role;GRANT ALL ONSCHEMA biops_general TO write_user_role;ALTERSCHEMA powerbi_ingest OWNERTO datalogz_diagnostics_admin;ALTERSCHEMA powerbi_source OWNERTO datalogz_diagnostics_admin;ALTERSCHEMA tableau_ingest OWNERTO datalogz_diagnostics_admin;ALTERSCHEMA tableau_source OWNERTO datalogz_diagnostics_admin;ALTERSCHEMA biops_general OWNERTO datalogz_diagnostics_admin;ALTERSCHEMA biops_marts OWNERTO datalogz_diagnostics_admin;ALTERSCHEMA biops_int OWNERTO datalogz_diagnostics_admin;GRANT read_user_role TO datalogz_gateway_user;GRANT write_user_role TO datalogz_gateway_admin;CREATETABLEbiops_general.entitlement( account_id int, environment_id int, connector_id int, account_connector_key varchar(128), user_name varchar(256), system_name varchar(256), entitlement_type varchar(256), entitlement_status varchar(256), created_at timestampdefault current_timestamp, updated_at timestampdefault current_timestamp);GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA powerbi_ingest TO datalogz_diagnostics_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA powerbi_source TO datalogz_diagnostics_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA tableau_ingest TO datalogz_diagnostics_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA tableau_source TO datalogz_diagnostics_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA biops_general TO datalogz_diagnostics_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA biops_int TO datalogz_diagnostics_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA powerbi_ingest TO datalogz_gateway_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA powerbi_source TO datalogz_gateway_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA tableau_ingest TO datalogz_gateway_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA tableau_source TO datalogz_gateway_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA biops_general TO datalogz_gateway_admin;GRANTSELECT, INSERT, UPDATEON ALL TABLES INSCHEMA biops_general TO datalogz_gateway_admin;GRANTSELECTONTABLE datalogz_wh.biops_general.entitlement TO read_user_role;GRANTINSERT, SELECT, UPDATEONTABLE datalogz_wh.biops_general.entitlement TO write_user_role;-- Before running this step, ensure the extension has been enabled in Azure-- In Azure Portal, go to Server parameters and include the following configuration:-- Azure.extensions -> FUZZYSTRMATCH enabledCREATE EXTENSION IFNOTEXISTS FUZZYSTRMATCH WITHSCHEMA biops_marts;CREATE EXTENSION IFNOTEXISTS plpgsql;
Option 2: Postgres + Snowflake
Postgres
App DB
Name: datalogz_bi
-- Setup the Application Database (OLTP - App Workflows)CREATEDATABASEdatalogz_bi;CREATEROLE datalogz_gateway_admin WITHPASSWORD'<password>'LOGIN; -- Add the password!GRANT ALL PRIVILEGES ONDATABASE datalogz_bi TO datalogz_gateway_db_admin WITHGRANTOPTION;GRANT ALL PRIVILEGES ONSCHEMA public TO datalogz_gateway_db_admin;GRANTSELECT, INSERT, UPDATE, DELETEON ALL TABLES INSCHEMA public TO datalogz_gateway_db_admin;-- Setup the Airflow Database (OLTP - API Connector Workflows)CREATEDATABASEairflow; CREATEROLE airflow WITHPASSWORD'<password>'LOGIN; -- Add the password!GRANT ALL PRIVILEGES ONDATABASE airflow TO airflow;
BI Warehouse DB (See Snowflake Option Section)
Name: datalogz_wh
Build
Follow these steps to build the Datalogz services on your virtual machine.
To build the Datalogz application, follow these steps:
ssh into a linux virtual machine to build the following repositories.
datalogz-bi-diagnostics (ELT)
datalogz-bi-gateway (API)
datalogz-bi-frontend (APP)
git pull the main/master branches of each repository
Read the README.md for each repository
Build the services:
Build datalogz-bi-diagnostics
Confirm non-sensitive environment variables listed here have been added to .prod.env and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
DBT_ENV=prod
WAREHOUSE_TYPE=POSTGRES
From the project directory, run source ./init_env.sh to run through the interactive build script:
Choose env: prod
Choose warehouse: postgres
Choose cloud: azure, aws
Choose IAM method for VM: env, id
Choose vm setup: mono or split
Choose deployment method: private
The ./init_env.sh script will build the correct docker compose file based on the options that are chosen.
Build datalogz-bi-gateway
Confirm non-sensitive environment variables listed here have been added to .prod.env and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
PYTHONDONTWRITEBYTECODE=1
CRON_SERVICE_URL=https://airflow_webserver:8080
HTTP_SCHEME=https
HOST_NAME=localhost
Change localhost to your host DNS or private IP of the VM
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
Split VM:
Run docker compose -f docker-compose.prod.yml up --build -d
Build datalogz-bi-frontend
Add the following environment variables to .env
REACT_APP_IDENTITY_PROVIDER = 'TABLEAU_SA'
VM_MANAGED_IDENTITY = 'TRUE'
Values for REACT_APP_IDENTITY_PROVIDER can be:
MICROSOFT
TABLEAU_SSO_FOR_CLOUD
TABLEAU_SA
WORKOS
GOOGLE
If REACT_APP_IDENTITY_PROVIDER is set, MANAGED_IDENTITY can also be set to TRUE to embed Service Principal credentials into API build for new Connectors to use by default.
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
OR
Split VM:
Run docker compose -f docker-compose.dev.yml up --build -d
Snowflake Option
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads. Teams can choose to run their BI Warehouse using Snowflake following the steps:
Snowflake
Minimum
Compute: X-Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $600 / mo.
Recommended
Compute: Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $1000 / mo.
Total Estimated Cost: $600 - $1000 / mo.
Object Creation
Snowflake
Warehouse DB
Name: datalogz_wh
The following instructions make reference to POWERBI, but this can be replaced with either TABLEAU or LOOKER as appropriate.