Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The Datalogz system design provides customers a flexible approach for deploying Datalogz according to their preferences.
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads. Customers may choose which database technologies they would like to use for both the application and warehouse databases.
Typical OLTP Workloads include: - Create and manage Accounts and Environments - Create and manage Roles, Users, and Permissions - Create and manage Connectors - Create and manage BI Activity Dashboard - Create and manage Operations and Actions - Create and manage Impact Reports
Typical OLAP Workloads include: - Transform raw JSON data into enriched dimensional datasets - Identify issues in the BI environment related to ROI, security and compliance - Generate Change History for all BI metadata endpoints - Produce Context Logs for identifying root causes of Issues - Generate Recommendations for improving BI environments The following options are available today: 1. Postgres Only - App DB: PostgreSQL (OLTP) - BI WH: PostgreSQL (OLAP) 2. Postgres + Snowflake - App DB: PostgreSQL (OLTP) - BI WH: Snowflake (OLAP)
This option utilizes a single PostgreSQL server with two databases -- one for OLTP workloads and another for OLAP workloads.
This option utilizes a PostgreSQL database for OLTP workloads and a Snowflake database for OLAP workloads.
The Datalogz application uses Apache Airflow for connector management, providing BI Admins with pre-built metadata pipelines they can choose to schedule daily, weekly or hourly basis. New issues and recommendations will be generated after each connector refresh based on the latest data that has changed.
Your connectors will retrieve metadata from the following API endpoints: - PowerBI: Endpoints listed here. - Tableau: Endpoints listed here. - Looker: Endpoints listed here.
Connectors must be configured by BI Admins to approve the Datalogz application. This will provide read-only access to standard and admin level APIs based on a selection of Groups. Groups are generally defined as follows for each system: - PowerBI: Workspaces - Tableau: Projects - Looker: Folders
The admin level APIs unlock the most insight for your BI Admins when it comes to types of Issues and Recommendations Datalogz is able to provide. After a new connector is created, BI Admins can use Datalogz RBAC to assign fine-grained permissions to Users who should only have access to certain metadata from certain Groups.
The Service Principal you create for Datalogz utilizes a combination of admin and standard endpoints to retrieve the activity, lineage, query expression, and inventory metadata.
The following data flow diagram shows the endpoints from the PowerBI Rest API used to build the metadata model for Datalogz BI Ops. This will run on a daily or weekly basis defined when creating your Datalogz connector.
All warehouse transformations are source-controlled and executed using an open-source technology named dbt-core
. Metadata loaded into the BI Warehouse is transformed using dbt-core
to produce insights and recommendations that can be used to improve BI operations.
The dbt-core service is included as part of the ELT API and shown in the diagram below:
These deployment guides help teams to deploy the Datalogz BI Ops product.
Proceed to view a high-level overview of the Datalogz system design.
Datalogz uses Azure Storage Account to stage external files for ingestion into the target warehouse. This is default for Snowflake warehouses and optional for Postgres warehouses.
Login to Azure Portal and create a new Gen2 Storage Account (hierarchical namespace).
Add the storage account to the appropriate virtual network from Storage Account > Networking panel.
Create a new container named datalogzbidiagnostics
Set the access level to Private
Ensure the following environment variables have been added to your Azure Key Vault. These are required for your VM or environment credentials to be authenticated to read/write to the storage container:
Navigate to Key Vault > Secrets and click Generate/Import
Create a new secret key and value for each of the following:
AZURE-BLOB-CONTAINER-NAME
AZURE-BLOB-STORAGE-ACCOUNT-NAME
AZURE-BLOB-CONNECTION-STRING
The values for these are available in your Storage Account:
The value for AZURE-BLOB-CONTAINER-NAME
should be set to datalogzbidiagnostics
The value for AZURE-BLOB-STORAGE-ACCOUNT-NAME
should be set to the name of your storage account.
The value for AZURE-BLOB-CONNECTION-STRING
can be found in:
Storage Account > Access Keys > Connection String
Datalogz runs on Virtual Machines inside Docker Containers for a simple, cost-effective deployment that can be scaled vertically as demand increases. You may deploy using either Windows or Linux.
Provisioning
Create new virtual machine(s) to host your Datalogz frontend application and backend API in the region of your choice. You may choose either a Windows 10/11 distribution or a Linux distribution.
The security group inbound rules on this machine should allow HTTP/HTTPS traffic from your private network IP – so your users can access the site.
The security group outbound rules on this machine should allow HTTPS traffic from your network IP – so the Gateway API Service can make HTTPS connection to 3rd party services such as Microsoft for running the OAuth2.0.
The security group inbound rules on this machine should allow SSH or RDP traffic from your private team’s IP – so your team can remotely login and deploy the builds.
Ensure you have enabled ManagedIdentity access on the Virtual Machine in the “Identity” panel on the Virtual Machine page in the Azure portal. This will ensure the VMs identity can be used to retrieve keys from the Key Vault requiring no sensitive credentials to be located in an .env file on the VM itself.
Grant access to the key vault to this VM identity following instructions here.
SSH or RDP into the VM to install Docker and clone the repositories.
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $160 / mo.
Total Estimated Cost: $80 - $160 / mo.
Backend VM
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: No changes required
Estimated Cost as of 2/1/2023: $160 / mo.
Frontend VM
Minimum / Recommended
CPU: 2 vCPU
Memory: 4 GB RAM
OS Disk: No change
Estimated Cost as of 2/1/2023: $40 / mo.
Total Estimated Cost: $120 - $200 / mo.
Datalogz is a portable application that can be deployed to either Windows or Linux machines using Docker to virtualize resources.
Only certain Windows 10/11 VM Images on Azure support nested virtualization. Please choose a VM from this list that has the three asterisks denoting ***Hyper-threaded and capable of running nested virtualization.
Datalogz recommends using the following sizes based on whether you choose Option 1 or 2 above:
Size for Single (Monolith) VM: D4d_v4
Sizes for Split VM:
Backend: D2d_v4
Frontend: D2d_v4
SSH
Set the correct permissions on your SSH key pair before connecting to the virtual machine, updating the following variables with your key name, user name, and VM IP address
Download the Remote Desktop Protocol (RDP) connection file for the Windows 10/11 VM from the Azure portal and ensure the machine connecting to the remote resources is allowed on port 3389 (RDP).
Both the Datalogz frontend and backend applications are deployed using Docker.
Installing Docker on Windows 10/11 performs best when using Windows Subsystem for Linux (WSL2). Here are the steps required to install this pre-requisite.
Use Remote Desktop Protocol to connect to the VM
Open a Powershell Terminal as Administrator and run the following command to setup Windows Subsystem for Linux:
wsl --install
Restart the VM. When the VM restarts, Windows Subsystem for Linux may start automatically. You can create a new user named dl_windows_linux_user with your own password to access WSL2 directly, but it's unlikely you will need to.
Now you can proceed to install Docker Desktop by following the official docs in the next section.
Please continue with the Docker Desktop installation referencing the official docs.
After installation as completed, open Docker Desktop and Accept the Terms of Conditions for Docker Engine to start. If the Docker Engine is not started you may need to Disconnect and Reconnect to the VM.
Download and install Git (Link)
Set up credential store by running the command in a Command Prompt:
git config --global credential.helper 'store'
The next time you run git pull
on a remote origin and sign-in, your credentials will be cached for future reuse.
Deploying a Datalogz Proof-of-Concept (POC) will use self-signed keys generated during the build process to enable encrypted communications over HTTPS, and you will access your VM either using the Public IP Address of the VM or an Azure-provided DNS ending in *.cloudapp.azure.com. For example:
https://x.x.x.x OR https://mono-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.x OR https://app-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.y OR https://api-mycomanywin11.eastus2.cloudapp.azure.com
Deploying Datalogz into Production (PROD) enables you to provide your own Certificates for deploying Datalogz to new subdomains on an existing domain. For example:
https://app.datalogz.mycompany.com
https://app.datalogz.mycompany.com
https://api.datalogz.mycompany.com
Datalogz uses Azure Key Vault to store sensitive secrets required to run the application.
To set up your Azure Key Vault follow these steps:
Login to Azure Portal
Create a new Key Vault if one has not already been created.
Proceed to either ManagedIdentity or EnvironmentCredential Identity options.
Navigate to the Virtual Machine you wish to use as a Managed System Identity and select Identity on the left sidebar.
Enable System Assigned identity and add the following role assignments to the VM:
“Key Vault Secrets Officer”
“Virtual Machine Contributor”
"Virtual Machine User Login"
Navigate to Key Vault and select Access configuration on the left sidebar.
Set permission model to “Azure role-based access control” and click Apply.
IAM permissions have already been configured in step 4, and you can confirm that they are present from the Key Vaults IAM page, if desired.
Add the Key vault to the default subnet in the same virtual network as the VM.
You can find this in the Key Vault > Networking tab.
Select "Allow public access from specific virtual networks and IP addresses."
Add the existing virtual network where your VM is located.
A service endpoint will be created for this subnet.
Note: If the subnet cannot take additional service endpoints, a new subnet will be required.
SSH into VM and install the Azure CLI
Login to Azure using 2FA from the SSH terminal, following the prompts
Assign the Identity to the VM and verify
Create a new App Registration to act as a Principal to access the Key Vault
Set Access configuration policy to “vault” access control
Create policy and add the Secrets Officer role to the App Principal.
Add the Key vault to the default subnet in the same virtual network as the VM is located in.
Pull the code and add the following environment variables to a dot file in your project directory named .prod.env
using the correct values based on the examples provided.
Pull the code and add the following environment variables to a dot file in your project directory named .env.prod
using the correct values based on the examples provided.
Pull the code and add the following environment variables to a dot file in your project directory named .env.prod
using the correct values based on the examples provided.
Navigate to your Key Vault in the Azure Portal and add the following environment variables populating them with the correct values based on the examples provided.
Datalogz allows Azure customers to easily integrate email notifications into Datalogz using the Microsoft Graph REST API and an Azure App Registration.
Documentation: Setting Up Email Notification with Datalogz Using Microsoft Graph REST API
Datalogz now allows users to set up email notifications for their team environments via the Microsoft Graph REST API (specifically, the 'user: sendMail' method). This feature enables any user in an organization to connect their Microsoft account through OAuth. Their email will then be utilized to send out email notifications to team members.
Please note that this feature is available only in private deployments.
Follow these steps to connect your Microsoft account:
Navigate to the 'Settings' page and select 'Email Settings'.
Click the 'Connect' button and execute the OAuth flow.
You will be redirected to the Microsoft consent screen. At this point, select a Microsoft account that has access to a mailbox (in other words, your Microsoft account should have Outlook access).
After granting consent, you will be redirected back to the Datalogz app. If the process is successful, a success message will appear in the popup.
If the process fails, the most common issue is that the selected Microsoft account does not have access to a mailbox. To resolve this, you can either ask your Microsoft account admin to provide a license to give your user access to a mailbox or you can choose a different account with mailbox access.
Furthermore, the exact reason for process failure will be detailed in the popup, as shown in the screenshot. This information should help you understand and address the issue.
Before initiating the connection process, please ensure that the redirect URL in the Microsoft App has been properly added: https://<host>/api/v0/oauth/azure_mail/redirect
. Failure to do so may result in unsuccessful integration between the Datalogz app and your Microsoft account.
Remember to replace <host>
with your specific hostname.
This integration will provide you with a seamless communication experience, ensuring that all team members stay informed and aligned. If you encounter any issues during setup, please refer to our troubleshooting section or contact our support team for further assistance.
Follow these steps to build the Datalogz services on your virtual machine.
To build the Datalogz application, follow these steps:
ssh
into a linux virtual machine to build the following repositories.
datalogz-bi-diagnostics (ELT)
datalogz-bi-gateway (API)
datalogz-bi-frontend (APP)
git pull
the main/master
branches of each repository
Read the README.md
for each repository
Build the services:
Build datalogz-bi-diagnostics
Add the following environment variables to .prod.env
ENV=PROD
DBT_ENV=prod
WAREHOUSE_TYPE=[SNOWFLAKE | POSTGRES]
If you are running Azure Managed Identity VMs, add the following:
AZURE_KEY_VAULT_URL=
AZURE_KEY_VAULT_NAME=
AZURE_VM_NAME=
AZURE_RESOURCE_GROUP_NAME=
From the project directory, run source ./init_env.sh
to run through the interactive build script:
Choose env: prod
Choose warehouse: postgres, snowflake
Choose cloud: azure, aws
Choose IAM method for VM: env, identity
Choose vm setup: mono
or split
The ./init_env.sh
script will build the correct docker compose file based on the options that are chosen.
Build datalogz-bi-gateway
Add the following environment variables to .prod.env
ENV=PROD
PYTHONDONTWRITEBYTECODE=1
CRON_SERVICE_URL=https://airflow_webserver:8080
HTTP_SCHEME=https
HOST_NAME=localhost
Change localhost
to your host DNS or private IP of the VM
If you are running Azure Managed Identity VMs, add the following:
AZURE_RESOURCE_GROUP_NAME=
AZURE_VM_NAME=
AZURE_KEY_VAULT_NAME=
AZURE_KEY_VAULT_URL=
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
Split VM:
Run docker compose -f docker-compose.prod.yml up --build -d
Build datalogz-bi-frontend
Add the following environment variables to .env
REACT_APP_IDENTITY_PROVIDER = 'MICROSOFT'
VM_MANAGED_IDENTITY = 'TRUE'
Values for REACT_APP_IDENTITY_PROVIDER can be:
MICROSOFT
TABLEAU_SSO_FOR_CLOUD
TABLEAU_SA
WORKOS
If REACT_APP_IDENTITY_PROVIDER
is set, MANAGED_IDENTITY
can also be set to TRUE to embed Service Principal credentials into API build for new Connectors to use by default.
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
OR
Split VM:
Run docker compose -f docker-compose.dev.yml up --build -d
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads.
Azure Database for PostgreSQL (flexible server)
Minimum
Memory Optimized
Compute: 2 vCores, 16 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $500 / mo.
Recommended
Memory Optimized
Compute: 4 vCores, 32 GiB Memory, 6400 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $850 / mo.
Total Estimated Cost: $500 - $850 / mo.
Azure Database for PostgreSQL
Minimum/Recommended
Memory Optimized
Compute: 2 vCores, 8 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $200 / mo.
Minimum
Compute: X-Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $600 / mo.
Recommended
Compute: Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $1000 / mo.
Total Estimated Cost: $800 - $1200 / mo.
App DB (Azure Database for PostgreSQL)
Name: datalogz_bi
Warehouse DB (Azure Database for PostgreSQL)
Name: datalogz_wh
App DB (Azure Database for PostgreSQL)
Name: datalogz_bi
Warehouse DB (Snowflake)
Name: datalogz_wh
Throughout this guide there will be references to secrets which should be stored in the Key Vault and environment variables which should be stored in an .env file in your project folder. Datalogz supports both the the class to authenticate to a Key Vault using an App Registration as a Principal.
Confirm non-sensitive environment variables listed have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Confirm non-sensitive environment variables listed have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Create a new Virtual Private Cloud (VPC) to provide an isolated network environment for your Datalogz deployment in the cloud.
The VPC should be created in the region of your choice and should be configured with an IPv4 CIDR block that can support the expected number of subnets and IP addresses for your deployment.
Log in to the cloud provider's console and navigate to the VPC management section.
Click "Create VPC."
Name your VPC "Datalogz VPC" or choose another appropriate name.
Enter the IPv4 CIDR block, such as 10.0.0.0/16, that provides enough IP addresses for your planned deployment.
Leave all the other options as default and click "Create VPC."
Next, you will need to create subnets in this VPC to enable different tiers of your application to communicate with each other securely.
Create an Internet Gateway and Route Table in AWS with appropriate routing rules to allow specific types of traffic between the VPC and the internet, such as HTTP, HTTPS, and DNS.
Ensure that access is restricted to specific IP addresses or network ranges and that appropriate security measures are implemented to protect the VPC and its resources.
Go to the VPC dashboard and select the Internet Gateway
option.
Click on the Create Internet Gateway
button.
Provide a name for the Internet Gateway such as datalogz-internet-gateway.
Click on the Create Internet Gateway
button to create the gateway.
Note down the ID of the Internet Gateway created, as it will be needed in step 13.
Go to the Route Table option and click on the Create Route Table
button.
Provide a name for the Route Table, such as datalogz-route-table.
Select the VPC that was created earlier in the VPC Deployment section.
Click on the Create Route Table
button to create the route table.
Go to the Subnet Association section of the route table and click on the Edit Subnet Association
button.
Select the two subnets that were created earlier in the Subnet Deployment section.
Click on the Save
association button to save the subnet associations.
Finally, go to the Routes section of the route table and click on the Edit
routes button.
Click on the Add
routes button, and enter the destination as 0.0.0.0/0 and the target as the ID of the Internet Gateway that was created earlier in step 5.
Click on the Save
changes button to save the new route.
When you create an RDS Postgres instance and add it to a VPC, the database requires that the VPC contains subnets in at least 2 different availability zones.
Go to subnets and create a new subnet.
Enter the VPC ID of the Datalogz VPC created in the VPC Deployment section.
Enter 10.0.0.0/25 as the IPv4 CIDR.
Choose an Availability Zone for the subnet.
Click Create Subnet.
Create at least two subnets in different Availability Zones. The RDS Postgres database requires that the VPC contains at least two subnets in different availability zones. Repeat the above steps to create another subnet with the following changes:
In step 3, enter the IPv4 CIDR as 10.0.0.128/25.
In step 4, choose a different Availability Zone than the one selected for the first subnet.
The following documentation describes the resources required to deploy Datalogz on AWS.
(1) AWS Virtual Private Cloud
(1) AWS Internet Gateway
(2) Subnets
(1 or 2) AWS EC2 Instance w/ SSH Key Pair
(1) AWS RDS for PostgreSQL Server
App DB
Warehouse DB
Airflow DB
(1) AWS Secrets Manager
(0 or 1) AWS S3 Bucket w/ IAM Role
For Snowflake customers:
(1) Snowflake Warehouse
Warehouse DB (instead of Warehouse DB in Postgres)
Create a Security Group in AWS with appropriate inbound and outbound rules to allow specific types of traffic between the VPC and the internet, such as SSH, PostgreSQL, and HTTPS. Ensure that access
Ensure that access is restricted to specific IP addresses or network ranges, and implement other security measures to protect sensitive data and resources.
Go to the VPC dashboard and select the Security Group option.
Click on the Create Security group button.
Provide a name and description for the security group. For example, name it "datalogz-security-group" and provide a description such as "Allow SSH, PostgreSQL, and HTTPS for Datalogz".
Select the VPC that was created previously in the VPC Deployment section.
In the Inbound Rules section:
Add a rule to allow incoming traffic on port 5432 for PostgreSQL, with the source set to 10.0.0.0/24 (the IP address range of the VPC).
Add a rule to allow incoming traffic on port 22 for SSH, with the source set to the IP address or IP address range of the developers who will be accessing the EC2 instance.
Add a rule to allow incoming traffic on port 443 for HTTPS. Set the source to the IP address(es) of your VPN for a private deployment.
In the Outbound Rules section:
Add a rule to allow outgoing traffic on port 443 for HTTPS, with the destination set to anywhere.
Add a rule to allow outgoing traffic on port 80 for HTTP, with the destination set to anywhere.
Click on the Create Security group button to create the security group.
These steps will create a security group that allows incoming traffic on ports 22, 5432, and 443, and outgoing traffic on ports 80 and 443. The security group will also restrict incoming traffic to specific sources, such as the IP address range of the VPC and the IP address range of the developers who will be accessing the EC2 instance via SSH.
Datalogz uses AWS S3 to stage external files for ingestion into the target warehouse. This is default for Snowflake warehouses and optional for Postgres warehouses.
Sign in to your AWS console
Navigate to the S3 service and select "Create Bucket"
Name your bucket "datalogzbidiagnostics"
Disable ACLs for object ownership configuration
Choose "Block all public access" for public access configuration
Disable versioning for the bucket
For the default encryption configuration, choose Amazon S3-managed keys and select the "Enabled" option for the bucket key
Click on "Create Bucket" to create your S3 bucket.
Datalogz uses AWS Secrets Manager to store sensitive secrets required to run the application.
Access the Secrets Manager service and select "Store".
Choose "Other Type of secret" as the secret type.
Input your secret's key and value. Use "Add Row" to add multiple secrets from the following list.
For Encryption key, select "aws/secretsmanager".
Select "Next".
Name your secret "datalogz_secrets". Add a description such as "secret values used by Datalogz to securely access credentials".
Select "Next".
This step is optional. Datalogz recommends creating a lambda function to enable secrets rotation. Once configured, select "Next".
Review your secrets. If everything looks good, select "Next". Otherwise, select "Previous" to make updates where necessary.
To enable access to the S3 bucket and the Secrets Manager that we created, we will create a custom policy and a role.
Create Policy
Log in to the AWS IAM Management Console.
Click on "Policies".
Create a policy that specifically allows read and write permissions only on the S3 bucket that we created in the S3 deployment guide, and only allows read access to the secrets we created in the Secrets Manager deployment guide.
Click on the JSON tab and paste the following JSON string, replacing "datalogz-s3" with the name of the S3 bucket you created in the S3 deployment guide.
Click "Next: Tags" and optionally add tags for the policy.
Click "Next: Review".
Enter a name for the policy, such as "datalogz-policy-to-access-s3-and-secrets-from-ec2".
Click "Create Policy".
Create Role
Go to "Roles" and click "Create Role".
Choose "AWS service" as the trusted entity type.
Choose "EC2" under "Common use cases".
Click "Next".
In the "Permissions" section, search for the policy we just created and select it.
Click "Next".
Enter a name for the role, such as "datalogz-role-access-resources-from-ec2".
Click "Create Role".
Datalogz runs on EC2 Instances inside Docker Containers for a simple, cost-effective deployment that can be scaled vertically as demand increases.
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $160 / mo.
Total Estimated Cost: $80 - $160 / mo.
Backend VM
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: No changes required
Estimated Cost as of 2/1/2023: $160 / mo.
Frontend VM
Minimum / Recommended
CPU: 2 vCPU
Memory: 4 GB RAM
OS Disk: No change
Estimated Cost as of 2/1/2023: $40 / mo.
Total Estimated Cost: $120 - $200 / mo.
Go to the EC2 section
Click on Launch Instance
Name your EC2
Select Ubuntu as Amazon Machine Image
Select instance type t2.xlarge
Generate a key pair for SSH into the EC2
Expand Networking settings
Select the VPC that we created in the VPC deployment Guide
Select any of the 2 subnets that we created in the Subnet Deployment Guide
Click Auto Assign public IP and click Enable
Under the Firewall (security groups), select "Select Existing Security group"
From the drop-down, select the security group that we created in the Security Group section
Next, configure the storage to at least 50 GiB
Click Advanced details
Select the IAM role that we created in the IAM Roles section
Leave the rest as default
Click Launch Instance
The security group inbound rules on this machine should allow HTTP/HTTPS traffic from your private network IP so your users can access the site. The security group outbound rules on this machine should allow HTTPS traffic from your network IP so the Gateway API Service can make HTTPS connections to 3rd party services, such as Microsoft for running the OAuth2.0.
The security group inbound rules on this machine should allow SSH traffic from your private team's IP so your team can remotely log in and deploy the builds.
Set the correct permissions on your SSH key pair before connecting to the virtual machine, updating the following variables with your key name, user name, and VM IP address.
SSH into the VM to install Docker and clone the repositories.
Example:
ssh -i "ec2_bi-monolith_testing_us-east-1_001.pem" ubuntu@ec2-xx-xxx-xxx-xx.compute-1.amazonaws.com
Run an update
Ubuntu: sudo apt update
Both the Datalogz frontend and backend applications are deployed using Docker.
Ubuntu: apt-get install git
Set up credential store by running the command in a Command Prompt:
git config --global credential.helper 'store'
The next time you run git pull
on a remote origin and sign-in, your credentials will be cached for future reuse.
Deploying a Datalogz Proof-of-Concept (POC) will use self-signed keys generated during the build process to enable encrypted communications over HTTPS, and you will access your VM either using the Public IP Address of the VM or an Azure-provided DNS ending in *.cloudapp.azure.com. For example:
https://x.x.x.x OR https://mono-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.x OR https://app-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.y OR https://api-mycomanywin11.eastus2.cloudapp.azure.com
Deploying Datalogz into Production (PROD) enables you to provide your own Certificates for deploying Datalogz to new subdomains on an existing domain. For example:
https://app.datalogz.mycompany.com
https://app.datalogz.mycompany.com
https://api.datalogz.mycompany.com
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads.
Minimum
Memory Optimized
Compute: 2 vCores, 16 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $500 / mo.
Recommended
Memory Optimized
Compute: 4 vCores, 32 GiB Memory, 6400 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $850 / mo.
Total Estimated Cost: $500 - $850 / mo.
Minimum/Recommended
Memory Optimized
Compute: 2 vCores, 8 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $200 / mo.
Go to RDS and click Create Database
Choose Standard Create
, Postgres
and choose your desired size
Settings:
Name your database instance
Provide the credentials to your database. Store these credentials somewhere secure. We will come to these credentials later in the Secrets Manager section
Instance configuration:
Leave all the settings as default
Storage:
Set Allocated Storage to 50 GiB
Connectivity:
Select the VPC and the subnets that we created in the VPC deployment section
Select the security group that we created in the Security Group section
Leave the rest as default
Leave all the other sections as default
Click Create database
App DB
Name: datalogz_bi
Warehouse DB
Name: datalogz_wh
App DB
Name: datalogz_bi
BI Warehouse DB (See Snowflake Option Section)
Name: datalogz_wh
Please continue with the Docker Desktop installation referencing the .
Ubuntu:
Download and install Git ()
Follow these steps to build the Datalogz services on your virtual machine.
To build the Datalogz application, follow these steps:
ssh
into a linux virtual machine to build the following repositories.
datalogz-bi-diagnostics (ELT)
datalogz-bi-gateway (API)
datalogz-bi-frontend (APP)
git pull
the main/master
branches of each repository
Read the README.md
for each repository
Build the services:
Build datalogz-bi-diagnostics
Confirm non-sensitive environment variables listed here have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
DBT_ENV=prod
WAREHOUSE_TYPE=POSTGRES
From the project directory, run source ./init_env.sh
to run through the interactive build script:
Choose env: prod
Choose warehouse: postgres
Choose cloud: azure, aws
Choose IAM method for VM: env, id
Choose vm setup: mono
or split
Choose deployment method: private
The ./init_env.sh
script will build the correct docker compose file based on the options that are chosen.
Build datalogz-bi-gateway
Confirm non-sensitive environment variables listed here have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
PYTHONDONTWRITEBYTECODE=1
CRON_SERVICE_URL=https://airflow_webserver:8080
HTTP_SCHEME=https
HOST_NAME=localhost
Change localhost
to your host DNS or private IP of the VM
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
Split VM:
Run docker compose -f docker-compose.prod.yml up --build -d
Build datalogz-bi-frontend
Add the following environment variables to .env
REACT_APP_IDENTITY_PROVIDER = '
TABLEAU_SA'
VM_MANAGED_IDENTITY = 'TRUE'
Values for REACT_APP_IDENTITY_PROVIDER can be:
MICROSOFT
TABLEAU_SSO_FOR_CLOUD
TABLEAU_SA
WORKOS
If REACT_APP_IDENTITY_PROVIDER
is set, MANAGED_IDENTITY
can also be set to TRUE to embed Service Principal credentials into API build for new Connectors to use by default.
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
OR
Split VM:
Run docker compose -f docker-compose.dev.yml up --build -d
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads. Teams can choose to run their BI Warehouse using Snowflake following the steps:
Minimum
Compute: X-Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $600 / mo.
Recommended
Compute: Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $1000 / mo.
Total Estimated Cost: $600 - $1000 / mo.
Warehouse DB
Name: datalogz_wh
The following instructions make reference to POWERBI, but this can be replaced with either TABLEAU or LOOKER as appropriate.
The following documentation describes the settings and permissions required to setup the Datalogz PowerBI Connector
Datalogz supports connecting to the PowerBI API by Service Principal or Microsoft Admin.
Option 1: Service Principal
The Service Principal is restricted to read-only access to Admin API endpoints for a single PowerBI tenant, and does not require admin consent to be granted by a Microsoft O365 Administrator. Option 2: Microsoft Admin
The Microsoft Admin is also restricted to read-only access to Admin API endpoints for all PowerBI tenants registered in Azure Active Directory (AD), but the Tenant.Read.All permission which requires admin content to be granted by a Microsoft O365 Administrator.
Option 1: Service Principal
Create an Azure AD App Registration
Note down the following properties which will be used later:
Client ID
Tenant ID
Create a new Secret and store this in a secure location to be used later:
Client Secret (value)
Create an Azure Security Group
Add the Azure AD App Registration as a Member to the Security Group
Add the Security Group to the PowerBI Tenant Admin Settings
Enable PowerBI REST API permissions for this security group in the tenant admin settings.
Login to the PowerBI app portal as a PowerBI Service Administrator
Navigate to the settings in the upper right, and click Admin Portal.
Under Tenant settings, scroll down to Admin API settings and enable the following permissions:
Allow service principals to use PowerBI APIs
Allow service principals to use read-only admin APIs
Add the Security Group created above to the security groups list
Scroll down to Admin API settings and enable the following permissions:
API responses with detailed metadata.
API responses with DAX and mashup expressions.
Add the Security Group created above to the security groups list, unless other users rely on this setting to apply to the entire organization, then leave the setting as shown.
Create a second App Registration in Azure for your Datalogz client.
Login to your Azure portal and navigate to Azure App registration
Register an Application
Add the following redirect URIs
Add the following Read-Only Microsoft Graph API Permissions to login with Microsoft.
Login to https://app.datalogz.io using Microsoft Azure Active Directory.
The following API permissions will be required to be approved by this user during this authentication process for a successful login and account creation. No admin consent is required for this step.
After logging in, proceed to create a new PowerBI Connector from the Connectors tab selecting the Service Principal option Complete Steps 1 - 4. In step 3 you will select the specific Workspaces you want to assign to this connector.
Once the Service Principal has been provided access to the read-only Admin APIs by following the steps above, the app is able to use the following endpoints documented here.
After you have completed your connector setup, the connector refresh status can be viewed from the Connectors page. After a few minutes the metadata refresh will complete and the Overview and Recommendations tabs will be populated.
Now create a Role and grant read access to this connector. Navigate to Role Settings from the profile menu in the upper-right of your window. Once the role is created and assigned to the connector, new users can be invited and assigned to the role(s) and connectors they should have access to.
To invite users, navigate to User Settings from the profile menu in the upper-right of your window. You can send email invitations to invite users, designating their user type as Admin or Member as described below:
- Admin: Can create and manage connectors and roles, manage account settings, and view the overview and recommendations. - Member: Can manage personal settings and view overview and recommendations.
Navigate back to the Connector page to check on the status of the connector. Once it successfully completes it's first run, the Overview and Recommendations views will be populated.
Option 2: Microsoft O365 Global Administrator
Enable PowerBI REST API permissions for this security group in the tenant admin settings.
Login to the PowerBI app portal as a Microsoft 365 Global Administrator.
Navigate to the settings in the upper right, and click Admin Portal.
Under Tenant settings, scroll down to Admin API settings and enable the following permissions:
API responses with detailed metadata.
API responses with DAX and mashup expressions.
Create a new App Registration in Azure for your Datalogz client.
Login to your Azure portal and navigate to Azure App registration
Register an Application
Add the following redirect URIs
Add the following Read-Only API Permissions
Once the App Registration has those delegated permissions, the app is able to use any admin or non-admin API that needs those permissions (such as WorkspaceGetInfo). The Tenant.Read.All
permission is required for Activity Events
and Workspace Datasets, Tables, Columns, and Queries.
Create a new Client Secret
Add the following environment variables to your Key Vault. The Client ID and Client Secret will be the same for both PowerBI and Microsoft having created a single app registration above.
These variables are split out in case you want to create app registrations for the Microsoft Graph API and PowerBI API separately.
Login to your on premises deployment of Datalogz as a Microsoft 365 Global Administrator.
The following API permissions will be required to be approved by your administrator during this authentication process.
As the Microsoft 365 Global Administrator, after logging in, proceed to create a new PowerBI Connector from the Connectors tab. Complete Steps 1 - 4. In step 3 you will select the specific Workspaces you want to assign to this connector.
After you have completed your connector setup, the connector refresh status can be viewed from the Connectors page. After a few minutes the metadata refresh will complete and the Overview and Recommendations tabs will be populated.
Now create a Role and grant read access to this connector. Navigate to Role Settings from the profile menu in the upper-right of your window. Once the role is created and assigned to the connector, new users can be invited and assigned to the role(s) and connectors they should have access to.
To invite users, navigate to User Settings from the profile menu in the upper-right of your window. You can send email invitations to invite users, designating their user type as Admin or Member as described below:
- Admin: Can create and manage connectors and roles, manage account settings, and view the overview and recommendations. - Member: Can manage personal settings and view overview and recommendations.
Navigate back to the Connector page to check on the status of the connector. Once it successfully completes it's first run, the Overview and Recommendations views will be populated.
Datalogz supports connecting to the Tableau Cloud or Server API by Service Account and Personal Access Token
To connect Tableau with Datalogz, you'll need to follow these steps:
Create a Tableau access token:
Log in to your Tableau Online account and click on your profile in the top right corner of the page.
Enter a name for your token and click the "Create new token" button. Save the token name and secret for the next step.
Retrieve your host name, site name, and API version:
You can find your API version by visiting the following page:[Tableau REST API Version Docs]
Use the Personal Access Token name and secret created in the previous step to login and create new Tableau connectors.
Login using Tableau and create Tableau connector Login using Tableau
The following documentation describes the resources required to deploy Datalogz on Microsoft Azure.
(1) Azure Virtual Network
(1 or 2) Azure Virtual Machine w/ SSH Key Pair
(1) Azure Database for PostgreSQL Server
App DB
Warehouse DB
Airflow DB
(1) Azure Key Vault
(0 or 1) Azure Storage Account (ADLS Gen2 Data Lake)
For Snowflake customers:
(1) Snowflake Warehouse
Warehouse DB (instead of Warehouse DB in Postgres)
Datalogz supports connecting to the Tableau Cloud or Server API by Service Account and Personal Access Token
In Tableau, creating a personal access token (PAT) requires specific permissions. To create a PAT, you must have the "Manage All Site Settings" or "Manage Site Content" permission. By default, site administrators have both of these permissions, but they can also be assigned to other users by a site administrator.
To check your permissions, follow these steps:
Log in to your Tableau account.
Click on your profile picture in the top right corner and select "Account Settings."
In the left-hand menu, click on "Site Roles and Permissions."
Look for the "Site Role" section and check if you have either the "Site Administrator" or "Content Manager" role assigned to your account.
If you don't have either of these roles assigned, you won't be able to create a PAT. You can either ask a site administrator to assign the necessary permissions to your account or contact your Tableau administrator for assistance.
To connect Tableau with Datalogz using your PAT, you'll need to follow these steps:
Create a Tableau access token:
Log in to your Tableau Online account and click on your profile in the top right corner of the page.
Enter a name for your token and click the "Create new token" button. Save the token name and secret for the next step.
Retrieve your host name, site name, and API version:
Use the Personal Access Token name and secret created in the previous step to login and create new Tableau connectors.
This guide will walk you through setting up the Datalogz PowerBI Connector. Datalogz can connect to PowerBI via two methods: Service Principal or Microsoft Admin.
Service Principal - This method allows read-only access to PowerBI's administrative APIs for a single PowerBI tenant. It doesn't require permission from a Microsoft O365 Administrator.
Microsoft Admin - This method provides read-only access to administrative APIs for all PowerBI tenants registered in Azure Active Directory (AD). It requires the 'Tenant.Read.All' permission, which needs to be granted by a Microsoft O365 Administrator.
Follow these steps to setup the Datalogz PowerBI Connector using the Service Principal method:
Create an Azure AD App Registration
Keep a record of the following properties: Client ID, Tenant ID.
Create a new Secret and securely store the value (Client Secret) for later use.
Create an Azure Security Group
Add Azure AD App Registration to the Security Group
Include the Azure AD App Registration as a member of the Security Group.
Link the Security Group to PowerBI Tenant Admin Settings
Enable PowerBI REST API permissions for the Security Group
Navigate to Admin Portal in the settings (upper right corner).
Allow service principals to use PowerBI APIs.
Allow service principals to use read-only admin APIs.
Add the Security Group created above to the security groups list
Scroll down to Admin API settings and enable the following permissions:
API responses with detailed metadata.
API responses with DAX and mashup expressions.
Add the Security Group created above to the security groups list, unless other users rely on this setting to apply to the entire organization, then leave the setting as shown.
Login to Datalogz
Create a Datalogz account
The BI Admin creates the Datalogz account by signing in. This process requires approval of the following API permissions: email, openid, User.Read.
Create a new PowerBI Connector
After logging in, navigate to the Connectors tab and select 'Service Principal' to create a new PowerBI Connector.
Select the PowerBI Workspaces to be included in the connector.
Check the connector refresh status
After the connector setup is complete, monitor the connector refresh status from the Connectors page. Once the metadata refresh is done, the 'Overview' and 'Recommendations' tabs will populate.
Create a Role and assign access
Create a Role and give it read access to this connector.
New users can be invited and assigned to the connector once the role is created.
Invite Users
To invite users, go to 'User Settings' from the profile menu in the top-right corner. Users can be invited via email, and assigned as 'Admin' or 'Member'.
Admin: Can create and manage connectors and roles, manage account settings, and view the overview and recommendations.
Member: Can manage personal settings and view overview and recommendations.
Follow these steps to setup the Datalogz PowerBI Connector using the Microsoft Admin method:
Enable PowerBI REST API permissions
Log in as a Microsoft 365 Global Administrator
In 'Tenant settings', locate 'Admin API settings' and enable:
Allow API responses with detailed metadata.
Allow API responses with DAX and mashup expressions.
Create a Datalogz account
Create a new Multi-Tenant PowerBI Connector
After logging in, navigate to the Connectors tab and select 'Microsoft Admin' to create a new Multi-Tenant PowerBI Connector.
Select the PowerBI Workspaces to be included in the connector.
Check the connector refresh status
After you have completed your connector setup, the connector refresh status can be viewed from the Connectors page. After a few minutes the metadata refresh will complete and the Overview and Recommendations tabs will be populated.
Create a Role and assign access
Same as in the Service Principal method.
Invite Users
Same as in the Service Principal method.
Choose "My Account Settings" from the dropdown menu.
Scroll down to "Personal Access Tokens".
Your host and site names can be found in the <host_name>
and site_name
parts of the URL as shown below: https:/<host_name>/#/site/<site_name>/
Create Tableau Connector
Choose "My Account Settings" from the dropdown menu.
Scroll down to "Personal Access Tokens".
Your host and site names can be found in the <host_name>
and site_name
parts of the URL as shown below: https:/<host_name>/#/site/<site_name>/
You can find your API version by visiting the following page: The latest version is 3.19 (2023.1) and works for both Tableau Cloud and Server
Login using Tableau and create Tableau connector Login to using Tableau
Log in to the app as a PowerBI Service Administrator.
In , locate 'Admin API settings' and enable:
Login to using Microsoft Azure Active Directory
Once the Service Principal has been provided access to the read-only Admin APIs by following the steps above, the app is able to use the following endpoints documented .
Enable PowerBI REST API permissions for this security group in the .
Navigate to Admin Portal in the settings (upper right corner).
Login to as a Microsoft 365 Global Administrator