Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Datalogz supports connecting to the Tableau Cloud or Server API by Service Account and Personal Access Token
In Tableau, creating a personal access token (PAT) requires specific permissions. To create a PAT, you must have the "Manage All Site Settings" or "Manage Site Content" permission. By default, site administrators have both of these permissions, but they can also be assigned to other users by a site administrator.
To check your permissions, follow these steps:
Log in to your Tableau account.
Click on your profile picture in the top right corner and select "Account Settings."
In the left-hand menu, click on "Site Roles and Permissions."
Look for the "Site Role" section and check if you have either the "Site Administrator" or "Content Manager" role assigned to your account.
If you don't have either of these roles assigned, you won't be able to create a PAT. You can either ask a site administrator to assign the necessary permissions to your account or contact your Tableau administrator for assistance.
To connect Tableau with Datalogz using your PAT, you'll need to follow these steps:
Create a Tableau access token:
Log in to your Tableau Online account and click on your profile in the top right corner of the page.
Enter a name for your token and click the "Create new token" button. Save the token name and secret for the next step.
Retrieve your host name, site name, and API version:
You can find your API version by visiting the following page:[Tableau REST API Version Docs] The latest version is 3.19 (2023.1) and works for both Tableau Cloud and Server
Use the Personal Access Token name and secret created in the previous step to login and create new Tableau connectors.
Choose "My Account Settings" from the dropdown menu.
Scroll down to "Personal Access Tokens".
Your host and site names can be found in the <host_name>
and site_name
parts of the URL as shown below: https:/<host_name>/#/site/<site_name>/
Login using Tableau and create Tableau connector Login to https://app.datalogz.io using Tableau
These deployment guides help teams to deploy the Datalogz BI Ops product.
Proceed to view a high-level overview of the Datalogz system design.
This guide will walk you through setting up the Datalogz PowerBI Connector. Datalogz can connect to PowerBI via two methods: Service Principal or Microsoft Admin.
Service Principal - This method allows read-only access to PowerBI's administrative APIs for a single PowerBI tenant. It doesn't require permission from a Microsoft O365 Administrator.
Microsoft Admin - This method provides read-only access to administrative APIs for all PowerBI tenants registered in Azure Active Directory (AD). It requires the 'Tenant.Read.All' permission, which needs to be granted by a Microsoft O365 Administrator.
Follow these steps to setup the Datalogz PowerBI Connector using the Service Principal method:
Create an Azure AD App Registration
Keep a record of the following properties: Client ID, Tenant ID.
Create a new Secret and securely store the value (Client Secret) for later use.
Create an Azure Security Group
Add Azure AD App Registration to the Security Group
Include the Azure AD App Registration as a member of the Security Group.
Link the Security Group to PowerBI Tenant Admin Settings
Enable PowerBI REST API permissions for the Security Group
Log in to the app portal as a PowerBI Service Administrator.
Navigate to Admin Portal in the settings (upper right corner).
In Tenant settings, locate 'Admin API settings' and enable:
Allow service principals to use PowerBI APIs.
Allow service principals to use read-only admin APIs.
Add the Security Group created above to the security groups list
Scroll down to Admin API settings and enable the following permissions:
API responses with detailed metadata.
API responses with DAX and mashup expressions.
Add the Security Group created above to the security groups list, unless other users rely on this setting to apply to the entire organization, then leave the setting as shown.
Login to Datalogz
Login to https://app.datalogz.io using Microsoft Azure Active Directory
Create a Datalogz account
The BI Admin creates the Datalogz account by signing in. This process requires approval of the following API permissions: email, openid, User.Read.
Create a new PowerBI Connector
After logging in, navigate to the Connectors tab and select 'Service Principal' to create a new PowerBI Connector.
Select the PowerBI Workspaces to be included in the connector.
Once the Service Principal has been provided access to the read-only Admin APIs by following the steps above, the app is able to use the following endpoints documented here.
Check the connector refresh status
After the connector setup is complete, monitor the connector refresh status from the Connectors page. Once the metadata refresh is done, the 'Overview' and 'Recommendations' tabs will populate.
Create a Role and assign access
Create a Role and give it read access to this connector.
New users can be invited and assigned to the connector once the role is created.
Invite Users
To invite users, go to 'User Settings' from the profile menu in the top-right corner. Users can be invited via email, and assigned as 'Admin' or 'Member'.
Admin: Can create and manage connectors and roles, manage account settings, and view the overview and recommendations.
Member: Can manage personal settings and view overview and recommendations.
Follow these steps to setup the Datalogz PowerBI Connector using the Microsoft Admin method:
Enable PowerBI REST API permissions
Enable PowerBI REST API permissions for this security group in the tenant admin settings.
Log in as a Microsoft 365 Global Administrator
Navigate to PowerBI Admin Portal in the settings (upper right corner).
In 'Tenant settings', locate 'Admin API settings' and enable:
Allow API responses with detailed metadata.
Allow API responses with DAX and mashup expressions.
Create a Datalogz account
Login to https://app.datalogz.io as a Microsoft 365 Global Administrator
Create a new Multi-Tenant PowerBI Connector
After logging in, navigate to the Connectors tab and select 'Microsoft Admin' to create a new Multi-Tenant PowerBI Connector.
Select the PowerBI Workspaces to be included in the connector.
Check the connector refresh status
After you have completed your connector setup, the connector refresh status can be viewed from the Connectors page. After a few minutes the metadata refresh will complete and the Overview and Recommendations tabs will be populated.
Create a Role and assign access
Same as in the Service Principal method.
Invite Users
Same as in the Service Principal method.
The following documentation describes the resources required to deploy Datalogz on Microsoft Azure.
(1) Azure Virtual Network
(1 or 2) Azure Virtual Machine w/ SSH Key Pair
(1) Azure Database for PostgreSQL Server
App DB
Warehouse DB
Airflow DB
(1) Azure Key Vault
(0 or 1) Azure Storage Account (ADLS Gen2 Data Lake)
For Snowflake customers:
(1) Snowflake Warehouse
Warehouse DB (instead of Warehouse DB in Postgres)
Datalogz runs on Virtual Machines inside Docker Containers for a simple, cost-effective deployment that can be scaled vertically as demand increases. You may deploy using either Windows or Linux.
Provisioning
Create new virtual machine(s) to host your Datalogz frontend application and backend API in the region of your choice. You may choose either a Windows 10/11 distribution or a Linux distribution.
The security group inbound rules on this machine should allow HTTP/HTTPS traffic from your private network IP – so your users can access the site.
The security group outbound rules on this machine should allow HTTPS traffic from your network IP – so the Gateway API Service can make HTTPS connection to 3rd party services such as Microsoft for running the OAuth2.0.
The security group inbound rules on this machine should allow SSH or RDP traffic from your private team’s IP – so your team can remotely login and deploy the builds.
Ensure you have enabled ManagedIdentity access on the Virtual Machine in the “Identity” panel on the Virtual Machine page in the Azure portal. This will ensure the VMs identity can be used to retrieve keys from the Key Vault requiring no sensitive credentials to be located in an .env file on the VM itself.
Grant access to the key vault to this VM identity following instructions here.
SSH or RDP into the VM to install Docker and clone the repositories.
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $160 / mo.
Total Estimated Cost: $80 - $160 / mo.
Backend VM
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: No changes required
Estimated Cost as of 2/1/2023: $160 / mo.
Frontend VM
Minimum / Recommended
CPU: 2 vCPU
Memory: 4 GB RAM
OS Disk: No change
Estimated Cost as of 2/1/2023: $40 / mo.
Total Estimated Cost: $120 - $200 / mo.
Datalogz is a portable application that can be deployed to either Windows or Linux machines using Docker to virtualize resources.
Only certain Windows 10/11 VM Images on Azure support nested virtualization. Please choose a VM from this list that has the three asterisks denoting ***Hyper-threaded and capable of running nested virtualization.
Datalogz recommends using the following sizes based on whether you choose Option 1 or 2 above:
Size for Single (Monolith) VM: D4d_v4
Sizes for Split VM:
Backend: D2d_v4
Frontend: D2d_v4
SSH
Set the correct permissions on your SSH key pair before connecting to the virtual machine, updating the following variables with your key name, user name, and VM IP address
Download the Remote Desktop Protocol (RDP) connection file for the Windows 10/11 VM from the Azure portal and ensure the machine connecting to the remote resources is allowed on port 3389 (RDP).
Both the Datalogz frontend and backend applications are deployed using Docker.
Installing Docker on Windows 10/11 performs best when using Windows Subsystem for Linux (WSL2). Here are the steps required to install this pre-requisite.
Use Remote Desktop Protocol to connect to the VM
Open a Powershell Terminal as Administrator and run the following command to setup Windows Subsystem for Linux:
wsl --install
Restart the VM. When the VM restarts, Windows Subsystem for Linux may start automatically. You can create a new user named dl_windows_linux_user with your own password to access WSL2 directly, but it's unlikely you will need to.
Now you can proceed to install Docker Desktop by following the official docs in the next section.
Please continue with the Docker Desktop installation referencing the official docs.
After installation as completed, open Docker Desktop and Accept the Terms of Conditions for Docker Engine to start. If the Docker Engine is not started you may need to Disconnect and Reconnect to the VM.
Download and install Git (Link)
Set up credential store by running the command in a Command Prompt:
git config --global credential.helper 'store'
The next time you run git pull
on a remote origin and sign-in, your credentials will be cached for future reuse.
Deploying a Datalogz Proof-of-Concept (POC) will use self-signed keys generated during the build process to enable encrypted communications over HTTPS, and you will access your VM either using the Public IP Address of the VM or an Azure-provided DNS ending in *.cloudapp.azure.com. For example:
https://x.x.x.x OR https://mono-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.x OR https://app-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.y OR https://api-mycomanywin11.eastus2.cloudapp.azure.com
Deploying Datalogz into Production (PROD) enables you to provide your own Certificates for deploying Datalogz to new subdomains on an existing domain. For example:
https://app.datalogz.mycompany.com
https://app.datalogz.mycompany.com
https://api.datalogz.mycompany.com
Datalogz allows Azure customers to easily integrate email notifications into Datalogz using the Microsoft Graph REST API and an Azure App Registration.
Documentation: Setting Up Email Notification with Datalogz Using Microsoft Graph REST API
Datalogz now allows users to set up email notifications for their team environments via the Microsoft Graph REST API (specifically, the 'user: sendMail' method). This feature enables any user in an organization to connect their Microsoft account through OAuth. Their email will then be utilized to send out email notifications to team members.
Please note that this feature is available only in private deployments.
Follow these steps to connect your Microsoft account:
Navigate to the 'Settings' page and select 'Email Settings'.
Click the 'Connect' button and execute the OAuth flow.
You will be redirected to the Microsoft consent screen. At this point, select a Microsoft account that has access to a mailbox (in other words, your Microsoft account should have Outlook access).
After granting consent, you will be redirected back to the Datalogz app. If the process is successful, a success message will appear in the popup.
If the process fails, the most common issue is that the selected Microsoft account does not have access to a mailbox. To resolve this, you can either ask your Microsoft account admin to provide a license to give your user access to a mailbox or you can choose a different account with mailbox access.
Furthermore, the exact reason for process failure will be detailed in the popup, as shown in the screenshot. This information should help you understand and address the issue.
Before initiating the connection process, please ensure that the redirect URL in the Microsoft App has been properly added: https://<host>/api/v0/oauth/azure_mail/redirect
. Failure to do so may result in unsuccessful integration between the Datalogz app and your Microsoft account.
Remember to replace <host>
with your specific hostname.
This integration will provide you with a seamless communication experience, ensuring that all team members stay informed and aligned. If you encounter any issues during setup, please refer to our troubleshooting section or contact our support team for further assistance.
Datalogz uses Azure Storage Account to stage external files for ingestion into the target warehouse. This is default for Snowflake warehouses and optional for Postgres warehouses.
Login to Azure Portal and create a new Gen2 Storage Account (hierarchical namespace).
Add the storage account to the appropriate virtual network from Storage Account > Networking panel.
Create a new container named datalogzbidiagnostics
Set the access level to Private
Ensure the following environment variables have been added to your Azure Key Vault. These are required for your VM or environment credentials to be authenticated to read/write to the storage container:
Navigate to Key Vault > Secrets and click Generate/Import
Create a new secret key and value for each of the following:
AZURE-BLOB-CONTAINER-NAME
AZURE-BLOB-STORAGE-ACCOUNT-NAME
AZURE-BLOB-CONNECTION-STRING
The values for these are available in your Storage Account:
The value for AZURE-BLOB-CONTAINER-NAME
should be set to datalogzbidiagnostics
The value for AZURE-BLOB-STORAGE-ACCOUNT-NAME
should be set to the name of your storage account.
The value for AZURE-BLOB-CONNECTION-STRING
can be found in:
Storage Account > Access Keys > Connection String
Follow these steps to build the Datalogz services on your virtual machine.
To build the Datalogz application, follow these steps:
ssh
into a linux virtual machine to build the following repositories.
datalogz-bi-diagnostics (ELT)
datalogz-bi-gateway (API)
datalogz-bi-frontend (APP)
git pull
the main/master
branches of each repository
Read the README.md
for each repository
Build the services:
Build datalogz-bi-diagnostics
Confirm non-sensitive environment variables listed here have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
DBT_ENV=prod
WAREHOUSE_TYPE=[SNOWFLAKE | POSTGRES]
If you are running Azure Managed Identity VMs, add the following:
AZURE_KEY_VAULT_URL=
AZURE_KEY_VAULT_NAME=
AZURE_VM_NAME=
AZURE_RESOURCE_GROUP_NAME=
From the project directory, run source ./init_env.sh
to run through the interactive build script:
Choose env: prod
Choose warehouse: postgres, snowflake
Choose cloud: azure, aws
Choose IAM method for VM: env, identity
Choose vm setup: mono
or split
The ./init_env.sh
script will build the correct docker compose file based on the options that are chosen.
Build datalogz-bi-gateway
Confirm non-sensitive environment variables listed here have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
PYTHONDONTWRITEBYTECODE=1
CRON_SERVICE_URL=https://airflow_webserver:8080
HTTP_SCHEME=https
HOST_NAME=localhost
Change localhost
to your host DNS or private IP of the VM
If you are running Azure Managed Identity VMs, add the following:
AZURE_RESOURCE_GROUP_NAME=
AZURE_VM_NAME=
AZURE_KEY_VAULT_NAME=
AZURE_KEY_VAULT_URL=
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
Split VM:
Run docker compose -f docker-compose.prod.yml up --build -d
Build datalogz-bi-frontend
Add the following environment variables to .env
REACT_APP_IDENTITY_PROVIDER = 'MICROSOFT'
VM_MANAGED_IDENTITY = 'TRUE'
Values for REACT_APP_IDENTITY_PROVIDER can be:
MICROSOFT
TABLEAU_SSO_FOR_CLOUD
TABLEAU_SA
WORKOS
If REACT_APP_IDENTITY_PROVIDER
is set, MANAGED_IDENTITY
can also be set to TRUE to embed Service Principal credentials into API build for new Connectors to use by default.
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
OR
Split VM:
Run docker compose -f docker-compose.dev.yml up --build -d
Create a Security Group in AWS with appropriate inbound and outbound rules to allow specific types of traffic between the VPC and the internet, such as SSH, PostgreSQL, and HTTPS. Ensure that access
Ensure that access is restricted to specific IP addresses or network ranges, and implement other security measures to protect sensitive data and resources.
Go to the VPC dashboard and select the Security Group option.
Click on the Create Security group button.
Provide a name and description for the security group. For example, name it "datalogz-security-group" and provide a description such as "Allow SSH, PostgreSQL, and HTTPS for Datalogz".
Select the VPC that was created previously in the VPC Deployment section.
In the Inbound Rules section:
Add a rule to allow incoming traffic on port 5432 for PostgreSQL, with the source set to 10.0.0.0/24 (the IP address range of the VPC).
Add a rule to allow incoming traffic on port 22 for SSH, with the source set to the IP address or IP address range of the developers who will be accessing the EC2 instance.
Add a rule to allow incoming traffic on port 443 for HTTPS. Set the source to the IP address(es) of your VPN for a private deployment.
In the Outbound Rules section:
Add a rule to allow outgoing traffic on port 443 for HTTPS, with the destination set to anywhere.
Add a rule to allow outgoing traffic on port 80 for HTTP, with the destination set to anywhere.
Click on the Create Security group button to create the security group.
These steps will create a security group that allows incoming traffic on ports 22, 5432, and 443, and outgoing traffic on ports 80 and 443. The security group will also restrict incoming traffic to specific sources, such as the IP address range of the VPC and the IP address range of the developers who will be accessing the EC2 instance via SSH.
The following documentation describes the resources required to deploy Datalogz on AWS.
(1) AWS Virtual Private Cloud
(1) AWS Internet Gateway
(2) Subnets
(1 or 2) AWS EC2 Instance w/ SSH Key Pair
(1) AWS RDS for PostgreSQL Server
App DB
Warehouse DB
Airflow DB
(1) AWS Secrets Manager
(0 or 1) AWS S3 Bucket w/ IAM Role
For Snowflake customers:
(1) Snowflake Warehouse
Warehouse DB (instead of Warehouse DB in Postgres)
Create a new Virtual Private Cloud (VPC) to provide an isolated network environment for your Datalogz deployment in the cloud.
The VPC should be created in the region of your choice and should be configured with an IPv4 CIDR block that can support the expected number of subnets and IP addresses for your deployment.
Log in to the cloud provider's console and navigate to the VPC management section.
Click "Create VPC."
Name your VPC "Datalogz VPC" or choose another appropriate name.
Enter the IPv4 CIDR block, such as 10.0.0.0/16, that provides enough IP addresses for your planned deployment.
Leave all the other options as default and click "Create VPC."
Next, you will need to create subnets in this VPC to enable different tiers of your application to communicate with each other securely.
Datalogz uses AWS S3 to stage external files for ingestion into the target warehouse. This is default for Snowflake warehouses and optional for Postgres warehouses.
Sign in to your AWS console
Navigate to the S3 service and select "Create Bucket"
Name your bucket "datalogzbidiagnostics"
Disable ACLs for object ownership configuration
Choose "Block all public access" for public access configuration
Disable versioning for the bucket
For the default encryption configuration, choose Amazon S3-managed keys and select the "Enabled" option for the bucket key
Click on "Create Bucket" to create your S3 bucket.
Create an Internet Gateway and Route Table in AWS with appropriate routing rules to allow specific types of traffic between the VPC and the internet, such as HTTP, HTTPS, and DNS.
Ensure that access is restricted to specific IP addresses or network ranges and that appropriate security measures are implemented to protect the VPC and its resources.
Go to the VPC dashboard and select the Internet Gateway
option.
Click on the Create Internet Gateway
button.
Provide a name for the Internet Gateway such as datalogz-internet-gateway.
Click on the Create Internet Gateway
button to create the gateway.
Note down the ID of the Internet Gateway created, as it will be needed in step 13.
Go to the Route Table option and click on the Create Route Table
button.
Provide a name for the Route Table, such as datalogz-route-table.
Select the VPC that was created earlier in the VPC Deployment section.
Click on the Create Route Table
button to create the route table.
Go to the Subnet Association section of the route table and click on the Edit Subnet Association
button.
Select the two subnets that were created earlier in the Subnet Deployment section.
Click on the Save
association button to save the subnet associations.
Finally, go to the Routes section of the route table and click on the Edit
routes button.
Click on the Add
routes button, and enter the destination as 0.0.0.0/0 and the target as the ID of the Internet Gateway that was created earlier in step 5.
Click on the Save
changes button to save the new route.
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads.
Minimum
Memory Optimized
Compute: 2 vCores, 16 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $500 / mo.
Recommended
Memory Optimized
Compute: 4 vCores, 32 GiB Memory, 6400 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $850 / mo.
Total Estimated Cost: $500 - $850 / mo.
Minimum/Recommended
Memory Optimized
Compute: 2 vCores, 8 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $200 / mo.
Go to RDS and click Create Database
Choose Standard Create
, Postgres
and choose your desired size
Settings:
Name your database instance
Provide the credentials to your database. Store these credentials somewhere secure. We will come to these credentials later in the Secrets Manager section
Instance configuration:
Leave all the settings as default
Storage:
Set Allocated Storage to 50 GiB
Connectivity:
Select the VPC and the subnets that we created in the VPC deployment section
Select the security group that we created in the Security Group section
Leave the rest as default
Leave all the other sections as default
Click Create database
App DB
Name: datalogz_bi
Warehouse DB
Name: datalogz_wh
App DB
Name: datalogz_bi
BI Warehouse DB (See Snowflake Option Section)
Name: datalogz_wh
When you create an RDS Postgres instance and add it to a VPC, the database requires that the VPC contains subnets in at least 2 different availability zones.
Go to subnets and create a new subnet.
Enter the VPC ID of the Datalogz VPC created in the VPC Deployment section.
Enter 10.0.0.0/25 as the IPv4 CIDR.
Choose an Availability Zone for the subnet.
Click Create Subnet.
Create at least two subnets in different Availability Zones. The RDS Postgres database requires that the VPC contains at least two subnets in different availability zones. Repeat the above steps to create another subnet with the following changes:
In step 3, enter the IPv4 CIDR as 10.0.0.128/25.
In step 4, choose a different Availability Zone than the one selected for the first subnet.
Datalogz uses AWS Secrets Manager to store sensitive secrets required to run the application.
Access the Secrets Manager service and select "Store".
Choose "Other Type of secret" as the secret type.
Input your secret's key and value. Use "Add Row" to add multiple secrets from the following list.
For Encryption key, select "aws/secretsmanager".
Select "Next".
Name your secret "datalogz_secrets". Add a description such as "secret values used by Datalogz to securely access credentials".
Select "Next".
This step is optional. Datalogz recommends creating a lambda function to enable secrets rotation. Once configured, select "Next".
Review your secrets. If everything looks good, select "Next". Otherwise, select "Previous" to make updates where necessary.
To enable access to the S3 bucket and the Secrets Manager that we created, we will create a custom policy and a role.
Create Policy
Log in to the AWS IAM Management Console.
Click on "Policies".
Create a policy that specifically allows read and write permissions only on the S3 bucket that we created in the S3 deployment guide, and only allows read access to the secrets we created in the Secrets Manager deployment guide.
Click on the JSON tab and paste the following JSON string, replacing "datalogz-s3" with the name of the S3 bucket you created in the S3 deployment guide.
Click "Next: Tags" and optionally add tags for the policy.
Click "Next: Review".
Enter a name for the policy, such as "datalogz-policy-to-access-s3-and-secrets-from-ec2".
Click "Create Policy".
Create Role
Go to "Roles" and click "Create Role".
Choose "AWS service" as the trusted entity type.
Choose "EC2" under "Common use cases".
Click "Next".
In the "Permissions" section, search for the policy we just created and select it.
Click "Next".
Enter a name for the role, such as "datalogz-role-access-resources-from-ec2".
Click "Create Role".
Follow these steps to build the Datalogz services on your virtual machine.
To build the Datalogz application, follow these steps:
ssh
into a linux virtual machine to build the following repositories.
datalogz-bi-diagnostics (ELT)
datalogz-bi-gateway (API)
datalogz-bi-frontend (APP)
git pull
the main/master
branches of each repository
Read the README.md
for each repository
Build the services:
Build datalogz-bi-diagnostics
Confirm non-sensitive environment variables listed here have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
DBT_ENV=prod
WAREHOUSE_TYPE=POSTGRES
From the project directory, run source ./init_env.sh
to run through the interactive build script:
Choose env: prod
Choose warehouse: postgres
Choose cloud: azure, aws
Choose IAM method for VM: env, id
Choose vm setup: mono
or split
Choose deployment method: private
The ./init_env.sh
script will build the correct docker compose file based on the options that are chosen.
Build datalogz-bi-gateway
Confirm non-sensitive environment variables listed here have been added to .prod.env
and that sensitive enviornment variables have been added to the Key Vault
Add the following environment variables to .prod.env
ENV=PROD
PYTHONDONTWRITEBYTECODE=1
CRON_SERVICE_URL=https://airflow_webserver:8080
HTTP_SCHEME=https
HOST_NAME=localhost
Change localhost
to your host DNS or private IP of the VM
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
Split VM:
Run docker compose -f docker-compose.prod.yml up --build -d
Build datalogz-bi-frontend
Add the following environment variables to .env
REACT_APP_IDENTITY_PROVIDER = '
TABLEAU_SA'
VM_MANAGED_IDENTITY = 'TRUE'
Values for REACT_APP_IDENTITY_PROVIDER can be:
MICROSOFT
TABLEAU_SSO_FOR_CLOUD
TABLEAU_SA
WORKOS
If REACT_APP_IDENTITY_PROVIDER
is set, MANAGED_IDENTITY
can also be set to TRUE to embed Service Principal credentials into API build for new Connectors to use by default.
Build the services from the project directory:
Monolith VM:
Run docker compose -f docker-compose.mono.prod.yml up --build -d
OR
Split VM:
Run docker compose -f docker-compose.dev.yml up --build -d
Datalogz runs on EC2 Instances inside Docker Containers for a simple, cost-effective deployment that can be scaled vertically as demand increases.
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $160 / mo.
Total Estimated Cost: $80 - $160 / mo.
Backend VM
Minimum
CPU: 2 vCPU
Memory: 8 GB RAM
OS Disk: 50 GB
Estimated Cost as of 2/1/2023: $80 / mo.
Recommended
CPU: 4 vCPU
Memory: 16 GB RAM
OS Disk: No changes required
Estimated Cost as of 2/1/2023: $160 / mo.
Frontend VM
Minimum / Recommended
CPU: 2 vCPU
Memory: 4 GB RAM
OS Disk: No change
Estimated Cost as of 2/1/2023: $40 / mo.
Total Estimated Cost: $120 - $200 / mo.
Go to the EC2 section
Click on Launch Instance
Name your EC2
Select Ubuntu as Amazon Machine Image
Select instance type t2.xlarge
Generate a key pair for SSH into the EC2
Expand Networking settings
Select the VPC that we created in the VPC deployment Guide
Select any of the 2 subnets that we created in the Subnet Deployment Guide
Click Auto Assign public IP and click Enable
Under the Firewall (security groups), select "Select Existing Security group"
From the drop-down, select the security group that we created in the Security Group section
Next, configure the storage to at least 50 GiB
Click Advanced details
Select the IAM role that we created in the IAM Roles section
Leave the rest as default
Click Launch Instance
The security group inbound rules on this machine should allow HTTP/HTTPS traffic from your private network IP so your users can access the site. The security group outbound rules on this machine should allow HTTPS traffic from your network IP so the Gateway API Service can make HTTPS connections to 3rd party services, such as Microsoft for running the OAuth2.0.
The security group inbound rules on this machine should allow SSH traffic from your private team's IP so your team can remotely log in and deploy the builds.
Set the correct permissions on your SSH key pair before connecting to the virtual machine, updating the following variables with your key name, user name, and VM IP address.
SSH into the VM to install Docker and clone the repositories.
Example:
ssh -i "ec2_bi-monolith_testing_us-east-1_001.pem" ubuntu@ec2-xx-xxx-xxx-xx.compute-1.amazonaws.com
Run an update
Ubuntu: sudo apt update
Both the Datalogz frontend and backend applications are deployed using Docker.
Please continue with the Docker Desktop installation referencing the official docs.
Ubuntu: https://docs.docker.com/engine/install/ubuntu/#set-up-the-repository
Download and install Git (Link)
Ubuntu: apt-get install git
Set up credential store by running the command in a Command Prompt:
git config --global credential.helper 'store'
The next time you run git pull
on a remote origin and sign-in, your credentials will be cached for future reuse.
Deploying a Datalogz Proof-of-Concept (POC) will use self-signed keys generated during the build process to enable encrypted communications over HTTPS, and you will access your VM either using the Public IP Address of the VM or an Azure-provided DNS ending in *.cloudapp.azure.com. For example:
https://x.x.x.x OR https://mono-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.x OR https://app-mycomanywin11.eastus2.cloudapp.azure.com
https://x.x.x.y OR https://api-mycomanywin11.eastus2.cloudapp.azure.com
Deploying Datalogz into Production (PROD) enables you to provide your own Certificates for deploying Datalogz to new subdomains on an existing domain. For example:
https://app.datalogz.mycompany.com
https://app.datalogz.mycompany.com
https://api.datalogz.mycompany.com
The following documentation describes the settings and permissions required to setup the Datalogz PowerBI Connector
Datalogz supports connecting to the PowerBI API by Service Principal or Microsoft Admin.
Option 1: Service Principal
The Service Principal is restricted to read-only access to Admin API endpoints for a single PowerBI tenant, and does not require admin consent to be granted by a Microsoft O365 Administrator. Option 2: Microsoft Admin
The Microsoft Admin is also restricted to read-only access to Admin API endpoints for all PowerBI tenants registered in Azure Active Directory (AD), but the Tenant.Read.All permission which requires admin content to be granted by a Microsoft O365 Administrator.
Option 1: Service Principal
Create an Azure AD App Registration
Note down the following properties which will be used later:
Client ID
Tenant ID
Create a new Secret and store this in a secure location to be used later:
Client Secret (value)
Create an Azure Security Group
Add the Azure AD App Registration as a Member to the Security Group
Add the Security Group to the PowerBI Tenant Admin Settings
Navigate to the settings in the upper right, and click Admin Portal.
Under Tenant settings, scroll down to Admin API settings and enable the following permissions:
Allow service principals to use PowerBI APIs
Allow service principals to use read-only admin APIs
Add the Security Group created above to the security groups list
Scroll down to Admin API settings and enable the following permissions:
API responses with detailed metadata.
API responses with DAX and mashup expressions.
Add the Security Group created above to the security groups list, unless other users rely on this setting to apply to the entire organization, then leave the setting as shown.
Create a second App Registration in Azure for your Datalogz client.
Register an Application
Add the following redirect URIs
Add the following Read-Only Microsoft Graph API Permissions to login with Microsoft.
The following API permissions will be required to be approved by this user during this authentication process for a successful login and account creation. No admin consent is required for this step.
After logging in, proceed to create a new PowerBI Connector from the Connectors tab selecting the Service Principal option Complete Steps 1 - 4. In step 3 you will select the specific Workspaces you want to assign to this connector.
After you have completed your connector setup, the connector refresh status can be viewed from the Connectors page. After a few minutes the metadata refresh will complete and the Overview and Recommendations tabs will be populated.
Now create a Role and grant read access to this connector. Navigate to Role Settings from the profile menu in the upper-right of your window. Once the role is created and assigned to the connector, new users can be invited and assigned to the role(s) and connectors they should have access to.
To invite users, navigate to User Settings from the profile menu in the upper-right of your window. You can send email invitations to invite users, designating their user type as Admin or Member as described below:
- Admin: Can create and manage connectors and roles, manage account settings, and view the overview and recommendations. - Member: Can manage personal settings and view overview and recommendations.
Navigate back to the Connector page to check on the status of the connector. Once it successfully completes it's first run, the Overview and Recommendations views will be populated.
Option 2: Microsoft O365 Global Administrator
Navigate to the settings in the upper right, and click Admin Portal.
Under Tenant settings, scroll down to Admin API settings and enable the following permissions:
API responses with detailed metadata.
API responses with DAX and mashup expressions.
Create a new App Registration in Azure for your Datalogz client.
Register an Application
Add the following redirect URIs
Add the following Read-Only API Permissions
Once the App Registration has those delegated permissions, the app is able to use any admin or non-admin API that needs those permissions (such as WorkspaceGetInfo). The Tenant.Read.All
permission is required for Activity Events
and Workspace Datasets, Tables, Columns, and Queries.
Create a new Client Secret
Add the following environment variables to your Key Vault. The Client ID and Client Secret will be the same for both PowerBI and Microsoft having created a single app registration above.
These variables are split out in case you want to create app registrations for the Microsoft Graph API and PowerBI API separately.
Login to your on premises deployment of Datalogz as a Microsoft 365 Global Administrator.
The following API permissions will be required to be approved by your administrator during this authentication process.
As the Microsoft 365 Global Administrator, after logging in, proceed to create a new PowerBI Connector from the Connectors tab. Complete Steps 1 - 4. In step 3 you will select the specific Workspaces you want to assign to this connector.
After you have completed your connector setup, the connector refresh status can be viewed from the Connectors page. After a few minutes the metadata refresh will complete and the Overview and Recommendations tabs will be populated.
Now create a Role and grant read access to this connector. Navigate to Role Settings from the profile menu in the upper-right of your window. Once the role is created and assigned to the connector, new users can be invited and assigned to the role(s) and connectors they should have access to.
To invite users, navigate to User Settings from the profile menu in the upper-right of your window. You can send email invitations to invite users, designating their user type as Admin or Member as described below:
- Admin: Can create and manage connectors and roles, manage account settings, and view the overview and recommendations. - Member: Can manage personal settings and view overview and recommendations.
Navigate back to the Connector page to check on the status of the connector. Once it successfully completes it's first run, the Overview and Recommendations views will be populated.
Enable PowerBI REST API permissions for this security group in the .
Login to the app portal as a PowerBI Service Administrator
Login to your Azure portal and navigate to
Login to using Microsoft Azure Active Directory.
Once the Service Principal has been provided access to the read-only Admin APIs by following the steps above, the app is able to use the following endpoints documented .
Enable PowerBI REST API permissions for this security group in the .
Login to the app portal as a Microsoft 365 Global Administrator.
Login to your Azure portal and navigate to
View the release notes for Datalogz by version.
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads. Teams can choose to run their BI Warehouse using Snowflake following the steps:
Minimum
Compute: X-Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $600 / mo.
Recommended
Compute: Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $1000 / mo.
Total Estimated Cost: $600 - $1000 / mo.
Warehouse DB
Name: datalogz_wh
The following instructions make reference to POWERBI, but this can be replaced with either TABLEAU or LOOKER as appropriate.
Enhanced treemap with new features, lineage graph integration and UI updates.
New Features and Enhancements:
Treemap Visualization Updates:
Improved treemap with the display of types, children count on tiles, and modification to the copied node's width.
Added base dataset name and similar dataset details at the top of the treemap.
Introduced legend colors at the top of the treemap for better data interpretation.
Updated treemap to version 0.2.5 for enhanced features and functionality.
Lineage Graph Integration:
Implemented setup for showing the lineage graph on the side panel, enhancing data navigation and visualization.
Updated functionality to show only the highlighted path and nodes when clicking on a particular node in the lineage graph.
UI/UX Improvements:
Inventory and Capacity Table Enhancements:
Made minor updates to capacity table columns on the impact page and inventory card tooltip, improving data display and user interaction.
Updated the tooltip to be visible only on mouse hover over the icon.
Color Scheme and Accessibility:
Revised treemap color scheme to be colorblind safe, ensuring accessibility for all users.
New Features and Enhancements:
Treemap Schema Update: Added "count_children" integer value to treemap schemas, enhancing the frontend's capability to display hierarchical data more effectively.
Snowflake Proxy Enhancements:
Introduced new methods in the Snowflake proxy specifically for duplicate datasets treemap, improving data handling and visualization capabilities.
Fixed handling of null children in the Snowflake BI proxy, ensuring better data integrity and error handling.
UI/UX Improvements:
Capacity Table Update on Impact Page: Made a minor update to the capacity table columns on the impact page, likely for better data representation and user understanding.
Data Model and API Updates:
RecommendationPreview Model Enhancements:
Added workspace_id
and workspace_name
to the RecommendationPreview Pydantic model, enriching the data model with more contextually relevant information.
Implemented workspace_id
and workspace_name
in the /preview
endpoint, aligning the API's output with the updated data model.
New Features and Enhancements:
Treemap Visualization Updates:
Added "removed" objects to the treemap, enhancing data representation.
Updated the Impact Page to show all months of capacity impact in a single table.
Simplified calculation for the count of duplicate datasets for PowerBI.
Anonymization and Data Privacy:
Implemented anonymization of metric descriptions and other data for demo connectors.
Added more anonymizer functions for creating fake data, supporting nested JSON structures.
UI/UX Improvements:
Tableau API Hardening: Strengthened the Tableau API extraction process for better data integrity and reliability.
Minor Updates to Metric Descriptions: Adjusted metric descriptions for PowerBI datasource and Tableau dataset to remove PII and ensure more controlled anonymization.
Operational and Infrastructure Changes:
Dockerfile and Deployment Updates:
Multiple updates to Dockerfile for Airflow images to ensure the vault folder is writable by the Airflow user.
Adjusted demo data to meet gateway Pydantic model requirements for issue preview data.
Development Integrations and Merge Resolutions: Multiple merges from the development branch, integrating features and resolving conflicts.
SQL and Model Refinements:
Various SQL adjustments (referred to as 'fluffing') and model updates, including refresh history model remake.
Fixed a bug related to unreasonably high refresh counts.
Enhanced treemap with expand feature, improved tooltips, UI updates, and conditional chart display logic.
New Features and Enhancements:
Treemap Visualization Enhancements:
Introduced an expand feature for the treemap, allowing users to explore data more interactively.
Upgraded treemap package to improve tooltip information, including additional fields for enhanced data insights.
Implemented a global switch to show/hide children nodes, providing a cleaner and more focused user experience.
Integrated API to the duplicate dataset treemap for real-time data visualization.
UI/UX Improvements:
Inverted getValueColor
for more intuitive coloring on the treemap.
Added asset name to the side panel body header for better asset identification.
Updated treemap style for a more modern and user-friendly interface.
Minor update to columns layout on the modelTable inventory page.
Various style enhancements, including updated breadcrumb style and border color changes.
Conditional Display Logic:
Implemented conditional logic to display different charts based on issue types, i.e., hiding the metrics chart for duplicate dataset issues and vice versa.
Operational and Infrastructure Changes:
Merge Resolutions and Development Integrations: Multiple merges from the development branch, resolving merge conflicts and integrating various features into the main branch.
Readme Update: Updated the readme file, likely with new instructions or documentation relevant to the recent changes.
New Features and Enhancements:
Treemap Data APIs: Added new methods get_list_duplicate_datasets
and get_duplicate_dataset_details
in the postgres_bi_proxy for enhanced data handling and visualization of duplicate datasets.
Duplicate Dataset API Enhancements: Included demo data in duplicate dataset APIs, along with the addition of "type", "children", and "sim_score" to the duplicate dataset details query for a more comprehensive understanding of dataset similarities.
New Features and Enhancements:
PowerBI and Tableau Enhancements: Updated duplicate datasets treemap data to show 'added' and 'copied' tables, and included 'removed' columns for PowerBI similarity treemap.
Performance Improvements: General performance enhancement in models related to treemap data.
Docker Compose Updates: Updated various docker-compose files (docker-compose.dev.yml
, docker-compose.override.yml
, docker-compose.sf.prod.yml
) for better environment setup and management.
Bug Fixes and Refinements:
Data Model Adjustments: Fixed issues related to column width on the issueTracking table and counts of tables and columns copied.
Autodoc Documentation: Added autodoc documentation for Tableau and PowerBI base schemas, improving the documentation quality and accessibility.
Issue Descriptions Update: Ensured that some description is always retrieved for issues, enhancing clarity and understanding of issue tracking.
Operational and Infrastructure Changes:
Dockerfile Update for Airflow: Updated the Dockerfile for Airflow images to ensure the vault folder is writable by the Airflow user.
Exception Handling in ML Run: Added exception handling to ml_run.py
for more robust error management.
DBT Dependencies Management: Added dbt deps
back to the default command list in dbt_app app.py
.
Enhanced UI/UX in inventory and impact pages, lineage graph improvements, and new impact chart features.
New Features and Enhancements:
Inventory Table Update: Minor update to the inventory table columns for improved data representation and user experience.
Grouped Bar Chart on Impact Page: Implemented a new grouped bar chart feature on the impact page, replacing the previous stacked bar chart for a clearer visualization of data.
Impact Page Developments: Integration of the impact API and inclusion of impact chart data from Redux for enhanced data handling and visualization.
Lineage Graph Enhancements: Significant improvements to the lineage graph, including:
Highlighting the full path on node click for better focus and understanding.
Custom styling for children and root nodes.
Addition of child nodes inside the parent node using ReactFlow subflow and dagre layout.
Workspace filter addition and default selection of the first workspace on the lineage graph page.
Customization of lineage node handles to match dagre layout.
UI/UX Improvements:
Column Width Adjustment: Fixed the column width on the issueTracking table for better readability.
Clock Icon Addition: Added a clock icon to the impact card for a more intuitive representation of time-related metrics.
Potential Dev Hours Savings: Included potential development hours savings on the impact tab, providing a clearer indication of efficiency gains.
Licensing Table Modifications: Made specific changes to the licensing table, including separate columns lists for PowerBI and Tableau.
Operational and Infrastructure Changes:
Enhanced Hover Functionality: Updated the hover-over mechanism on metric cards to reduce accidental triggers.
Workspace Filter and Input Field Enhancements: Improved workspace filter functionality and focus management on the lineage graph window.
New Features and Enhancements:
Inventory Page Update: Minor update to inventory page columns for better data representation and user experience.
Recommendation Configuration Enhancement: Added asset_id
, asset_name
, and asset_type
in Recommendation Config to provide more detailed recommendations.
Snowflake Proxy Improvement: Updated the Snowflake proxy to deserialize JSON properly from the VARIANT column, enhancing data handling and accuracy.
Usage Impact Chart Development: Introduced a method for get_usage_impact_chart
in the snowflake_bi_proxy, along with a refactor of the query to match the latest design. This feature aims to provide better insights into usage impact through comprehensive charting.
Bug Fixes and Refinements:
Bot User Creation Bug Fix: Resolved an issue causing new bot users not to be created on fresh builds due to a missing account_id
foreign key in the UserOrmModel
.
SQL Query Refinement for Impact Charts: Updated and cleansed the SQL query used to retrieve data for impact charts, ensuring more accurate and relevant information.
Operational and Infrastructure Changes:
Data Model Naming Correction: Fixed the naming of the database model used to gather data for usage impact charts.
API Addition for Impact Charts: Added a new API to return impact charts data, enhancing the ability to visualize usage impact effectively.
Column Additions for Inventory Page: Added dev_usage_ratio
and link columns to the inventory page, providing more detailed and relevant information for inventory analysis.
Engineering vs. Usage Hours Analysis: Introduced analysis for Engineering versus Usage Hours, enabling deeper insights into engineering efforts versus actual usage.
New Features and Enhancements:
Enhanced Dataset Embedding: Updated ml_app
to embed sentences of datasets instead of JSON itself, improving data matching accuracy.
Duplicate Dataset Detection: Implemented a new approach for duplicate dataset detection using embeddings, which improves performance and accuracy. This includes changes in processing datasets for both Tableau and PowerBI.
Airflow Configuration Optimization: Aligned with Airflow production recommendations, now using LocalExecutor on a single VM configuration.
Improved Similarity Metrics: Updated precision on similarity scores to 3 significant figures and relaxed duplication rules for a more accurate detection of duplicate tables.
Enhanced Dataset Tree Maps: Added new types, similarity scores, and children to the top level of dim_dataset_treemap
, enabling better visualization and understanding of dataset structures.
Performance Enhancements: Numerous optimizations and refactors to improve the efficiency and speed of various processes, especially in similarity detection and embedding calculations.
Tableau and PowerBI Integration Improvements: Numerous updates and fixes to enhance integration with Tableau and PowerBI, including better handling of dataset similarities and license summaries.
Embedding Model Enhancements: Transitioned to a smaller FastText model for reduced memory footprint and updated Docker compose files for better service management.
CI/CD Pipeline Updates: Added ml_app
to the CI/CD pipeline and updated Docker image names for consistency.
Bug Fixes and Refinements:
Syntax and Logic Fixes: Addressed various issues including fixing Snowflake syntax, resolving ambiguous alias names, and correcting incremental logic errors.
UI and UX Improvements: Adjustments to sorting and filtering logic in various models for a more intuitive user experience.
Data Handling Corrections: Fixed issues related to data embedding, including removing newline characters in sentences for embedding generation and adjusting data types and structures for better processing.
Account: Use your work or school account to sign in to Power BI. Administrators manage work or school accounts in Azure Active Directory. Your level of access is determined by the Power BI license associated with that account and the capacity type where content is stored. See license and Premium.
Admin portal: The location where Power BI admins manage users, features, and settings for Power BI in their organization.
Aggregates: When the values of multiple rows are grouped together as input on criteria to form a single value of more significant meaning or measurement. Only implicit measures can be aggregated.
Alert (alerts): A feature that notifies users of changes in the data based on limits they set. Alerts can be set on tiles pinned from report visuals. Users receive alerts on the service and on their mobile app.
Annotate: To write lines, text, or stamps on a snapshot copy of a tile, report, or visual on the Power BI mobile app for iOS and Android devices.
App (apps): A bundle of dashboards, reports, and datasets. It also refers to the mobile apps for consuming content such as the Power BI app for iOS.
AppSource: Centralized online repository where you can browse and discover dashboards, reports, datasets, and apps to download.
ArcGIS for Power BI: ArcGIS is a mapping and analytics platform created by the company Esri. The name of the visual included in the Power BI visuals library is called ArcGIS for Power BI.
Auto Insights: Now called Quick Insights.
BI: Business intelligence.
Bookmark: A view of data captured in the Bookmarks pane of a report in Power BI Desktop or service. In Desktop, the bookmarks are saved in the pbix report file for sharing on the PowerBI service.
Breadcrumbs: The navigation at the top left to quickly navigate between reports and dashboards.
Calculation: A mathematical determination of the size or number of something.
Capacity (Power BI Premium): Data models running on hardware fully managed by Microsoft in Microsoft cloud data centers to help ensure consistent performance at scale. BI solutions are delivered to the entire organization regardless of Power BI license.
Card (visual type): A Power BI visualization type.
Card (Power BI Home): Power BI Home displays rectangular and square pictures that represent dashboards, reports, apps, and more. These pictures are referred to as cards.
Certified custom visual: A Power BI custom visual that met requirements and passed strict security testing.
Connect live: A method of connecting to SQL Server Analysis Services data models. Also called a live connection.
Connector: Power BI Desktop includes an ever-growing collection of data connectors that are built to connect to a specific data source. Examples include GitHub, MailChimp, Power BI dataflows, Google Analytics, Python, SQL Server, Zendesk, and more than 100 additional data sources.
Container: The areas on the navigation pane are containers. In the nav pane, you'll find containers for: Browse, Data hub, Apps, Metrics, Deployment pipelines, Learn, Workspaces, and Home.
Content: Content for the Power BI service is generally dashboards, reports, and apps. It can also include workbooks and datasets.
Content list: The content index for an app.
Content view: The view that lists Power BI content you created or content that was shared by other designers.
Continuous variable: A continuous variable can be any value between its minimum and maximum limits; otherwise, it is a discrete variable. Examples are temperature, weight, age, and time. Continuous variables can include fractions or portions of the value. The total number of blue skateboards sold is a discrete variable since we can't sell half a skateboard.
Correlation: A correlation tells us how the behavior of things are related. If their patterns of increase and decrease are similar, then they're positively correlated. And if their patterns are opposite, then they're negatively correlated. For example, if sales of our red skateboard increase each time we run a TV marketing campaign, then sales of the red skateboard and the TV campaign are positively correlated.
Cross-filter: Applies to visual interactions. Cross-filtering removes data that doesn't apply. For example, selecting Moderation in the doughnut chart cross-filters the line chart. The line chart now displays only data points that apply to the Moderation segment.
Cross-highlight: Applies to visual interactions. Cross-highlighting retains all the original data points but dims the portion that doesn't apply to your selection. For example, selecting Moderation in the doughnut chart cross-highlights the column chart. The column chart dims all the data that doesn't apply to the Moderation segment and highlights all the data that does apply to the Moderation segment.
Custom visual: Visuals that are created by the community and Microsoft. They can be downloaded from the Microsoft Store for use in Power BI reports.
Dashboard: In the Power BI service, a dashboard is a single page, often called a canvas, that uses visualizations to tell a story. Because it is limited to one page, a well-designed dashboard contains only the most important elements of that story. Dashboards can be created and viewed only in the Power BI service, not in Power BI Desktop.
Data connector: See connector.
Data model (Excel data model): In Power BI content, a data model refers to a map of data structures in a table format. The data model shows the relationships that are being used to build databases. Report designers, administrators, and developers create and work with data models to create Power BI content.
Dataflow: Dataflows ingest, transform, integrate, and enrich big data by defining data source connections, Extract Transform Load (ETL) logic, refresh schedules, and more. Formerly called 'data pool'.
Dataset: A dataset is a collection of data used to create visualizations and reports.
Desktop (or Power BI Desktop): Free Power BI tool used primarily by report designers, admins, and developers.
Diamond (Power BI Premium): The shape of the icon that signifies a workspace is a Premium capacity workspace.
Dimension: Dimensions are categorical (text) data. A dimension describes a person, object, item, product, place, and time. In a dataset, dimensions are a way to group measures into useful categories. For our skateboard company, some dimensions might include looking at sales (a measure) by model, color, country/region, or marketing campaign.
Drill up, drill down, drillthrough: In Power BI, 'drill down' and 'drill up' refer to the ability to explore the next level of detail in a report or visual. 'Drill through' refers to the ability to select a part of a visual and be taken to another page in the report filtered to the data that relates to the part of the visual you selected on the original page. Drill to details commonly means to show the underlying records.
Editing view: The mode in which report designers can explore, design, build, and share a report.
Ellipsis: (...) menu. Selecting an ellipsis displays additional menu options. Also referred to as the More actions or More options menu depending on the menu options.
Embed code: A common standard across the internet. In Power BI, the customer can generate an embed code and copy it to place content such as a report visual on a website or blog.
Embedded: See Power BI Embedded.
Embedding: In the Power BI developer offering, the process of integrating analytics into apps using the Power BI REST APIs and the Power BI SDK.
Environment: [Power BI Desktop, Power BI Mobile, the Power BI service, and others] Another way to refer to one of the Power BI tools. It's OK to use Power BI environment (tenant) in documentation where it might help business analysts who are familiar with the term 'tenant' to know it's the same thing.
Explicit measures: Power BI uses explicit measures and implicit measures (see definition). Explicit measures are created by report designers and saved with the dataset. They are displayed in Power BI as fields and can therefore be used over and over. For example, a report designer creates an explicit measure TotalInvoice that sums all invoice amounts. Colleagues who use that dataset and who have edit access to the report can select that field and use it to create a visual. When an explicit measure is added or dragged onto a report canvas, Power BI does not apply an aggregation. Creating explicit measures requires edit access to the dataset.
Filter versus highlight: A filter removes data that does not apply. A highlight grays out the data that does not apply.
Focus mode: Use focus mode to pop out a visual or tile to see more detail. You can still interact with the visual or tile while in focus mode.
Full-screen mode: Use full-screen mode to view Power BI content without the distraction of menus and navigation panes.
Gateway or on-premises data gateway: A bridge to underlying data sources. It provides quick and secure data transfer between the Power BI service and on-premises data sources that support refresh. Usually managed by IT.
High-density visuals: Visuals with more data points than Power BI can render. Power BI samples the data to show the shape and outliers.
Home: The default landing page for Power BI service users. Doesn't modify anything. Can be called Power BI Home or simply Home.
Implicit measures: Power BI uses implicit measures and explicit measures. Implicit measures are created dynamically when you drag a field onto the report canvas to create a visual, and Power BI automatically aggregates the value using one of the built-in standard aggregations (SUM, COUNT, MIN, AVG, and others). Creating implicit measures requires edit access to the report.
Insights: See quick insights.
KPIs: Key performance indicators. A type of visual.
Left navigation (left nav): This was replaced with nav pane but might still appear in some documentation. The controls along the left edge of Power BI service. First instance: navigation pane. Subsequent mentions or tight spaces: nav pane.
License: Your level of access is determined by the Power BI license associated with your account and the capacity type where content is stored.
List page or content list: One of the section pages for the elements in the nav pane. For example, Create Data hub or My workspace.
Measure: A measure is a quantitative (numeric) field that can be used to do calculations. Common calculations are sum, average, and minimum.
Microsoft R (R): R is a programming language and software environment for statistical computing and graphics.
Mobile app: Apps that allow you to run Power BI on iOS, Android, and Windows mobile devices.
Modeling (Power BI Desktop): Getting the data you've connected to ready for use in Power BI. This includes creating relationships between tables in multiple data sources, creating measures, and assigning metrics.
My workspace: The workspace for each Power BI customer to use to create content.
Native: Included with the product. For example, Power BI comes with a set of native visualization types. But you can also import other types such as Power BI visuals.
Navigation pane or nav pane: The controls along the left edge of the Power BI service.
Notification: Messages sent by and to the Power BI Notification center.
Notification center: The location in the service where messages are delivered to users such as notice of sunsetting certain features.
OneDrive for work or school vs OneDrive: OneDrive is a personal account and OneDrive for work or school is for work accounts.
On-premises: The term used to distinguish local computing (in which computing resources are located on a customer's own facilities) from cloud computing.
On-premises data gateways: See gateways or on-premises data gateways.
PaaS: Platform as a service, for example, Power BI Embedded.
Page: Reports have one or more pages. Each tab on the report canvas represents a page.
Paginated reports: Paginated reports are designed to be printed or shared. They display all the data in a table even if the table spans multiple pages.
pbiviz: The file extension for a Power BI custom visual.
pbix: The file extension for a Power BI Desktop file.
Permissions: What a user can and can't do in Power BI is based on permissions.
Phone report: The name for a Power BI report that's been formatted for viewing on a phone.
Phone view: The user interface in the Power BI service for laying out a phone report.
Pin unpin: The action a report designer takes when placing a visual, usually from a report, onto a dashboard.
Power BI, Power BI service, Power BI Desktop, Power BI mobile: Some of the Power BI offerings. Power BI is the general term.
Power BI Desktop: Also referred to as Desktop. The free Windows application of Power BI you can install on your local computer.
Power BI Embedded: A product used by developers to embed Power BI dashboards and reports into their own apps, sites, and tools.
Power BI Premium: An add-on to the Power BI Pro license that enables organizations to predictably scale BI solutions.
Power BI Pro: A monthly per-user license that provides the ability to build reports and dashboards, collaborate on shared data, and more.
Power BI Report Builder: A free standalone Windows Desktop application used for authoring paginated reports.
Power BI Report Server: An on-premises report server with a web portal in which you display and manage reports and KPIs.
Power BI service: An online SaaS (software as a service).
Premium workspace: A workspace running in a capacity signified to customers by a diamond icon.
Pro license or Pro account: See account and license.
Publish: Power BI service report designers bundle the contents of a Power BI workspace to make it available to others as a Power BI app.
Q&A: The Power BI feature that allows natural language questions about a dataset and get responses in the form of visualizations.
Q&A virtual analyst: [Power BI Mobile] For iOS, the conversational UI for Q&A.
QR codes: [Power BI Mobile] A matrix barcode that can be generated for dashboards or tiles in the Power BI service to identify products.
query string parameter: Add one to a URL to pre-filter the results seen in a Power BI report.
Quick Insights: Automatically generated insights that reveal trends and patterns in data.
Reading view: Read-only view for reports (as opposed to Editing View).
real-time streaming: The ability to stream data and update dashboards in real time from sources such as sensors, social media, usage metrics, and more.
recent: The container on the home page that holds all the individual items that were accessed last.
related content: Shows the individual pieces of content that contribute to the current content. For example, for a dashboard, you can see the reports and datasets providing the data and visualizations on the dashboard.
relative links: Links from dashboard tiles to other dashboards and reports that have been shared directly or distributed through a Power BI app. This enables richer dashboards that support drillthrough.
report: A multi-perspective view into a single dataset with visualizations that represent different findings and insights from that dataset. Can have a single visualization or many, a single page or many pages.
report editor: The report editor is the tool in which new reports are created and changes are made to existing reports by report designers.
report measures: Also called custom calculations. Excel calls these calculated fields. See also measures.
responsive visuals: Visuals that change dynamically to display the maximum amount of data and insights no matter the screen size.
row-level security (RLS): Power BI feature that enables database administrators to control access to rows in a database table based on the characteristics of the user executing a query (for example, group membership).
SaaS: Software as a service is a way of delivering applications over the internet as a web-based service. Also referred to as web-based software, on-demand software, or hosted software.
screenshot: Simple screenshots of a report can be emailed using the 'send a screenshot' feature.
service: See Power BI service. A standalone resource available to customers by subscription or license. A service is a product offering delivered exclusively via the cloud.
settings: The location for Power BI users to manage their own general settings such as whether to preview new features, set the default language, close their account, and more.
share (sharing): In Power BI, sharing typically means directly sharing an individual item (a dashboard or report) with one or more people by using their email address.
Shared with me: The container in the nav pane that holds all the individual items that were directly shared by another Power BI user.
snapshot: In Power BI, a snapshot is a static image compared with a live image of a tile, dashboard, or report.
SQL Server Analysis Services (SSAS): An online analytical data engine used in decision support and business analytics, providing the analytical data for business reports and client applications such as Power BI, Excel, Reporting Services reports, and other data visualization tools.
SQL Server Reporting Services (SSRS): A set of on-premises tools and services to create, deploy, and manage report servers and paginated reports.
streaming data: See real-time streaming. The ability to stream data and update dashboards in real time.
subscriptions (subscribe): You can subscribe to report pages, apps, and dashboards and receive emails containing a snapshot. Requires a Power BI Pro license.
summarization: [Power BI Desktop] The operation being applied to the values in one column.
tiles: Power BI dashboards contain report tiles.
time series: A time series is a way of displaying time as successive data points. Those data points could be increments such as seconds, hours, months, or years.
value (values): Numerical data to be visualized.
visual (visualization): A chart. Some visuals are bar chart, treemap, doughnut chart, map.
visual interaction: One of the great features of Power BI is the way all visuals on a report page are interconnected. If you select a data point on one of the visuals, all the other visuals on the page that contain that data change based on that selection.
Visualizations pane: Name for the visualization templates that ship in the shared report canvas for Power BI Desktop and the Power BI service.
workbook: An Excel workbook to be used as a data source. Workbooks can contain a data model with one or more tables of data loaded into it by using linked tables, Power Query, or Power Pivot.
workspace: Containers for dashboards, reports, and datasets in Power BI. Users can collaborate on the content in any workspace except My workspace.
x-axis: The axis along the bottom, the horizontal axis of a line graph.
y-axis: The axis along the side, the vertical axis of a line graph.
Release notes for 0.0.8 are a bit sparse due to it being a fairly tiny update.
Diagnostics Service
Initial development of Tableau dbt model
Bug fixes
Gateway Service
New API implemeted to get the users based on some filter parameters.
Frontend Service
Sharing comment by copying link.
Multiple modifications, bug fixes, feature additions, UI enhancements, and connector-related changes have been made for PowerBI and Tableau
Frontend Service
Modifications made to the UI.
Bug fix and code cleanup.
Redux implemented on all pages.
Fixes and improvements.
Update to the toggleActivateDeactivateConnector and removal of unnecessary text.
Improved functionality to open connector details page by clicking anywhere on the row.
Updates to the ConnectorSetting component props and changes in the Toggle API response handling.
Bug fixes and improvements.
Feature implementation for recommendation side panel.
Alignment adjustment for the connector settings button.
Gateway Service
Added support for SQLAlchemy classes to manage the app database (user, account, environment).
Completed user authentication flow and updated the project structure.
Implemented datasource APIs for DAG Run List and Trigger DAG functionality.
Introduced PowerBI feature.
Developed connector feature.
Improved swagger documentation accuracy.
Implemented Snowflake-related features and bug fixes.
Added functionalities for user invitations and role management.
Completed Tableau connector setup.
Enhanced notifications and issue tracking capabilities.
Diagnostics Service
Added unit tests for secrets clients and updated unit tests for Azure secrets.
Implemented a secrets client for both local secrets and Azure secrets.
Created a skeleton DAG framework with support for multiple clients and task dependencies.
Improved the PowerBI data loading process for testing.
Developed dashboards, datasets, and capacities for PowerBI.
Integrated Tableau and added Tableau DBT models.
Automated Snowflake ingestion for PowerBI.
Optimized activities ingestion for PostgreSQL.
Enhanced error handling and error code implementation.
Refactored and optimized the metrics history integration.
New Features and Enhancements Across Services: Improved UX, Sorting, Filtering, and Formatting in Frontend; Expanded APIs and Azure Integration in Gateway; Performance Boosts and Advanced Diagnostics
Frontend Service
Blocked Status in the Status Dropdown - Now you can easily indicate a blocked status in the Status Dropdown. This feature was contributed by @raj-wadhwa.
Recommendations Table Data Sorting - The recommendations table data is now sorted by priority upon loading. This enhancement was made by @pushkar1701.
Improved Datetime Formatting - Datetime formatting now considers timezone differences, providing a better user experience. Thanks to @arif-js for this improvement.
Flexible Comment Input Box - The side panel's comment input box is now more flexible in height, with the submit button conveniently placed at the lower right corner. This enhancement was contributed by @tom-juntunen.
Connector Inventory Enhancements - Various improvements have been made to the inventory page, including a new connector filter and the core inventory table, thanks to @arif-js.
Color Coordination Enhancements: The frontend now features improved color coordination, contributing to a more visually cohesive design. (Contributed by @pushkar1701)
Carousel Implementation: A new carousel component has been added to the frontend, enhancing the navigation and display of content. (Contributed by @pushkar1701)
Edit and Delete Issue Comments - Now you can easily edit and delete issue comments directly from the side panel. This feature was added by @arif-js.
Activity Card Style Changes - The activity card body has received some style enhancements, thanks to @arif-js.
Connector Status Next to Selection - The connector status is now displayed next to the connector selection, improving visibility. This feature was added by @arif-js.
Improved Inventory Table Columns - The inventory table columns have been updated based on the connector type for a more informative display. This enhancement was made by @arif-js.
Share Comment Feature - Now you can easily share comments with others using the new share comment functionality by @arif-js.
@raj-wadhwa made their first contribution in this repository.
@rcharkowicz made their first contribution in this repository.
Gateway Service
Edit and Delete Comment APIs - You can now perform editing and deletion of comments using our new APIs. This feature was contributed by @zaheeruddinfaizdl.
Azure Mail Proxy Implementation - We have implemented the Azure Mail Proxy, making communication more efficient. Thanks to @zaheeruddinfaizdl.
APIs for User Filtering - New APIs have been added to retrieve users based on specific filter parameters. This enhancement was made by @zaheeruddinfaizdl.
Inventory Data Retrieval APIs - We now have APIs to fetch inventory data using mock data. Integration with the BI proxy layer is still in progress and will be available in the next version.
Additional Enhancements and Bug Fixes - Various other enhancements and bug fixes have been implemented in the Gateway Service. Check out the individual pull requests for more details.
@rcharkowicz made their first contribution in this repository.
Diagnostics Service
Inventory Page Data Model - We've introduced a new and improved data model for the inventory page, enhancing its performance and functionality. Thanks to @JamesRizkallah1 for this contribution.
Tableau Model Update - The Tableau model has been updated for better compatibility and efficiency. This update was provided by @JamesRizkallah1.
MS Graph Department Level Metadata - We're excited to introduce the incorporation of department level metadata into BI usage behavior analysis using MS Graph. This powerful addition provides deeper insights and allows for more granular analysis of BI usage patterns across departments. Thank you to @rcharkowicz and @tom-juntunen for implementing this feature.
Bug Fixes and Performance Improvements - We've addressed various issues and improved the overall performance of the Diagnostics Service. Check out the pull requests for more details.
Cross-tab: Another name for a text table or a table of numbers. It is a tabular representation of data where rows and columns intersect to display values.
Dashboard: A collection of views shown in a single location where you can compare and monitor a variety of data simultaneously. Dashboards provide a consolidated and interactive way to analyze data.
Data source: The underlying data that Tableau Reader is connected to. You can't change the data source in Tableau Reader. It serves as the foundation for creating visualizations and reports in Tableau.
Filter: A control on a view that limits the data shown in a view. For example, a filter on Region that only includes the West. Filters help users focus on specific subsets of data within a visualization.
Marks: A visual representation of one or more rows in a data source. Mark types can be bar, line, square, and so on. Marks are the individual data points or elements displayed on a visualization.
Packaged workbook: A type of workbook created in either Tableau Desktop or Tableau Server. These files contain both the workbook as well as copies of the referenced local file data sources and background images. They allow for easy sharing and collaboration.
Pane: The row and columns areas in a view. Panes divide the view into sections, often used for arranging headers, rows, and columns within a worksheet or dashboard.
Repository: A folder located in your My Documents folder that stores workbooks. The repository is where Tableau stores its files, including workbooks and data sources.
View: The visual representation of your data in a worksheet or dashboard. Views are the charts, graphs, and tables that display data to users for analysis and interpretation.
Workbook: A collection of one or more worksheets and dashboards. Workbooks serve as containers for organizing and presenting data visualizations and analyses.
Worksheet: A single view of data. Each worksheet can be connected to a single data source. Worksheets are where you build and design visualizations and reports.
Dimension: A qualitative field that can be used to categorize, segment, and reveal the details in your data. Examples include dates, customer names, or geographical data. Dimensions provide context for analysis.
Measure: A quantitative field that can be aggregated and is suitable for mathematical operations, such as sums or averages. Measures would be data like sales amount, temperature readings, or counts of events. Measures provide numeric values for analysis.
Calculated Field: A user-defined field created by applying calculations to existing fields in the data source. This allows for more advanced analysis within a Tableau workbook. Calculated fields are created using mathematical, logical, or custom expressions.
Parameter: A dynamic placeholder that allows users to replace a constant value in a calculation, filter, or reference line. For instance, a parameter can let end-users change the threshold value displayed in a view. Parameters enable user interactivity and customization.
Extract: A saved subset of a data source that you can use to improve performance and support offline data analysis. An extract is a snapshot of the data taken at a specific point in time. It can be useful for working with large datasets efficiently.
Live Connection: A direct connection to a data source that allows real-time access to the latest data, but can be slower if the data set is very large or the database is not optimized. Live connections ensure that data is always up-to-date.
Hierarchy: An organizational structure that allows for drilling down into dimensions. Hierarchies are used in Tableau to define levels of data granularity from higher to lower levels of aggregation. They help in organizing and navigating data.
Tooltip: A message that appears when a user hovers over a mark in the view. Tooltips can be customized to display relevant information about the data point. They provide additional context and details about data.
Blending: The ability to combine data from two different data sources on a single sheet and visualize them together, even if they're not joined or related at the database level. Blending allows for integrated analysis of disparate data sources.
Sets: Custom fields that define a subset of data based on some conditions. A set can be used for comparative analysis, like comparing the performance of top products against all others. Sets help in creating segments within data.
Bins: User-defined containers of equal size that can be used to divide the dimension data into distinct ranges, which are often used for histograms. Bins are used to group continuous data into discrete intervals for analysis.
Story: A sequence of visualizations that work together to show different facets of data and insights. A story can explain how data leads to the conclusions you've made. It allows for storytelling through data visualization.
Data catalog: A component in Data manager and Data load editor that enables you to select and load data from all the datasets to which you have access. It serves as a catalog or repository of available data sources.
Data connection: Used to let data tasks access data sources and external storage and cloud data warehouses used in a data project. Data connections are the links or interfaces that allow data to be transferred or accessed.
Qlik Data Gateway - Data Movement: Allows you to move firewalled data from your enterprise data sources to cloud and on-premises targets over a strictly outbound encrypted and mutually authenticated connection. It facilitates secure data transfer between different environments.
Data Gateway Direct Access: Allows Qlik Sense SaaS applications to securely access firewalled data over a strictly outbound encrypted and mutually authenticated connection. It provides direct access to otherwise restricted data sources.
Data leakage: An undesired phenomenon in machine learning where an algorithm is trained with data that it will use for generating predictions, leading to unrealistically high model performance from memorization rather than actual learning. It can result in biased or overfit models.
Data load editor: A script editor that allows you to build and customize the script that loads data into your app. It provides a way to manipulate and transform data during the loading process.
Data manager: An app component that allows you to load and manage data sources in an app. Data managers are responsible for organizing and maintaining data within the application.
Data mart: Part of your data pipeline containing a subset of data from Storage or Transform data assets, ideally containing summarized data collected for analysis on specific sections or units within an organization. Data marts are specialized databases optimized for specific purposes.
Data model viewer: An app component that allows you to view the structure of the data added to an app and metadata about tables and fields. It provides insights into the organization of data within an application.
Data pipeline: A set of tasks for integrating data in a data project, which can be a simple linear pipeline or a complex one consuming several data sources and generating many outputs. Data pipelines define the flow of data processing within a project.
Data profiling: Displays statistics and information about your data sets. It provides insights into the characteristics and quality of data, helping in data preparation and analysis.
Data project: A workspace where you create your data pipeline using data assets, associated with a data platform used as the target for all outputs. Data projects are where data integration and transformation activities are managed.
Data task: The main unit of work in a data project for moving, storing, transforming data, and creating data marts. Data tasks define specific actions within a data project.
Dataset: Synonymous with table, referring to original source tables, transformed tables, or the fact and dimension tables in a data mart. Datasets are organized collections of data.
Dimension: An entity used in Analytics Services to categorize data in a chart, and in Data Integration, a dataset in a data mart forming part of the star schema. Dimensions provide context for data analysis.
Dynamic views: Allows you to query and view relevant subsets of large datasets from another app in a chart, with the ability to refresh dynamically as selections are made. Dynamic views provide a flexible way to interact with data.
Fact: A table that holds data to be analyzed, working together with dimension tables to store data on the ways in which fact table data can be analyzed. Facts contain the measurable data points in a data model.
Favorites: A section available to all users to add apps, datasets, automations, notes, experiments, and charts from the hub, which are private. Favorites allow users to bookmark and access frequently used items.
Feature (machine learning): A variable in a machine learning problem that can influence the value of the target column, recognized as columns in a dataset within Qlik AutoML. Features are input variables used to make predictions.
Field: Contains values loaded from a data source, corresponding to a column in a table and used to create dimensions and measures in visualizations. Fields represent individual data attributes.
Full load: Refers to the initial replication of data from the data source to the landing in Qlik Cloud Data Integration. It involves transferring all data without incremental updates.
Sheet: A sheet in Qlik Sense is a canvas where you can create a customized view of your data, arranged in a way that tells a story or answers specific questions. Sheets are used for data visualization and analysis.
Sheet objects: Components used to create an interface on a sheet, which can include data visualizations like tables and charts, as well as other objects such as buttons and text objects. Sheet objects are elements placed on sheets for interaction.
Snapshot: Graphical representations of a visualization at a certain point in time, used to create stories. Snapshots capture the state of visualizations for storytelling purposes.
Space data: Governed areas of the Qlik Cloud tenant used to create and store data projects, manage new data connections, and access Data Movement gateways. Space data is where data integration and management activities occur.
Space managed: Controlled spaces used to share apps with a limited group of users. Managed spaces provide a controlled environment for collaborative app development.
Space personal: A private space belonging to users where they can develop apps. Personal spaces are individual workspaces for app development.
Space shared: Areas where apps and data sources can be shared with other users for collaborative development. Shared spaces facilitate teamwork and sharing of resources.
Storage: Part of the data pipeline containing ready-to-consume datasets in Qlik Cloud from data copied from the landing zone. Storage is where data is stored and made available for analysis.
Story: A tool that allows the sharing of data insights and discoveries made in an app with other users, combining reporting, presentation, and exploratory analysis. Stories enable data-driven narratives.
Subscription: Reports that let you schedule recurring emails containing a PDF of selected sheets or charts. Subscriptions automate the delivery of data insights to users.
Synthetic key: A composite key between two tables in the data model, created when two or more tables have common fields, which may need to be reviewed if it results in a data model error. Synthetic keys are generated to link related tables.
Tables: ODS, HDS, and Change: Types of tables in a data project such as the Current table (ODS), the Prior table (HDS), and the Change table, serving different purposes within the data architecture. These tables are used to manage historical data changes.
Target: The destination or endpoint where data is intended to be transferred, stored, or loaded, in data movement, migration, or synchronization processes. Targets define where data should be placed.
Tenant: The deployment of Qlik Cloud, holding items such as users, apps, and spaces. Tenants represent the individual environments within Qlik Cloud.
Training dataset: The dataset used to train a machine learning model in Qlik AutoML, designed to learn patterns and make predictions on new data. Training datasets are used to teach models.
Transform: A task that allows creation of reusable data transformations in a data pipeline with rules and custom SQL. Transformations modify data to prepare it for analysis or storage.
Type 1 - Operational Data Store (ODS): In ODS datasets, new information overwrites the original information, i.e., no historical data is kept. Type 1 ODS tables update existing data with new information.
Type 2 - Historical Data Store (HDS): In HDS datasets, a new record representing the new information is added to the table, including both the original and the new record. Type 2 HDS tables maintain historical data.
Variable: A variable in Qlik Sense is a value container which can store a static or a calculated value, like a numeric or an alphanumeric value. Variables hold values that can be used in expressions and calculations.
Views: Virtual representations of physical datasets in data projects, which can query and fetch relevant data dynamically without occupying significant disk space. Views provide efficient data access.
Visualization: Charts, extensions, and other objects that help visualize data for exploration on a sheet. Visualizations are used to represent data graphically.
Vocabulary: A business logic feature in Qlik that allows the addition of synonyms and custom analyses to Insight Advisor Search and Chat. Vocabulary customization enhances business analysis capabilities.
Sheet view: A view in Qlik Sense representing a canvas where users can arrange data visualizations and other objects to tell a story or answer questions. Sheet views are used for creating customized data presentations.
Working in spaces in Qlik Cloud Data Integration: Refers to the governed areas in Qlik Cloud where users can create and store data projects, manage data connections, and access Data Movement gateways. Working in spaces involves data management and integration activities.
Working in managed spaces: Involves utilizing tightly controlled spaces to share applications with a select group of users. Managed spaces ensure controlled access and collaboration.
Working in personal spaces: Related to a user's private space where they can develop applications independently. Personal spaces provide individual workspaces for app development.
Working in shared spaces: Pertains to areas in Qlik Cloud where users can collaborate and share applications and data sources. Shared spaces facilitate teamwork and sharing of resources.
Storing datasets: Involves keeping datasets up-to-date in the Qlik Cloud data pipeline without manual intervention after data is transferred from the landing zone. Data storage ensures data availability for analysis.
Using data storytelling: A feature that enables users to share insights and discoveries from data analysis through a narrative combining reporting, presentation, and exploratory analysis. Data storytelling enhances data communication.
Scheduling reports with subscriptions: Allows users to configure and send recurring reports via email containing selected sheets or charts. Subscriptions automate report delivery.
Synthetic keys: Composite keys in the data model created when common fields between two or more tables exist, which may require review if data model errors are present. Synthetic keys are generated to link related tables.
Machine learning concepts: Encompass general principles of machine learning, such as targets for predictions in Qlik AutoML and the concept of data movement to a target destination. Machine learning concepts provide the foundation for predictive analytics.
Working with visualizations: Encompasses creating and interacting with charts, extensions, and other objects on a sheet to explore and understand data patterns. Working with visualizations is a key aspect of data analysis.
Business logic vocabulary: A feature in Qlik that allows adding synonyms and custom analyses to Insight Advisor Search and Chat, enhancing business analysis capabilities. Vocabulary customization improves data understanding and search capabilities.
The Datalogz system design provides customers a flexible approach for deploying Datalogz according to their preferences.
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads. Customers may choose which database technologies they would like to use for both the application and warehouse databases.
Typical OLTP Workloads include: - Create and manage Accounts and Environments - Create and manage Roles, Users, and Permissions - Create and manage Connectors - Create and manage BI Activity Dashboard - Create and manage Operations and Actions - Create and manage Impact Reports
Typical OLAP Workloads include: - Transform raw JSON data into enriched dimensional datasets - Identify issues in the BI environment related to ROI, security and compliance - Generate Change History for all BI metadata endpoints - Produce Context Logs for identifying root causes of Issues - Generate Recommendations for improving BI environments The following options are available today: 1. Postgres Only - App DB: PostgreSQL (OLTP) - BI WH: PostgreSQL (OLAP) 2. Postgres + Snowflake - App DB: PostgreSQL (OLTP) - BI WH: Snowflake (OLAP)
This option utilizes a single PostgreSQL server with two databases -- one for OLTP workloads and another for OLAP workloads.
This option utilizes a PostgreSQL database for OLTP workloads and a Snowflake database for OLAP workloads.
The Datalogz application uses Apache Airflow for connector management, providing BI Admins with pre-built metadata pipelines they can choose to schedule daily, weekly or hourly basis. New issues and recommendations will be generated after each connector refresh based on the latest data that has changed.
Your connectors will retrieve metadata from the following API endpoints: - PowerBI: Endpoints listed here. - Tableau: Endpoints listed here. - Looker: Endpoints listed here.
Connectors must be configured by BI Admins to approve the Datalogz application. This will provide read-only access to standard and admin level APIs based on a selection of Groups. Groups are generally defined as follows for each system: - PowerBI: Workspaces - Tableau: Projects - Looker: Folders
The admin level APIs unlock the most insight for your BI Admins when it comes to types of Issues and Recommendations Datalogz is able to provide. After a new connector is created, BI Admins can use Datalogz RBAC to assign fine-grained permissions to Users who should only have access to certain metadata from certain Groups.
The Service Principal you create for Datalogz utilizes a combination of admin and standard endpoints to retrieve the activity, lineage, query expression, and inventory metadata.
The following data flow diagram shows the endpoints from the PowerBI Rest API used to build the metadata model for Datalogz BI Ops. This will run on a daily or weekly basis defined when creating your Datalogz connector.
All warehouse transformations are source-controlled and executed using an open-source technology named dbt-core
. Metadata loaded into the BI Warehouse is transformed using dbt-core
to produce insights and recommendations that can be used to improve BI operations.
The dbt-core service is included as part of the ELT API and shown in the diagram below:
Datalogz uses Azure Key Vault to store sensitive secrets required to run the application.
Throughout this guide there will be references to secrets which should be stored in the Key Vault and environment variables which should be stored in an .env file in your project folder. Datalogz supports both the ManagedIdentityCredential the EnvironmentCredential class to authenticate to a Key Vault using an App Registration as a Principal.
To set up your Azure Key Vault follow these steps:
Login to Azure Portal
Create a new Key Vault if one has not already been created.
Proceed to either ManagedIdentity or EnvironmentCredential Identity options.
Navigate to the Virtual Machine you wish to use as a Managed System Identity and select Identity on the left sidebar.
Enable System Assigned identity and add the following role assignments to the VM:
“Key Vault Secrets Officer”
“Virtual Machine Contributor”
"Virtual Machine User Login"
Navigate to Key Vault and select Access configuration on the left sidebar.
Set permission model to “Azure role-based access control” and click Apply.
IAM permissions have already been configured in step 4, and you can confirm that they are present from the Key Vaults IAM page, if desired.
Add the Key vault to the default subnet in the same virtual network as the VM.
You can find this in the Key Vault > Networking tab.
Select "Allow public access from specific virtual networks and IP addresses."
Add the existing virtual network where your VM is located.
A service endpoint will be created for this subnet.
Note: If the subnet cannot take additional service endpoints, a new subnet will be required.
SSH into VM and install the Azure CLI
Login to Azure using 2FA from the SSH terminal, following the prompts
Assign the Identity to the VM and verify
Create a new App Registration to act as a Principal to access the Key Vault
Set Access configuration policy to “vault” access control
Create policy and add the Secrets Officer role to the App Principal.
Add the Key vault to the default subnet in the same virtual network as the VM is located in.
Pull the code and add the following environment variables to a dot file in your project directory named .prod.env
using the correct values based on the examples provided.
Pull the code and add the following environment variables to a dot file in your project directory named .env.prod
using the correct values based on the examples provided.
Pull the code and add the following environment variables to a dot file in your project directory named .env.prod
using the correct values based on the examples provided.
Navigate to your Key Vault in the Azure Portal and add the following environment variables populating them with the correct values based on the examples provided.
The Datalogz application uses OLTP and OLAP databases to optimize performance and scalability between the two workloads.
Azure Database for PostgreSQL (flexible server)
Minimum
Memory Optimized
Compute: 2 vCores, 16 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $500 / mo.
Recommended
Memory Optimized
Compute: 4 vCores, 32 GiB Memory, 6400 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $850 / mo.
Total Estimated Cost: $500 - $850 / mo.
Azure Database for PostgreSQL
Minimum/Recommended
Memory Optimized
Compute: 2 vCores, 8 GiB Memory, 3200 IOPS
Storage: 512 GB
Estimated Cost as of 2/1/2023: $200 / mo.
Minimum
Compute: X-Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $600 / mo.
Recommended
Compute: Small Warehouse
Storage: Pay as you go
Estimated Cost as of 2/1/2023 assuming 10 users: $1000 / mo.
Total Estimated Cost: $800 - $1200 / mo.
App DB (Azure Database for PostgreSQL)
Name: datalogz_bi
Warehouse DB (Azure Database for PostgreSQL)
Name: datalogz_wh
App DB (Azure Database for PostgreSQL)
Name: datalogz_bi
Warehouse DB (Snowflake)
Name: datalogz_wh
Datalogz supports connecting to the Tableau Cloud or Server API by Service Account and Personal Access Token
To connect Tableau with Datalogz, you'll need to follow these steps:
Create a Tableau access token:
Log in to your Tableau Online account and click on your profile in the top right corner of the page.
Enter a name for your token and click the "Create new token" button. Save the token name and secret for the next step.
Retrieve your host name, site name, and API version:
You can find your API version by visiting the following page:[Tableau REST API Version Docs]
Use the Personal Access Token name and secret created in the previous step to login and create new Tableau connectors.
Login using Tableau and create Tableau connector Login using Tableau
Choose "My Account Settings" from the dropdown menu.
Scroll down to "Personal Access Tokens".
Your host and site names can be found in the <host_name>
and site_name
parts of the URL as shown below: https:/<host_name>/#/site/<site_name>/
Create Tableau Connector