Datalogz Security

Introduction

This document provides an in-depth look at Datalogz, a SaaS-based, read-only metadata application, detailing how it interfaces with various BI tools, handles authentication, and ensures data security.

Overview of Datalogz SaaS Solution

Datalogz is a specialized tool designed to administer BI environments effectively through the analysis of non-proprietary metadata. It operates exclusively in a SaaS environment, ensuring enhanced efficiency and security.

Metadata Read from BI Tools

Connector Pipeline

The Datalogz application uses Apache Airflow for connector management, providing BI Admins with pre-built metadata pipelines they can choose to schedule daily, weekly or hourly basis. New alerts will be generated after each connector refresh based on the latest data that has changed.

Your connectors will retrieve metadata from the following API endpoints:

  • PowerBI: Endpoints listed here.

  • Tableau: Endpoints listed here.

  • Qlik: Endpoints listed here.

Connectors must be configured by BI Admins to approve the Datalogz application. This will provide read-only access to standard and admin-level APIs based on a selection of Groups. Groups are generally defined as follows for each system:

  • PowerBI: Workspaces

  • Tableau: Projects

  • Qlik: Streams

The admin-level APIs unlock the most insight for your BI Admins when it comes to types of Issues and Recommendations Datalogz is able to provide. After a new connector is created, BI Admins can use Datalogz RBAC to assign fine-grained permissions to Users who should only have access to certain metadata from certain Groups.

Authentication and Security Measures

Secrets used to authenticate to the BI Metadata APIs are encrypted and stored in a managed-identity Azure Key Vault within a private subnet of the Datalogz private virtual network. Only the Datalogz backend virtual machine used for running data extraction pipelines has network access and adequate privileges to use these secrets for authentication.

Datalogz as a "Read Only" Metadata Application

Datalogz is able to provide BI Ops insights and recommendations using metadata about the existence of BI reports, users, activities, and other BI assets. Datalogz does not require any data access to generate these recommendations, so we do not require this level of permission to be granted to service principal credentials or personal access tokens. This means Datalogz does not have access to query any data that is used in BI reports. Datalogz only requires access to metadata about the nature of those BI reports, such as the title, description, configuration, asset lineage, usage patterns, refresh durations, successful uptime, and governance features. From this information, the Datalogz recommendations are generated without data access, which can also be described as “read-only access to metadata only.”

Comparison with On-Prem Deployment

Compared to the On-Prem Deployment, a SaaS Deployment has the following benefits:

  • Datalogz monitors and maintains all infrastructure required to run the Datalogz BI Ops platform.

  • Datalogz monitors and maintains all services, code, and images required to run the backend and frontend services.

  • Datalogz upgrades, tests, and deploys all new versions of Datalogz to give you a seamless experience between versions.

  • No resources on your end are required to commit to the following activities that are part of an on-prem deployment engagement:

    • Monitoring and maintaining the infrastructure.

    • Monitor and maintain the services, code, and images required to run the services.

    • Upgrading and deploying new versions of Datalogz using Docker Desktop and some Windows or Linux commands.

Architecture Diagram

Resource Group: rg-biops-prod-eastus2-001

Resources:

  • Virtual Machines:

  • Frontend VM: Hosting services running in containers using Docker Compose behind an nginx web server in the public subnet.

  • Backend VM: Hosting backend services (ELT API, Fast API, etc.) in containers using Docker Compose behind an nginx web server in the private subnet.

  • Virtual Network:

    • Public Subnet for frontend services.

    • Private Subnet for backend services.

  • Databases:

    • Azure PostgreSQL database for operational data.

    • Snowflake data warehousing service for BI metadata ingestion and analysis using Azure Storage Integration and Snowflake Network Policy.

  • Storage:

    • File Storage configured for secure data ingestion via Snowflake External Stage using Azure blob storage for staging data before loading it into Snowflake.

  • Key Vault:

    • Key Vault for secure storage of keys and passwords, with Managed Identity access exclusively from the backend VM.

Last updated