Multi-Cloud Governance — one catalog, one lineage, one tag set¶

Comparative positioning note

This document is written from the perspective of Microsoft Azure, Cloud Scale Analytics, and CSA Loom. Any description of third-party or competing products, services, pricing, or capabilities is derived from publicly available documentation and sources believed accurate at the time of writing, and is provided for general comparison only. We do not claim expertise in, or authority over, any non-Microsoft product or service; the respective vendor's official documentation is the authoritative source for their offerings, which may change over time. Nothing here is intended to disparage any vendor — where a competing product has genuine advantages, we aim to note them honestly. Verify all third-party details against the vendor's current official documentation before making decisions.

Multi-cloud governance fails the same way every time: each cloud has its own catalog (AWS Glue, GCP Dataplex, Azure Purview), each cloud has its own tag namespace, each cloud has its own policy engine. The result is a governance picture stitched from three half-views, with seams visible everywhere.

The defense is federation, not duplication. Purview as the business-glossary + sensitivity + lineage anchor; Unity Catalog as the technical-catalog + access-control anchor for Databricks workloads; IaC-enforced tag propagation across all providers.

The architecture¶

flowchart TB
    subgraph anchors["Catalog anchors"]
        PURVIEW["Microsoft Purview<br/>(business glossary · sensitivity · lineage)"]
        UC["Unity Catalog<br/>(technical catalog · grants · queries)"]
    end

    subgraph sources["Catalog sources"]
        ADLS["ADLS Gen2"]
        S3["S3"]
        GCS["GCS"]
        SQL["Azure SQL · MI"]
        RDS["AWS RDS"]
        BQ["BigQuery"]
        DBX["Databricks (any cloud)"]
    end

    subgraph consumers["Consumers"]
        BI["Power BI · Tableau"]
        APPS["Applications"]
        AI["AI Foundry RAG"]
        ANALYSTS["Analysts"]
    end

    ADLS --> PURVIEW
    S3 --> PURVIEW
    GCS --> PURVIEW
    SQL --> PURVIEW
    RDS --> PURVIEW
    BQ --> PURVIEW

    DBX --> UC
    UC <-->|catalog federation| PURVIEW

    PURVIEW --> BI
    PURVIEW --> APPS
    UC --> AI
    UC --> ANALYSTS

    classDef anchor fill:#0078D4,stroke:#fff,color:#fff,stroke-width:2px
    classDef peer fill:#5C2D91,stroke:#fff,color:#fff,stroke-width:2px
    classDef consumer fill:#107C10,stroke:#fff,color:#fff,stroke-width:2px

    class PURVIEW,UC anchor
    class ADLS,S3,GCS,SQL,RDS,BQ,DBX peer
    class BI,APPS,AI,ANALYSTS consumer

Two catalogs, two roles¶

There is a useful division of labor between Purview and Unity Catalog:

Concern	Purview	Unity Catalog
Business glossary	Yes (primary)	No
Sensitivity labels (PII, PHI, PCI)	Yes (primary)	Inherits from Purview
Cross-source lineage (ADF → Databricks → Power BI)	Yes (primary)	Per-Databricks lineage
Asset discovery + search	Yes (broad)	Yes (Databricks-scoped)
Fine-grained access control (table/row/column)	No (catalog only)	Yes (primary)
Query-time enforcement	No	Yes
Multi-cloud Databricks workspaces	N/A	Yes (any cloud)
Multi-cloud non-Databricks (Snowflake, BigQuery, Athena)	Yes	Via federation

The pattern: Purview is the executive view. Business glossary, sensitivity classifications, end-to-end lineage. Unity Catalog is the engineer view. Granular grants on tables, columns, rows. Federation keeps them coherent.

Catalog federation¶

Purview can register Unity Catalog metastores as scanned sources since Purview 2024 connector v3.0. The scan brings Unity tables into Purview as cataloged assets, with their classifications and column-level lineage. Conversely, Unity Catalog can pull Purview sensitivity tags via the Purview API and enforce them as column masks.

The bidirectional pattern:

sequenceDiagram
    participant Source as Source<br/>(any cloud)
    participant UC as Unity Catalog
    participant Purview
    participant Consumer

    Source->>UC: Databricks writes table
    UC->>Purview: scan registers asset
    Purview->>Purview: classifier tags PII column
    Purview->>UC: API pulls sensitivity tag
    UC->>UC: column-mask policy for PII tag
    Consumer->>UC: SELECT * FROM table
    UC-->>Consumer: row with PII column masked

The result: one classifier tag in Purview becomes an enforced column mask in every Databricks workspace in every cloud, with no per-workspace copy of the policy.

Tag standards that propagate¶

A tag set must be mandatory, propagated by automation, and identical across providers. The recommended minimum:

Tag	Values	Purpose
`Environment`	`prod` \| `nonprod` \| `dev` \| `sandbox`	Lifecycle + cost separation
`DataClass`	`public` \| `internal` \| `confidential` \| `restricted`	Sensitivity-based access
`Owner`	Entra group object ID	Accountability
`CostCenter`	finance code	Chargeback
`Workload`	name from workload registry	FinOps + ops grouping
`Compliance`	`none` \| `pci` \| `hipaa` \| `fedramp-mod` \| `fedramp-high`	Policy enforcement
`Region`	Azure region code	Data residency
`Retention`	duration	Lifecycle automation

Propagation rules:

IaC applies on create. Bicep + Terraform modules require these tags as inputs; build fails without them.
CI/CD enforces on update. A policy job runs after every deployment scanning for resources missing any required tag.
Cross-provider mapping. Azure tags, AWS tags, GCP labels, and OCI tags use the same keys. The values are normalized (Environment=prod not env=production).
Cost reporting joins on tags. The cross-cloud FinOps dashboard groups by Workload + CostCenter + Environment.

Policy enforcement across clouds¶

Each cloud has a policy engine. The recommended pattern is to write the policy intent in one place (Azure Policy + Microsoft Defender for Cloud) and export equivalent policies to peer clouds via tooling:

Cloud	Policy engine	Equivalent
Azure	Azure Policy + Defender for Cloud	Primary
AWS	AWS Config + Security Hub + SCP	Peer
GCP	Organization Policy + Security Command Center	Peer
OCI	Cloud Guard + IAM policies	Peer
Cross-cloud	Open Policy Agent (OPA) via Conftest in CI	Universal pre-deployment gate

The pre-deployment gate (OPA in CI) is the most reliable layer because it catches policy violations before they reach any cloud's runtime engine. The runtime engines are the second line of defense.

Compliance frameworks¶

The right framework taxonomy is:

FedRAMP Moderate — minimum for federal civilian SaaS. Azure has > 100 services in scope; AWS GovCloud and GCP Assured Workloads have parity for core services.
FedRAMP High — federal civilian with sensitive data. Azure Government has a broad in-scope footprint; other government clouds have parity for core services to varying degrees — verify each provider's current authorized-service list.
DoD IL4 / IL5 / IL6 — DoD-specific. Azure Government and AWS GovCloud have IL4 + IL5; IL6 is Azure DoD Government Secret and AWS Top Secret.
HIPAA / HITRUST — healthcare. All three majors meet via BAA.
PCI DSS — payment data. All three majors meet via attestation.
ISO 27001 / 27017 / 27018 — international. All three majors meet.
SOC 1 / 2 / 3 — service organization controls. All three majors meet.

The discipline: map compliance scope at the workload-tag level, not the subscription level. A workload tagged Compliance=fedramp-high triggers different policy enforcement than a workload tagged Compliance=none, even if both run in the same subscription.

Lineage that crosses clouds¶

End-to-end lineage is the hardest cross-cloud governance problem because every engine generates its own lineage events in its own format. The recommended pattern:

Every engine emits OpenLineage events to a central collector. OpenLineage is the open standard supported by Spark, Airflow, dbt, Databricks, Fabric, Trino, Flink.
The collector pushes to Marquez or directly to Purview.
Purview is the lineage UI — graph view that crosses cloud + engine boundaries.

Per-engine lineage that does not flow OpenLineage events into Purview is invisible at the executive level. Treat it as a gap.

Anti-patterns¶

Per-cloud catalog with no federation. Three catalogs, three search experiences, three glossaries. Pick a federation anchor.
Tags applied manually. Tags must be IaC-required. Manually-applied tags drift within weeks.
Policy in only one cloud's engine. Azure Policy alone does not stop AWS misconfigurations. Use OPA in CI as the universal gate; each cloud's engine as the runtime backstop.
Lineage per engine, no aggregation. Per-engine lineage is useless to the executive who needs the full picture.
Sensitivity labels per cloud. PII in AWS Glue, PII in Purview, PII in BigQuery — three different keywords. Pick the Purview sensitivity taxonomy and propagate.