Enable Unified Data Governance Across All Your Data Sources with Azure Purview

In my last Planet Technologies Perspectives blog post five months ago (December 2020), I discussed organizational Enterprise Data Strategy, using Microsoft’s own strategy as an example, around an organization’s Data Estate. As a reminder, the Data Estate is defined as all data, both structured and unstructured, owned and managed across the entire organization no matter where it resides; on-premises, in the cloud, whether Azure or other cloud (e.g. AWS), or SaaS (Software as a Service). Obviously, the management of the data in all possible locations can be overwhelming. So much so, some organizations’ provision teams are led by a Chief Data Officer (CDO) to ensure the organization knows what and where the data is to support the organization’s operations.

This past December, Microsoft announced a new PaaS tool offering, Azure Purview, to assist in management and governance of the Data Estate. For those following Microsoft’s products and technologies around enterprise data management and governance, you may consider this a third iteration or version of cataloging enterprise data. Although currently still in preview, it is a significant offering in focusing on the challenge of enterprise data governance. Additionally, since the announcement, Microsoft has released significant enhancements to the preview platform. As such, with the Azure Purview platform continuing to be enhanced, information provided here may change prior to general availability (GA) and after release.

What is Azure Purview?

In Microsoft’s own words, “Azure Purview is a unified data governance service that helps you manage and govern your on-premises, multi-cloud, and software-as-a-service (SaaS) data. Easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage. Empower data consumers to find valuable, trustworthy data.

Challenges Azure Purview Addresses

Azure Purview is meant to address challenges for multiple organizational roles utilizing and managing data: Data Consumers, Data Producers, and Security Administrators.

Discovery Challenges for Data Consumers

Traditionally, discovering enterprise data sources has been an organic process based on communal knowledge. For companies that want the most value from their information assets, this approach presents many challenges:

  • Because there is no central location to register data sources, users might be unaware of a data source unless they come into contact with it as part of another process.
  • Unless users know the location of a data source, they cannot connect to the data by using a client application. Data-consumption experiences require users to know the connection string or path.
  • The intended use of the data is hidden to users unless they know the location of a data source’s documentation. Data sources and documentation might live in several places and be consumed through different kinds of experiences.
  • If users have questions about an information asset, they must locate the expert or team responsible for the data and engage them offline. There is no explicit connection between data and the experts that have perspectives on its use.
  • Unless users understand the process for requesting access to the data source, discovering the data source and its documentation will not help them access the data.

Discovery Challenges for Data Producers

Although data consumers face the previously mentioned challenges, users who are responsible for producing and maintaining information assets face challenges of their own:

  • Annotating data sources with descriptive metadata is often a lost effort. Client applications typically ignore descriptions that are stored in the data source.
  • Creating documentation for data sources can be difficult and it’s an ongoing responsibility to keep documentation in sync with data sources. Users might not trust documentation that’s perceived as being out of date.
  • Creating and maintaining documentation for data sources is complex and time-consuming. Making that documentation readily available to everyone who uses the data source can be even more so.
  • Restricting access to data sources and ensuring that data consumers know how to request access is an ongoing challenge.

When such challenges are combined, they present a significant barrier for companies that want to encourage and promote the use and understanding of enterprise data.

Discovery Challenges for Security Administrators

Users who are responsible for ensuring the security of their organization’s data may have any of the challenges listed above as data consumers and producers, as well as the following additional challenges:

  • An organization’s data is constantly growing, stored, and shared in new directions. The task of discovering, protecting, and governing the sensitive data is one that never ends. The organization wants and needs to make sure that its content is being shared with the correct people and applications with the correct permissions.
  • Understanding the risk levels in the organization’s data requires diving deep into the content, looking for keywords, RegEx patterns and/or and sensitive data types. Sensitive data types can include Credit Card numbers, Social Security numbers, or Bank Account numbers, to name a few. The organization must constantly monitor all data sources for sensitive content, as even the smallest amount of data loss can be critical to the organization.
  • Ensuring the organization continues to comply with corporate security policies is a challenging task as the content grows and changes, and as those requirements and policies are updated for changing digital realities. Security administrators are often tasked with ensuring data security in the quickest time possible.

Key Components of Azure Purview

Azure Purview is currently comprised of three key components: Azure Purview Data Map, Purview Data Catalog, and Purview Data Insights. All of these are accessed via the web-based environment called Azure Purview Studio.

Azure Purview Studio

Once the Azure Purview instance has been created within the Azure Portal, Azure Purview Studio is used to work with Azure Purview. It is a web-based central console (https://web.purview.azure.com) with which administrators, developers, and end-users can work with the enterprise data.

Azure Purview Data Map

Azure Purview Data Map provides the foundation for data discovery and effective data governance. Purview Data Map is a cloud native PaaS service that captures metadata about enterprise data present in analytics and operation systems on-premises and cloud. Purview Data Map is automatically kept up to date with built-in automated scanning and classification system. Users within the organization can configure and use the Purview Data Map through an intuitive UI and developers can programmatically interact with the Data Map using open-source Apache Atlas 2.0 APIs.

Azure Data Map of multiple on-premises, multicloud, and SaaS (Power BI) data sources

Azure Purview Data Map powers the Purview Data Catalog and Purview data insights as unified experiences within Purview Studio.

Purview Data Catalog

With the Purview Data Catalog, business and technical users alike can quickly & easily find relevant data using a search experience with filters based on various lenses like glossary terms, classifications, sensitivity labels and more. For subject matter experts, data stewards, and officers, the Purview Data Catalog provides data curation features like business glossary management and ability to automate tagging of data assets with glossary terms.

Purview Data Catalog search experience

Data consumers and producers can also visually trace the lineage of data assets starting from the operational systems on-premises, through movement, transformation & enrichment with various data storage & processing systems in the cloud to consumption in an analytics system like Power BI.

Purview Data Catalog data source lineage graphical representation

Purview Data Insights

Purview Data Insights provides the organization a single pane of glass view into their catalog and further aims to provide specific insights to the data source administrators, business users, data stewards, data officer, and security administrators. Currently, Purview has the following Insights reports that will be available to customers at public preview: Asset Insights, Scan Insights, Glossary Insights, Classification Insights, Sensitivity Labeling Insights, and File Extension Insights. Ultimately, Purview Data Insights provides data officers and security officers with a bird’s eye view and can immediately understand what data is actively scanned, where sensitive data resides and how it moves through data pipelines (i.e. ETL or ELT).

Classification Insights

Purview Data Insights Classification Overview

In Purview, classifications are similar to subject tags, and are used to mark and identify data of a specific type that’s found within the data estate during scanning. Classifications are matched directly, such as a social security number, which has a classification of Social Security Number.

Beyond classifications, sensitivity labels enable you to state how sensitive certain data is in the organization. For example, a specific project name might be highly confidential within the organization, while that same term is not confidential to other organizations.

Sensitivity Labels Insights

In contrast to classifications, sensitivity labels are applied when one or more classifications and conditions are found together. In this context, conditions refer to all the parameters that you can define for unstructured data, such as proximity to another classification, and % confidence.

Purview uses the same classifications, also known as sensitive information types, as Microsoft 365. This enables the organization to extend its existing sensitivity labels across the organization’s Azure Purview assets. Microsoft Identity Protection (MIP) sensitivity labels are created and managed in the Microsoft 365 Security and Compliance Center. To create sensitivity labels for use in Azure Purview, you must have an active Microsoft 365 E5 license.

Purview Data Insights Sensitivity labels insights

Since sensitivity labels are defined in Microsoft 365, a prerequisite is to extend the organization’s Microsoft 365 sensitivity labels to assets within Azure Purview.

Purview Data Insights Sensitivity labels insight detail view

Azure Purview Process Overview

To successfully implement and provision Azure Purview, there are four primary tasks:

  1. Register Data Sources – Azure Purview provides a cloud-based service into which you can register data sources. During registration, the data remains in its existing location, but a copy of its metadata is added to Azure Purview, along with a reference to the data source location. The metadata is also indexed to make each data source easily discoverable via search and understandable to the users who discover it.

In-Region Data ResidencyAzure Purview does not move or store customer data out of the region in which it is deployed.

  1. Enrich the Metadata – After registration of a data source, its metadata can be enriched. Either the user who registered the data source or another user in the enterprise adds the metadata. Any user can annotate a data source by providing descriptions, tags, or other metadata for requesting data source access. This descriptive metadata supplements the structural metadata, such as column names and data types, that’s registered from the data source.
  2. Data Catalog Discovery – Discovering and understanding data sources and their use is the primary purpose of registering the sources. Enterprise users might need data for business intelligence, application development, data science, or any other task where the right data is required. They use the data catalog discovery experience to quickly find data that matches their needs, understand the data to evaluate its fitness for the purpose, and consume the data by opening the data source in their tool of choice.
  3. Examine Data Estate Insights – Insights are one of the key components of Purview. The feature provides the organization a single pane of glass view into their catalog and further aims to provide specific insights to the data source administrators, business users, data stewards, data officers, and security administrators.

Many organizations have started their data governance journey by developing individual solutions that cater to specific requirements of isolated groups and data domains across the organization. Although experiences may vary depending on the industry, product, and culture, most organizations find it difficult to maintain consistent controls and policies for these types of solutions.

Identify Data Governance Objectives and Goals

Microsoft recommends identifying objectives and goals in the early phases. Some of the common data governance objectives include:

  • Maximizing the business value of your data
  • Enabling a data culture where data consumers can easily find, interpret, and trust data
  • Increasing collaboration amongst various business units to provide a consistent data experience
  • Fostering innovation by accelerating data analytics to reap the benefits of the cloud
  • Decreasing time to discover data through self-service options for various skill groups
  • Reducing time-to-market for the delivery of analytics solutions that improve service to their customers
  • Reducing the operational risks that are due to the use of domain-specific tools and unsupported technology

The general approach is to break down those overarching objectives into various categories and goals.

Once the organization agrees on the high-level objectives and goals, expect many questions from multiple groups. It is crucial to gather these questions in order to craft a plan to address all of the concerns. Some example questions that should be addressed during the initial phase include:

  • What are the main organization data sources and data systems?
  • For data sources that are not supported yet by Purview, what are my options?
  • How many Purview instances do we need?
  • Who are the users?
  • Who can scan new data sources?
  • Who can modify content inside of Purview?
  • What process can I use to improve the data quality in Purview?
  • How to bootstrap the platform with existing critical assets, glossary terms, and contacts?
  • How to integrate with existing systems?
  • How to gather feedback and build a sustainable process?

While you might not have the answer to most of these questions right away, it can help the organization to frame this project and ensure all “must-have” requirements can be met.

Conclusion

Azure Purview can be a very valuable tool to assist your organization in getting a handle on all the data silos within your organization. However, it is important to properly plan and implement it to ensure maximum organizational value. In my decades of experience with multiple data platforms, but focused on Microsoft’s, Azure Purview is truly an exciting platform.

As an experienced Microsoft Gold Certified Partner with numerous Data and Cloud specialties and Advanced Specializations, Planet Technologies can assist organizations in planning, designing, and implementing your organization’s data modernization and transformation, all the while ensuring your team’s ongoing ability to not only maintain, but continue to refine the entire Data Estate. We do that by ensuring throughout all engagements industry leading knowledge transfer, so your organization can continue its maturation in managing your data.