Data Map: Collection Design

Purview collections

The collection design in purview is the only area where I would warn implementers to be be careful. Once assets are ingested, it is difficult to start over. If you start over, you will lose the work, if you scanned assets into the wrong collection, purview UI only lets you move or delete 50 items at a time. Imagine scanning a storage account and you consume 8000 assets. Yes, you can programmatically mitigate this and there are tools you can leverage, but in the end, this is the one place that its difficult to do again.

Designing a collection in Microsoft Purview centers around these key considerations:
  • Define data steward access and set clear permission boundaries.
  • Decide where data sources will be registered and how assets are distributed.
  • Identify data owners and stewards responsible for managing and curating information.

Microsoft recommends an access-based structure when setting up collections. Alternative approaches—such as organizing by use case or department—will be covered in the Governance Domains section. This guide is split into two parts. First, we’ll cover how to create a collection in Purview and assign roles (you can skip this if you’re already familiar). Then, we’ll explore key design strategies and business workflows. Finally, we’ll highlight platform limitations to be aware of.
  1. Creating a Collection
  2. Recommended Roles and Assignments for Collections
  3. Collection Design and Considerations
  4. Caveats and Platform Gaps

Collection Design and Considerations

Effective collection design in Microsoft Purview is critical—not just for following best practices, but due to platform limitations. For example, you can only delete around 50 assets at a time. Now imagine registering an ADLS container that adds 8,000 files to a single collection. The structure below is based on proven designs and can be tailored to mirror either organizational hierarchies or business functions. This design helps answer key questions such as:

  1. How do I separate production, development, or regional divisions?
  2. Where should I register my sources and scanned assets? (See our scanning guide for best practices.)
  3. How do I assign data owners and stewards to manage specific domains?

This setup is your foundation. As you move into governance domain layers, you’ll be able to abstract and design around specific use cases. At the collection layer, the focus is access-based governance: Who has permission to edit or delete metadata in a domain? Who should respond if PII is discovered and questions arise? By assigning clear ownership to data stewards and domain leads, this model supports scalable, domain-driven governance. It also enables a repeatable process—new business units can be added as domains, complete with access controls and stewardship responsibilities.

Creating a Collection


The following is just an example of where to create the collection structure. you can go to the data map and under domains, if you have the collection admin role, you will be able to start creating the collection structure
Datamap > Domains > New collection: Enter the Collection name and administrator
When you select the collection, you will be able to choose roles and assign access to this.
If you ever need to delete the collection, know that it will not let you unless you delete every object in that collection, this includes Assets, Data Sources or even scans. If, you use an asset in a data product, it wont let you delete it and it will tell you its part of a data product.

Best Practices: Minimum Roles for a Data Owner in Their Domain

A Data Owner is responsible for the trust, visibility, and quality of data within their business domain. At a minimum, they should be granted:

  • Collection Admin – So they can delegate access, manage sub-collections for functional or regional data segmentation, and oversee structural governance.
  • Data Source Admin – To configure, kick off scans, and refresh metadata as needed without relying on central IT. This is vital for ensuring timely metadata updates and autonomy.
  • Data Curator – So they can enrich asset metadata with business context, such as glossary terms, sensitivity labels, owners, and classifications, ensuring data is well understood and governed.
  • Data Estate Insights Viewer – To monitor scan coverage, data classification adoption, and glossary term usage across their domain, enabling proactive data quality and stewardship.

By assigning all four roles, you’re empowering Data Owners to:

  • Scan and Refresh Their Own Assets – Reducing dependencies on the central team while maintaining governance boundaries.
  • Curate Their Data with Business Context – Enhancing discoverability and trust in the unified data catalog.
  • Monitor Health and Governance Adoption – Using Data Estate Insights to assess progress and identify gaps.
  • Control Delegation and Access – Managing their own collections and teams efficiently, especially in federated environments.

Caveats and Things to Understand

The roles you assign here are access based for a reason. if you give data curator, they will be the only ones to curate data, regardless of where the asset is used. As an example, later on when creating data products, sure, they can group assets as needed but when they want to review the asset and edit it, only the collection permissions will reflect over the asset, denying others the ability to modify it. In the reverse, if you add data curator, but later use this asset in multiple data products, many different domains may want to edit this and have access to. the thought is, set the needed permissions and only those who have rights, can edit the asset in the collection.