Not all data catalogs are created equal

jrineakter · Post by **jrineakter** » Tue Feb 11, 2025 3:49 am

For your migration to deliver a true upgrade, you need an enterprise data catalog with certain key capabilities.

A fully complete data catalog is essential to a successful migration, and to catalog all your legacy data, you need an enterprise data catalog with a collector that can be pushed onto on-premises systems and send what it finds back to the cloud. On-premises collectors are paramount because more often than not, the data you want to migrate comes from legacy non-modern, non-cloud, on-premises data sources. Without on-premises collectors, you will be missing out on all the on-premises data that needs to be migrated.

You also need a data catalog built on a knowledge graph, which enables you to catalog and understand whatever type of data you discover during your migration, and that shows you how it relates to your other data. A knowledge graph-based data catalog provides your data model with limitless extensibility, allowing it to grow to include resources and relationships from proprietary and legacy systems, which may have not been defined before your migration, without costly and time-consuming infrastructure changes. Without this capability, you may not understand the legacy landscape of your organization, potentially leading to uninformed decisions on what needs to be migrated.

Many of data.world and Snowflake’s shared cambodia whatsapp number data customers collect, catalog, and understand infinite amounts of disparate data by embracing this powerful functionality.

A data catalog keeps your cloud migration agile
Once your on-premises data is cataloged, you can figure out what data is most important, what data is of the highest business value, and what data sees the most use. And from there, you can create a prioritized backlog of resources to migrate, then iterate through the backlog in an agile manner.

You should prioritize your data using a two-by-two matrix. The axes are high value, low value, high complexity, and low complexity.

.
Start by identifying high-value data. How to do this? Focus on the importance of business use cases; what are the most visible pain points? Which business users are complaining most about slow data delivery or critical broken dashboards?

Next, identify low-complexity data so you can focus on low-complexity, high-value data to start, moving on to more complex data after you’ve shown quick success and value to your team. By demonstrating momentum, your business leaders will feel more secure in your organization’s investment in a data catalog, and be more inclined to support future data governance initiatives.

Using the broken dashboard as an example, your enterprise data catalog’s automated lineage viewer—powered by a knowledge graph—lets you understand which data sources inform it; these are the data you should be prioritizing for unraveling, cleaning, and migrating. With any luck, the node in your knowledge graph that represents the dashboard is receiving data from a few, easily viewed and understood data sources, represented by edges. If so, you can consider this “low complexity” data.