What is Active Metadata?
The information that is used to represent data is known as metadata. In other words, we can say that metadata is the summarized data that leads us to detailed data. In terms of data warehouse, data lakes and analytics, we can define metadata as follows:
- Metadata sets the roadmap for your informational data stores
- Metadata defines the objects within the data stores
- Metadata acts as a catalog that helps the helps navigate to the right information
- Metadata is the terminology, taxonomy, and ontology about the data.
There are three categories of metadata:
- Business − The business metadata in a data acquisition, data integration, and analytics domain are primarily focused on the business glossary, common definitions, data governance, stewardship, taxonomy, and ontologies. Capturing the common vocabulary, definitions and formulas in a company information flow are key components of a solid information governance program.
- Technical – It includes database system names, table and column names, and sizes, data types and allowed values. Technical metadata also includes structural information such as primary and foreign key attributes and indices.
- Operational − It includes currency of data, statistics of the data loads, data lineage. The currency of data captures the counts of loads, active, archived, or purged. The lineage of data means the history of data migrated and transformation applied to it.
In the world of analytics that federates corporate data assets across varied technologies, collecting and utilizing metadata becomes even more critical to the success of any organization. A significant reason why data lakes quickly turn into data swamps is because of the lack of managing their metadata.
For data-driven companies that process analytical data in different data stores across different locations; such as but not limited to, cloud, on-premise, social media, storage, streaming, operational systems, data warehouse, data marts, analytical environments the business of managing the metadata is crucial for success.
The Metadata Process
The role of metadata is different from the data, yet it plays an important role. Some roles of metadata are explained:
- Acts as a catalog
- This catalog helps the decision support system to locate the data contents
- Helps in decision support system for mapping of data when data is transformed
- Translates the summarization between current detailed data and highly summarized data
- Also translates summarization between lightly detailed data and highly summarized data
- Documents the mapping for the extraction and loading tools
- Leveraged in the query and reporting tools
- Utilized in the data quality process
- Instrumental for troubleshooting
- Permits agile development lifecycles.
Once you understand how metadata plays in each of these roles then you truly understand the value of collecting real-time metadata and ensuring your development lifecycle incorporates this information in each delivery. In fact, many data projects fail or fought with erroneous processes as the metadata is not exposed, understood or actioned.
Business Challenges
- Business Glossary
There are many tools in the market that collect business glossary, but an island of business metadata that is not integrated to the information flow or to the technical metadata limits the understanding of corporate data and its potential use. Most companies simply manage the business glossary in separate spreadsheets and are not integrated to the overall data governance and development processes. Bringing that business metadata and combining it with the technical metadata will create a powerful paradigm.
- Technical Metadata
The technical metadata management tools require a retrospective extract to collecting their metadata. These tools at best play a passive role in the development lifecycle and the effectiveness of these tools is not fully realized during the development and support processes. Also, these standalone metadata management tools, do not keep up with the varied technologies, databases, and processes in the data management space. Often corporations do not have the depth of resources and capital to manage a centralized metadata management team and processes.
- Operational Metadata
The market requires pro-active tools that automatically collect the operational statistics as part of the production processes. This should be a proactive part of the architecture and not an afterthought. To effectively incorporate operational metadata into the development lifecycle the collection and management process must be captured and acted on early in the analytical phases.
From a data governance perspective, the above three groupings of metadata are instrumental in managing a solid data foundation. In particularly, by identifying the owner of the data (or stewardship) and the interrelationship of that data asset to other classes of data becomes a powerful leverage. Transparency of information can only be accomplished if the technical metadata is fully captured and exposed. This builds the necessary lineage to explain the path the data took from the original source to target analytical models.
Data stewardship, transparency, data accuracy, data quality all rely on this metadata to be managed efficiently and accurately.
How A2B Data™ Collects and Mines Metadata
A2B Data™ actively collects, manages and acts on your enterprise metadata. The product’s core functionality proactively collects, processes and manages data from this metadata. Because of this, A2B Data™ creates error-free code, mitigates project risk and enforces transparency as the metadata is always managed. More importantly, you now have a tool that integrates and reports on your metadata for the enterprise data collection services.
By utilizing A2B Data™ as the data staging, data integration, and ingestion tool, not only do you get the data where, when and how you want it, but you also collect and actively utilize the metadata about the data. Your data integration processes become proactive as
- The collection services extract the source-to-target information (at database, table, columns and row level)
- The extract and load design patterns are model driven with descriptive metadata
- The impact analysis of the data types is managed in A2B Data™ Metamodel
- Lineage reporting from source-to-target
- Transaction log and Error logs feed
- Runtime and grow statistics are captured.
The technical metadata in the data integration and analytics domain, in general, is information about data; its definitions, its use, its original source, taxonomy and transformation rules applied to the data.
Advantages of A2B Data™
- Core functionality and user interface is driven by metadata
- Collects the technical and operational metadata for any source and any target
- Exposes the metadata for end-user reporting, metadata mining and impact analysis
- Offers an export mechanism to download the metamodel to any repository
- 24/7 support services are streamlined as a direct result of mining the metadata