Information Governance efforts within today’s organizations have to deal with potentially tens of thousands of data elements. Our experience within banking shows that it is not uncommon to find 20,000+ uniquely named data elements across the organization when looking both at production systems, and the proliferation of end-user computing tools or spreadsheets. It turned out that 15,000+ were pure synonyms (e.g. the many ways that something like Accrued Interest or Outstanding Principle can be spelled and stored). We found that the remaining 5,000 were comprised of computed measures (e.g. total income, net profit, total capital) or actual raw, discrete pieces of data (e.g. invoice total, customer number).
While the numbers and mix between real, computed, and synonyms will vary by company and industry, the fact remains that it is impossible to effectively do governance at this level. A classification structure, or taxonomy of the data and information, needs to be defined and implemented. We use the term subject areas of information to represent such a classification structure. Subject areas represent similar types of data or information that can be further classified based on type, business intent, security, etc. Classification of content into subject areas is more of an art than a science. There is a needed balance between the degree of precision, and maintainability of the governance stakeholder assignment.
Classification structures are governance metadata components, which form the foundation of the Information Governance Architecture. This diagram shows very basic subject areas that are used to classify the various types of trades associated with an investment bank. blog 031016Subject areas are built with categories and then associated into the needed grouping for uniqueness. Criteria for classification by industry would be whitepapers on its own.
Effective Information Governance, or Data Governance, requires the ability to assign governance roles and responsibilities at the subject area level, and then enable the traceability from the subject area down to the plethora of data, synonyms, and homonyms (the 20,000+). The Dewey Decimal System was established to enable the public to find publications (155M books in the Library of Congress alone) by using approx. 1000 classes. Following this model, we have defined and implemented classification systems that work across finance, insurance, and aspects of healthcare that enable Information Governance at a workable level. We take this one step forward and have created technology-enabled metadata linkages that enable connectivity and traceability from this governance matrix down to the actual data.
Through our blogs and whitepapers we intend to expand upon the components needed for successful and sustainable Information Governance and Data Governance efforts. In this article we point out that is it not possible to govern at the detail data levels (e.g. the 20,000+). Yet governing at the level of abstraction of a classification system without connectivity to the actual data and information content is also doomed for failure. There needs to be governance connectivity and traceability from the business to information (logical information view to data assets as implemented in IT systems and end user computing programs. Without this linkage, real governance of information assets, i.e. the “establishment of policies, and continuous monitoring of their proper implementation) cannot be accomplished. Herein lies the primary purpose of an Information Governance Architecture and metadata (business rule) engine.