Quick Take: The world of Data Management is becoming exposed and books like this one are a great starter guide for practitioners to understand what goes into initiating a Data Governance program. There’s no secret sauce or magic and that’s mostly the point.
Detail Review: There was once a time when people didn’t have enough information. Now there is too much of it. And in a few years we’ll supposedly have smart appliances and talking toasters. Well, maybe not talking, but data is becoming more ubiquitous.
Over the last decade you’ve probably been on vacation and asked “is there a good pizza place around here?” and a friend responded “according to Google, there are 8 pizza places within 5 miles of here.” You picked up the phone and called one but the number was no longer in service. Being persistent, you tried another, ordered a large pepperoni and got it 30 minutes later. Unfortunately, crackers with ketchup would have tasted better.
Companies like Google are working on this, but these were two examples of poor data quality. And data quality is a data management issue. In the case above, the phone number being out of service could be because the pizza place is closed or it could be incorrect phone digits. Not sure. The taste, or lack of, is shows a failure in relevancy – “is there a good pizza place around here?” is a two part question.
The author, Sunil Soares,
is an IBM Director in the Software Group. He has worked with over 100 clients across multiple industries and has years of consultant experience. I don’t know him, but I’ve worked with a coworker of his, Doris Saad. She did a wonderful job with extending a data governance model with an IBM flavor.
Back to the book. The aesthetics are decent. It’s a paperback consisting of 125 pages of content and another 28 of appendix material. The font is average size and the construction of the chapters is typical of a business book – bullets and concise paragraphs. The front cover is a washed out blue with the illustration of the Unified Process on it.
The introduction is by another IBM lead, Steven Adler. He provides an example of a time he wanted to apply for a refinance. He completed the forms but there was an error with the type of loan. There was no way to deal with the mistake except to start over, which he did. This small classification issue resulted in much more rework – missing forms, open quotes, and back and forth communication. These are the type of inefficiencies a good data management programs help with. I like my pizza example better
Being a governance person, I especially like how early in the book he frames up the role of governance. Many people believe it’s about policing decisions i.e. exceptions. But it’s about getting stakeholders to make decisions. Soares states:
“Data Governance is the discipline of treating data as an enterprise asset. It involves the exercise of decision rights to optimize, secure, and leverage data as an enterprise asset. It involves the orchestration of people, process, technology, and policy within an organization, to derive the optimal value from enterprise data. Data Governance plays a pivotal role in aligning the disparate, stovepiped, and often conflicting policies that cause data anomalies in the first place.”
I also liked this line”
“Treating data as a strategic enterprise asset implies that organizations need to build inventories of their existing data, just as they would physical assets.”
The reason is because it’s hard to manage what you can’t count. If you don’t have an inventory then how will know if things have changed. It seems so obvious, but it isn’t. Making a concept like data tangible is vital to getting everyone on board.
He validates this point by offering some great questions during the Govern Analytics chapter.
- How many users do we have for our data, by business area?
- How many reports do we create, by business area?
- Do the users derive value from these reports?
- How many report executions do we have per month?
- How long does it take to produce a new report?
- What is the cost of producing a new report?
- Can we train the users to produce their own reports?”
- Would a BI Competency Center help?
Additional questions I add are:
- Are new data generated by analysts?
- Is the new data reincorporated back into the operational processes?
- Are the reports sensitive? How is access to the data handled?
And page 15 offers this realistic picture of why data governance often fails:
“Most organizations with stalled Data Governance programs identify these symptoms:
- “The business does not see any value in Data Governance.”
- “The business thinks that IT is responsible for data.”
- “The business is focused on near-term objectives, and Data Governance is considered a long-term program.”
- “The CIO cut the funding for our Data Governance department.”
- “The business reassigned the data stewards to other duties.”
Once you’ve gotten your bosses on board with doing Data Governance, it’s time to identify an approach. Soares has a IBM Maturity Model (below). It’s not a bad one. I’ve designed a few different governance related maturity models and I like this one because it eschews the levels and goes with relationships.
- Data Risk Management and Compliance is a methodology by which risks are identified, qualified, quantified, avoided, accepted mitigated, or transferred out.
- Value Creation is a process by which data assets are qualified and quantified to enable the business to maximize the value created by data assets.
- Organizational Structures and Awareness refers to the level of mutual responsibility between business and IT, and the recognition of fiduciary responsibility to govern data at different levels of management.
- Stewardship is a quality-control discipline designed to ensure the custodial care of data for asset enhancement, risk mitigation, and organizational control.
- Policy is the written articulation of desired organizational behavior.
- Data Quality Management refers to methods to measure, improve, and certify the quality and integrity of production, test, and archival data.
- Information Lifecycle Management is a systematic, policy-based approach to information collection, use, retention, and deletion.
- Information Security and Privacy refers to the policies, practices, and controls used by an organization to mitigate risk and protect data assets.
- Data Architecture is the architectural design of structure and unstructured data systems and applications that enables data availability and distribution to appropriate users.
- Classification and Metadata refers to the methods and tools used to create common semantic definitions for business and IT terms, data models, and repositories.
- Audit Information Logging and Reporting refers to the organizational processes for monitoring and measuring the data value, risks, and effectiveness of data governance.
From here the book dives into each one of these areas with specific actions that need to happen. I noted a few below.
Ultimately, I view this book as a good asset for getting started with Data Governance work. Howe
ver, it lacks some real best practices beyond suggesting the use of certain IBM tools. Governance is as much about getting people to compromise as it is about whether the metrics are in a red or green status. A playbook outlining the tasks won’t help in the relationships and politics that this often boils down to. Is the pizza good? It just depends on who you ask.
Page 38: This paragraph is critical. The nuance of it can go unheeded.
“It is important to recognize that a “1” rating is not inherently bad, and a “5” rating is not necessarily good. The Data Governance organization had to work with IT and business stakeholders and (preferably) develop a business case to determine whether it is feasible to increase the rating for a given category in the desired future state.
Page 42: I consider a charter to be pretty self explanatory, but the reality is it isn’t. This is a good recap.
“The Data Governance charter is similar to the Articles of Incorporation of a corporation. The charter spells out the primary objectives of the program and its key stakeholders, as well as roles and responsibilities, decision rights, and measures of success.”
Page 42: The break down of the Data Governance structure is pretty good too.
“The optimal organization for Data Governance is a three tier structure. The Data Governance council, at the pinnacle of the organization, includes senior stakeholders. At the next level down, the Data Governance working group consists of members who are responsible for governing data on a fairly regular basis. Finally, the data stewardship community had day-to-day, hands-on responsibility for data.
“Here are some of the responsibilities of an executive sponsor:
- Have ultimate responsibility for the quality of data within the domain
- Ensure the security and privacy of all sensitive data, such as PII and PHI, within the domain
- Appoint data stewards with day-to-day responsibility for dealing with the data quality, security, and privacy issues within the domain
- Establish and monitor metrics regarding the progress of Data Governance within the domain
- Collaborate with other executive sponsors in situations where business rules collide, to ensure that the enterprise continues to derive maximum value from its data
“When a data stewardship program reaches maturity, the data steward should report into the business. At this point, it is important to ensure that there is a some level of oversight across all the data stewards, to ensure a consistency in roles and responsibilities and to develop a sense of community.”
Some commentary, the notion of a community is important. This data culture change is not just a top down manifest. You need to get everyone, especially projects, viewing data differently than they have been.
Page 95: There is a good example of a business rule which establishes which record is authoritative.
“Fortunately, that is where the rules of data survivorship come into play. The Data Governance rules of survivorship state that life insurance is the best source for birth date because that information determines premiums. Similarly, homeowner’s insurance is the best source for address information because that data is directly tied to the entity being insured.”