Archive for the ‘Data Governance’ Category
Monday, November 14th, 2011
Ajilitee’s newly-released “Ultimate Guide to Data Governance Metrics for Health Payers” dives into 30+ data governance metrics from a quantitative and qualitative vantage point. But today, I’m exploring a subset topic, which is how to measure the success of a Data Governance Council initiative.
We’ve said this often, but it bears repeating: data governance programs tend to fall short of expectations because they wind up as tactical data quality initiatives that address accuracy and consistency in silos. They also lack an effective governing body to manage data ownership, lineage and accountability across the enterprise.
We believe that establishing a Data Governance Council is the key to transforming a data governance program into real business value. An informed and active Data Governance Council will tackle inaccurate, inconsistent and incomplete data holistically through policies and cultural change at the leadership level. And just as a data governance program establishes metrics on its data quality and performance measurements on the data stewards, it’s equally as important to set performance goals for the Data Governance Council.
In the Health Payer space, metrics are driven by corporate drivers and key performance indicators. Typical drivers include the following:
- Cost avoidance and cost containment
- HIPAA, Privacy and/or regulatory compliance
- Fraud detection
- Constraints management
- Products and Plans time-to-market
A Data Governance Council composed of upper management will be engaged and committed if they are presented with demonstrated successes that address these drivers. This includes continuous data quality that the governance program helps ensure is embedded upstream rather than performed sporadically downstream. Also realized are the benefits of improved transparency, audit-ability and data lineage, which are essential to compliance with government regulations such as HIPAA.
During its instantiation, the Data Governance Council should be reminded that they are part of this group to serve as proactive change agents. Therefore, this ability to be change agents should be measured. To that end, we recommend these five key metrics to measure the Data Governance Council and its members:
- METRIC 1: Advocacy success measure.
- Getting each Council member to recognize that their role is not a passive one. To remain on the Council, they are expected to be “data integrity proselytizers” – e.g., identifying a steward for their line of business, and speaking at their team meetings about the new policies, progress and changes, and so forth.
- METRIC 2: Meeting success measure.
- Demonstration of commitment. This can be accomplished by an early vote to have a policy that a Council member could and would be “disinvited” for lack of attendance.
- METRIC 3: Each Council Member must bring a Data or Process Issue request to the Council.
- Demonstration that the Council member understands what is an appropriate process and/or data issue that warrants attention from the DG Council. They must be willing to push skeletons in their own business areas in front of their peers for resolution.
- METRIC 4: Number of Policies Established.
- Enterprise Policies serve as the basis for prying systemic data issues away from the silo-minded lines of business. In the first years, typical policies include defining the list of governed data elements; approving Unique Identifier data elements (e.g., Unique Provider, Unique Institution, Unique Member); establishing USPS Address Standardization; conforming Provider Specialty Taxonomy to CMS labels.
- METRIC 5: Maturity Model measure.
- The Data Governance Council should demonstrate proficiency in their role before tackling the more complex topic of a Data Governance 5-Year Maturity Model. But by the end of Year 1, the level of progress on the Maturity Model should be set and tracked for each succeeding year.
At each Council meeting, we advocate reviewing each of these Scorecard metrics. Everyone sees the contributions of their colleagues. It’s important to have this level of visibility and openness – peer pressure works wonders! And we stress not to be “locked-in” to a particular set of members. It’s not uncommon to realize that another representative needs to be added or someone cannot make the necessary commitment and should be replaced.
Finally, measure the business impact of a Data Governance Council and then publish results on an enterprise data governance internal website or Sharepoint. This demonstrates the commitment to improved data integrity at the highest executive ranks.
Tuesday, October 25th, 2011
I was sharing an “October” beer with John Bair (our CTO) last night and I was struck by a few thoughts in our conversation.
Back in college, I always enjoyed reading books by Thomas Kuhn. In 1962, Kuhn authored the book, The Structure of Scientific Revolutions. The book described how “paradigm shifts” occur in the scientific world. The core thought was that new ideas do not always immediately take root and become the new norm. For example, once Einstein proposed relativity, while it was certainly very exciting, it took many confirmations and an extended amount of time for the theory to be accepted. There has to be issues in the existing paradigm to cause people to question the current theories. It takes time to adopt the new, improved theory. A paradigm shift is rarely immediate.
At the same time, I remembered Steven J. Gould’s punctuated equilibrium (in evolutionary biology) theory that came out in 1970s. In contrast to the idea that evolution was gradual, punctuated equilibrium said that sometimes large, infrequent events shift the slow-moving, evolutionary path. So, the key message is that things may be moving along, something big happens, and suddenly you are in a whole new world.
Well, perhaps data governance is like that. Perhaps data governance is really an underutilized organizational process. When I talk to clients about data governance, the conversation invariably turns to the different aspects of data such as data is missing! Data is dirty! Only the business knows the really business rules! There are data errors!
That’s certainly all good and fun topics to talk about and those conversations can consume the entire day. However, there is another aspect of data governance we think is important. Its about program management.
We think of data governance as composed of two tracks: the “data” part of the data governance program and the program management part of the data governance program. The program management part is often the most overlooked. Yes, there is a data governance steering committee and yes there is a “leader” of the overall daily effort who reports to that committee. But the program management aspect of the data governance program is really a management process not unlike other governance programs such as IT portfolio management or “strategic projects” governance. For example, IT governance often helps with priorities, decisioning, budgeting, resolving resource issues, helps communicate to other parts of the organization and bundles scope to form projects/programs.
In the data governance space, we think people often forget this important aspect. A few areas of issues we have observed include:
- Funding: The data governance program should act as a forum for obtaining funding either directly through itself or through other funding mechanisms such as integrating into other projects or proposing in other governance forums.
- Bundling: Data governance should maintain a list of issues and smartly bundle those into projects to be funded and executed. Either direct data governance funding or other business/IT funding could be used.
- Resourcing: For example, perhaps more training is needed for stewards. What resource can help with that task? Do we need to hire a consultant? Do we need to have metrics in place to track attendance and participation? Does HR need to get involved?
- Communicating: The senior people on the committee need to use their organizational influence to help keep the data governance agenda front and center in other parts of the organization.
- Statusing: Perhaps the data governance program needs rejuvenation, how do you get it back on track? The daily governance lead should be identifying program issues that remain unsolved and escalate them for guidance. There should be “program” type status each month in addition to just talking about data.
We think there are issues today that cannot be easily addressed by the traditional style data governance implementation. Changes are needed. The issues we see and listed above are starting to push the boundaries of the current model. Perhaps it’s time for a paradigm shift.
Data governance can be used as a way to manage funding for data-related programs. It can be used to do more than just discuss daily data issues. Its time for data governance to evolve. Instead of always lingering on just the data. Its time for a landslide to happen and make data governance a real management force.
If a data governance program is only focused on data, it is probably too local to act as a problem solving capability. Lets change it. Let’s take the traditional data governance program and make it something more relevant. Let’s ensure that the program aspects of data governance can help fund, help bundle and help communicate to the organization.
So what we have is an old thing, like data governance, playing a new role, becoming more relevant to the business and seeding innovation. We have seen issues in the current “theory” that need to be handled–so lets change the current theory of data governance and ensure that the new model also emphasizes program management. With the paradigm shift in play, lets start a landslide to kill off the old data governance programs and disrupt the equilibrium.
Ajilitee can help you do that.
Data governance is the new pink.
Thursday, September 8th, 2011
In my last blog, I discussed the use of Data Governance Metrics to measure program success. That blog included a list of 17 typical quantitative metrics. This blog will venture down a much fuzzier path, the use of qualitative metrics to measure DG program success.
Quantitative metrics are gathered directly through the observation and measurement of data. There is a high degree of transparency and a direct correlation between action and outcome with quantitative metrics. Analytical types like me are easily convinced of the value of Data Governance (DG) using quantitative metrics. Fix the data, increase the quality score.
Qualitative metrics, on the other hand, are not so transparent. Connecting the dots can be challenging, since the points of data capture for qualitative metrics are often two or more degrees of separation from the data. That is, a DG project may address (and fix) the quality of the data, but the measurement of success is not the data that was remediated, but some other outcome like a lift in compliance, healthier customer satisfaction survey results, an increase in industry standard scorecards, and so forth.
Qualitative metrics fit into a number of general categories, including compliance, industry ratings, customer satisfaction measures, and business opportunity, amongst others. Successful programs identify metrics meaningful to both middle management and executive leadership.
Compliance – Data governance programs are well positioned to support data-related compliance efforts. Data Governance guides the implementation of controls to document, institute and monitor compliance with data-related regulations. Cross-functional teams established by DG can look for opportunities to drive cost out of compliance efforts. These regulations include Sarbanes-Oxley, Basel I, Basel II, and, HIPAA. Compliance requires formal business and management processes to govern the impacted data subject areas that the DG Council can help manage.
Industry ratings –Data Governance can help improve industry ratings such as HEDIS and NCQA scores. HEDIS enables clients to notify members and providers of the need to obtain necessary services through multiple pathways. Proactive identification of HEDIS care gaps can be directed to DG participants for discussion and targeted improvements. NCQA scores depend in large part on the availability and customer satisfaction with clinical services. The DG team can help address core data that feeds HEDIS and NCQA scores to help drive a lift in overall patient experiences and customer satisfaction scores.
Customer satisfaction levels – Most companies devote huge amounts of resources to track, measure, socialize and lift customer satisfaction levels. Collection of these metrics can be costly and complex, but need to be considered essential for survival. Commonly used sources include classic data like phone surveys, customer comment cards, and focus groups, to more recent entrants like blogs, Facebook and Twitter. Collection and analysis of these metrics over a period of time will help gather knowledge of exactly how consumers feel about your products and services and can be linked to improvements wrought by DG efforts.
Business Opportunities – The policies and controls implemented by DG help organizations identify direct and indirect business opportunities. Correct and current data drives significant and direct impact to the business. Proposals that carry assumptions based on poor data quality will be at high risk of under or over-bidding. Common themes among the external regulations center on the need to manage risk. The risks can be financial misstatements, inadvertent release of sensitive data, or poor data quality required to drive key decision making.
Sample Qualitative Metrics – Some of the qualitative measures are easier to substantiate than others. For example, you may find it challenging to quantify increased collaboration between teams. You may struggle to link new control standards and policies to increases in customer satisfaction scores. Stay the course! Each DG team needs to define and measure some indirect or qualitative metrics that indicate program success. Brainstorm on the categories in the table below as a starting point for discussion.
There is no standard body of qualitative metrics. Each organization creates their own metrics based on needs, culture, industry, data availability, and so forth. Here are some representative metrics I have used in the past to measure qualitative DG program success in the health care industry.
| Category |
Metrics |
| Compliance |
- Percent of users with access to the PHI data. Access to PHI must be restricted to only those employees who have a need for it to complete their job function.
- Number of times data within business critical systems changed or erased in an unauthorized manner
- Decrease in risk or cost of regulatory fines
- Decrease in latency associated with delivery of compliance data
|
| Industry Ratings |
- Contributions to improvement in NCQA report cards
- Improved capture ability of HEDIS
|
| Customer Satisfaction Measures |
- Survey results showing greater collaboration between internal departments
- Percent increase or decrease in customer satisfaction survey index
|
| Collaboration / Improved Productivity |
- Percent of times DG council detected and eliminated redundant intra- or inter-departmental projects / initiatives
- Number of projects that adopted the enterprise logical data model without creating one from scratch
- Number of redundant systems eliminated to create a single definition of customer, product, or other widely shared master data
|
| Business
Opportunity / Risk |
- Business opportunities gained due to better data quality
- Business opportunities lost or misaligned due to questionable data quality
- Increase in precision of analytics and forecasting gains from improved data quality
- Increase in competitive analytics due to data availability and data quality improvements
|
In my next blog, I’ll explore how to design and implement Data Governance Metrics as foundational building blocks for a successful Data Governance program. If you’ve already walked this path and have some learning to share, let me hear from you.
– Jim Van de Water contributed to this blog.
No Comments
Category Blog, Dambaru Jena, Data Governance, Jim Van de Water | Tags: Basel I, Basel II, data governance, HEDIS, HIPAA, NCQA, Qualitative Metrics, Quantitative Metrics, Sarbanes-Oxley,
Friday, August 12th, 2011
You made a business case for building a Data Governance organization to solve your Business Intelligence challenges, got your funding, and hired a couple of seasoned consultants to help design the governance strategy, shaped a DG council and stewardship organization, and transcribed policies and procedures to put DG in motion. A couple of years have gone by and you are still facing those key challenges. What happened?
Is your data governance program working? How do you know? Are the sponsors happy? Are the analysts working more effectively? How do you know when the effort has paid big dividends? As Business Intelligence practitioners, we attach metrics to all sorts of activities as best practices. Shouldn’t capture of success metrics at program inception and on an ongoing basis be a best practice for data governance as well?
Many organizations don’t consider the need for metrics to measure the long-term effectiveness of their Data Governance program. Don’t feel badly – if you haven’t quantified DG success, you’re not alone. Companies in all industries struggle to capture relevant and meaningful DG metrics. Thankfully, there is no shortage of data that can help you to measure the impact and effectiveness of your Data Governance efforts. The key is to use Data Governance Metrics to measure and demonstrate program effectiveness.
What is Data Governance Metrics?
Data Governance Metrics is foundational to measuring the success and effectiveness of a Data Governance program. Consider the set of metrics designed to measure the effectiveness of the Data Governance. Like other business performance measures, metrics should be managed and tracked at all levels.
What defines a good Data Governance Metric?
Prior to defining the DG metrics, you should understand the key characteristics of a reasonable DG metric and then explore how to map those characteristics to the measurable aspects of data governance. Metrics should be specific, measurable, attainable, realistic and timely. The following list of questions provides some guidance to jump-start the approach:
- Specific – Identify set of specific metrics that will measure the success of the DG program
- Measureable – Clearly defined, simple to understand and easy to measure
- Actionable – easy to capture, realistic and practical, and quantifiable.
- Realistic – Does the metric have business relevance?—i.e. defined within a business context that explains how the metric score correlates to improved business performance
- Timely – A timely goal is intended to establish a sense of urgency and measure over a period of time to analyze the trend.
The acid test for each metric is whether it is clearly defined, capable of measurement, and directly relevant to improving program effectiveness. The Data Governance Council will be a good judge, and they should review and approve all metrics. Metrics are created by either DG Councils or data stewards with input from data analysts.
Key Data Governance Metrics
Data Governance metrics should be identified and tracked prior to implementing the Data Governance program to baseline performance. You will want to capture these metrics periodically and store in a table or database to review progress with the metrics over time.
Commonly measured dimensions of Data Governance include completeness, consistency, timeliness, and uniqueness, although the range of possible dimensions is limited only by the ability to provide a method for measurement. Metrics can be composed of directly measured rules or more complex metrics that are defined as weighted averages of collected scores. Metrics from a business case or ROI for the DG program proposal can provide a starting point.
At a high level there are two broad categories of DG Metrics:
- Quantitative Metrics – Data centric Metrics that can be measured as hard benefits like savings in manpower or operational cost savings, etc.
- Qualitative Metrics – Metrics that measure soft benefits like improved customer satisfaction survey results, increased industry standard data quality scores, etc.
Listed below are some sample Quantitative Metrics I have used in the past to measure the program effectiveness for healthcare clients.
| Category |
Metrics |
| Accuracy |
- % of time match-merge logic needs manual interventions
- % of returned mail due to incorrect address causing reshipments and lost business – Is it going down after implementing DG program?
|
| Completeness |
- % of Provider addresses that are accurate
- % of Member addresses that are filled with required data elements
|
| Consistency |
- % of time data conforms to business rules/policy
- % of data values that conforms to the code sets/domain values
|
| Accessibility |
- % of Critical Data Elements(CDE) identified by the DG council are available to business users
- % of time sample queries completed within the SLA defined by DG council
|
| Uniqueness |
- % of records having a unique primary key
- % of records having duplicate member or provider records
|
| Compliance |
- Number of regulatory noncompliance data issues with HIPAA, PHI policy
|
| Efficiency |
- % of key operational processes that achieved X% of improved efficiency
- X% of operational costs reduced after implementing the DG
- Number of DQ issues taken up the DG Council
- Number of DQ Issue resolved
|
| Timeliness |
- Time between when information is expected and when it is readily available for use
- % of time data load completed as per SLAs
- % of time users queries returned results per SLAs
- Is Data Warehouse uptime increasing over period of time?
|
In my next blog, I’ll explore indirect metrics as well as how to design and implement Data Governance Metrics as the foundational building blocks for a successful Data Governance program. If you’ve already walked this path and have some learning to share, let me hear from you. Stay tuned.
Monday, January 31st, 2011
I talk with many clients around their data warehouse programs. In some cases, projects run longer and cost more than expected. This is true for clients who have mature data warehouse development process as well as those with new capabilities. Why is estimation so hard? Let’s ask that question in the context of healthcare analytics and figure out why counting on “hope” is a bad way to run your data warehouse program.
The Path to Reliable Estimates is not Well-Defined
One reason that estimation is hard is because it is difficult to specify analytical requirements. Yes, I know it’s fairly easy to specify that you want a report that runs a cross-tab between admission types and paid claim amounts. But let’s look a little deeper at what’s necessary to get a valid report–the analytics underlying it. What definition do you use for admission types? Are they the same definitions across all of the groups? How do you calculate paid claim amounts–what do you exclude or include? Do you have all the data? While data governance standards can help you answer these questions, sometimes it’s the next level of detail that makes it difficult to define a path from a report request to the development effort to create that report–that analytical artifact.
The path to reliable estimation is not always well-defined because analytics, using best practices, tries to be predictive. When you use predictive modeling techniques, however, you do not always know what variables are important. For example, when you are forecasting utilization rates, do you want to take last year’s utilization rates and bump them up 10 percent? Why 10 percent? Maybe other factors are influential. Are patient demographics changing? Are providers cost-shifting procedures to achieve higher reimbursement rates in one area versus another? Perhaps governmental factors, such as the increase in Medicaid enrollees, is driving up utilization for basic services.
Because you cannot specify everything, you will need to address basic utilization forecasting before you start. That means you have to play with the data, understand data quality issues, understand proxies for measures, understand the information content of specific datasets. In other words, you do not know what you need and how you will need it until you solve the problem. That’s why it’s a not a tidy, well-defined path to trustworthy analytics underlying your estimates.. At the end of the day, however, predictive modeling that takes into account various factors is still a better way to forecast than guessing or hopeful thinking.
Just because something is not well-defined does not mean you should give up or just hope that the answer you give is the right one. As we know, “hope” is not a great way to run a company or a program. Here I’ll share some of my thoughts on a more studied approach to estimating.
Use Top-Down Estimation
The most basic approach to estimation that most companies use is top-down estimation. It comes in two forms.
The first form is by saying, “hmmm….my management judgment says it will take 4 weeks to gather requirements, 3 weeks for design and architecture, 6 weeks of build and unit test, 3 weeks of systems integration testing, 2 weeks of UAT and 1 week to deploy. Okay…that’s about right.” That’s management judgment at work. This approach is actually fairly good but is highly variable and not easily reproduced consistently across different managers. This method is good at capturing organizational dynamics–for example, the business is busy this month, so requirements will take longer, or the development team is busy on another release, so they need 2x the amount of time they would normally need. So it’s good and proper to use this form of top-down estimation.
The second form of top-down estimation uses look-alike comparsions. Project X was really HARD and it took 6 months. Project Y is of the same order of complexity, so therefore it’s 6 months as well +/- 2 weeks. This approach takes into account high-level, structural complexity. For example, I have to master the members (uniquing). or I have to build an organizational view of providers, or I must consider some other aspect of complexity that can be identified at a high level. This is a good way to capture estimates for this type of information.
But top-down estimation is not enough. While it can be good at times and eerily accurate, it’s not always reliable, scalable across many projects and managers, or consistent. Hoping that top-down answers are right is not a strategy. A strategy is a purposeful creation that management should address with purposeful action. In some cases, these top-down estimates are guesses, and guesses are not why I pay managers to work for me. I do not expect perfect estimates, but I expect estimates that I can learn from organizationally, and continually refine during the life of the project. I need estimates that make me better even if they are wrong today.
Use Bottom-Up Estimation
Bottom-up estimation is the second major approach to creating work estimates. Bottom-up estimates do involve management judgment and analytical models and estimation of complexity. But they do so in a reproducable, structured way:
What goes into a bottom-up estimate?
- How many tables are being staged from a source to a staging database, then to an operational data store, then to a data warehouse?
- How many ETL programs must be written assuming the organization’s standard architecture and design for ETL?
- How many sessions with users are needed to gather requirements?
- How many reports are to be created? Are they well understood?
- How many new servers must be procured, installed and configured?
- How many data model changes, dimensions or facts are needed?
- …other factors in your estimating model
Work units represent fundamental steps in data warehousing needed to properly move data from one place to the other in a way that satisfies business requirements. There are many different levels of bottom-up estimation, but generally you need these types of details, in some form, to create a large spreadsheet to perform estimation. For each item, find a count or use a parametric model to convert the count into work effort. Those conversion factors represent your productivity factors.
You cannot always get all of the data in the exact form you want. However, you can detail your estimates and submodels in the bottom-up calculation. If you don’t have the number of tables in the sources, you can estimate based on organizational average, say 50 or 20 or 30. You can model the average number of attributes per table. You can state your assumption for the number of requirements and reports. All of this can be stated and most importantly, recorded, in a spreadsheet or estimating tool that captures the assumption and allows you to change it as needed, throughout the duration of the project.
It‘s the ability to be explicit with your assumptions and use an explicit model that makes a good bottom-up estimate.
Are bottom-up estimates sometimes wrong? You bet. But through a few iterations, the models can be become significantly better more quickly. In addition, you can do things you could not do before, such as engage multiple vendors and understand their assumptions, use a disciplined number of development steps, the SDLC, built into the estimate, which again forces standardization across potentially diverse sets of suppliers.
Putting it All Together
So both top-down and bottom-up estimation are needed and useful. BOTH must be performed to arrive at an estimate that can be triangulated from different points of view. Somewhere in the middle is the work estimate you can use to start your project. Hope is not a strategy when it comes to data warehouse projects–use analytics to help you estimate and get better over time–in a way, that’s the whole point of analytics.
Putting it together also means that you need to run the top-down and bottom-up model across all projects at the same time. This allows you to identify duplication or other types of common costs that can be more efficiently solutioned.
How good can your estimates be? For small-to-large and simple-to-complex projects you can achieve good estimates with 5-10% of actuals. I’ve always considered a good estimate to be one that is within 5% of the actuals, especially for analytical projects. Your target tolerance may vary but you’ll be using a methodology that employs the analytics you are building for others.
What strategies do you use to develop reliable estimates for your data warehousing project work efforts? Let me hear from on this interesting topic.
No Comments
Category Agile Analytics, Blog, Data Governance, Data Integration, Data Warehousing, Gregory Lampshire | Tags: bottom-up estimation, data governance, Data warehouse, ETL, healthcare analytics, predictive modeling, top-down estimation, utilization forecasting,