Ajilitee

Archive for the ‘Gregory Lampshire’ Category

Big data technologies in healthcare insurance (payers): nosql and MDM–part 1

Friday, April 20th, 2012

We were working with a client around Fraud, Waste and Abuse (FWA) recently and we needed to clean up the client’s Provider data to help us track longitudinal changes in fraud behavior. Some of the published reports suggest that FWA accounts for, minimally, $21B in Medicare payments that never should have been made. That’s a lot of money. I’ll blog more on FWA, analytics and how Ajilitee could help you under our Managed Analytics service offerings, but in this blog I wanted to focus on just one small aspect of managing Provider data at Payer companies.

Many healthcare insurance companies are not known for rapid innovation. But times have changed. The need to manage members using Customer Relationship Management (CRM) technologies has greatly increased and many Payers have started their CRM efforts. But, these CRM technologies have been in use other industries for 20 years.  The technologies needed to support CRM analytics and member point of care/point of service interactions are different then what many Payer organizations have in place today.

Where do bigdata technologies play? We have been asked that question here at Ajilitee and as part of working on our FWA products and internal R&D innovation, we have been using and looking at these technologies for awhile.

Bigdata technologies span many areas and have interesting names: hadoop, hbase, hive, pig, nosql, mahout and more. Nosql and hadoop in particular are core technologies that other layers build on. For example, hive and pig build on hadoop. Hadoop itself can build on nosql technologies. I will not repeat all of these concepts in this blog because a lot of content has already been written on these technologies. From here on, I’ll assume you are familiar with some of these terms or look them up easily.

Guiding Thoughts

Here’s a couple of principles that can help guide the conversation:

  • Deep insights: The more you understand your domain specific problem, the easier it is to adapt new technologies to solve them in novel ways or to recognize trade-offs you are implicitly making with current technologies. You can find novel uses that solve a problem in different ways. Think Geoffrey Moore and disruption.
  • Delamination: By rethinking the layers of technologies you use today, its possible to pick and choose the layers and how they combine together to create new solutions. Many nosql packages do not include compute engines within their data access and storage solution. So solutions such as hadoop must be layered on top of the nosql databases in order to obtain compute processing. There are a lot of variations to this thought but just think “delamination” can be helpful around innovation.
  • Do something different: Unless you are really, really smart, a great way to learn about how to use the new technologies is try something and fail multiple times to solve a problem.  This learning by doing is key to innovation. Of course, you have to learn from your mistakes each time—failing by itself is pointless.

Scanning the bigdata Technical Landscape

There are many new technologies in the bigdata world. Let’s look at nosql. Many people question whether the technologies are mature enough for production use and whether they help them solve business problems faster, cheaper or better in some way.

Bigdata and nosql conversations usually start with explaining issues found in the world of managing and serving massive amounts of data needed for websites. But the technologies seem confusing at times and begin to wonder if they apply to our problems especially those in the healthcare world.

Nosql technologies are often quoted as having the following properties:

  • No sql: This means there is no sql!
  • Schema-less: There is no schema!
  • Eventually consistent: Forget ACID. Forget transactions. Eventually your data is consistent.
  • Fault tolerant, scalable, distributed: A whole number of really great architectural characteristics that sound good but you are not always sure apply to you.

Once you start looking at the bigdata technologies you are immediately struck by the fact that:

  • With some nosql technologies you actually do define a schema and indicate what a column’s type and name is. Some nosql technologies also need to know how to sort columns so they need knowledge of how to compare keys or values. That sounds like a schema!
  • With nearly all bigdata technologies, someone has already written a [insert technology name here] Query Language to you can run queries. This sounds like everyone still wants *SQL.
  • With some nosql technologies, you have to specify properties such as coherency, which indicates that you can get the value you just wrote back out of the database. While not the full definition of ACID, it starts sounding close!
  • There is a lot of parallelism everywhere. The file system is parallelized! Don’t forget that the job flow is parallelized as well! (assuming it fits a specific data parallel processing model). Everything is about scale-out—add more nodes and everything keeps running, the system stays up, and everyone has fast, guaranteed access!
  • There appears to be several interfaces into the database some of which require programming in a language you may not be familiar with e.g. not SQL! How do you even load data?! You almost feel like you are programming at the lowest level of database programming possible. Wait a second, which layer are you programming? The filesystem level? The map-reduce level? Or both?

This seems very confusing. So let’s think through the issues. We want to avoid having everything look like a nail because we have a bigdata hammer.

Healthcare Example

In the Payer space, Master Data Management (MDM) is finally becoming a component of the business and architectural landscape.  In the Payer world, MDM means managing Providers, Members, Contracts and Products and other business entities that you can often touch and feel or that are really considered non-claims data.  Other types of Payer data include “event” data. This is data generated by interactions with Members from sales, service and care oriented interactions. Of course, there is also claims data. Claims data is the largest source of data, followed by “event” data then MDM. Are some of the bigdata technologies relevant for even the small datasets such as MDM datasets? Small in this case means a few million rows and typically much less. Of course, this links back to our FWA problem statement at the beginning of the blog and the need to clean up our Provider list to perform Fraud analysis. We’ll illustrate some of our points with industry specifics.

  • Schema-less:
    • The NPEES Provider list from CMS changes multiple times a year. The list contains all of the Medicare Providers and some, but not all, of their demographics.  The columns of data change although not frequently. New providers are added or removed as appropriate. Provider data inside a Payer typically originates in several systems–sometimes up to 20.  So it would seem that a technology that claims to be “schema-less” would be useful. But schema-less does not mean that you do not need to specify the data types of the data.  You have to specify the format somewhere so external tools can use the data. The NPEES file has several sub-entities in it like addresses and other codes indicating the Provider’s specialty or whether the Provider is an individual or an organization. Shouldn’t we pull these sub-entities out and make them their own table? Shouldn’t we also try to specify where and how the data should be loaded to be efficiently accessed, perhaps by using table partitions, or striped volumes or other typical database designs? These are normal database design concepts.
      • Part of the value of being schemaless is that you tend to concentrate data together into denormalized structures and use it to answer a smaller set of business questions such as “what data changed between NPPES files each month?” And the data you load may be very dirty, so lets load it all as strings, then convert the data in the database itself. We don’t have to work to hard to specify types, but we must specify some. We can also ignore doing detailed table design because most nosql database are designed to scale out. Bigdata can help us push aside this operational complexity.
      • MDM data changes over time. For example, you may choose to append one external data vendor’s Provider demographic data one year, then switch the vendor the next. That’s a whole new set of data structures in the traditional world. Being schemaless allows us to manage data changing over time without having to reload or re-baseline to achieve acceptable performance. Hence, you can evolve the schema more easily and that’s a great reduction in operational complexity.
      • No-sql: We clearly need to write a query to determine what changed between different NPPES file releases. We have to write a query. The entire claim of nosql must be false! The answer is more subtle than that. The claim of nosql is really one of not having many characteristics of traditional RDBMS databases built into the database layer. For example, you will not see nosql databases implementing referential integrity through sql statement such as foreign keys, etc. You do have to specify primary keys for some nosql database just to help with managing the data.  In fact, many systems today, whether a data warehouse or a transactional system, actually implement integrity in the processing layer above the database these days. This is neither wrong nor right, but just where it is often happening. Hence by saying its nosql, you are really saying that the data architecture is one where the data is more concentrated, where integrity is implemented in a processing layer and not the database, and where the data access interface makes as few assumptions about the data as possible in terms of its structure.
        • There is often another implication of schemaless that is less often recognized. Because the nosql database essentially delaminate the database stack to some degree. Mathematical processing occurs outside the layer.. While nosql creates uniform access performance by keeping the interface simple and scaling out, it also does not allow computations to be automatically pushed down to an individual node for parallel processing. That’s where hadoop steps in. By teasing apart the computing part from the data access layer, you have to now choose where computing occurs. In the case of hadoop, that processing can occur on a node where the data lives (there is a Cassandra+Hadoop integration layer) or you can process the data controlling using uniform access performance to avoid overloading the compute server. This also means there are really not any stored procedures in nosql databases.
        • Eventually Consistent: In our specific case, eventually consistent is fine. Since we are loading the data and deduping and cleansing it initially, that’s not a big deal. But…
          • Let’s also think through the case of Provider MDM. In an enterprise MDM system, all transactional systems should reference, in real time, the MDM system to obtain authoritative data when it needs it. The MDM data should be consistent. However, even in Payers today, the MDM data is not immediately consistent. There is an acceptable lag between one transactional system authoring some master data and another system in being able to access it. Typically, the lag is a few hours or a day or a few days. In fact, if we look at nosql databases like Cassandra, its quite possible to improve the time to consistency at a significant lower cost structure. For example, a social media site can tune its consistency which means it can tune how fast you can see your new “friends” post. You will also want to tune the consistency you want for MDM and scale it up or down. You can do this in nosql technologies without incurring additional development time or complexity all using the same database. That’s huge and compelling. Because enterprise MDM makes the MDM system an operational imperative, you have almost immediately solved some very vexing architecture problems at an incredible inexpensive cost point.
          • Cool architecture: The previous bullets have already pointed out the need for scale and robustness so I will not repeat that here. But an additional thought may be worth pointing out. If you think about the data access patterns, where the architecture is concentrated to have an MDM hub that selves up authoritative data to many, many consuming applications (many reads, few writes) and this all happening in relatively real-time, then scaling and fault-tolerance are actually key. In the MDM vision landscape, cool architecture is actually really important and your MDM hub does look more and more like a website serving up data. You need a transactional OD.  Its also important to realize that you can only take advantage of the cool architecture if the other parts of your architecture are also simplified. Your mileage may vary if all you are doing is plugging in new technology into the same old landscape without any changes anywhere.
            • Because a cool architecture can be delaminated, we have to plan for how computations (queries) will be executed. You cannot automatically have the computations pushed down to a node without using hadoop or something similar. Otherwise all computation and IO gets throttled on the node you issue the query from. That’s one are of review and choice you have to think through and one place where hadoop, hive and pig try to help you think through. Other database engines that distribute the data and computations might make this much easier but you may have to make other architecture choices to use those tools. Think deeply and carefully about cool architecture.

So based on some deeper thinking, it appears that the bigdata and nosql world can offer something of value even for a Provider MDM problem which seems like an ill-fit to begin with.

Summary

It appears that if the problem you are trying to solve is important enough to use these other technologies, there is some benefit to using them in the right mix and in the right proportions to your existing architecture. They are viable and based on our experience at Ajilitee, can be made production ready. In some cases, they can dramatically reduce operating complexity despite their seemingly lack of maturity around tooling. In many Payers, reducing operating complexity is a huge win.

In the next blog we will demonstrate learning by doing using bigdata technologies on larger Provider datasets and common healthcare processing analytical patterns. I’ll also return to the FWA theme.

As a treat, John Bair our CTO is speaking at TDWI’s Cool BI Forum in Chicago on May 8. He’ll be talking about these technologies and how they can help you. His talk is based on direct experiences from building products and solutions for our clients.

Thomas Kuhn, Steven J. Gould and Why Data Governance is the New Pink

Tuesday, October 25th, 2011

I was sharing an “October” beer with John Bair (our CTO) last night and I was struck by a few thoughts in our conversation.

Back in college, I always enjoyed reading books by Thomas Kuhn. In 1962, Kuhn authored the book, The Structure of Scientific Revolutions. The book described how “paradigm shifts” occur in the scientific world. The core thought was that new ideas do not always immediately take root and become the new norm. For example, once Einstein proposed relativity, while it was certainly very exciting, it took many confirmations and an extended amount of time for the theory to be accepted. There has to be issues in the existing paradigm to cause people to question the current theories. It takes time to adopt the new, improved theory. A paradigm shift is rarely immediate.

At the same time, I remembered Steven J. Gould’s punctuated equilibrium (in evolutionary biology) theory that came out in 1970s. In contrast to the idea that evolution was gradual, punctuated equilibrium said that sometimes large, infrequent events shift the slow-moving, evolutionary path. So, the key message is that things may be moving along, something big happens, and suddenly you are in a whole new world.

Well, perhaps data governance is like that. Perhaps data governance is really an underutilized organizational process. When I talk to clients about data governance, the conversation invariably turns to the different aspects of data such as data is missing! Data is dirty! Only the business knows the really business rules! There are data errors!

That’s certainly all good and fun topics to talk about and those conversations can consume the entire day. However, there is another aspect of data governance we think is important. Its about program management.

We think of data governance as composed of two tracks: the “data” part of the data governance program and the program management part of the data governance program. The program management part is often the most overlooked. Yes, there is a data governance steering committee and yes there is a “leader” of the overall daily effort who reports to that committee.  But the program management aspect of the data governance program is really a management process not unlike other governance programs such as IT portfolio management or “strategic projects” governance. For example, IT governance often helps with priorities, decisioning, budgeting, resolving resource issues, helps communicate to other parts of the organization and bundles scope to form projects/programs.

In the data governance space, we think people often forget this important aspect.  A few areas of issues we have observed include:

  • Funding: The data governance program should act as a forum for obtaining funding either directly through itself or through other funding mechanisms such as integrating into other projects or proposing in other governance forums.
  • Bundling: Data governance should maintain a list of issues and smartly bundle those into projects to be funded and executed. Either direct data governance funding or other business/IT funding could be used.
  • Resourcing: For example, perhaps more training is needed for stewards. What resource can help with that task? Do we need to hire a consultant? Do we need to have metrics in place to track attendance and participation? Does HR need to get involved?
  • Communicating: The senior people on the committee need to use their organizational influence to help keep the data governance agenda front and center in other parts of the organization.
  • Statusing: Perhaps the data governance program needs rejuvenation, how do you get it back on track? The daily governance lead should be identifying program issues that remain unsolved and escalate them for guidance. There should be “program” type status each month in addition to just talking about data.

We think there are issues today that cannot be easily addressed by the traditional style data governance implementation. Changes are needed. The issues we see and listed above are starting to push the boundaries of the current model. Perhaps it’s time for a paradigm shift.

Data governance can be used as a way to manage funding for data-related programs. It can be used to do more than just discuss daily data issues. Its time for data governance to evolve. Instead of always lingering on just the data. Its time for a landslide to happen and make data governance a real management force.

If a data governance program is only focused on data, it is probably too local to act as a problem solving capability. Lets change it. Let’s take the traditional data governance program and make it something more relevant. Let’s ensure that the program aspects of data governance can help fund, help bundle and help communicate to the organization.

So what we have is an old thing, like data governance, playing a new role, becoming more relevant to the business and seeding innovation. We have seen issues in the current “theory” that need to be handled–so lets change the current theory of data governance and ensure that the new model also emphasizes program management. With the paradigm shift in play, lets start a landslide to kill off the old data governance programs and disrupt the equilibrium.

Ajilitee can help you do that.

Data governance is the new pink.

ACOs and aggregated data: Payer analytics comes to ACOs

Thursday, September 22nd, 2011

I was reading through the the 42 CFR 425 Medicare Program: Medicare Shared Savings Program: Accountable Care Organizations yesterday in response to a client asking us about our Managed Services. Specifically thinking about II(C)(2) and data sharing.

The rule discusses data sharing between ACO participants and the government.  Generally, the gist of the following few sections suggests allowing the Secretary to share claims information on Medicare beneficiaries both at the point of enrollment as well as on an ongoing basis. Why?

First, having claims data at the point of enrollment helps ACOs satisfy other parts of the rule around creating personalized care plans, helping with care coordination and other changes in care processes that need to be made for that beneficiary–all focused on patient centered principles. The data could be personally identifiable of course.

Finally, on an ongoing basis, the data would be provided in detailed as well as aggregated form to help ACOs understand the total universe of care. Since an ACO may not actually provide all care services for a beneficiary (although the goal is to do as much as possible), obtaining beneficiary level claims data helps ACOs understand the complete picture. The rule goes on to describe how this data would be used in aggregate and in detail (Section 5 and 6):

  • Financial peformance modeling
  • Utilization management
  • Clinical management
  • Quality reporting
  • Care management/Care coordination

For example, if the ACO’s beneficiary population has a high rate of readmission, then a program could be put in place to improve discharge coordination to reduce readmissions.

Looking over the intended uses, none of which are new per se, it struck me that a  lot of Payer claims data analytics is now transitioning to the Provider community. Since ACOs are inherently about Providers, not Payers (which is the primary reason an ACO is not an HMO), the ACOs now need to do Payer analytics.

That’s what Ajilitee is really good at. And we can do this on a Managed Analytics basis so we can help ACOs rapidly get to market on ACO analytics at the beneficiary level. Most importantly, since we build our analytics environments on could computing platforms like Amazon, you only pay for what you need and it can grow to the size you want as the ACO grows.

That’s what struck me about Managed Analytics and the client conversation. With Ajilitee, Managed Analytics has a great promise to deliver capabilities that Payers have today cost efficiently to the Provider community immediately tomorrow.

Blues IM Symposium around the corner

Friday, September 16th, 2011

It’s September again and the Blue Cross Blue Shield Information Management (IM) Symposium is upon us September 25-28 in Dallas.  This year’s conference will bring 140+ IM executives together to explore a range of topics across IM, including best practices in data warehousing, information architecture, business intelligence, informatics and analytics.

Ajilitee is proud to be a Platinum sponsor at this year’s event, the highest level possible. We founded Ajilitee 18 months ago with the idea that exceptional execution and results coupled with innovative (dare I say agile!) thinking would help us win and keep clients.

We’ve had a great year so far and are serving some really great clients, one of whom will join us on the podium at IM Symposium.  Our client at Horizon Blue Cross Blue Shield of New Jersey will co-speak with my colleague and Ajilitee CTO John Bair on “Driving Immediate Value Through a Phased MDM Program.”   Ajilitee founder Tina McCoppin also will co-speak with Horizon BCBS in a breakfast talk, “Establish a Data Governance Council for Better, Faster Decision Making.”  Both case histories demonstrate how agile thinking leads to agile information solutions.

If you are in the Blues and attending, please stop by and say hello to our great team we’ll have there including: Tina McCoppin, John Bair, Diann Bilderback, David Grice, Michael Pooley, and Paul Vosters.  I’ll be there as well.

Are system integrators poised to play a new role as analytics innovator?

Wednesday, May 18th, 2011

We were briefing Gartner analysts John Hagerty and Kurt Schlegel on Ajilitee strategy, plans, and client work. We shared with them our belief that markets, especially mid-markets, are moving towards a world of managed analytics–analytics that are performed by third parties like Ajilitee and which are tuned to a specific business process and contracted through a managed services agreement.

Our sister division Discovery Health Partners (DHP) is a great proof-point on how managed analytics can deliver a fully outsourced business process in a SaaS model, in this case for healthcare payers who want to recover claim overpayments and subrogate third-party liability. The DHP Intelligent Cost Containment platform runs completely on Amazon Web Services. We also shared that Ajilitee will deliver the entire information management stack to one healthcare company and will provide process-specific reference data to support business rule processing in “smart” claims processing to a second—both using Amazon’s cloud.

Near the end of the conversation, Gartner posed some questions for us. Do we think that traditional system integrators, such as the large IT consultancies, are all poised to become a new type of analytics provider to companies?  Well, we think the answer is no, and of course, not surprisingly, it’s also a yes. Here’s why.

NO

On the one hand, systems integrators have been trying to integrate analytics all along as part of building an annuity revenue stream and as an extension of their efforts to integrate applications and business process. In neither case have these efforts changed the game. Let me explain.

First of all, large consultancies have thrived on large packaged-application implementations: SAP, Oracle and the like.  These are huge, complex projects that typically span years, continents, and literally hundreds of millions of dollars. This is revenue they can count on, year after year—the concept of a revenue annuity. Instead of selling a new consulting project each year, which is a lot of work, why not have something locked in for a multi-year contract?

The packaged-application projects have almost always been about improving, standardizing, centralizing or otherwise improving business processes. To answer client demands in these multi-year commitments, the large application vendors have spent millions, perhaps billions, developing footprint extensions in the analytics space to try and help clients understand the business process metrics as well as the end results of the business process, e.g., how much money did you make last quarter?  Some packages have even integrated more advanced data mining components. My old company, Thinking Machines, had a data mining product called Darwin that was sold to Oracle for just that reason.

So the idea of analytics closely coordinated with the business process software has always been an area of interest, an intention. However, while the analytics packages are okay, they are not always best-of-breed or useful in standalone situations. They are often considered footprint extensions of the main ERP application–they are not a focus.

Secondly, system integrators have always tried to build “solution sets” representing preintegrated business process solutions.  After all, system integrators are in a great position to understand the nuances of a client’s business process and inject analytics at the right place, in the right way, using the right tools. For years, going back to the time that I was at PricewaterhouseCoopers (PwC), producing these solution sets was a goal. Since I was in the management analytics (data mining) practice at the time, the idea was to produce analytical products that could be resold. We did not have a lot of tools, infrastructure was always hard to get allocated, and generally there were logistical issues that prevented us from realizing our dream.  The solution sets didn’t get built.

So, in one sense, the answer is no, system integrators are not a new type of analytics provider in the market. They have been trying to do this all along–for the annuity revenue and as an extension of what they had always been working on with applications and the business process. Analytics has been on the agenda for awhile.

YES

Alright, because we are also consultants, we also have to say yes. Here’s why.

System integrators are also a new analytics provider to companies because the tools and technologies, whether open source or commercial, are now becoming available at a scale that allows a system integrator to create, evolve and manage complex applications more easily and cheaper.  In short, creating analytic solutions has become a viable business for them.

Several enablers make this possible. Cloud computing has greatly enhanced the capability to create sandboxes, analysis environments and scalable analytical production systems, like scoring engines. Open source and commercial analytics software have more flexible licenses or architectures–there is probably even a hadoop app for your iPhone coming out soon that taps into your top-10 friends’ smartphones. And of course, the web has opened new analytical output delivery mechanisms that were not envisioned a decade ago. Taken together, these evolutions make it easier to deliver results.

So there is also a “yes” answer to the question because, at long last, new technologies allow a system integrator to engage in this area in a way that makes economic sense to the clients.

Inside client organizations, challenges may be creating more opportunities for system integrators to step in as analytics partner.  Analytical talent is still a scarce commodity and is shifting overseas.  Creating analytics capabilities inside a company has been harder than clients envisioned, given budget, skillsets, and focus. At the same time, time to market, the speed of delivery, required to make an improvement in a company’s top or bottom line has increased.  Data integration and data management, which are core competencies to power industrial-sized analytics—is hard work.

Considering Gartner’s question makes me smile. Yes, system integrators are making major noise in this space, and they are buying some companies here and there and forming internal groups to harvest their clients’ intellectual property and produce products. While they haven’t succeeded on a large scale in the past, things may be different in our evolving environment.

Ajilitee is already in this space and we are innovating.  We do this in the healthcare market today where security and privacy are paramount and we do it on a large scale across multiple clients. We do it with cloud computing technologies and our information management stack.  We do it because we have the business and technical talent to advise clients on how to make analytics work in their business process. We do it because we passionately believe that this approach helps our clients compete and improve their results. That’s what we are about.