Ajilitee

Archive for the ‘Information Management’ Category

Big data technologies in healthcare insurance (payers): nosql and MDM–part 1

Friday, April 20th, 2012

We were working with a client around Fraud, Waste and Abuse (FWA) recently and we needed to clean up the client’s Provider data to help us track longitudinal changes in fraud behavior. Some of the published reports suggest that FWA accounts for, minimally, $21B in Medicare payments that never should have been made. That’s a lot of money. I’ll blog more on FWA, analytics and how Ajilitee could help you under our Managed Analytics service offerings, but in this blog I wanted to focus on just one small aspect of managing Provider data at Payer companies.

Many healthcare insurance companies are not known for rapid innovation. But times have changed. The need to manage members using Customer Relationship Management (CRM) technologies has greatly increased and many Payers have started their CRM efforts. But, these CRM technologies have been in use other industries for 20 years.  The technologies needed to support CRM analytics and member point of care/point of service interactions are different then what many Payer organizations have in place today.

Where do bigdata technologies play? We have been asked that question here at Ajilitee and as part of working on our FWA products and internal R&D innovation, we have been using and looking at these technologies for awhile.

Bigdata technologies span many areas and have interesting names: hadoop, hbase, hive, pig, nosql, mahout and more. Nosql and hadoop in particular are core technologies that other layers build on. For example, hive and pig build on hadoop. Hadoop itself can build on nosql technologies. I will not repeat all of these concepts in this blog because a lot of content has already been written on these technologies. From here on, I’ll assume you are familiar with some of these terms or look them up easily.

Guiding Thoughts

Here’s a couple of principles that can help guide the conversation:

  • Deep insights: The more you understand your domain specific problem, the easier it is to adapt new technologies to solve them in novel ways or to recognize trade-offs you are implicitly making with current technologies. You can find novel uses that solve a problem in different ways. Think Geoffrey Moore and disruption.
  • Delamination: By rethinking the layers of technologies you use today, its possible to pick and choose the layers and how they combine together to create new solutions. Many nosql packages do not include compute engines within their data access and storage solution. So solutions such as hadoop must be layered on top of the nosql databases in order to obtain compute processing. There are a lot of variations to this thought but just think “delamination” can be helpful around innovation.
  • Do something different: Unless you are really, really smart, a great way to learn about how to use the new technologies is try something and fail multiple times to solve a problem.  This learning by doing is key to innovation. Of course, you have to learn from your mistakes each time—failing by itself is pointless.

Scanning the bigdata Technical Landscape

There are many new technologies in the bigdata world. Let’s look at nosql. Many people question whether the technologies are mature enough for production use and whether they help them solve business problems faster, cheaper or better in some way.

Bigdata and nosql conversations usually start with explaining issues found in the world of managing and serving massive amounts of data needed for websites. But the technologies seem confusing at times and begin to wonder if they apply to our problems especially those in the healthcare world.

Nosql technologies are often quoted as having the following properties:

  • No sql: This means there is no sql!
  • Schema-less: There is no schema!
  • Eventually consistent: Forget ACID. Forget transactions. Eventually your data is consistent.
  • Fault tolerant, scalable, distributed: A whole number of really great architectural characteristics that sound good but you are not always sure apply to you.

Once you start looking at the bigdata technologies you are immediately struck by the fact that:

  • With some nosql technologies you actually do define a schema and indicate what a column’s type and name is. Some nosql technologies also need to know how to sort columns so they need knowledge of how to compare keys or values. That sounds like a schema!
  • With nearly all bigdata technologies, someone has already written a [insert technology name here] Query Language to you can run queries. This sounds like everyone still wants *SQL.
  • With some nosql technologies, you have to specify properties such as coherency, which indicates that you can get the value you just wrote back out of the database. While not the full definition of ACID, it starts sounding close!
  • There is a lot of parallelism everywhere. The file system is parallelized! Don’t forget that the job flow is parallelized as well! (assuming it fits a specific data parallel processing model). Everything is about scale-out—add more nodes and everything keeps running, the system stays up, and everyone has fast, guaranteed access!
  • There appears to be several interfaces into the database some of which require programming in a language you may not be familiar with e.g. not SQL! How do you even load data?! You almost feel like you are programming at the lowest level of database programming possible. Wait a second, which layer are you programming? The filesystem level? The map-reduce level? Or both?

This seems very confusing. So let’s think through the issues. We want to avoid having everything look like a nail because we have a bigdata hammer.

Healthcare Example

In the Payer space, Master Data Management (MDM) is finally becoming a component of the business and architectural landscape.  In the Payer world, MDM means managing Providers, Members, Contracts and Products and other business entities that you can often touch and feel or that are really considered non-claims data.  Other types of Payer data include “event” data. This is data generated by interactions with Members from sales, service and care oriented interactions. Of course, there is also claims data. Claims data is the largest source of data, followed by “event” data then MDM. Are some of the bigdata technologies relevant for even the small datasets such as MDM datasets? Small in this case means a few million rows and typically much less. Of course, this links back to our FWA problem statement at the beginning of the blog and the need to clean up our Provider list to perform Fraud analysis. We’ll illustrate some of our points with industry specifics.

  • Schema-less:
    • The NPEES Provider list from CMS changes multiple times a year. The list contains all of the Medicare Providers and some, but not all, of their demographics.  The columns of data change although not frequently. New providers are added or removed as appropriate. Provider data inside a Payer typically originates in several systems–sometimes up to 20.  So it would seem that a technology that claims to be “schema-less” would be useful. But schema-less does not mean that you do not need to specify the data types of the data.  You have to specify the format somewhere so external tools can use the data. The NPEES file has several sub-entities in it like addresses and other codes indicating the Provider’s specialty or whether the Provider is an individual or an organization. Shouldn’t we pull these sub-entities out and make them their own table? Shouldn’t we also try to specify where and how the data should be loaded to be efficiently accessed, perhaps by using table partitions, or striped volumes or other typical database designs? These are normal database design concepts.
      • Part of the value of being schemaless is that you tend to concentrate data together into denormalized structures and use it to answer a smaller set of business questions such as “what data changed between NPPES files each month?” And the data you load may be very dirty, so lets load it all as strings, then convert the data in the database itself. We don’t have to work to hard to specify types, but we must specify some. We can also ignore doing detailed table design because most nosql database are designed to scale out. Bigdata can help us push aside this operational complexity.
      • MDM data changes over time. For example, you may choose to append one external data vendor’s Provider demographic data one year, then switch the vendor the next. That’s a whole new set of data structures in the traditional world. Being schemaless allows us to manage data changing over time without having to reload or re-baseline to achieve acceptable performance. Hence, you can evolve the schema more easily and that’s a great reduction in operational complexity.
      • No-sql: We clearly need to write a query to determine what changed between different NPPES file releases. We have to write a query. The entire claim of nosql must be false! The answer is more subtle than that. The claim of nosql is really one of not having many characteristics of traditional RDBMS databases built into the database layer. For example, you will not see nosql databases implementing referential integrity through sql statement such as foreign keys, etc. You do have to specify primary keys for some nosql database just to help with managing the data.  In fact, many systems today, whether a data warehouse or a transactional system, actually implement integrity in the processing layer above the database these days. This is neither wrong nor right, but just where it is often happening. Hence by saying its nosql, you are really saying that the data architecture is one where the data is more concentrated, where integrity is implemented in a processing layer and not the database, and where the data access interface makes as few assumptions about the data as possible in terms of its structure.
        • There is often another implication of schemaless that is less often recognized. Because the nosql database essentially delaminate the database stack to some degree. Mathematical processing occurs outside the layer.. While nosql creates uniform access performance by keeping the interface simple and scaling out, it also does not allow computations to be automatically pushed down to an individual node for parallel processing. That’s where hadoop steps in. By teasing apart the computing part from the data access layer, you have to now choose where computing occurs. In the case of hadoop, that processing can occur on a node where the data lives (there is a Cassandra+Hadoop integration layer) or you can process the data controlling using uniform access performance to avoid overloading the compute server. This also means there are really not any stored procedures in nosql databases.
        • Eventually Consistent: In our specific case, eventually consistent is fine. Since we are loading the data and deduping and cleansing it initially, that’s not a big deal. But…
          • Let’s also think through the case of Provider MDM. In an enterprise MDM system, all transactional systems should reference, in real time, the MDM system to obtain authoritative data when it needs it. The MDM data should be consistent. However, even in Payers today, the MDM data is not immediately consistent. There is an acceptable lag between one transactional system authoring some master data and another system in being able to access it. Typically, the lag is a few hours or a day or a few days. In fact, if we look at nosql databases like Cassandra, its quite possible to improve the time to consistency at a significant lower cost structure. For example, a social media site can tune its consistency which means it can tune how fast you can see your new “friends” post. You will also want to tune the consistency you want for MDM and scale it up or down. You can do this in nosql technologies without incurring additional development time or complexity all using the same database. That’s huge and compelling. Because enterprise MDM makes the MDM system an operational imperative, you have almost immediately solved some very vexing architecture problems at an incredible inexpensive cost point.
          • Cool architecture: The previous bullets have already pointed out the need for scale and robustness so I will not repeat that here. But an additional thought may be worth pointing out. If you think about the data access patterns, where the architecture is concentrated to have an MDM hub that selves up authoritative data to many, many consuming applications (many reads, few writes) and this all happening in relatively real-time, then scaling and fault-tolerance are actually key. In the MDM vision landscape, cool architecture is actually really important and your MDM hub does look more and more like a website serving up data. You need a transactional OD.  Its also important to realize that you can only take advantage of the cool architecture if the other parts of your architecture are also simplified. Your mileage may vary if all you are doing is plugging in new technology into the same old landscape without any changes anywhere.
            • Because a cool architecture can be delaminated, we have to plan for how computations (queries) will be executed. You cannot automatically have the computations pushed down to a node without using hadoop or something similar. Otherwise all computation and IO gets throttled on the node you issue the query from. That’s one are of review and choice you have to think through and one place where hadoop, hive and pig try to help you think through. Other database engines that distribute the data and computations might make this much easier but you may have to make other architecture choices to use those tools. Think deeply and carefully about cool architecture.

So based on some deeper thinking, it appears that the bigdata and nosql world can offer something of value even for a Provider MDM problem which seems like an ill-fit to begin with.

Summary

It appears that if the problem you are trying to solve is important enough to use these other technologies, there is some benefit to using them in the right mix and in the right proportions to your existing architecture. They are viable and based on our experience at Ajilitee, can be made production ready. In some cases, they can dramatically reduce operating complexity despite their seemingly lack of maturity around tooling. In many Payers, reducing operating complexity is a huge win.

In the next blog we will demonstrate learning by doing using bigdata technologies on larger Provider datasets and common healthcare processing analytical patterns. I’ll also return to the FWA theme.

As a treat, John Bair our CTO is speaking at TDWI’s Cool BI Forum in Chicago on May 8. He’ll be talking about these technologies and how they can help you. His talk is based on direct experiences from building products and solutions for our clients.

Breaking the Wall Between Business and IT

Tuesday, March 13th, 2012

“Mr. Gorbachev, tear down this wall!”

When the Great Communicator made that proclamation, a wall fell. Within months, the world became a different place.

In many organizations, a more resilient wall stands strong. Leveling the divide between business and IT is no mean feat. In fact, technology created and defined that divide. In somewhat of a twist of fate, technology has surprisingly become the enabler, the communicator. Technology available now can help drive collaboration in the business intelligence arena in ways we couldn’t imagine five years ago.

Data profiling is one area where tools get business and IT teams talking. My last blog discussed conducting joint data quality review sessions using data profiling tools as an accelerator. That approach enables a rich conversation between business and IT – a groupthink that creates a high degree of collaboration and trust between the parties that produce and consume data.

Similar collaboration tools are available for monitoring ongoing data quality, maintaining master data, building business glossaries, validating business rules, and other tasks. The tools leverage maturing technology and clever design to enable and enrich the tasks of stewards. These stewardship activities have been difficult to define and harder to implement – until now.

The clarion call here is not about simply deploying new technology – we’ve all been down that road before. Neither do our goals include making every business user a tool jockey. We’ve missed the point if we haven’t grasped the incredible attention to detail required to smooth over the business-IT divide. Tools used with patience and care gracefully handle these kinds of details. Our goal is to obscure the line between technology and business, while eliciting the responsibilities of stewardship.

Using such tools sets the organization into a virtuous cycle of continuous data quality improvement. The business users work with intuitive interfaces that mask the underlying complexities of data, master data and metadata. The tools simplify presentation of information, evaluation of options, and acceptance of inputs. IT gets the feedback needed from the business to improve data knowledge and quality. Players on both sides of the wall gain a deeper appreciation for the need to communicate effectively, while managing data as a corporate asset.

We can turn our organizations into entirely different places by leveraging these capabilities to systematically chip away the divide between business and IT.

This wall, too, is destined to fall.

Blues IM Symposium around the corner

Friday, September 16th, 2011

It’s September again and the Blue Cross Blue Shield Information Management (IM) Symposium is upon us September 25-28 in Dallas.  This year’s conference will bring 140+ IM executives together to explore a range of topics across IM, including best practices in data warehousing, information architecture, business intelligence, informatics and analytics.

Ajilitee is proud to be a Platinum sponsor at this year’s event, the highest level possible. We founded Ajilitee 18 months ago with the idea that exceptional execution and results coupled with innovative (dare I say agile!) thinking would help us win and keep clients.

We’ve had a great year so far and are serving some really great clients, one of whom will join us on the podium at IM Symposium.  Our client at Horizon Blue Cross Blue Shield of New Jersey will co-speak with my colleague and Ajilitee CTO John Bair on “Driving Immediate Value Through a Phased MDM Program.”   Ajilitee founder Tina McCoppin also will co-speak with Horizon BCBS in a breakfast talk, “Establish a Data Governance Council for Better, Faster Decision Making.”  Both case histories demonstrate how agile thinking leads to agile information solutions.

If you are in the Blues and attending, please stop by and say hello to our great team we’ll have there including: Tina McCoppin, John Bair, Diann Bilderback, David Grice, Michael Pooley, and Paul Vosters.  I’ll be there as well.

Are system integrators poised to play a new role as analytics innovator?

Wednesday, May 18th, 2011

We were briefing Gartner analysts John Hagerty and Kurt Schlegel on Ajilitee strategy, plans, and client work. We shared with them our belief that markets, especially mid-markets, are moving towards a world of managed analytics–analytics that are performed by third parties like Ajilitee and which are tuned to a specific business process and contracted through a managed services agreement.

Our sister division Discovery Health Partners (DHP) is a great proof-point on how managed analytics can deliver a fully outsourced business process in a SaaS model, in this case for healthcare payers who want to recover claim overpayments and subrogate third-party liability. The DHP Intelligent Cost Containment platform runs completely on Amazon Web Services. We also shared that Ajilitee will deliver the entire information management stack to one healthcare company and will provide process-specific reference data to support business rule processing in “smart” claims processing to a second—both using Amazon’s cloud.

Near the end of the conversation, Gartner posed some questions for us. Do we think that traditional system integrators, such as the large IT consultancies, are all poised to become a new type of analytics provider to companies?  Well, we think the answer is no, and of course, not surprisingly, it’s also a yes. Here’s why.

NO

On the one hand, systems integrators have been trying to integrate analytics all along as part of building an annuity revenue stream and as an extension of their efforts to integrate applications and business process. In neither case have these efforts changed the game. Let me explain.

First of all, large consultancies have thrived on large packaged-application implementations: SAP, Oracle and the like.  These are huge, complex projects that typically span years, continents, and literally hundreds of millions of dollars. This is revenue they can count on, year after year—the concept of a revenue annuity. Instead of selling a new consulting project each year, which is a lot of work, why not have something locked in for a multi-year contract?

The packaged-application projects have almost always been about improving, standardizing, centralizing or otherwise improving business processes. To answer client demands in these multi-year commitments, the large application vendors have spent millions, perhaps billions, developing footprint extensions in the analytics space to try and help clients understand the business process metrics as well as the end results of the business process, e.g., how much money did you make last quarter?  Some packages have even integrated more advanced data mining components. My old company, Thinking Machines, had a data mining product called Darwin that was sold to Oracle for just that reason.

So the idea of analytics closely coordinated with the business process software has always been an area of interest, an intention. However, while the analytics packages are okay, they are not always best-of-breed or useful in standalone situations. They are often considered footprint extensions of the main ERP application–they are not a focus.

Secondly, system integrators have always tried to build “solution sets” representing preintegrated business process solutions.  After all, system integrators are in a great position to understand the nuances of a client’s business process and inject analytics at the right place, in the right way, using the right tools. For years, going back to the time that I was at PricewaterhouseCoopers (PwC), producing these solution sets was a goal. Since I was in the management analytics (data mining) practice at the time, the idea was to produce analytical products that could be resold. We did not have a lot of tools, infrastructure was always hard to get allocated, and generally there were logistical issues that prevented us from realizing our dream.  The solution sets didn’t get built.

So, in one sense, the answer is no, system integrators are not a new type of analytics provider in the market. They have been trying to do this all along–for the annuity revenue and as an extension of what they had always been working on with applications and the business process. Analytics has been on the agenda for awhile.

YES

Alright, because we are also consultants, we also have to say yes. Here’s why.

System integrators are also a new analytics provider to companies because the tools and technologies, whether open source or commercial, are now becoming available at a scale that allows a system integrator to create, evolve and manage complex applications more easily and cheaper.  In short, creating analytic solutions has become a viable business for them.

Several enablers make this possible. Cloud computing has greatly enhanced the capability to create sandboxes, analysis environments and scalable analytical production systems, like scoring engines. Open source and commercial analytics software have more flexible licenses or architectures–there is probably even a hadoop app for your iPhone coming out soon that taps into your top-10 friends’ smartphones. And of course, the web has opened new analytical output delivery mechanisms that were not envisioned a decade ago. Taken together, these evolutions make it easier to deliver results.

So there is also a “yes” answer to the question because, at long last, new technologies allow a system integrator to engage in this area in a way that makes economic sense to the clients.

Inside client organizations, challenges may be creating more opportunities for system integrators to step in as analytics partner.  Analytical talent is still a scarce commodity and is shifting overseas.  Creating analytics capabilities inside a company has been harder than clients envisioned, given budget, skillsets, and focus. At the same time, time to market, the speed of delivery, required to make an improvement in a company’s top or bottom line has increased.  Data integration and data management, which are core competencies to power industrial-sized analytics—is hard work.

Considering Gartner’s question makes me smile. Yes, system integrators are making major noise in this space, and they are buying some companies here and there and forming internal groups to harvest their clients’ intellectual property and produce products. While they haven’t succeeded on a large scale in the past, things may be different in our evolving environment.

Ajilitee is already in this space and we are innovating.  We do this in the healthcare market today where security and privacy are paramount and we do it on a large scale across multiple clients. We do it with cloud computing technologies and our information management stack.  We do it because we have the business and technical talent to advise clients on how to make analytics work in their business process. We do it because we passionately believe that this approach helps our clients compete and improve their results. That’s what we are about.

Pragmatic Data Governance—Let’s Get Real, People

Tuesday, January 11th, 2011

I’ve had a number of clients and colleagues comment how difficult it is to make Data Governance programs achieve the expected traction in the first year. The learning curve is tough for those new to the concept – with data stewards bearing the brunt of the organization’s expectations for “making things happen.” The pundits, industry experts and Internet community have white papers, blogs, and articles a-plenty providing advice and direction on best practices, things to do and things to avoid, but there exists a void in specific, actionable steps to create a strong foundation. Without these steps, our big ideas and enthusiasm fade in the wake of disappointment and frustration. What’s needed is an approach I call “Pragmatic Data Governance.” Let’s keep the grand plan in our long view but let’s start with some good, solid basics we can build on. I believe Pragmatic Data Governance solutions have some of these characteristics:

• Low cost “must have” tools for Data Governance
• Data Governance service level agreements
• Measuring (and telling leadership about) effectiveness of Data Governance
• Certified data (or what exactly is the data you should certify?)

I’d like to explore how these pragmatic approaches to Data Governance serve as the foundational building blocks for a successful Data Governance program. If you’ve walked this path and have some learnings to share, let me hear from you. Meanwhile, stay tuned for posts on my top four actionable steps to Pragmatic Data Governance.