Ajilitee

Archive for the ‘Steve Knutson’ Category

Rethinking How to System Test Your BI Project, Part 7: Pass Functional Canary Testing Before Moving Code to the Test Platform

Friday, December 2nd, 2011

The best executed system testing happens prior to the project ‘test phase.’ If you think that’s a catch-22, you’re right. How can system testing happen prior to the completion of development, before the ‘test phase’ even begins? In fact, if you read my last six blogs and followed my lead, you already have the prerequisities in the form of carefully crafted system test sets built for rapid execution and validation. For more guidelines, read on.

Move the code into the test environment after it has been thoroughly system tested.

I’ve never understood why projects rush to get code into the test environment. Think of the constraints on our working environments. Development is owned and controlled by the developers. Test, on the other hand, is locked down, typically by the QA team (and by all rights should be).  Working through functional canary tests in development is quicker, easier, and cheaper.

Execute the vast majority of system tests on the development integration platform.

A common practice is to migrate work from developer “sandboxes” to a development “integration” area, where code is assembled into a BI solution. The integration area is the ideal test platform. That move is an at least an order of magnitude easier than moving the code out of your freedom loving development environment to the lockdown on test.

Make system testing a formality.

Development of system test materials, scripts, and the test harness needs to coincide with the arrival of unit code into the development integration environment. Require the development team to demonstrate successful system testing within the development integration area BEFORE moving the code into the test area. The goal is to make system testing a formality. (Mindful readers will note that I stated just the opposite in part #5. The rule only applies if you complete the majority of system testing on development).

Regression test by design.

Too many projects focus on passing failed tests while ignoring previously passed tests.  The fallacy is that testing is complete when code fixes pass a retest. The reality is that code fixes can impact code that ran successfully in the past, causing test failure. Every set of code fixes mandates a test rerun.  Fortunately, our automated and fast-running functional canary test datasets means you can afford lots of test cycles (see part #6).

Every incremental code change requires the addition of a small set of tests to the functional canary test dataset. The entire test dataset runs together to verify that the incremental change and all prior code perform as expected. Each functional canary test set run validates the entire code base. This ‘black box’ treatment of the code (see part #2) ensures that every logic path – old and new – is retested each time the code is changed.

This is regression testing by design.

System testing expands to encompass new rules as the code base matures. Our functional canary data sets and scripts expand in scope and precision as the code base matures. The code base and system testing mature in lockstep, all outside the confines of the test platform.

Rerun your test scripts on the system test environment to validate code promotion.

Typical project plans deliver system test scripts and data sets just in time for system testing. Initial migration from development to test uncovers defects related to migration. Errors are found in the setup of system test data sets and test scripts – even if the code is working perfectly. This happens when neither the migration process, the test setup, or the code base are isolated. Discovering the source of issues is going to take time – and yet more migrations. This is an expensive way to validate testing and migration.

Let’s say the code passes all system tests prior to migration to the system test environment. You will know (in development) how well the code is working. The test scripts will be debugged before they reach the test environment. After promotion, repoint the test harness to the new environment and the right datasets and you’re ready to rerun system testing on the test box. The system tests will now focus on the validity of migration and setup of the test environment.  Any differences can be attributed to those issues, not bugs in the code or test scripts. This can reduce from weeks to days the time it takes to validate that the migration process works correctly. This is a cheap and expeditious way to validate migration.

Success is in the timing and coordination.

System testing done this way requires some TLC and more attention from management.  Development of the test scripts and the test harness need to be timed with the development and integration of the solution components. More coordination is required between developers writing code, creating test cases and test data.

Rethinking system testing

The testing process described in the seven parts of this blog may be seem counter-intuitive, given that what is proposed is the completion of most system testing before code touches the test box. You’ll need to rethink how your team conducts testing, to consider the content and timing of test scripts and data sets, to offer guidance to the development and testing teams, and to find a skilled developer to tie the tests and validations together with appropriate automation (see part #6). None of these are beyond the skills of an experienced development team working with a good project manager.

This approach to system testing can help hit project objectives and deliver the project sooner, to the kudos of management and the end user community.

Good luck with your system testing efforts.

Please share your thoughts about this blog, or relate your own experiences.

- Jim Van de Water contributed to this blog.

Rethinking How to System Test Your BI Project, Part 6: Design Testing to be Evaluated Automatically

Monday, November 7th, 2011

Best practice tells us that complex solutions require comprehensive system testing. The idea that one or two system test cycles will validate the solution is incorrect, worth no more than a view into an opaque crystal ball.

In other words, here comes the painful part. Well, then again, maybe not.

Here is what you might expect to see for a well-vetted solution:

  • 20 to 100 system test executions during development
  • 3-5 system test executions during testing
  • 2-3 system test executions during deployment to production

Let’s rephrase it; a BI project should plan for 25 to 100+ system test cycles.

I know what you’re thinking – that a LOT of tests.

This is not only possible, it will also help ensure your longevity at the company. You need to design system testing to be executed and evaluated quickly – in fact, very quickly. That’s why functional canary testing is so vital – it’s quick and thorough and it needs to occur in every system test cycle. That’s also why we’re going to build automation into our system testing routine.

Test counts might look something like this:

  • 100 Functional canary
  • 20   Large data sets
  • 60   Incremental
      • 50 Incremental functional canary
      • 10 Incremental large data

You may have also have noticed that the system testing is heavily skewed into development.  Is this your concept of system testing? Read on.

Most system testing cycles should occur while the code is in the development environment. System testing executed in the development environment mitigates for the conditions we discussed in the last blog, “The Case for Thorough System Testing”. When system testing occurs after development, your code will be subject to a vicious cycle of unexpected new rules, recoding, and retesting.

The concept of system testing is often tightly coupled with Quality Assurance. QA generally starts their efforts when development is completed. Old school! Your project must find a way to have QA teams develop and/or approve system test data sets and methods to execute during development. This is a critical concept that project leadership typically struggles with or simply does not understand.

Target functional canary test data set runs to complete in thirty minutes or less. Test cycles include setup, execution, and validation of test results. The big challenge is to complete the result validations in that timeframe. Traditional system tests are time consuming as they require humans to compare results against expectations. Manual test validations can take upwards of two days to complete. Sorry, that simply won’t meet our timeline when we have dozens of tests to execute. Efficient and fast system testing and validation require automation.

One test concept automates system test setup and cleanup using a “test harness.”  A test harness includes logic that differentiates between all of our test conditions – functional canary data set and large data set runs, as well as initial and incremental data set runs.  A good test harness provides the opportunity to the tester to select the type of test run and the locations of source and target data.

Test validation is done using a preconfigured setup of test target data sets. This could be actual data processing results (e.g. a data mart schema) or a test view generated on top of the actual data processing results (e.g. summary derived calculations from the data mart). A data set defines the test, the table/view, row/column, expected value, when the test executes, the test run, actual data values and test results. Manual reads are replaced with an algorithm that reads and writes to the above dataset, finds the comparison value, records the test cycle and results, and calls out variances. It’s an elegant system that captures all the required metadata and results for our developers to analyze just the unexpected values.

Your team has a lot of work ahead planning and executing a series of system test cycles in development, QA and production. How do you plan to get all of this done in a reasonable timeframe?

My next blog will discuss additional guidelines for successful system test execution.

- Jim Van de Water contributed to this blog.

Rethinking How to System Test Your BI Project, Part 5: The Case for Thorough System Testing

Wednesday, November 2nd, 2011

My last couple blogs set the stage for building test data sets of the right size and content. It’s time to take a breather to understand why system testing is so critical to our BI solutions.

Imagine if you will a crystal ball on the desk in front of you.

Inside the orb, you can view your project performance.  You see a project that gathers all the business rules correctly and understands and mitigates data quality issues. You see developers implement complex rules and properly integrate and test their code. At completion, a satisfied business user community praises the BI team.

Unlike the crystal ball, where all the business rules are known and data quality is predictable and resolved flawlessly, reality can be opaque and imperfect.  System testing can help us address both the substantial and subtle obstacles.

System testing highlights shortcomings in our business rules and brings unforeseen data quality issues to light.

The validity and scope of the business rules defined for a BI effort may not be apparent until system testing starts. This maxim doesn’t prevent our diligent business analysts and development teams to offer up their best attempts to fill out incomplete, vague, or poorly written business rules.   Challenges with data quality can appear anywhere. Unit testing rarely exposes such problems. Neither does cursory system testing. No business analyst or user can anticipate all possible data conditions, so the rules defined to address data quality will come up incomplete at best, or even incorrect.

System testing authenticates our designs.

Complexity motivates our designers to break solutions apart. Similar logic is consolidated rather than distributed for the sake of consistency and maintenance.  Performance considerations mandate certain expensive logic be performed first. The design solution was parsed out amongst the development team. Our competent developers completed and unit tested their code components, but assembly concealed unexpected ‘system’ behaviors. Unit testing does not address these anomalies.  The interplay of independently developed code sets within a designed solution can cause our work to miss the mark. The business goals can become anything but transparent in the designed solution.

Forget the idea that system testing is just a formal validation of development. It’s not. Done right, system testing will validate your business rules, elicit acceptable data quality, and prove out the solution design. All those steps are essential if you want to deliver a trustworthy solution to your end users.

What about that crystal ball sitting on your desk?

Use it to magnify, not mask, project issues.

My next blog will explain how to automate your system data sets.

- Jim Van de Water contributed to this blog.

Rethinking How to System Test Your BI Project, Part 4: Build Initial and Incremental Data Sets

Wednesday, October 19th, 2011

Many BI efforts focus on testing the initial data load. Only the initial data load. We know BI solutions can behave differently depending on the presence or absence of data in the target. Incremental testing recognizes these varied load conditions.

Are you willing to risk defects after just a couple of incremental production runs? Skip incremental data testing and operations may stop soon after deployment to production. How will your team react to that news? What about the end user community?

The test team must make sure that the functional canary data sets include data suited to incremental testing. The test sets needs the right data to act against an empty target and a target that already contains data. System test runs will include a one-two punch consisting of an initial test run followed immediately by an incremental test run.

The best practice is to execute incremental test runs using functional canary and large data sets selected to test multiple target conditions.

The last three blogs answered the question – “What records do we use for system testing?” Functional canary data sets to vet the rules and logic. Large data sets to prove out volume and build user confidence. Incremental data sets to validate the code under working conditions. These data sets collectively demonstrate the determination of your BI team to find problems before the business finds them during UAT. Support the efficacy of your team – and your job – by using all three types of testing.

My upcoming blogs will answer the questions - “How and when is system testing done using these carefully crafted data sets?” I’ll focus on the process of running system testing, system test automation, and timing system test setup and execution as part of the development lifecycle.

- Jim Van de Water contributed to this blog.

Rethinking How to System Test Your BI Project, Part 3: Building Large Test Data Sets

Friday, October 14th, 2011

Functional canary testing ensures that the solution exercises every business rule, but not that your ETL will work under the pressure of production loads. For the next set of tests, we need to think BIG DATA, as in large numbers of records. We also need to start thinking in terms of aggregates. The confidence of end users is gained when the additive records passing through your processing plant are proven against expected totals.

Your development team will use the so-called “sniff test” to use additive totals to validate the data loads against totals from other production systems. The test queries against these large test data sets will be summary in nature and will help to identify aberrant code and other issues. Later, the business testers will run similar tests during user acceptance testing, but don’t defer issue discovery to the business.

Here’s another reason not to belay large data set processing until the system moves to production. You don’t want to surprise users at the differences between what they know and what they don’t know – that is, their skepticism with the information in your new system against their almost evangelical faith in the current one. Your team needs to understand the differences and advise on these before UAT begins. If you let the business users find these issues first, convincing them that the new system is right (and the old one wrong or just plain different) is going to take a Herculean effort. You may have been there. I have to confess that I have been. Avoid as possible.

Big data sets also demonstrate the scalability of your code and test environment(s). Code that performs poorly can be tuned and retested using these large test data sets. You’ll need to ferret out code issues from infrastructure issues, as in processors, memory, I/0, and the size of both the development and test environments. Projects can have long data processing windows when production-sized loads are tested on shared or undersized environments. Take note of the scaling factor of your test versus production environments.

Let’s restate the best practices we just walked through:

  • Our project team needs to find the ‘tipping point’ for the quantity of data required to build user faith in the new system.
  • Our ETL testers need to work proactively to find the issues with the new system before our business users find them.
  • Our project team needs to work proactively to identify the differences between legacy (proven) systems and new (unproven) systems before discovery by our business users.
  • Our technical team needs to understand the influence of code versus environment on test performance.

Here are two techniques I use to help establish a baseline of big data.

Dimensional subsetting can be used when there are big history requirements (e.g., year over year comparison). Consider loading a dimensional subset of data. For example, if you have 5 large and 50 small customers, load the data for 1 large customer and 5 small customers for your large data test.  The output will show a full set of metrics for a subset of customers. The business will eagerly sign off on those results.

The cycle loading concept considers conflicting performance needs for historical versus cycle load processing.  For example, most BI processing systems do not scale linearly.  Hourly, daily or weekly load processes may work within service level agreement parameters, but a three year run of data may simply fail. A good practice is to design to run in cycles, both historical and ongoing.  The designers will work to make the processing cycle perform well, and the BI team will have a good sense of how the BI application will perform in production.  Tuning the BI application for a single megalithic history build will require extra time and coding that may actually cause regular cyclical processes to perform poorly.

Large test data sets enable your ETL team to reduce the time required to build confidence in the new system and will help propel your solution through UAT. Planning and creating these test data sets will take time, but ultimately you will gain acceptance of your solution much more quickly.

– Jim Van de Water contributed to this blog.