Making Sense of Test Results

The goal of Quality Assurance is to ensure that your product meets requirements, is debugged, and delivers happy customers. To obtain satisfied customers your testing and QA processes must provide meaningful information during the development lifecycle and after that to efficiently assist engineers and decision-makers in informing choices.

Dashboards as a Tool

The challenge for many is making sense of the data. As a tool, dashboards serve as a functional way to deliver pretty information with a simple overview. Unfortunately, they can also provide too much data, hide essential details, or create easily ignorable aggregates.

Let’s look at a TFS dashboard from Visual Studio Reporting Tools:

This particular dashboard looks great and provides meaningful readings: things are good or bad (green or red), and it is easy to see that things are ok. In reality, dashboards rarely look this nice. Many teams see prominent displays of yellow and orange with little substantial evidence of green and red.

Where Dashboards Go Wrong

Flakey tests: When there are a lot of tests, with concurrency issues for instance, that fail sometimes, there will inevitably be the occasional failing test. This situation results in teams whose dashboard delivers too much unhelpful orange.

False Sense of Security: Dashboards can also provide a false sense of security. If you set a goal to keep errors under 1%, but then you only look at the percentage number of errors, and you fail to check out what is failing; you will likely miss that the same key defects may have been failing for quite some time.

A Call to Action

In the end, the goal of a dashboard is not to convey information, but rather to get people to act (assuming the data reported indicate something bad). The dashboard when thoughtfully constructed will thus be a warning flag and a call to action, but only if it says something that causes people to take action.

If the dashboard is not practical, people will ignore what it displays.

It is like the threat level announcements at the airport or the nightly news, you know all is not well, but you either learn to tune it out or find another way to numb the information. Developers that get told every day that there are some failures will do the same, sure they spend some time chasing unreproducible failures, but after awhile they will stop paying attention.

A Good Dashboard Generates an Emotional Response

The genius of a thoughtfully constructed dashboard is that it generates an emotional response.

Yay! Good!


Oh. No, not again!

So how do we get people back to the emotional reaction of a simple read / green dashboard in a world where things are noisy?

This question leads to the creation of a new dashboard that focuses on the history of defects and failures. In this case, we categorize failures that are new as orange, consecutive failures over multiple days as red, with red failures that exist for several days then going black.

Createing Useful Dashboards

Screenshot © Possum Labs 2018

It is easy to see from the chart that even though there are some flay tests, there are also a few consistent failures. And it is easy to see that we have a few regular failures over the last few weeks.

This occurrence creates an opportunity to start drawing lines in the sand, for example, setting the goal to eliminate any black on the charts. Even when getting to a total pass rate seems infeasible for a project, it should be more manageable to at least fix the tests that fail consistently.

This type of goal is useful because it is something that can get an emotional response, people can filter the signal from the noise visually, so they get to the heart of the information, which is that there are indeed failures that matter. When we make it evident how things change over time, if things get worse, people will notice.

Dashboards are a Tool — not a Solution

Each organization is different, and each organization has its challenges, getting to a 100% pass rate is much easier when it is an expectation from the beginning of the project, but often systems were designed years before the first tests crept in. In those scenarios, the best plan is to create a simple chart and listen to people when they tell you it is meaningless. These are the people that give you your actual requirements; these are the people who will spell out where and what to highlight to get an emotional response to the dashboard data.  

Dashboards do not always convey meaningful data about your tests. If they lack thoughtful construction, it is likely they may fail at getting people to act when the information is bad. Bad results on your panel should be a call to action, a warning flag, but the results need to mean something or your people simply ignore and move on.

At Possum Labs, meaningful dashboards are not a solution, but instead just one of many tools available to assure that your QA delivers effective and actionable data to your teams.

How to Effortlessly Take Back Control of Third Party Services Testing

Tools of the Trade: Testing in with SaaS subsystems.

For the last few years, the idea has been floating around that every company is a software company, regardless of the actual business mission. Concurrently even more companies are dependent upon 3rd party services and applications. From here it is easy to extrapolate that the more we integrate, the more likely it is that at some point, every company will experience problems with updates: from downtime, uptime, and so on.

Predicting when downtime will happen and or forcing 3rd party services to comply with our needs and wishes is difficult if not impossible. One solution to these challenges is to build a proxy. The proxy allows us to regain a semblance of control and to test 3rd party failures. It won’t keep the 3rd parties up, but we can simulate failures whenever we want to.

As an actual solution, this is a bit like building a chisel with a hammer and an anvil. And yet, despite the rough nature of the job, it remains a highly useful tool that facilitates your work as a Quality Assurance Professional.

The Problem

Applications increasingly use and depend upon a slew of 3rd party integrations. In general, these services tend to maintain decent uptime and encounter only rare outages.

The level of reliability of these services leads us to continue to add more services to applications with little thought to unintended consequences or complications. We do this because it works: it is cheap, and it is reliable.

The problem (or problems) that arise stem from the simple nature of combining all of these systems. Even if each service maintains good uptimes, errors and discordant downtime may result in conditions where the time that all your services are concurrently up is not good enough.

The compounding uptimes

Let’s look at a set of services that individually boast at least 95% uptime. Let’s say we have a service for analytics, another for billing, another for logging, another for maps, another for reverse IP, and yet another for user feedback. Individually they may be up 95% of the time, but let’s say that collectively the odds of all of them being up at the same time is less than 75%.

As we design our service, working with an assumption of around a 95% up-time scenario feels a lot better than working with a chance of only 75% uptime. To exacerbates this issue, what happens when you need to test how these failures interact with your system?

Automated Testing is Rough with SaaS

To create automated tests around services being down is not ideal. Even if the services consumed are located on site, it is likely difficult to stop and start them using tests. Perhaps you can write some PowerShell and make the magic work. Maybe not.

But what happens when your services are not even located on the site? The reality is that a significant part of the appeal of third-party services is that businesses don’t really want onsite services anymore. The demand is for SaaS services that remove us from the maintenance and upgrade loop. The downside to SaaS means that suddenly turning a service off becomes much more difficult.

The Solution: Proxy to the Rescue

What we can do is to use proxies. Add an internal proxy in front of every 3rd party service, and now there is an “on / off switch” for all the services and a way to do 3rd party integration testing efficiently. This proxy set-up can also be a way to simulate responses under certain conditions (like a customer Id that returns a 412 error code).

Build or buy a proxy with a simple REST API for administration and it should be easy to run tests that simulate errors from 3rd party providers. Now we can simulate an entire system outage in which the entire provider is down.

By creating an environment isolated by proxies, test suites can be confidently run under different conditions, providing us with valuable information as to how various problems with an application might impact our overall business service.

Proxy Simulations

Upon building a proxy in between our service and the 3rd party service, we can also put in the service-oriented analog of a mock. This arrangement means we can create a proxy that generates response messages for specific conditions.

We would not want to do this for all calls, but for some. For example, say we would like to tell it that user “Bob” has an expired account for the next test. This specification would allow us to simulate conditions that our 3rd party app may not readily support.

Customer specific means better UX

By creating our own proxy, we can return custom responses for specific conditions. Most potential conditions can be simulated and tested before they happen in production. And we can see how various situations might affect the entire system. From errors to speed, we can simulate what would happen if a provider slows down, a provider goes down, or even a specific error, such as a problem that arises when closing out the billing service every Friday evening.

Beyond Flexibility

Initially, the focus might be on scenarios where you simulate previous failures of your 3rd party provider; but you can also test for conditions for which your 3rd party may not offer to you in a test environment. For instance, expiring accounts in mobile stores. With a proxy, you can do all of this by merely keying off of the specific data that you know will come in the request.

In Conclusion: Practical Programming for the Cloud

This proxy solution is likely not listed in your requirements. At the same time, it is a solution to a problem that in reality is all too likely to arise once you head to production.

In an ideal world, we wouldn’t need to worry about 3rd party services.

In an ideal world, our applications would know what failures to ignore.

In the real world, the best course is preparation: this allows us to test and prevent outages.

In reality, we rarely test for the various conditions that might arise and cause outages or slowdowns. Working with and through a 3rd party to simulate an outage is likely very difficult, if not impossible. You can try and call Apple Support to request that they turn off their test environment for the App store services, but they most likely won’t.

This is essentially a side-effect from the Cloud. The Cloud makes it is easy to add new and generally reliable services, which also happen to be cheap and makes the Cloud an all-around good business decision. It should not then be surprising that when you run into testing problems, an effective solution will also come from the Cloud. Spinning up small, lightweight proxies for a test environment is a practical solution for a problem in the Cloud.

Predicting the Future: The Big Data Quandary

Predicting the Future: the Big Data Quandary

The Role of Testing in Indeterminate Systems: should the humans be held accountable?

Big data is a hot topic that introduces us to incredible possibility and potentially terrible consequences.

Big data essentially means that engineers can harness and analyze traditionally unwieldy quantities of data and then create models that predict the future. This is significant for a variety of reasons, but primarily because accurate prediction of the future is worth a lot of money and it has the potential to have an effect on the lives of everyday citizens.

Good Business

On one level big data allows us to essentially reinvent the future by allowing software to encourage individuals to do something new that they as of yet have not considered (and might never have), such as recommendations for viewing on Netflix or buying on Amazon. Big Data can also provide daily efficiencies in the little things that make life better by saving us time or facilitating decision making. For businesses, Big Data can give deeper meaning to credit scores, validate mortgage rates, or guide an airline as to how much they should overbook their planes or vary their fares.


Optimizing algorithms based on data are even more powerful when we consider that their effectiveness is reliably better than actual humans attempting to make the same types of decisions and predictions. In addition to the computing power of big data, one advantage algorithms have over human predictions is that they are efficient. Algorithms do not get sidelined or distracted by bias and so avoid getting hung up by ethical judgments that humans are required to consider, either by law or by social code. Unfortunately, this doesn’t mean that algorithms won’t avoid making predictions that present ethical consequences.

Should algorithms be held to the same moral standards as people?

Optimization algorithms look for correlations and any correlation that improves a prediction may be used. Some correlations will inevitably incorporate gender, race, age, orientation, geographic location or proxies for those values. Variables of this sort are understandably subject to ethical considerations and this is where the science of big data gets awkward. A system that looks at a user’s purchasing history might end up associating a large weight (significance) to certain merchants. And those merchants might then happen to be hair care providers, which then means that there is a good chance that the algorithm has found an efficient proxy for race or gender. Similarly, the identification of certain types of specialty grocers or specialty stores, personal care vendors or clothing stores might reveal other potentially delicate relationships.

As these rules are buried deep inside a database it is hard to determine when the algorithms have managed to build a racist, sexist, or anything-ist system. To be fair, neither the system nor its developers or even the business analyst in charge of the project, had to make any conscious effort for an algorithm to identify and use these types of correlations. As a society, we implicitly know that many of these patterns exist. We know that women get paid less than men for the same work; we know that women and men have different shopping behaviours when it comes to clothing; we know that the incarceration rate for minorities is higher; and we know that there will be differences in shopping behaviors between different populations based on age, race, sex and so on.

Can algorithms be held to the same moral standards as their human developers or should the developers be held responsible for the outcomes? If the answer to either or both of these questions is “yes,” then how can this be achieved both effectively and ethically? When ethically questionable patterns are identified by an algorithm, we need to establish an appropriate response. For example, would it be acceptable predict a lower acceptable salary to a female candidate than a male candidate, when the software has determined that the female candidate will still accept the job at the lower rate? Even if the software did not know about gender, it may determine it based on any number of proxies for gender. One could argue that offering the lower salary isn’t a human judgment, it is simply following sound business logic as determined by the data. Despite the “logic” behind the data (and the fact that business makes this kind of decision all the time), hopefully, your moral answer to the question is still “no, it is not okay to offer the female candidate a lower suggested salary.”

Immoral or Amoral: What is a moral being to do?

If we label the behavior of the algorithm in human terms, we’d say that it was acting immorally; however, the algorithm is actually amoral, it does not comprehend morality. To comprehend and participate in morality (for now) we need to be human. If we use big data, the software will find patterns, and some of these patterns will present ethical dilemmas and the potential for abuse. We know that even if we withhold certain types of information from the system (such as avoiding direct input of information like race, gender, age, etc.) the system may still find proxies for that data. And, the resulting answers to that data may continue to create ethical conflicts.

Testing for Moral Conflicts

There are ways to test the software and determine if it is developing gender or race biases with controlled data. For instance, we could get create simulations of individuals that we see as equivalent for the purpose of a question and then test to see how the software evaluates them as individuals. Take the instance of two job candidates with the same general data package but vary one segment of the data, say spending habits. We then look at the results and see how the software treated the variance in the data. If we see that the software is developing an undesired sensitivity for certain data we can go to the drawing board make an adjustment, such as removing that data from the model.

In the end, an amoral system will find the optimal short-term solution; however, as history has shown, despite humanity’s tendency towards the occasional atrocities, we are moral critters. Indeed, modern laws, rules, and regulations generally exist, because as a society we see that the benefits of morality outweigh the costs. Another way to look at the issue is to consider that for the same reasons we teach morality to our children, sooner or later we will likely also have to impart morality into our software. We can teach software to operate according to rules of morality and we can also test for compliance, thereby ensuring that our software abides by society’s rules.

Responsibility: So why should you care?

Whose responsibility is it (or will it be) to make sure this happens? My prediction is that given the aforementioned conundrum, in the near future (at most the next decade) we will see the appearance of a legal requirement to verify that an algorithm is acting and will continue to act morally. And, of course, it is highly probable that this type of quality assurance will be handed to testing and QA. It will be our responsibility to verify the morality of indeterminate algorithms. So for those of us working in QA, it pays to be prepared. And for everyone else, it pays to be aware. Never assume that a natively amoral piece of technology will be ethical, verify that it is.

If you enjoyed this piece please discuss and share!

Click here to learn more about Bas and Possum Labs.

Navigating Babylon Part II

How to Introduce DomainSpeak in Testing

First, let’s start with a quick overview of the problem I discussed in Navigating Babylon Part I. Microservices create efficiencies in development in a world dependent on remote work environments and teams. Unfortunately, the separation of workers and teams results in the tendency for microservices to encourage the development of multiple languages or dialects that obfuscate communication and further complicating testing. We have our anti-corruption layer and we don’t want to pollute our code by spilling in sub-system language.

A Domain-specific Vocabulary for Testing: DomainSpeak

There is, however, a pragmatic solution: we can build on the anti-corruption layer by creating tests in a specific language that has been created to clearly describe business concepts. We can and should create DomainSpeak, a domain-specific vocabulary or language, to be used for testing. Once we publish information in this language it can be shared across microservices and thus influence the workflow. Periodically, as is done in the English language, we may need to improve definitions of certain vocabulary, by re-defining usage and disseminating it widely, thus influence its meaning.

How will this DomainSpeak improve testing?

For integration tests, all the different dialects should not permeate your integration tests. You need to be very clear that a word can only have a single meaning. This requires a two-part process:

  1. You need to verify that you are not inconsistently naming anything inside an actual test; and,
  2. You need to do translations in an anti-corruption layer so everything inside is consistent.

What does DomainSpeak look like in a practical sense?

When you consider how brands influence pop culture, it is through language.

In the business world, marketing professionals use domain specific languages to create a brand vocabulary or a BrandSpeak. All major and influential brands and even smaller yet influential brands have a specific vocabulary, with specific definitions and meanings, to communicate their brand to the public. All communications materials are integrated into this system.

Brand specific, intentional vocabulary, has the ability to invade and permeate. Many people are completely unaware that it was a DeBeers commercial in the 1940s that created the cultural tradition “a diamond is forever.” Other examples, “Don’t mess with Texas” came from an anti-litter campaign and although we know it’s a marketing ploy, just about everyone is on board with the idea that “What happens in Vegas, stays in Vegas.” On an international level, if you order a “coke” you will most likely get a carbonated beverage, but you won’t necessarily get a Coca-cola.

As I referenced in my first discussion on Navigating Babylon, I recommend implementing a mapping layer between the microservices and the test cases. Next, when deciding to address the language used, we take it a step further. Now focus in on the language or the DomainSpeak and how this domain-specific vocabulary improves the associated output of the test cases. This means that for example, a Customer, a User, and a Client all have specific meanings and that they cannot be interchanged.

What is the process to create this language?

The initial process is an exploratory step. To create your own DomainSpeak your testing department will need to communicate regularly with the business owners and developers. Their goal will not be to dictate what words the business owners and developers use, but to learn what words already have meanings and to document these usages. The more your communicate, recognize and document adopted meanings, the more you will discover how, where and why meanings differentiate.

For instance, the business may see a Customer as a User with an active subscription, whereas a microservice might use the words interchangeably as they do not have the concept of a subscription. You will also notice that sometimes situations may give rise to conflicting meanings. A developer may have picked up the word “Client” from a third party API he integrated with for “User,” whereas the business may use “Client” for a specific construct in a billing submodule for “customers of their customer.” In such situations, to avoid confusion and broken stuff, you will need to specify which definition is to be used and possibly introduce a new concept or word to account for the narrowing of the definition. Perhaps the “customers of their customer” will now be a “vendee” instead of a “client.” Don’t dismay if there is not an existing word that accurately matches your concept, you can always create a new word or make a composite word to meet your needs.

Indeed, by being consistent and by distributing your use of language to a wide audience you can introduce new words and shape the meaning of existing words. This means that your tests have to be very close to a formal and structured form of English. This can be accomplished by using Aspect-oriented testing or by creating fluid API wrappers on top of the microservices. Aspect-oriented testing would look like this (cucumber syntax):

Given a User
When the user adds a Subscriptions
Then User is a Customer
Whereas a fluid API would be something like this (C# syntax)
User user1 = UserManager.CreateNewUser();

This creates a lot of focus on syntactic sugar* and writing code behind the scenes to ensure that your code incorporates your business logic (your test) and looks the way you want it to. Every language has their own way to solve this challenge. Even in C, you could use macros to take a pseudo code and turn it into something that would compile, and chances are that your results would be far superior to that of your current language usage.

For my uses, the cucumber syntax, with a side of extra syntactic sugar that allows me to define variables, is very effective. (I will get into this in more detail another day.) Whichever language you use, keep in mind that the goal of creating a DomainSpeak vocabulary is not to make your code look pretty, but rather to ensure that your code communicates clearly to the business and developers and that meanings are defined with precise and consistent language.

The End Goal is Efficient Quality Assurance

The goal, after all, is to improve productivity and deliver a quality product. Clear communication will not only benefit your team internally, it will also influence other teams. By communicating your results in consistently clear and concise language to a wide audience, you will influence their behavior. You will be able to efficiently respond to questions along the lines of “we use ‘customer’ for all our ‘users.’” You will also be able to easily define and answer where the rule may not hold and why you use the word you use. Again, the goal is not to dictate to folks what words to use, but to explain the meanings of words and to encourage consistent usage. Adoption will follow slowly, and over time usage will become a matter of habit. Over time services will be rewritten and you should be able to delete a few lines from your anti-corruption layer every time one gets rewritten.

*syntactic sugar allows you to do something more esthetically, but not in a necessarily new way. Different ways of saying the same thing. Just looks different. Not important because it’s not new, but significant because it makes things more readable and understandable. Clean up language/clearer code and therefore easier to find a bug.

If you enjoyed this piece please share it with your colleagues! If you have something to add, please join the discussion!

A Look at Why I can’t follow advice and neither can you…

Anyone who has tried to lose weight knows there is a significant gap between “knowing” and “doing.”

We can read the books. Follow the science. Listen to the advice. The professional advice and of course the “helpful” information from well-meaning friends, coworkers and even the cashier at the grocery store. Weight-loss is so ubiquitous in our society that everyone feels confident sharing their expert advice and experience. And yet we are a good decade or more into an obesity epidemic…

Lousy Advice or Lack of Questioning?

Software testing ironically has many of the same issues. Testing is part of every organization, and most any developer can tell you how get things done (usually without having done it themselves) and there is plenty of advice, numerous books on best practices that tell you what should be doing. And yet, testing doesn’t always go the way it should nor does it consistently deliver the results that we expect. This is not due to a lack of standard rules or even expert advice, but it is indicative of the fact that each situation has unique conditions. Success is thus much more complicated than simply the following of advice. For example, telling a vegan that the perfect way to lose weight is to cut out carbs is perhaps more complicated and a more significant sacrifice than for someone working (and eating!) at a BBQ shack.

In many situations testing does not work for similar reasons to diet advice: it is not appropriate to the specific organization and the implementing individuals. You need a solution that works for you, for your group, for your particular advantages, and for your limitations. Drugs (pharmaceuticals) are another parallel. Not every drug is right for every person. Some people experience side effects or allergies, where others don’t. Some drugs are only effective for some people. Some drugs make grand claims, but it is unclear whether they are even actually any more effective than a placebo.

Let’s consider the disclaimer on Forbes for Chantix a drug that is supposed to help folks quit smoking.

Purpose: A prescription medicine that contains no nicotine, Chantix can help adults stop smoking. Studies have shown that after 12 weeks of use, 44% of users were able to quit smoking.

Side Effect: Some of the most common side effects of using Chantix include trouble sleeping, as well as vivid, unusual or increased dreaming. In February, the FDA also released a public health advisory stating that severe changes in mood and behavior may be related to use of the drug.

Only 44% effective. Trouble sleeping. Vivid dreams. Possible unknown changes to behavior. Wow. Are we sure this drug is worth the investment and risk? Is it the optimal method to quit smoking?

We should be asking similar questions when we select testing methods.

Time and money can be spent quickly on ineffective methods. Ineffective testing creates frustration, scheduling problems, budget problems and it often results in a lack of morale for the various stakeholders. And so, the next time you consider implementing the latest advice on engineering productivity or a different idea for reducing testing cycles, think carefully about what its side effects might be for your specific situation. Is the testing method truly optimal for your organization? Similarly, if you just completed a testing cycle and your result was moody engineers and not meaningful data, consider writing up a disclaimer for the next group or project that might fall into a similar trap.

One-size-fits-all testing disclosure:

Problems may occur if testing is poorly integrated or a one-size-fits-all approach is taken. Some of the most common side effects of a “one-size-fits-all approach” include unrealistic expectations, inadequate identification of defects and overextended budgets. A NIST study indicated that 25% to 90% of software development budgets might be spent on testing.

For a field where we have a digital track record for everything from requirements to code to bugs to releases, we don’t have much information about what works and what does not work. Everyone tries to share the advice that he or she believes is right; however, before implementing the newest tip make sure to ask questions. Ask the person sharing the advice, if she has any real world experience with the method. And, then ask yourself or your team, if the advice is truly appropriate to and optimal for your testing situation. 

Remember your organization is unique and not a one-size-fits-all.

Logic vs Emotion in Testing

There is a significant focus on efficient testing through the use of the testing pyramid. The testing pyramid focuses on finding a defect in as small of a segment as possible. This focus, in turn, creates efficient coverage of the functionality.

This is logically correct.

This is also emotionally flawed.

Dale Carnegie.png

The Challenge: relating failures

The challenge with low-level tests is that it is hard to connect failures to their end-user effects. Looking hard at failed unit tests never provides a straightforward answer. Rarely do we ask, “so what does this result mean for our users?” and as a result, it is all too easy to comment out and ignore specific failures. This is in part because determining user effects requires us to stop and make an effort to decide what commenting this out (or marking the test as ignored) would actually mean to the end user.

Most software development teams either have some failing tests or all passing tests. (And, we ignore some tests.) Ideally, we would want nothing commented out, and nothing fails. I’m sure that there are teams who can make this claim. However, the reality is that testing exists because it provides value. And despite our best intentions and our best developers, we are all only human, and sooner or later we find ourselves in a situation where we have failing tests or the need to comment them out.

The testing pyramid is a logical solution that optimizes the problem of test coverage. It would be rational to respect all tests under these conditions because you are trying to minimize the duplication of coverage. It is correct. There is just one relatively significant problem, and that is that we have humans, living breathing, imperfect people implementing the testing pyramid.


Humans love to assign meaning.

Astrology exists because of this perpetual search for meaning. We want to be connected. Associated. The more abstract something is, the more distant it becomes. Testing pyramids create detached results in many different jargons. People don’t do well with the anonymity; it doesn’t bring out the best in anyone. By making tests anonymous, we make it easy to comment them out or ignore their failure. We, in fact, promote the avoidance of finding the meaning in our results. This should not be surprising, given that we are asking humans to assess the results.

Engineering Quality and Positive Results

On the other hand, successful practices are those that deliver desired results (this is in contrast to those that should get desired results, but don’t necessarily). Positive (desired) results come from the prescription and adherence to said prescription. If there is a problem with adherence of the prescription, it might be that you need to adjust the prescription (simplify, modify, require less, etc.). In turn, you may have a less efficient prescription, but you will compensate with better adherence.

Logic vs. Emotion: Remember that you are working with humans.

There has been a lot of thought put into adherence, and there are plenty of books talking about how to create change in organizations. One of the common elements is that people need to have a visceral reaction when making the decision either to follow or to deviate from the prescription. A solution to reducing anonymity and improving the meaning in our testing is thus to make test failures visceral.

How can we do this? We must find a way to recreate a real-life scenario that demonstrates what happens when a test fails. As a practice, this proposal comes into direct conflict with the testing pyramid. The pyramid focuses on individual components, component interactions, API’s, UI, and finally users in the case of integration tests. The tests that are easiest to relate to are those that we have fewest of, while unfortunately the tests that are the least relatable, are the most common. Furthermore, the chances are that the business analyst does not know the names of the classes and interfaces referenced in unit tests. All of which contributes to making it difficult for decision-makers to apply meaning to the results of a testing pyramid functionally.

What about Best Practices?

When it comes to best practices, it bears reminding that they are intended for a generic company with generic developers. In contrast, your goal is to improve your unique organization with the individuals inside. Best practices are certainly worthwhile recommendations that can help you; however, they will not always accommodate the specifics for your particular scenario. If, for example, unit test are not resilient (get deleted, commented out, remain broken) then the unit test might not be the right solution for you and integration tests would be more valuable. You can choose to change the organization, but this is not always practical. The pragmatic choice is to optimize your solution to meet your organization where it is. Meaningful tests are ideally suited for scenarios where there is not enough time to attain the aspired end-quality. Often we must deliver imperfect solutions, but we need a solid understanding of acceptable versus unacceptable imperfections.

All this does not diminish the testing pyramid as the optimal coverage from a logical perspective; it just means that if you work with humans; your organization may find better long-term results from a different breakdown of types of tests.