Making Sense of Test Results

The goal of Quality Assurance is to ensure that your product meets requirements, is debugged, and delivers happy customers. To obtain satisfied customers your testing and QA processes must provide meaningful information during the development lifecycle and after that to efficiently assist engineers and decision-makers in informing choices.

Dashboards as a Tool

The challenge for many is making sense of the data. As a tool, dashboards serve as a functional way to deliver pretty information with a simple overview. Unfortunately, they can also provide too much data, hide essential details, or create easily ignorable aggregates.

Let’s look at a TFS dashboard from Visual Studio Reporting Tools:

This particular dashboard looks great and provides meaningful readings: things are good or bad (green or red), and it is easy to see that things are ok. In reality, dashboards rarely look this nice. Many teams see prominent displays of yellow and orange with little substantial evidence of green and red.

Where Dashboards Go Wrong

Flakey tests: When there are a lot of tests, with concurrency issues for instance, that fail sometimes, there will inevitably be the occasional failing test. This situation results in teams whose dashboard delivers too much unhelpful orange.

False Sense of Security: Dashboards can also provide a false sense of security. If you set a goal to keep errors under 1%, but then you only look at the percentage number of errors, and you fail to check out what is failing; you will likely miss that the same key defects may have been failing for quite some time.

A Call to Action

In the end, the goal of a dashboard is not to convey information, but rather to get people to act (assuming the data reported indicate something bad). The dashboard when thoughtfully constructed will thus be a warning flag and a call to action, but only if it says something that causes people to take action.

If the dashboard is not practical, people will ignore what it displays.

It is like the threat level announcements at the airport or the nightly news, you know all is not well, but you either learn to tune it out or find another way to numb the information. Developers that get told every day that there are some failures will do the same, sure they spend some time chasing unreproducible failures, but after awhile they will stop paying attention.

A Good Dashboard Generates an Emotional Response

The genius of a thoughtfully constructed dashboard is that it generates an emotional response.

Yay! Good!


Oh. No, not again!

So how do we get people back to the emotional reaction of a simple read / green dashboard in a world where things are noisy?

This question leads to the creation of a new dashboard that focuses on the history of defects and failures. In this case, we categorize failures that are new as orange, consecutive failures over multiple days as red, with red failures that exist for several days then going black.

Createing Useful Dashboards

Screenshot © Possum Labs 2018

It is easy to see from the chart that even though there are some flay tests, there are also a few consistent failures. And it is easy to see that we have a few regular failures over the last few weeks.

This occurrence creates an opportunity to start drawing lines in the sand, for example, setting the goal to eliminate any black on the charts. Even when getting to a total pass rate seems infeasible for a project, it should be more manageable to at least fix the tests that fail consistently.

This type of goal is useful because it is something that can get an emotional response, people can filter the signal from the noise visually, so they get to the heart of the information, which is that there are indeed failures that matter. When we make it evident how things change over time, if things get worse, people will notice.

Dashboards are a Tool — not a Solution

Each organization is different, and each organization has its challenges, getting to a 100% pass rate is much easier when it is an expectation from the beginning of the project, but often systems were designed years before the first tests crept in. In those scenarios, the best plan is to create a simple chart and listen to people when they tell you it is meaningless. These are the people that give you your actual requirements; these are the people who will spell out where and what to highlight to get an emotional response to the dashboard data.  

Dashboards do not always convey meaningful data about your tests. If they lack thoughtful construction, it is likely they may fail at getting people to act when the information is bad. Bad results on your panel should be a call to action, a warning flag, but the results need to mean something or your people simply ignore and move on.

At Possum Labs, meaningful dashboards are not a solution, but instead just one of many tools available to assure that your QA delivers effective and actionable data to your teams.

Effective QA is Not an Option it’s a Necessity: Here’s How to do it Right

As has long been the case in the software and technology industry, immigration is an important source for accessing Quality Assurance people.

Part of the reason is that, as per usual, QA isn’t always a job that attracts US-based developers. The US job market is already competitive for developers and moving developers on your team to quality assurance is often undesirable for a number of reasons. The result is that relocating QA developers to the USA is in many cases the most satisfactory solution for the industry.

Unfortunately, barriers to immigration continue to drive-up demand for qualified QA, which only further exacerbates the shortage of people with an appropriate development background already in the US.


Quality Assurance is not Optional

Compounding the problems experienced by a physical shortage of qualified developers for QA is the fact that some decision makers continue to consider QA an “optional” line item in their budget. Unfortunately for them, QA and the surrounding processes are anything, but “optional” and a good quality assurance engineer is a key player on any team.

In companies where QA is an accepted need, it is still often considered more of a necessary evil than a definite benefit. QA is too easily blamed when a project that’s been on schedule suddenly gets hung up in testing. Another common problem appears when development teams pick up speed after implementing workflow changes, only to discover that QA is still a bottleneck keeping them from delivering a product to their customer.


A Look at the Current Situation

As we’ve seen, numerous factors contribute to the myths surrounding Quality Assurance that contribute to this questionable climate that makes some engineers shy away from the moniker or even the field.  Moreover, we are experiencing an actual shortage of qualified engineers, which means that QA in many instances ends up being not an afterthought, but rather a luxury.

Immigration and work status has been hot-topics for the last few years. And regardless of where you fall on the political spectrum, if you work in software, you’ve likely experienced first hand the effect on the jobs market.

For those of you that have been lucky enough to hire engineers, many of you may have also been unlucky enough to discover that said engineer has to go back to India (or wherever his or her country of origin might be). And what started out as a short trip, quickly devolves into a lengthy, even a 6-month long, process to renew or regularize an H-1B visa, following changes made to the requirements in 2017.

Whether your experience with the dwindling QA applicant pool is first hand or anecdotal,  here are some statistics for you to munch on:

The US Bureau of Labor and Statistics expects the market for software developers to blow-up by 24% from 2016 to 2026. That means a need for approximately 294,000 additional software developers in an already tight market. If you think it’s hard to convince an engineer to join your QA team now, just wait and see what it will be like in 2026.

We can’t know for sure the number of H-1B visas currently held up due to changes in requirements, but this article does a decent job of discussing the demand and actual need for H-1B Visa’s in for the USA with a focus on the State of Massachusetts. If you’d like to know more, I’d suggest taking a look; however, for our purposes, I don’t think importing QA staff is necessarily the answer.

So, you need to have QA, but you can’t hire qualified staff to take care of the work. What can you do?

The Foundations of Quality Assurance

Before I answer this question, let’s take a look at why Quality Assurance exists in the first place. From a business perspective, there are a few things that pretty much every customer expects from a software product. The following three expectations are the crucial reasons to the story behind why quality assurance is not optional::

  1. Customers expect that the programs will work as requested and designed, and within the specified environments;
  2. They hope that software will be user-friendly;
  3. And, they assume that software will have been successfully debugged: meaning that QA must deliver a product that is at the least free of the bugs that would result in numbers 1 or 2 becoming false.

Historically, software teams were small, but since the early 80s, due to the need to scale quickly and keep up with changing requirements and other advances in technology and globalization, we’ve experienced rapid growth in the size of development teams and companies. This growth has led to implementing a wide variety of tactics from workflow solutions, think Waterfall or Agile, to different methods of increasing productivity and efficiencies, such as offshore teams and microservices.

Take a look at this simple chart approximating the increase in development team size from the original release of Super Mario Brothers in 1985 to Super Mario World in 1990 and Super Mario 64 in 1996. (Noting that in the credits, by 1996 the occasionally thank entire teams, not just individuals, so actual number is likely even higher).

Super Mario Release Team Size

© Possum Labs 2018

Where we haven’t (at least in the USA) been able to keep up is in the training of new software engineers. QA departments, regardless of size, are challenged to carry-out all the different required processes to follow software through the development lifecycle to delivery and maintenance (updates), while also keeping abreast of changes to technology and integrations.

A misfortunate result of this shortage of QA engineers is that the point in the development cycle where most companies fall short is in testing. And, yet, the ability to provide useful and meaningful testing is crucial to the successful delivery of quality assurance to one’s client, whether building an in-house product, such as for a financial institution, or a commercial product for the public market.

While offshore teams may be a solution for some companies, many companies are too small to make offshore building teams practical or cost-effective.

What’s more is that many engineers tend to be good at one thing — development — they may not have a good sense of your organization’s business goals or even an understanding of what makes a good customer experience. And while your high paid development staff might excel at building clever solutions, it doesn’t necessarily mean that they also excel at testing their own goods. And do you really want to pay them to do your testing, when their time could be better invested in innovations and features? At Possum Labs we’ve determined that it is often most efficient to design workflows and teams to adjust to the people you have.

This disconnect between development requirements and a full understanding of business goals is in fact often the culprit in a pervasive disconnect between testing and business outcomes. What do I mean by disconnect? Let’s consider the four following statements and then talk about some real-life examples:

  • Users prefer seamless interfaces, intuitive commands, and technology that makes them feel smart, not dumb.
  • Businesses prefer software that assures their business goals are met and that technology allows their employees to work smarter with greater efficiency thus promoting growth and profit.
  • Today the average person is adequately adept and familiar with technology to know when your software exhibits even moderately lousy UX. And companies can also experience public shaming via social media when they make a particularly dumb or inopportune mistake.
  • And then there are the security risks.

In 2017 we saw several major security snafus experienced by large corporations that from the few details publicized were the direct result of inaction on the part of the decision-makers, despite being notified by engineering.

One might think that the decision makers, while moderately acknowledging the risk, may have simply gambled that nothing would come of the risks while taking steps to protect themselves.

I would like to go a step further. I’d wager that everyone involved fell victim to a set of common testing pitfalls.

Indeed, one of the most challenging aspects of testing is figuring out not only how to effectively and efficiently create and run tests, but most importantly to figure out how to confidently deliver meaningful results to the decision makers. Whether you are a software business or a business that uses software, successful quality assurance is crucial to the long-term health and success of your business.


Let’s do a quick recap of what we’ve covered so far:

  1. There is a shortage of qualified test engineers.
  2. Users want products they can rely on and that are friendly to use.
  3. Companies want products they can trust that improve efficiencies, their bottom line and that, of course, make their clients happy.
  4. It is difficult to create tests that deliver meaningful results when the testing is done by engineers that don’t necessarily understand the businesses end goals.
  5. Decision makers don’t want to understand the tests; they want to have meaningful results so that they can make effective decisions.


So what if we could solve all of these problems at once?

This type of solution is what Possum Labs achieves through the clever use of solutions and tools that integrate with your existing systems, processes, and people. We build out quality assurance so that anyone who understands the business goals can successfully carry-out testing and efficiently uses the results.

Not only does this solve problems 1 to 5 above, but it is also, in fact, a super solution in that it prevents most companies from having to hire or train new developers. Instead, you need to hire for people who with a keen understanding of your business and that can be trained to work with your tools. Possum Labs methods allow you to implement and upgrade your quality assurance — sometimes even reducing your staffing load — while delivering better and more meaningful results, so that your end of service recipients get better services or better products than before.


How does Possum Labs do this?

Each of our solutions vary a bit from company to company, but in general, we use several tools, including proxies and modules (think Lego) to make it so existing tests can be modified and new tests written with simply reorganizations of the “bricks.” This focus on custom solutions allows a non-technical individual with a solid understanding of business goals to generate tests and result that deliver meaningful results with confidence for him or her to share with decision makers.

The result is that testing bottlenecks open up allowing for a more efficient flow of information and better feedback through all channels. Products are delivered faster. Information flows smoothly. Better decisions are made, and efficiencies are gained. Developers can focus on development and decision makers in achieving their strategic goals. Meanwhile, you’ve got happy customers, and everyone can get a good night’s rest.

3 Risks to Every Team’s Progress and How to Mitigate

When looking at improving performance the first thought is often to increase the size of our development team; however, a larger group is not necessarily the only or the best solution. In this piece, I suggest several reasons to keep teams small and why to stop them from getting too tiny. I also look at several types of risk to consider when looking at team size: how team size effects communication, and the possibility of individual risk and systematic risk.

Optimal Team Size for Performance

The question of optimal team size is a perpetual debate in software organizations. To adjust, grow and develop different products we must rely on various sizes and makeups of teams.

We often assume that fewer people get less done, which results in the decision of adding people to our teams so that we can get more done. Unfortunately, this solution often has unintended consequences and unforeseen risks.

When deciding how big of a team to use, we must take into consideration several different aspects and challenges of team size. The most obvious and yet most often overlooked is communication.

Risk #1: Communication Costs Follow Geometric Growth

The main reason against big teams is communication. Adding team members results in a geometric growth of communication patterns and problems. This increase in communication pathways is easiest illustrated by a visual representation of team members and communication paths. 

Geometric Growth of Communication Paths

Bigger teams increase the likelihood that we will have a communication breakdown.

From the standpoint of improving communication, one solution that we commonly see is the creation of microservices to reduce complexity and decrease the need for constant communication between teams. Unfortunately, the use of microservices and distributed teams is not a “one size fits all” solution, as I discuss in my blog post on Navigating Babylon.

Ultimately, when it comes to improving performance, keep in mind that bigger is not necessarily better. 

Risk #2: Individual Risk & Fragility

Now a larger team seems like it would be less fragile because after all, a bigger team should be able to handle one member winning the lottery and walking out the door pretty well. This assumption is partially correct, but lottery tickets are usually individual risks (unless people pool tickets, something I have seen in a few companies).

When deciding how small to keep your team, make sure that you build in consideration for individual risk and be prepared to handle the loss of a team member.

Ideally, we want to have the smallest team as is possible while limiting our exposure to any risk tied to an individual. Unfortunately, fewer people tend to be able to get less work done than more people (leaving skill out of it for now).

Risk #3: Systematic Risk & Fragility

Systematic risk relates to events that will affect multiple people in the team. Fragility is the concept of how well structure or system can handle hardship (or changes in general). Systemic risks are aspects shared across the organization, this can be leadership, shared space, or shared resources.

Let’s look at some examples:

  • Someone brings the flu to a team meeting.
  • A manager/project manager/architect has surprise medical leave.
  • An affair between two coworkers turns sour.

All of these events can grind progress to a halt for a week (or much more). Events that impact morale can be incredibly damaging as lousy morale can be quite infectious.

In the Netherlands, we have the concept of a Baaldag (roughly translated as an irritable day) where team members limit their exposure to others when they know they won’t interact well. In the US with the stringent sick/holiday limits, this is rare.

Solutions to Mitigate Risk 

Now there are productive ways to minimize risk and improve communication. One way to do this is by carefully looking at your structure and goals and building an appropriate team size while taking additional actions to mitigate risk. Another effective technique for risk mitigation is through training. You shouldn’t be surprised, however, that my preferred method to minimize risk is by developing frameworks and using tests that are readable by anyone on your team.

The case for continuing education

Do you have job security? The surprising value in continuing education.

If the last 2 decades have taught us anything about change, they’ve shown that while software development may be one of the most rapidly growing and well-paid industries, it can also be highly unstable.

You may already invest in professional development in your free time. In this piece, I’ll show you how to convince your employer to invest in professional development as part of your job.

My Personal Story

I started my first software job at the height of the dot-com boom. I’d yet to finish my degree, but this didn’t matter because the demand for developers meant that just about anyone who merely knew what HTML stood for could get hired. Good developers could renew contract terms 2 or 3 times per year. Insanity reigned and some developers financially made out like bandits.

Of course, then came the crash came. The first crash happened just about the time I finished my degrees. By the time I graduated, I’d gone through three rounds of layoffs during my internships. By the time I actually started full-time work things had stabilized a bit, with layoffs settling down to once a year events in most companies. In 2007 we saw an uptick in a twice a year layoff habit for some companies, but then it quieted down again.

Of late, in most companies and industries software developer layoffs are less frequent. The more significant problem is, in fact, finding competent brains and bodies to fill open positions adequately.

My initial move into consultancy stemmed from a desire to take success into my own hands. Contracting and fulfilling specific project needs leaves me nimble and in control of my own destiny. My success is the happiness of my customer, and that is within my power.  Indeed, I am not immune to unexpected misfortune, but I rarely risk a sense of false security.  And I particularly enjoy the mentoring aspect of working as a consultant.

Despite the growth, I’d say software is still a boom and bust cycle.

Despite the relative calm (for the developers, not the companies), I think that as a software developer it is wise to accept that our work can vanish overnight or our salaries cut in half next month. Some people even leave the industry in hopes of better job security, while others deny the possibility that misfortune will ever knock on their door.

Not everyone has the desire, personality or the aptitude to be a consultant. However, everyone does have the ability to plan for and expect change. I wager that in any field it is wise to always have your next move in the back of one’s mind. This need to be prepared is particularly true in the area of software development. And while some people keep their resume fresh and they may even make a habit of annual practice interviews. Others have no idea which steps they’ll need to take to land their next job.

Landing that next job has some steps, and while the most straightforward step may be to make sure your resume and your LinkedIn profile are fresh with the right keywords (and associated skills) sprinkled throughout, it is even more important to stay on top of your game professionally.

Position yourself correctly, and you will fly through the recruiters’ hands into your next company’s lap. For many companies, keywords are not enough — they also need to know that you have experience with the most current versions and recent releases. Recruiters may not be able to tell you the difference between .Net 3.5 vs. 4.0; but if their client asks for only 4.0, they will filter out the 3.5 candidates. Versions are tricky, Angular 1 to 2 is a pretty big change, Angular 2 to 4 is tiny (and no, there is no Angular 3), it is not reasonable to expect recruiters to make heads or tails off of these versions.    

Constant Change Means Constant Learning

So how do you position yourself to leap if and when you need to? In the field of software development, new tools, methods, and practices are continually appearing. Software developers frequently work to improve and refine the trade and their products.

The result of this constant change is that for software engineers who maintain legacy products; you are at risk of losing your competitive edge. Staying at one job often results in developers becoming experts in software that will eventually be phased out.

Not surprisingly, the companies that rely on software to get their work done, but that are not actually software companies by trade tend to overlook professional development for their employees. The decision makers at these companies concern themselves with their costs more than the competitiveness of their employees and so they often remain entirely ignorant of the realities for their software engineers.

In some companies, from the decision makers’ point of view, they don’t see any logic in investing in training their employees or upgrading their software, when what they have works just fine. It’s easy to make a budget for a software upgrade, what is less evident is the cost of reduced marketplace competitiveness of their employees. Even worse, in some companies, there is an expectation that instead of investing in training, they’ll simply hire new people with the skills they need when their existing staff gets dated.

I once met a brilliant mathematician in Indianapolis that had worked on a legacy piece of software. One day after 40 years of loyal employment he found himself without a job due to a simple technology upgrade. With a skill set frozen circa 1980, he ended up working the remainder of his career in his neighborhood church doing administrative tasks and errands. Most people do not want to find themselves in that position, and they want to keep their economic prospects safe.

Maintain Your Own Competitive Edge

Another reason that many software engineers (and developers) move jobs every few years is to maintain their competitive edge and increase their pay. Indeed, earlier last year Forbes published a study showing that employees who stay longer than two years in a position tend to make 50% less than their peers who hop jobs.

“There is often a limit to how high your manager can bump you up since it’s based on a percentage of your current salary. However, if you move to another company, you start fresh and can usually command a higher base salary to hire you. Companies competing for talent are often not afraid to pay more when hiring if it means they can hire the best talent.”

More Important than Pay is the Software Engineer’s Fear of Irrelevance

As a software engineer working for a company that uses software (finance, energy, entertainment, you name it) there is nothing worse than seeing version 15 arrive on the scene when your firm remains committed to version 12.

Your fear is not that version 12 technology will phase out tech support, as these support windows are often a good decade in length. You fear that this release means that your expertise continues to become outdated and the longer you stay put, the harder it will be to get an interview, let alone snag a job. You feel a sinking dread that your primary skill-set is suddenly becoming irrelevant.  

Your dated skill-set has real financial implications and will eventually negatively impact your employability.

A Balancing Act

For companies, the incentive is to develop software cheaply, and cheap means that it is easy to use, quick to develop and let’s be realistic here, that you can Google the error message and copy your code from stack exchange.

A problem in software can often gobble up a few days when you are on the bleeding edge. All too often I stumble upon posts on a stack exchange where people answer their own question, often days later; or even worse I see questions responded to months after having asked for help. It makes sense that companies want to avoid the costs of implementing new releases.

Why would companies jump on the latest and greatest when the risk of these research problems is amplified in the latest version?

Companies are Motivated to Maintain Old Software, while employees are motivated to remain competitive.

This balancing act is a cost transfer problem; the latest framework is a cost to companies due to the research aspect, whereas an older framework is a cost to developers by reducing their marketability. At the moment where it is hard to hire good people, it will be hard to convince developers to bear the costs of letting their skills fall out of date.

New language and framework features can add value, but they are often minor, and there are often just ways to do something people can already do better and faster (but this is only true after the learning curve, and even then the benefits rarely live up to expectations (see No Silver Bullet). Chances are that the benefit of a new version of a framework will often outweigh the costs of learning the new framework, especially for existing code bases.

It seems like there should be some room for the middle ground; in the past, there was a middle ground. This was called the training budget.

Corporate Costs

With software developers jumping ship every few years to maintain their competitive edge, it is understandable that some management might find it difficult to justify or even expect a return on investment on training staff. In many cases, you’d need to break even on your training investment in less than a year.

At the same time, the need for developers to keep learning will never go away. Developers are acutely aware that having out of date skills is a direct threat to their economic viability.

For the near future developers will remain in high demand and the effects of refusing to provide on the job continuing education will only backfire. Developers are in demand, and they want to learn on the job. Today we do our learning on the production code, and companies pay the price (quite likely with interest). Whereas before developers were shipped off to conferences once a year, now they Google and read through the source code of the new framework on stack overflow for months as they try to solve a performance issue.

In Conclusion: Investing in Continuing Education Pays Off

The industry has gone through a lot of changes, in the dot-com boom developers were hopping jobs at an incredible speed, and companies reacted by changing how they treated developers and cut back on training as they saw tenures drop. This all makes perfect sense. Unfortunately, this has led to developers promoting aggressive and early adoption of frameworks so that developers keep their skills up-to-date with the market. And as more and more companies adapt to frequent updates, the pressure to do so will only increase.

Training provides a way to break the cycle and establish an unspoken agreement that companies will leave developers as competitive as they were when they were hired by regular maintenance through training. So how to support continuing education and maintain a stable and loyal development pool? Send your developers to conferences, host in-house training, lunch and learns, and so on to ensure that they feel both technically competitive and financially secure.  

Despite their reluctance, in the end, there is a real opportunity and a financial incentive for companies to go back to the training budget approach. Companies want to have efficient development, developers want to feel economically secure. If developers are learning then they feel like they are improving their economic prospects and remaining competitive. Certainly, some will still jump ship when it suits their professional goals, but many will chose to stay put if they feel they remain competitive.

“Advancement occurs through the education of practitioners at least as much as it does by the advancement of new technologies.” Improving Software Practice Through Education



How to play with LEGO when you should be testing

How To Play with LEGO® When You Should be Testing

There is a gap between automating your testing and implementing automated testing. The first is taking what you do and automating it; the second is writing your tests in an executable manner. You might wonder about the distinction that makes these activities different. And indeed, some testers will view these two events as parts of a single process.

However, for someone who reads test cases, for instance, your customer support, it is a big difference; what is a transparent process to one party, is often an opaque process to another. In fact, if you are a nontechnical person reading this explanation, you may very well be scratching your head trying to see the difference and understand why it is crucial.

Writing Tests versus Reading Tests

Automating involves looking at your interactions and your processes, both upstream and downstream, and then automating the part in between while maintaining those interactions. This automation process should mean that nontechnical people remain able to understand what your tests do, while also providing your developers the necessary detail of how to reproduce bugs.

The second piece, communicating the detailed information to your developers is relatively easy as you can log any operation with the service or browser; however, explaining these results to nontechnical people is often significantly more difficult.

When it comes to communicating the information, you have a few options; the first is to write detailed descriptions for all tests. Writing descriptions works, at least initially.

Regrettably, what tends to happen is that these descriptions can end up inaccurate without any complaints from the people who read your tests. If testing goes smoothly, then nothing will happen, and no one will notice the discrepancy. The problems only arise when something goes wrong.

Meaningful Tests Deliver Confidence

The worst problem an automated test implementation can have is one that erodes confidence in the results. And when a test fails, and the description does not match the implementation then you have suddenly undermined confidence in the entire black box that certifies how the software works.

And now, the business analyst cannot sleep at night after release testing, because there is a nagging suspicion something might not be right.

You might respond that keeping the descriptions updated accurately should be easy, and technically that should be true. However, the reality is that description writing (and updating) is often only one aspect of someone’s job.

Breaking Down the Problem

As a project progresses and accumulates tasks, particularly as a project falls behind schedule and or over budget, the individuals writing the descriptions are often too worried about many other details to focus proper attention on their descriptions.

And whether we like it or not when we move to automated testing we become prone to hacks and shortcuts just like everyone else. There is the simple reality that testing a new feature appears to be a more pressing priority than updating the comments on some old test cases.

Moreover, above and beyond your ability to remember or accurately update the comments, there is another point to consider. There is an underlying problem with these detailed descriptions as they violate one of the more useful rules of programming: DRY (Don’t Repeat Yourself).

One of the most important reasons to practice DRY is that duplicates all too easily get out of sync and cause systems to behave inconsistently. In other words, two bits that should do the same bit, now do slightly different things. Oops. Or in the case of documentation and automated tests; two bits that should mean the same bit are now out of sync. Double oops.

How do we avoid duplication and implement DRY?

We can use technologies that solve this process, such as using a natural language API, so that your tests read like English.

For Example:

This example is readable and executable in just about every language, but regrettably, to get to this form you will have to write a lot of wrappers and helpers so that the syntax is easy to follow.

And this means that you will likely then need to rely on writers with significant experience in your specific software language and we may not have someone with the right expertise on hand.

An alternative is to create a domain specific language in which you write your tests. A domain specific language means that you create tests in something like Gherkin/Cucumber or you write a proprietary parser and lexer. Of course, this path again relies on someone who has a lot of experience with API / Language design.

The Simplest Solution

To me the preferred method is to use Gherkin; mostly because it is easier to maintain after your architect wins the lottery and moves to Bermuda. With Gherkin when you run into a problem you can Google the answer, or hire specialist. There is a sense of mystery and adventure about working on issues where you can’t Google the answer, at the same time, it’s not necessarily a reliable business practice.

The most significant benefit that I’ve discovered is that for many people, this method no longer feels like programming. This statement undoubtedly seems odd coming from a programmer, but hear me out as there is a method to my madness.

Solutions for the People You Have

To begin, let’s acknowledge that there is a shortage of programmers, especially programmers that are experts in your particular business. Imagine if you had a tool that you could hand to anyone who knows your field and that would allow this individual to understand and define tests? Wouldn’t that be grand?

How would this tool look? What would it require? To accomplish readability (understanding) and functionality (definition) you’d need to be able to hand off something that is concrete (versus abstract) and applicable to the business that you are in, but most importantly it needs to have a physical appearance.


Imagine if you could design tests with LEGO®? There is nothing you can build out of Lego bricks that you couldn’t create in a wood shop. Unfortunately, most people are too intimidated to make anything in a woodshop. Woodworking is the domain of woodworkers. On the flipside, give anyone a box of Lego bricks, and they will get to work building out your request.

Software development runs into the woodworker conundrum: programming is the domain of developers. Give a layperson C#, Java or JavaScript and assign them to a project to build and they’ll get so flustered they won’t even try. Give them Lego bricks, and they will at the least try to build out their assignment.

Reducing the barrier to accomplishing something new is extremely important for adoption; we know this is a barrier to getting customers to try our software, but we often forget the same rule applies to adopting new things for our teams.

This desire for something concrete to visualize is why we come across people who would “like to learn to program” while building complicated Excel sheets filled with basic visual formulas. These folks can see Excel so they naturally can use this formula thing, and don’t even realize that they are programming. My method is similar in concept.

To successfully automate our testing we need to reduce the barriers to trying to use the technology we plan to produce tomorrow.  As an organization adopts change, we need to find ways to make changes transparent and doable; we need to convince our people that they will succeed or they might not even try.

Gherkin Lego bricks

As I wrote in my post Variables in Gherkin:

“The purpose of Cucumber is to provide a clear and readable language for people (the humans) who need to understand a test’s function. Cucumber is designed to simplify and clarify testing.

For me, Cucumber is an efficient and pragmatic language to use in testing, because my entire development team, including managers and business analysts, can read and understand the tests.

Gherkin is the domain-specific language of Cucumber. Gherkin is significant because as a language, it is comprehensible to the nontechnical humans that need to make meaning out of tests.

In simpler terms: Gherkin is business readable.”

This explanation shows how Gherkin is the perfect language for our Lego bricks, to build testing “building blocks.” To create an infrastructure that is readable by and function for the people you have on hand, I like to develop components that I then provide to the testers.

Instead of needing to have a specific technical understanding, such as a particular programming language the testers now just need to know which processes they need to test.

For example, I would like to:

  • Open an order = a blue 2×4 brick
  • Open a loan = a blue 2×6 brick
  • Close a loan = a red 2×4 brick
  • Assign a ticket = a green 2×4 brick

This method addresses the issue of DRY because each “brick” has its process, so if you need to change the process, you change out the brick. If a process is broken, you pass it back to the development team, but it is very precise. It makes it concrete and removes a lot of the abstract parts inherent in software development.

⇨ This method addresses the issue of readability because each brick is a concrete process. Your testers can be less technical while producing meaningful automated tests.

⇨ This method solves the problem of confidence because problems are isolated to bricks. If one brick is broken, it doesn’t hint at the possibility that all bricks are broken.

⇨ This method also solves the problem of people, because it’s much easier to find testers who understand your business process and goals, such as selling and closing mortgages, without also having to understand the abstract nature of the underlying software that makes it all work.

The reality is that in this age every company has software while few are software companies.  Companies rely on software to deliver something concrete to their customers. My job as an automation and process improvement specialist is to make the testing of the software piece as transparent and as painless as possible so that your people can focus on your overarching mission.

“LEGO®is a trademark of the LEGO Group of companies which does not sponsor, authorize or endorse this site”.

The Cost of Software Bugs: 5 Powerful Reasons to Get Upset

If you read the PossumLabs blog regularly, you know already that I am focused on software quality assurance measures and why we should care about implementing better and consistent standards. I look at how the software quality assurance process affects outcomes and where negligence or the effects of big data might come into play from a liability standpoint. I also consider how software testing methodologies may or may not work for different companies and situations.

If you are new here, I invite you to join us on my quest to improve software quality assurance standards.

External Costs of Software Bugs

As an automation and process improvement specialist, I am somewhat rare in my infatuation with software defects, but I shouldn’t be. The potential repercussion of said bugs is enormous.

And yet you ask, why should YOU care?

Traditional testing focuses on where in the development lifecycle a bug is found and how to reduce costs. This is the debate of Correction vs. Prevention and experience demonstrates that prevention tends to be significantly more budget-friendly than correction.

Most development teams and their management have a singular focus when it comes to testing: they want to deliver a product that pleases their customer as efficiently as possible. This self-interest, of course, focuses on internal costs. In the private sector profit is king, so this is not surprising.

A few people, but not many, think about the external costs of software defects. Most of these studies and the interested parties tend to be government entities or academic researchers. In this

In this article, I discuss five different reasons that you as a consumer, a software developer or whomever you might be, should be concerned with the costs of software bugs to society.

#1 No Upper Limit to Financial Cost

The number one reason that we should all be concerned is that in reality software costs for defects, misuse or crime likely have no upper limit on their expense.

In 2002 NIST compiled a detailed study looking at the costs of software bugs and what we could do to both prevent and reduce costs, not only within our own companies but also external societal costs. The authors attempted to estimate how much software defects cost different industries. Based on these estimates they then proposed some general guidelines.

Although an interesting and useful paper, the most notable black swan events over the last 15 years demonstrate that these estimates provide a false sense of security.

For example, when a bug caused $500 million US in damage with the Ariana 5 rocket launch failure, observers treated it like a freak incident. At the time, little did we know that the financial cost of freak incident definition would continue to grow a few orders of magnitude just a few years later.

This behavior goes by many names, Black Swans, long tails, etc. What it means is that there will be extreme outliers. These outliers will defy any bell curve models, they will be rare, they will be unpredictable, and they will happen.


Black Swan is an unpredictable event as so named by Nassim Nicholas Taleb in his book The Black Swan: The Impact of the Highly Improbable. It is predicted that the next Black Swan will come from Cyberspace.

Long tail refers to a statistical event in which most events will happen in a specific range whereas a few rare events will occur at the end of the tail. https://en.wikipedia.org/wiki/Long_tail

Of course, it is human nature always to try and assemble the clues that might lead to predicting a rare event.

Let’s Discuss Some Examples:

4 June 1996

A 64-bit integer is written to a 16-bit value, and 500 million dollars went up in flames. As you see in Table 6-14 (page 135), as published in the previously mentioned NIST study, the estimated cost for software defects for the aerospace industry for a company this size was only $1,289,167. And so, 500 million blows that estimate right out of the water.

This single bug cost 200 times the expected annual cost of defects for a company.

May 2007

A startup routine for engine warm-up is released with some new conditions. The estimate for the automotive industry’s cost of software bugs in 2002; per company, per year as seen in Table 6-14 (page 135). Company Costs Associated with Software Errors and Bugs Automotive for a company bigger than 10,000 was only $2,777,868. That is not even a dent in the cost to Volkswagen — this code cost Volkswagen 22 Billion dollars.

That equates to about 10,000 times the expected costs of defects per company per year.

This behavior goes by many names, Black Swans, long tails, etc. What it means is that there will be extreme outliers. These outliers will defy any bell curve models, they will be rare, they will be unpredictable, and they will happen.

It is human nature always to try and assemble the clues that might lead to predicting a rare event. Unfortunately, when it comes to liability, it seems only academics are interested in this type of prediction, but given the possibility of exponential costs to a single company, shouldn’t we all be concerned?

#2 Data Leaks: Individual Costs of Data Loss?

Data leaks of 10-100 million customers are becoming routine. These leaks are limited by the size of the datasets and thus unlikely to grow much more. In large part that is because not many companies have enough data to move into the billions of records data breaches.

Facebook has ~2 billion users, the theoretical limit of a data breach is therefore limited to Facebook, or a Chinese or Indian government system. We only have 7.5 billion people on earth so to have a breach of 10 billion users we first need more people.

Security Breaches are limited by the Human Population Factor

That is what makes security breaches different, the only thing that it tells us is that we will approach the theoretical limit of how bad it could be. The Equifax breach affected 143 million users.

When it comes to monetary damages for the cost of the data breach, there is not a limiting factor, such as population size.

As we saw with Yahoo and more recently Equifax, cyber security software incidents show a similar pattern of exponential growth when it comes to costs. Direct financial costs are trackable, but the potential for external costs and risks should concern everyone.

#3 Bankrupt Companies and External Social Costs

From its inception no one would have predicted that this simple code pasted below might cost VW $22 billion US:

if (-20 /* deg */ < steeringWheelAngle && steeringWheelAngle < 20 /* deg */)


lastCheckTime = 0;

cancelCondition = false;




if (lastCheckTime < 1000000 /* microsec */)


lastCheckTime = lastCheckTime + dT;

cancelCondition = false;


else cancelCondition = true;


else cancelCondition = true;


Even if you argue that this is not an example of a software defect, but rather deliberate fraud, it’s unlikely you’d predict the real cost. Certainly, one was different, unexpected, not conforming to our expectations of a software defect. But that is the definition of a Black Swans. They do not conform to expectations, and as happened here the software did not act according to expectations. The result is that it cost billions.

How many companies can survive a 22 million dollar hit? Not many. What happens when a company we rely heavily on suddenly folds? Say the company that manages medical records in 1/5th of US states? Or a web-based company that provides accounting systems to clients in 120 countries just turns off?

#4 Our National Defense is at Risk

This one doesn’t take a lot to understand the significance, and yet it is one issue currently in the limelight. Software defects, faults, errors, etc. have the potential to produce extreme costs, despite infrequent occurrences. Furthermore, the origins of the costs of long tail events may not always be predictable.

After all what possible liability would Facebook have for real-world damages regarding international tampering in an election? It is all virtual, just information; until that information channel is misused.

There is very little chance that when actuaries for Facebook thought about election interference that they looked for such an area of risk. Sure they considered liability, people live broadcasting horrible and inhumane things, but did they contemplate foreign election interference? And even if they did consider the possibility, how would they have been able to predict or monitor the entry point?

And that is the long tail effect; it is not what we know, or can imagine, it is the unexpected. It is the bug that can’t be patched, as the rocket exploded, it is the criminal misuse of engine optimization routines or the idea that an election could be swayed due to misinformation. These events are so costly that we can’t assume that we know how bad it could be because the nature of software means that things will be as bad as they possibly can get.

#5 Your Death or Mine

Think of the movie IT, based off of Stephen King’s book by the same name. A clown that deceives children and leads them to death and destruction. What happens when a piece equipment runs haywire, masquerading as one thing and doing yet another? Software touches a great enough aspect of our lives that from the hospital setting to self-driving cars, a software defect could undoubtedly lead to death.

We’ve already had a case, presumably settled out of court, where a Therac-25 radiation therapy machine irradiated people to death. What happens when a cloud update to a control system removes fail-safes on hundreds or thousands of devices in hospitals or nursing homes? Who will be held liable for those deaths?

Mitigation is often an attempt at Prediction

A large part of software quality assurance is risk mitigation as an overlapping safety net to look for unexpected behaviors. Mitigation is an attempt to make it less likely that your company unintentionally finds the next “unexpected event.”

There has been a lot written about how there is an optimal way to get test coverage on your application. Most of this comes down to testing the system at the lowest level (unit test) that is feasible and has resulted in the testing pyramid. This is mathematically true. Unfortunately, the pyramid assumes that there are no gaps in coverage. Less overlap means that a gap in coverage at a lower level is less likely to be caught at a higher level.

The decision of test coverage and overlapping coverage can be approximated using Bernoulli trial, which delivers one of two results: success or failure.

Prioritizing the Magnitude Of Errors and their Effects

When we look at the expected chance of a defect and multiply that with the cost of a defect, we can compare that to the chance of a defect with overlapping coverage, multiplied by the cost.

We are usually looking at the cost of reducing the chance of a defect slipping through and comparing that to our estimated cost of a defect.

Unfortunately, the likelihood that we underestimate the cost of a defect due to long tail effects is very high. Yes, it is improbable that your industry will have a billion dollar defect discovered this year; but how about in the next 10 years? Now the answer becomes a maybe, let us call it a 10% chance and let us say that there are 100 companies in your industry. What is the cost of one of those outlier defects per year?

1,000,000,000 * .01 (1% chance per year) * .01 (1% chance of it hitting your company) = 100,000 per year as an expected cost for outlier defects per year.

The problem with outlier events is that despite their rare nature, even with a significantly small probability that your company might be the victim, the real outliers have the potential to be so big and expensive that it may, in fact, be worth your time investing in considering the possibility.

Enduring the Effect of a Black Swan

In reality, companies might use bankruptcy law to shield themselves from the full cost of one of these defects. VW’s financial burden for their expensive defect stems from the fact that they could afford it without going bankrupt. The reality is that most companies couldn’t afford to pay the costs of this type of event and would ultimately be forced to dissolve.

We cannot continue to ignore that software defects, faults, errors, etc. have the potential to produce extreme costs, despite infrequent occurrences. Furthermore, the origins of the costs of long tail events may not always be predictable.

The problem with “rarity of an event” as an insurance policy is that the costs of significant black swan bug events are that their risk goes beyond simple financial costs borne by individual companies. The weight of these long tail events is borne by society.

And so the question is, for how long and to what extent will society continue to naively or begrudgingly bear the cost of software defects? Sooner or later the law will catch up with software development. And software development will need to respond with improved quality assurance standards and improved software testing methodologies.

What do you think about these risks? How do you think we should address the potential costs?


Tassey, G., Ph.D. (2002, May). Report02-3: The Economic Impacts of Inadequate Infrastructure for Software Testing [PDF]. Gaithersburg: RTI for National Institute of Standards and Technology.

A Crucial Look at the Unethical Risks of Artificial Intelligence

Artificial Intelligence Pros and Cons:

As much as we wonder at the discoveries and the artificial intelligence benefits to society of AI and prediction engines, we also recoil at some of their findings. We can’t make the correlations that this software discovers go away, and we can’t stop the software from re-discovering the associations in the future. As decent human beings, we certainly wish to avoid our software making decisions based on unethical correlations.

Ultimately, what we need is to teach our AI software lessons to distinguish good from bad…

Unintended results of AI: an example of the disadvantage of artificial intelligence.

A steady stream of findings already makes it clear that AI efficiently uses data to determine characteristics of people. Simplistically speaking, all we need is to feed a bunch of data into a system and then that system figures out formulas from that data to determine an outcome.

For example, more than a decade ago in university classes, we ran dome tests on medical records trying to find people that had cancer. We coded the presence of disease onto our training data, which we then scanned for correlations to other medical codes present.

The algorithm ran for about 26 hours. In the end, we scanned the data for accuracy, and needless to say, the system returned fantastic results. The system reliably honed in on a medical code that predicted cancer; and more specifically, the presence of tumors.

Of course, at the outset, we’d like to assume this data will go to productive, altruistic uses. At the same time, I’d like to emphasize that the algorithm delivered the response: “well, of course, that is the case,” substantially demonstrating that such a program can discover correlations without being explicitly told what to look for…

Researchers might develop such a program with the intention to cure cancer, but what happens if it gets into the wrong hands? As we well know, not everyone, especially when driven by financial gain, is altruistically motivated. Realistically speaking, if we use a program looking for correlations to guide research leading to scientific discoveries for good intent, it can also be used for bad.

The Negative Risk: Unethical Businesses

By function and design, algorithms naturally discriminate. They distinguish one population from another. The basic principles can be used to determine a multitude of characteristics: sick from healthy; gay from straight; and, black from white.

Periodically the news picks up an article that illustrates this facility. Lately, it’s been a discussion of facial recognition. A few years ago the big issue revolved around Netflix recommendations.

The risk is that this kind of software can likely figure out, for example, if you are gay with varying levels of certainty. Depending on the data available, AI software can figure out figure out all sorts of other information that we may or may not want it to know or that we may not intend for it to understand.

When it comes to the ethics and adverse effects of artificial intelligence, it’s all too easy to toss our hands in the air and have excited discussions around the water cooler or over the dinner table. What we can’t do is simply make it we can’t make it go away. This concern is a problem that we must address.

Breakthrough: The Problem is its own Solution

Up to this point, my arguments may sound depressing. The good news is that the source of the problem is also the source of the solution.

If this kind of software can determine from data sets the factors (such as the presence of tumors) that we associate with a discrimination (such as the presence of cancer), we can then take these same algorithms and tell our software to ignore the results.

If we don’t want to know this kind of information, simply ignore this type of result. And then, we can then test to verify that our directives are working and our software is not relying on the specified factors in our other algorithms.

For instance, say we determine that as part of a determination of the risk of delinquent payment for a mortgage, we know that our algorithm can also determine gender, race or sexual orientation. Rather than using this data, which is likely a wee bit racist, sexist, and bigoted, when calculating a mortgage rate recommendation, we could ask it to ignore said data.

In fact, we could go even further. Just as we have equal housing and equal employment legislation, we could carry over to legislate that if a set of factors can be used to discriminate, then software should be instructed to disallow the combining of those elements in a single algorithm.

Discussion: Let’s look at an analogy.

Generally speaking, US society legislates that Methamphetamine is bad, and people should not make it, but at the same time the recipe is known, and we can’t uninvent meth.

An unusual tactic is to publicise the formula and tell people not to mix the ingredients into their bathtub “accidentally.” If we find people preparing to combine the known ingredients, we can then, of course, take legal action.

For software, I’d recommend that we take similar steps and implement a set of rules. If and when we determine the possible adverse outcomes of our algorithms, we can require that the users (business entities) cannot combine the said pieces of data into a decision algorithm, of course making an exception for those doing actual constructive research into data-ethical issues.

The Result: Constructing and or Legislating a Solution

Over time our result would be the construction of a dataset of ethically sound and ethically valid correlations that could be used to teach software what it is allowed to do. This learning would not happen overnight, but it also might not be as far down the line as we first assume.

The first step would be to create a standard data dictionary where people and companies would be able to share what data they use, similar to elements on the chemical periodic table. From there we would be ready to look for the good and the bad kinds of discrimination. We can take the benefits of the good while removing the penalties from the bad.

This process might mean that some recommendations would possibly have to ask if it would be allowed to utilize data that could be used to discriminate based upon an undesirable metric (like race). And it might mean that in some cases it would be illegal to combine specific pieces of data, such as for a mortgage rate calculation.

No matter what we choose to do, we can’t close Pandora’s box. It is open; the data exists, the algorithms exist; we can’t make that go away. Our best bet is to put in the effort to teach software ethics, first by hard rules, and then hopefully let it figure some things out on its own. If Avinash Kaushik’s predictions are anywhere near accurate, maybe we can teach software actually to be better than humans at making ethical decisions, only the future will tell!

If you’re curious about the subject of AI and Big Data read more in my piece Predicting the Future.

Why Negligence in Software should be of Urgent Concern to You

The future of Liability in Software:

Many things set software companies apart from other businesses. A significant, but an often overlooked difference, is that the manufacturers of software exhibit little fear of getting sued for negligence, including defects or misuse of their software. For the moment, the legal consequences of defective software remain marginal.

After more than a decade, even efforts to reform the Uniform Commercial Code (UCC) to address particularities of software licensing and software liabilities remain frozen in time. As Jane Chong discusses in We Need Strict Laws, the courts consistently rule in favor of software corporations over consumers, due to the nature of contract law, tort law, and economic loss doctrine. In general, there is little consensus regarding the question: should software developers be liable for their code?

Slippery When Wet

If you go to your local hardware store, you’ll find warning signs on the exterior of an empty bucket. Look around your house or office, and you see warning labels on everything from wet-floors to microwaves and mattresses.

In software, if you are lucky you might find an EULA buried somewhere behind a link on some other page. Most products have a user manual, why is it not enough to print your “warning” inside? An easy to find, easy to implement as a standard.

Legal issues in software development: why is there no fear?

Fear is socially moderated and generated. Hence the term “mass hysteria.” We fear things that we’ve already experienced or that have happened to personal connections. We all too easily imagine horrible events that have befallen others. In an age of multimedia, this is a rather flawed system that has gone berserk or as they say “viral.” What we see has left us with some pretty weak estimates on the odds of events, like shark attacks or kidnappings.

One reason we don’t fear lawsuits around software is that we don’t see them in the public sphere. They do happen, but all too often the cases never make it very far. Or the judge rules in favor of the software developer. Interpretation of the laws makes it difficult to prove or attribute harm to a customer.

To date, we’ve yet to see a Twittergate on negligence in software development. This lack of noise doesn’t mean that no one has reasons to sue. Instead, it is more of an indicator that for the moment, the law and the precedent are not written to guide software litigation. And with little news or discussion in the public sphere, no one has the “fear.”


A Matter of Time

Frankly, it is not a matter of will it happen, but when? What is the likelihood that in the next five years there will be successful suits brought against software firms? Should we make a bet?

Software development is an odd industry. We leave an incredible electronic trail of searchable data for everything that we do. We track defects, check-ins, test reports, audit trails, and logs. And then we back up all of them. Quite often these records last forever. Or at least as long, if not longer than the company that created the record.

Even when we change from one source control system to another, we try to make sure that we keep a detailed record intact just in case we want to go back in time.

This level of record keeping is impressive. The safety it provides and the associated forensic capabilities can be incredibly useful. So what is the catch? There is a potential for this unofficial policy of infinite data retention to backfire.

Setting the Standard

Most companies follow standard document retention policies that ensure businesses save both communications and artifacts to meet fiscal or regulatory requirements for a required period then eventually purged after some years.

Ironically, even in companies who follow conservative document retention policies, the source control, and bug tracking system is often completely overlooked if not flat out ignored. From a development perspective, this makes sense: data storage isn’t expensive, so why not keep it all?

The Catch & The Cause

The reason that document retention policies exist is not merely to keep companies organized and the IRS happy, it’s because of the potential for expensive lawsuits. Companies want to make sure that they can answer categorically why certain documents do or do not exist.

For instance, let’s say your company makes widgets and tests these before shipping them on; you don’t want to say that “we don’t know why we don’t have those test results.” By creating a documented process around the destruction of data (and following it) you can instead point to the document and say the data does not exist — it’s been destroyed according to “the policy.”

Policy? What Policy?

This example takes us back to the story of liability in software. In the software business we often keep data forever, but then we also delete data in inconsistent bursts. Maybe we need to change servers, or we are running out of space, or we find random backups floating around on tapes in vaults. So we delete it or shred it or decide to move to the Cloud to reduce the cost of storage.

This type of data doesn’t tend to carry any specific policy or instructions for what we should and shouldn’t include, or how best to store the data, and so on.

What’s more, when we document our code and record check-ins, we don’t really think about our audience. Or default audience is likely ourselves or the person(s) in the cube next to us. And our communication skills on a Friday night after a 60-hour-week won’t result in the finest or most coherent checking comments, especially if our audience ends up being someone besides our cubemate.

The reason that this type of mediocre record keeping persists is that it remains difficult to sue over software. There really are no clear-cut ways to bring a suit for defective services provided. If you live in the US you know the fear of lawsuits over slips and falls; this fear does not exist for creating a website and accepting user data.

Walking a Thin Line

My guess is that this indifference to record keeping and data retention will persist as long as potential suitors must do all the groundwork before seeing any money. And, as long as judges continue to side with corporations and leave plaintiffs in the lurch. However, as soon as that first case sneaks through the legal system and sets a precedent, anyone and everyone just may reference that case.

Ironically, patents or copyright protection don’t travel with theories presented in a trial, which means that once a case makes it through the system, the case only needs to be referenced. Suggesting that if one lawyer learns how to sue us; they all do. Think of it as an open source library you can reference, once it exists anyone gets to use it.

I expect that there will be a gold rush, we are just waiting for the first prospector to come to town with a baggy of nuggets.

As to what companies can do? For now, create an inventory of what data you keep and how it compares to any existing policies. This may involve sitting down in a meeting that will be an awkward mix of suits and shorts where there likely will be a slide titled “What is source control?” There is no right answer, and this is something for every company to decide for themselves.

Where does your development process leave a data trail? Has your company had discussions about document retention and source control?

How to Effortlessly Take Back Control of Third Party Services Testing

Tools of the Trade: Testing in with SaaS subsystems.

For the last few years, the idea has been floating around that every company is a software company, regardless of the actual business mission. Concurrently even more companies are dependent upon 3rd party services and applications. From here it is easy to extrapolate that the more we integrate, the more likely it is that at some point, every company will experience problems with updates: from downtime, uptime, and so on.

Predicting when downtime will happen and or forcing 3rd party services to comply with our needs and wishes is difficult if not impossible. One solution to these challenges is to build a proxy. The proxy allows us to regain a semblance of control and to test 3rd party failures. It won’t keep the 3rd parties up, but we can simulate failures whenever we want to.

As an actual solution, this is a bit like building a chisel with a hammer and an anvil. And yet, despite the rough nature of the job, it remains a highly useful tool that facilitates your work as a Quality Assurance Professional.

The Problem

Applications increasingly use and depend upon a slew of 3rd party integrations. In general, these services tend to maintain decent uptime and encounter only rare outages.

The level of reliability of these services leads us to continue to add more services to applications with little thought to unintended consequences or complications. We do this because it works: it is cheap, and it is reliable.

The problem (or problems) that arise stem from the simple nature of combining all of these systems. Even if each service maintains good uptimes, errors and discordant downtime may result in conditions where the time that all your services are concurrently up is not good enough.

The compounding uptimes

Let’s look at a set of services that individually boast at least 95% uptime. Let’s say we have a service for analytics, another for billing, another for logging, another for maps, another for reverse IP, and yet another for user feedback. Individually they may be up 95% of the time, but let’s say that collectively the odds of all of them being up at the same time is less than 75%.

As we design our service, working with an assumption of around a 95% up-time scenario feels a lot better than working with a chance of only 75% uptime. To exacerbates this issue, what happens when you need to test how these failures interact with your system?

Automated Testing is Rough with SaaS

To create automated tests around services being down is not ideal. Even if the services consumed are located on site, it is likely difficult to stop and start them using tests. Perhaps you can write some PowerShell and make the magic work. Maybe not.

But what happens when your services are not even located on the site? The reality is that a significant part of the appeal of third-party services is that businesses don’t really want onsite services anymore. The demand is for SaaS services that remove us from the maintenance and upgrade loop. The downside to SaaS means that suddenly turning a service off becomes much more difficult.

The Solution: Proxy to the Rescue

What we can do is to use proxies. Add an internal proxy in front of every 3rd party service, and now there is an “on / off switch” for all the services and a way to do 3rd party integration testing efficiently. This proxy set-up can also be a way to simulate responses under certain conditions (like a customer Id that returns a 412 error code).

Build or buy a proxy with a simple REST API for administration and it should be easy to run tests that simulate errors from 3rd party providers. Now we can simulate an entire system outage in which the entire provider is down.

By creating an environment isolated by proxies, test suites can be confidently run under different conditions, providing us with valuable information as to how various problems with an application might impact our overall business service.

Proxy Simulations

Upon building a proxy in between our service and the 3rd party service, we can also put in the service-oriented analog of a mock. This arrangement means we can create a proxy that generates response messages for specific conditions.

We would not want to do this for all calls, but for some. For example, say we would like to tell it that user “Bob” has an expired account for the next test. This specification would allow us to simulate conditions that our 3rd party app may not readily support.

Customer specific means better UX

By creating our own proxy, we can return custom responses for specific conditions. Most potential conditions can be simulated and tested before they happen in production. And we can see how various situations might affect the entire system. From errors to speed, we can simulate what would happen if a provider slows down, a provider goes down, or even a specific error, such as a problem that arises when closing out the billing service every Friday evening.

Beyond Flexibility

Initially, the focus might be on scenarios where you simulate previous failures of your 3rd party provider; but you can also test for conditions for which your 3rd party may not offer to you in a test environment. For instance, expiring accounts in mobile stores. With a proxy, you can do all of this by merely keying off of the specific data that you know will come in the request.

In Conclusion: Practical Programming for the Cloud

This proxy solution is likely not listed in your requirements. At the same time, it is a solution to a problem that in reality is all too likely to arise once you head to production.

In an ideal world, we wouldn’t need to worry about 3rd party services.

In an ideal world, our applications would know what failures to ignore.

In the real world, the best course is preparation: this allows us to test and prevent outages.

In reality, we rarely test for the various conditions that might arise and cause outages or slowdowns. Working with and through a 3rd party to simulate an outage is likely very difficult, if not impossible. You can try and call Apple Support to request that they turn off their test environment for the App store services, but they most likely won’t.

This is essentially a side-effect from the Cloud. The Cloud makes it is easy to add new and generally reliable services, which also happen to be cheap and makes the Cloud an all-around good business decision. It should not then be surprising that when you run into testing problems, an effective solution will also come from the Cloud. Spinning up small, lightweight proxies for a test environment is a practical solution for a problem in the Cloud.

Unrealistic Expectations: The Missing History of Agile

Unrealistic expectations or “why we can’t replicate the success of others…

Let’s start with a brain teaser to set the stage for questioning our assumptions.

One day a man visits a church and asks to speak with the priest. He asks the priest for proof that God exists. The priest takes him to a painting depicting a group of sailors, safely washed up on the shore following a shipwreck.

The priest tells the story of the sailors’ harrowing adventure. He explains that the sailors prayed faithfully to God and that God heard their prayers and delivered them safely to the shore.

Therefore God exists.

This is well and good as a story of faith. But what about all the other sailors who have prayed to God, and yet still died? Who painted them?

Are there other factors that might be at play?

When we look for answers, it’s natural to automatically consider only the evidence that is easily available. In this case, we know that the sailors prayed to God. God listened. The sailors survived.

What we fail to do, is look for less obvious factors.

Does God only rescue sailors that pray faithfully? Surely other sailors that have died, also prayed to God? If their prayers didn’t work, perhaps this means that something other than faith is also at play?

If our goal is to replicate success, we also need to look at what sets the success stories apart from the failures. We want to know what the survivors did differently from those that did not. We want to know what not to do, what mistakes to avoid.

In my experience, this is a key problem in the application of agile. Agile is often presented as the correct path; after all lots of successful projects use it. But what about the projects that failed, did they use Agile, or did they not implement Agile correctly? Or maybe Agile is not actually that big a factor in the success of the project?

Welcome to the history of what is wrong with Agile.

Consider this, a select group of Fortune 500 companies, including several technology leaders decides to conduct an experiment. They hand pick some people from across their organization to complete a very ambitious task. A task of an order of magnitude different from anything they’d previously attempted and with an aggressive deadline.

Question 1: How many do you think succeeded?

Answer 1: Most of them.

Question 2: If your team followed the same practices and processes that worked for these teams do you think your team would succeed?

Answer 2: Probably not.

The Original Data

In 1986, Hirotaka Takeuchi and Ikujiro Nonaka published a paper in the Harvard Business Review titled the “The New New Product Development Game.” In this paper, Takeuchi and Nonaka tell the story of businesses that conduct experiments with their personnel and processes to innovate new ways to conduct product development. The paper introduces several revolutionary ideas and terms, which most notably developed the practices that we now know as agile (and scrum).

The experiments, run by large companies and designed for product development (not explicitly intended for software development), addressed common challenges of the time regarding delays and waste in traditional methods of production. At the root of the problem, the companies saw the need for product development teams to deliver more efficiently.

The experiment and accompanying analysis focused on a cross-section of American and Japanese companies, including Honda, Epson, and Hewlett-Packard. To maintain their competitive edge each of these companies wished to rapidly and efficiently develop new products. The paper looks at commonalities in the production and management processes that arose across each company’s experiment.

These commonalities coalesced into a style of product development and management that Takeuchi and Nonaka compared to the rugby scrum. They characterized this “scrum” process with a set of 6 holistic activities. When taken individually, these activities may appear insignificant and may even be ineffective. However, when they occur together as part of cross-functional teams, they resulted in a highly effective product development process.

The 6 Characteristics (as published):

  1. Built-in instability;
  2. Self-organizing project teams;
  3. Overlapping development phases;
  4. Multilearning;
  5. Subtle control;
  6. And, organizational transfer of learning.

What is worth noting, is what is NOT pointed out in great detail.

For instance that the companies hand-picked these teams out of a large pool of, most likely, above average talent. These were not random samples, they were not even companies converting their process, these were experiments with teams inside of companies. The companies also never bet the farm on these projects, they were large, but if they failed the company would likely not go under.

If we implement agile, will we be guaranteed success?

First, it is important to note that all the teams discussed in the paper delivered positive results. This means that Takeuchi and Nonaka did not have the opportunity to learn from failed projects. As there were no failures in the data set, they did not have the opportunity to compare failures with successes, to see what might have separated the successes from failures.

Accordingly, it is important to consider that the results of the study, while highly influential and primarily positive, can easily deceive you into believing that if your company implements the agile process, you are guaranteed to be blessed with success.

After years in the field, I think it is vitally important to point out that success with an agile implementation is not necessarily guaranteed. I’ve seen too many project managers, team leads, and entire teams banging their heads up against brick walls, trying to figure out why agile just does not work for their people or their company. You, unlike the experiments, have a random set of people that you start with, and agile might not be suited for them.

To simplify this logical question; if all marbles are round, are all round things marbles? The study shows that these successful projects implemented these practices, it did not claim these practices brought success.

What is better: selecting the right people or the right processes for the people you have?

Consider that your company may not have access to the same resources available to the companies in this original experiment. These experiments took place in large companies with significant resources to invest. Resources to invest in their people. Resources to invest in training. Resources to invest in processes. Resources to cover any losses.

At the outset, it looks like the companies profiled by Takeuchi and Nonaka took big gambles that paid off as a result of the processes they implemented. However, it is very important to realize that they, in fact, took very strategic and minimal risk, because they made sure to select the best people, and did not risk any of their existing units. They spun up an isolated experiment at an arm’s length.

If you look at it this way, consider that most large multinational companies already have above average people, and then they cherry pick the best suited for the job. This is not your local pick-up rugby team, but rather a professional league. As large companies with broad resources, the strategic risks they took may not be realistic for your average small or medium-sized organization.

The companies profiled selected teams that they could confidently send to the Olympics or World Cup. How many of us have Olympians and all-star players on our teams? And even if we have one or two, do we have enough to complete a team? Generally, no.

The Jigsaw Puzzle: If one piece is missing, it will never feel complete.

Takeuchi and Nonaka further compare the characteristics of their scrum method to that of a jigsaw puzzle. They acknowledge that a single piece of the puzzle or a missing piece mean that your project will likely fail. You need all the pieces for the process to work. They neglect to emphasize that this also means that you need the right people to correctly assemble the puzzle.

The only mention they make regarding the people you have is the following:

“The approach also has a set of ‘soft’ merits relating to human resource management. The overlap approach enhances shared responsibility and cooperation, stimulates involvement and commitment, sharpens a problem-solving focus, encourages initiative taking, develops diversified skills, and heightens sensitivity toward market conditions.”

In other words, the solution to the puzzle is not only the six jigsaw puzzle pieces, but it is also your people. These “soft merits” mean that if your people are not able to share responsibility and cooperate, focus, take the initiative, develop diverse skills and so on, they aren’t the right people for an agile implementation.

If you don’t have all the pieces, you can’t complete the puzzle. And if you don’t have the right people, you can’t put the pieces together in the right order. Again, you might be round, but you might not be a marble.

Human-Centered Development for the People You HAVE

As with any custom software development project, the people who implement are key to your project’s success. Implementing agile changes the dynamics of how teams communicate and work. It changes the roles and expectations of all aspects of your project from executive management to human resources and budgeting.

Agile may work wonders for one company or team, but that success doesn’t mean that it will work wonders for YOUR team. Especially if all stakeholders do not understand the implications and needs of the process or they lack the appropriate aptitudes and skills.

In other words, if these methods don’t work for your people, don’t beat up yourself or everyone else. Instead, focus on finding a method that works for you and for your people.

Agile is not the only solution …

Why do people select agile? People implement agile because they have a problem to solve. However, with the agile approach managers need to step back and let people figure things out themselves. And that is not easy. Especially when managers are actively vested in the outcome. Most people are not prepared to step back and let their teams just “go.”

Maybe you have done the training, received the certifications, and theoretically “everyone” is on board. And yet, your company has yet to see Allstar success. Are you the problem? Is it executive management? Is it your team? What is wrong?

I cannot overemphasize that the answer is as simple as the people you have. Consider that the problem is unrealistic expectations. The assumption when using agile and scrum is that it is the best way to do development, but what if it is not the best way for you?

If you don’t have the right people or the right resources to implement agile development correctly, then you should probably do something else. At the same time, don’t hesitate to take the parts of agile that work for you. 


Nonaka, H. T. (2014, August 01). The New New Product Development Game. Retrieved July 19, 2017, from https://hbr.org/1986/01/the-new-new-product-development-game

Process Design for the Team You Have: Surgical Team

Maximizing Productivity and Creating Value Series

Human-centered Development Strategy: Article I

The Brook’s Surgical Team: Archaic or Cutting Edge?

The surgical team as described by Frederick Brooks in the Mythical Man Month admittedly feels a little archaic to the modern development team. To be fair, Brook’s audience probably saw medical science as a bit sexier back then.

Sexy or not, the surgical team concept remains an effective and pragmatic tactic when implemented as part of a human-centered development strategy. Effective implementation of the “surgical team” increases productivity and creates value. The concept is based on the likelihood that one particular developer is likely a lot, even 10x, more efficient than your other developers. Rather than a team of equals working on a project, your team will instead support this most efficient individual, with the result being an all around increase in efficiencies.

Whether the 10x number is precisely accurate, the concept remains a viable way to take advantage of the people you have on hand. In my experience, and likely in yours as well, it is rare to find a team in which all developers demonstrate equivalent abilities and output. There is always an outlier or two. Some people communicate better, some see the big picture, some are generalists and so on.

As I discussed in “You don’t work at Google and neither do I,” it is important to note that the surgical team is a non-egalitarian system designed so that your average contributors support and augment your best contributor(s). It may not work for every team or every individual. And, even if it is good for your team overall, some people may choose to leave rather than work under this arrangement. This may actually be a gift in disguise, given that any such individual is likely not an ideal team player under any condition.

And, even if it is good for your team overall, some people may choose to leave rather than work under this arrangement. Although, this may actually be a gift in disguise, given that any such individual is likely not an ideal team player under any condition.

What exactly is a “Surgical team” in custom software development?

And how can we develop an effective surgical team to maximize productivity for your custom software development process?

Let’s talk about what a surgical team might look like in a contemporary workplace, starting with a hypothetical team: a bunch of people trying to make a deadline work without any particular or official hierarchical structure.

The Organic Surgical Team

Many activities will take place concurrently and spontaneously.

A common event is for one individual to start making tools. These tools will then help others to get their work done faster, test existing code, set up some code generation, rig up frameworks that deploy code and so on.

The tools may take on many different shapes, but ultimately the end result is that they change how work gets done while improving the possibility that the project is successful.

This is an organic example of the surgical team. In all probability, the organic surgical team is already a familiar pattern of work for your people.

The Problem with Organic Surgical Teams

surgical team

Embracing this system creates an unofficial surgical team, where one individual takes care of a large percentage of the original work, and the others follow along to work in a system that is of the “lead” individual’s design.

Ultimately, this is likely to happen in any project that is subject to adequate chaos and enough people. It happens, because it works. Unfortunately, it is also a pattern that can give rise to potentially significant problems depending on a variety variables and on the interplay between management and team members.

The most common problem that I have experienced, is the creation of a gap between the formal and informal organizational structure. If managers address problems that arise early on, they generally resolve with positive or minimal negative effects.

If the existing structures do not address the problems, friction within the team will develop and potentially cascade out of control prior to resolving. This friction is often rooted in the perception of unequal expectations and unfair privileges across the team.

Human-centered Development Strategy

Working with the people you have, how can you intentionally structure an effective surgical team and what will this process look like?

To begin with, you need to know your team. Your team is made up of people. And everyone is different. Unique. You need to know individuals’ strengths and weaknesses.

Questions to ask yourself or your project management team:

  • Who is the person who always takes ownership? Is this person a tool maker because s/he sees the big picture? Or because s/he simply sees that the team needs tools to proceed?
  • Who is best at communication?
  • Who is a natural manager?
  • Who understands the whole organization and the need for a system?
  • Who do you have that is a generalist?

Once you identify everyone’s skills and strengths, you need to look at how they can be combined and maximized. Can the tool maker fulfill the communication and team management rolls? Or should there be a second team member that delegates and supports the tool maker? Or perhaps this is the role of an outside project manager?

Working with the team you have…

Whatever you do and whatever the makeup of your unique development team, keep in mind the surgical team only works if you honestly acknowledge, recognize and accept the team you have on hand. This is why I call this human-centered development.

Many factors from geography to budget to the type of company will have an effect on the types of people at your disposal. You very well may not have your ideal team on hand.

Furthermore, in today’s world, not every project will require the same skill sets and yet most of us will not have the liberty to hire those skills, we must make do with the people we already have on hand.

The US Bureau of Labor and Statistics predicts that the software development field will experience a 17% increase in jobs from 2014 to 2024. With an existing workforce of 1,114,000 software developers that means an increase of almost 190,000 new jobs. A number that can likely just barely be covered by new graduates.

Add to the mix the number of people continuing to exit the workforce in the next decade and there is a high likelihood that there will be a surplus of software development jobs. Finding good people is already difficult, with entire recruiting and outsourcing industries capitalizing on and dedicated to alleviating the problem. And it looks the future will only be more of a challenge.

In other words, learning how to best work with the people you have is a much more likely path to success than hoping to hire the perfect solution.

Cutting Edge: to Maximize Productivity, Maximize your People

In answer to the original question, the idea of the surgical team is, in fact, cutting edge as an intentional and pragmatic development tactic. Human-centered development strategy means implementing the appropriate tactics for the people you have.

To maximize productivity and create value with the team you have, you need to be honest. If you try and force people into roles you will generate stress and friction, and eventually, something somewhere will break.

In situations that naturally promote the organic development of surgical teams, by all means, the intentional creation of surgical teams will improve productivity. An effective use of the surgical team as a tactic means that you can engineer productivity, maximizing your people and your resources for the best possible outcome.

Keep in mind that the idea of the surgical team is not to put one team member on a pedestal while relegating other team members into menial jobs. Instead, the goal is to maximize the contribution of individual skill sets and personalities, as needed for each project. It may not work for every team or every situation. But when applied intentionally and thoughtfully the surgical team can be a highly effective solution for your people.

Correction Vs. Prevention

Correction vs Prevention in Software Development

The desire to prevent adversity is a natural instinct.

As humans, as individuals, we generally do what we can to avoid something going wrong. This is especially true when we invest a lot of our time and effort in a project.

Say you spend 40hrs per week on a particular project for a year. The project may become a part of your identity. At the least, you will be personally vested in a positive outcome.

The more time we invest, the greater our fear that something might go wrong. In this way, investing time and resources in a project is almost like raising a kid.

We do everything we can to set our project “children” up to avoid and prevent problems with the intent that they achieve success. Similarly, we do everything possible within reach of our finances and power (and sometimes beyond) from schooling to extracurricular activities, to ensure that our kids have the best chance in life as is possible.

Defect Prevention

Investing in our kids is a bit like defect prevention in software process improvement. Just as many theories of parenting existing, many methods of defect prevention exist. Some of them, such as Six Sigma look for the root causes of failures with the intention of preventing failures before they occur.

In prevention, we must continually ask why something might break and then fix the underlying causes. On one hand, this is very much like parenting.

A challenge in prevention work is that to be effective, detailed and good requirements are a necessity. For example, if 6 months ago someone rushed the requirements and left out a key step or valuable information, you will likely soon discover a slew of bugs in your project.

This kind of noise easily distracts us from the underlying issue that the problem came from faulty requirements. Either way, your project is delayed. And, if you have to hand over issues with requirements to another department, your work on a feature may suddenly grind to a halt.

At this point, the best you can do is note why the problem happened and resolve to do better next time. Prevention work is not always timely.

Appropriate applications for prevention work…

Prevention work generally means lessons learned for the next iteration.  We might learn how to more accurately prepare requirements during implementation. Or we might learn to focus better attention on our code after we’ve already deployed.

In prevention, we learn from problems we encounter so that we can prevent those problems in the future.  By paying more attention to difficult steps or stages, we learn what to avoid next time. Effective prevention work, in effect, creates a system that can create successful future projects, and in then it follows a company that can consistently launch successful products.

Shortcomings of Defect Prevention: Defect Prevention and Rare Events

Let’s return to the parenting analogy. Defect prevention, when applied to parenting, shifts our target goal away from our current child so that our end goal becomes setting up a system to be better parents to our future children.

I’ve chosen the parenting analogy because it clearly highlights a shortcoming of the defect prevention method. In real life, humans are highly invested in carrying out tasks that will ensure the success of already existing offspring. Unlike in prevention work, most parents don’t (intentionally) use their first child as a test case, to learn how to be successful with their future offspring.

What’s more, many families may choose only to have a single child or they may space their children 5 or 10 years apart. Defect prevention is a waste of resources if you lack a future (and immediate) application.

Prevention work is not appropriate for rare events.

Lots of small and medium-sized companies are not software companies, but companies that also do software. Software for them is a necessity, but not a business. For these companies, a software project is likely a rare event.

In prevention the entire feedback mechanism is reactive. If you want to use prevention, you will make your next project better with the lessons you learn, not your current project.

As we know, many companies may support only one or two software products and or they may only have a software development project come about every few years. Defect prevention that demonstrates how requirements could have been better 6 months ago may give comfort in the form of an explanation, but they will not solve existing and relevant problems.

Please, have a seat on the couch: explanations vs. solutions

Prevention methods may help you change your perception through the receipt of explanations, but they won’t give you solutions to an active problem. Let’s say the parent of a college student visits a psychologist to discuss problems they encountered raising their now young adult. The discussion and insight help the parent to understand and accept what they may have done right or wrong. But this explanation and acceptance will not fix a problem, it simply will change your perception of a problem. Of course, perception is significant, but it’s not a solution.

The discussion and insight may help the parent to understand and accept what they may have done right or wrong. But this explanation and acceptance will not fix a problem, it simply will change the parent’s perception of the problem. Of course, perception is significant, but it’s not a solution.

Resilience through rapid corrections…

An alternative is to simply pursue resilience through rapid corrections. Think of this like driving. Seated behind the wheel of a moving car you constantly process multiple pieces of information and feedback, adjusting your speed, direction, and so on. Driving requires constant attention and focus.

It’s a given that the closer we pay attention when driving, the better our results. Paying less attention, such as texting while driving, often results in the occasional need for bigger or last minutes corrections. Sometimes the information arrives too late or not at all and the result is an accident.

Attention + Discipline = Positive Results

This method again applies to raising children. Children and custom software development projects both require close attention. In child rearing there is a saying that “discipline is love.” Pay attention to your children, apply the rules consistently and thoughtfully, and generally speaking, they will grow up without too many bugs!

In correction (versus prevention) you pay constant attention to your custom software development projects. This focus allows you to react to problems and correct in a timely manner. Consistent attention and disciplined application of requirements result in a better end product. Rapid correction builds resilience.

Focus on reactive resiliency…

How we change as we go through production is a function of lessons learned, but also our ability to adapt, such as an outage to a new test case. Or perhaps some more automation to email for failures, so that we have the tools to pay closer attention and fine tune our responses as needed.

In correction, we maintain the goal of improving for future projects. We will still change systems and processes to fix defects, but we are less interested in learning how and why our problems exist. Instead, we focus on continually improving our near future. In correction, our immediate goal is to make today better than yesterday. And then tomorrow’s production a little better than today. Your ability to react quickly and appropriately builds resilience.

Instead, we focus on continually improving and resolving for our near future. In correction, our immediate goal is to make today better than yesterday. And then tomorrow’s production a little better than today. The ability to react quickly and appropriately builds resilience.

Deciding on Prevention or Correction

Know your company, know your goals: are you in the business of software development or are you a company that sometimes develops software?

In custom software development we come across different types of companies and goals. Some companies, say a mortgage company or an insurance company require software to function, but their business is not software. If your team is only involved the occasional software project, then it is likely more efficient fro you to focus on resilience trough correction over prevention.

From an industry perspective, it would be awesome if we could benefit from group intelligence and expect continuous improvement on every custom software project. Prevention across teams and companies is an ideal goal. Unfortunately, for the moment lessons learned on unique and rare projects cannot be easily shared. Concrete results or lessons learned are seldom shared across companies by anything other than word of mouth.

For infrequent projects, correction is best …

Until there is a method to consistently record and move beyond oral histories of software projects, the parties involved are most likely better off focusing on correcting problems in the projects we work on rather than preventing problems in future projects.

You can do both, but neither is free and prevention is most effectively applied to future projects. Decision makers must thus be careful to prioritize solutions that are best suited to your particular situation, project, and company.

If you are interested to know more about how we use correction at Possum Labs please feel free to contact me or start a discussion below.

Staying up all night to get Lucky: The Importance of Retrospectives

Retrospectives: every custom software development project should conclude with a retrospective to determine how and why luck contributed to its success or its failure.

My grandmother always said, “Better lucky than good.”

“Unless it is my money that you are spending.”

Grandmothers and many others happily share advice that interestingly promotes getting lucky over doing good. In custom software development the line between lucky and good is often quite blurry. And yet, when asked how a project succeeded, the answer is all too often, “well, I think we just got lucky!”

Is it better to be lucky or good?

It is true that often the answer depends on luck. Sometimes a well-managed project may simply encounter bad luck. Sometimes your project may suffer from bad management and yet it succeeds purely on dumb luck.

Does staying up all night to meet a deadline show dedication or is it a fool’s errand? Personally, I advocate for good over lucky. Consistent success depends upon an ability to look back and learn from your successes and from your failures.

Why conduct retrospectives? Because luck is not a strategy.

Outside of Agile development, it is rare in the quality assurance and custom software development industries, that we conduct retrospectives.

It is an even rarer event that we document the retrospectives of successful projects. When things go right we are more likely to pat ourselves on the back and move on. Often, it is only when things go wrong that we stop to take a deeper look.

Conducting retrospectives should be standard operating procedure.

I find the lack of retrospectives exceptionally shocking in the field of custom software development and quality assurance. Given that projects frequently deliver late and over budget, the keys to success seem overly taken for granted.

Everyone knows that it is generally more expensive to fix mistakes than to avoid them. And understanding where things have gone wrong in the past is an excellent way to anticipate future problems. Why then, is it not a given that every team should invest in a simple tactic that is proven to deliver continuous improvement?

Retrospectives are an opportunity, not a burden.

Retrospectives provide an opportunity to effectively record experiences and events. All too often the information exists, but only in people’s heads and if a key “head” leaves the organization then the lessons learned are lost overnight.

And even without employee turnover, once a project is completed and we move on to the next task, it is all too easy to fall into the same familiar patterns of development even if they don’t work or they are not the most efficient.

Furthermore, if we pass off our projects or our teams as simply “lucky” or “unlucky” we cannot effectively harness our capabilities. Perhaps we could be luckier. Or perhaps one day our luck will turn if we don’t understand why it exists.

Do you really want to gamble on every project?

Even when a successful custom software project can be chalked up to a fair bit of luck, we should take a careful look at how and why luck contributed in a positive manner. Maybe the actual contributing factor leads back to a particular developer or tactic and not to “pure luck.”

Or, maybe we will see that at a particular point in time we skirted disaster: if component A had been implemented, as intended, before component B, our project might have outright failed. In other words, a mistake contributed to our “good luck,” when the project really out to have failed. Can we really count on two wrongs making a right in the future?

There is high value in conducting retrospectives.

I strongly believe in the value of retrospectives. To me, the most valuable training a company can engage in is: “when and how to conduct a retrospective.” There is nothing more applicable to learn at a company than how to record and quantify experiences of the company.

A successful retrospective is one that is thoughtfully conducted and carefully documented. In this manner, the retrospective is sure to provide all stakeholders the opportunity to improve, while also providing the opportunity to learn. Retrospectives must be documented so that they can be shared widely and overtime.

The benefits of conducting retrospectives are extensive:

  • Improved communication between all stakeholders
  • Honest communication that demonstrates a commitment to transparency
  • Building a company specific repository of best practices
  • Improved ability to identify possible pain points for customers
  • Finish solutions faster and with better quality
  • More accurate development of requirements and adherence to schedule

When should we conduct a retrospective?

Retrospectives are valuable both during and at the end of a project. Conducting periodic smaller retrospectives during a project will certainly help to assess a project’s progression and make it easier to redirect or change paths before it is too late. Final retrospectives provide a valuable long-term tool and method to document lessons learned.

Periodic Retrospectives

Periodic retrospectives are a key component of Agile development, but it is a useful tactic across pretty much every domain. If you think about it, successful football coaches conduct retrospectives every time they call a timeout. And in accounting, monthly profit and loss statements provide a retrospective or a snapshot of where the company is on the last day of the month.

Periodic retrospectives may be less formal than an end of project retrospective, but they can still be extremely useful. That said, although I believe in periodic retrospectives, what I focus on this piece is the importance of the final retrospective.

Final Retrospective

Every single custom software development project, successful or not, should terminate with a complete and thoughtful retrospective.

The most valuable and necessary retrospective takes place at the end of a project. Again using the accounting analogy, an end of project retrospective is a bit like an end of year financial audit. This article focuses on the value of conducting final retrospectives.

Who benefits from a retrospective?

The IRS requires nonprofit organizations with budgets over a certain dollar value to conduct an annual financial audit. Audits are not only useful to the IRS, they are exceptionally useful to all stakeholders, from decisions makers, such as upper management and boards of directors to the employees. An audit assesses not only the financial state of the organization but how and why certain decisions or actions took place. Audits conclude with recommendations to ameliorate or improve decision making and record keeping in the future.

Retrospectives in custom software development are similarly valuable at multiple levels to audits in nonprofits. Everyone from business analysts to developers to QA testers will benefit from a retrospective. I cannot think of any viable reason that any software development team should not conduct a retrospective at the end of each and every project.

What does a retrospective look like?

Retrospectives have three key components: the people involved, the questions, and the final analysis or report. The goal of the retrospective is to review how a project has progressed from start to end, with a final assessment that includes lessons learned, and future recommendations. Ideally, all team members participate and share their answers to a set of questions.

1) The People: Who participates in the Retrospective?

Depending on the size and location of your development teams, ideally, everyone involved in a project should participate. If possible, local management should sit with each team and conduct a live assessment.

Hearing co-workers and talking together is often more productive than simply filling out a written survey. Given the opportunity to speak, members of your development and QA teams will feel that their opinion is both important and valued.

2) The Questions: What to ask during a retrospective?

(Don’t worry, it is not rocket science!)

The first three questions are fairly standard, but the final three entail an evolved process to effectively create and implement lessons learned from a retrospective.

  1. During this project, what did you feel worked well?
  2. During this project, what did you feel did not work well?
  3. In the future, what actions would you recommend taking to improve future projects?
  4. Retrospective ideas for identifying luck:
    1. Where did we get lucky?
    2. Was this really “luck” or a “lucky coincidence”?  
    3. Could we replicate this “lucky” success in the future (or avoid this failure)
  5. The final step is to work with all stakeholders to identify:
    1.  Lessons Learned
    2. Future Recommendations

3) Documentation

Without a little analysis and documentation to record the lessons learned and to make future recommendations, a retrospective quickly loses its value. Sure, giving everyone the opportunity to talk about their experience is of value. However, as we already discussed, memories are short and people move on.

To truly implement a valuable process you need to put some effort into conducting and concluding an effective retrospective. Maybe you need to bring in an outside consultant. Maybe you need to create a cohort or assign a thoughtful member of your team who writes well to type it up into a report. Whatever you do, make sure that you have a clear objective, an outline, and a process in place before you start.

Every time you conduct and document a retrospective, your team will get better at the process and the results will increase in value. Over time your lessons learned and your future recommendations will become more precise and targeted.

Retrospectives in action — an example of “where we got lucky.”

Let’s say that right as project X commences, you hire Bob to replace a retiring developer. Bob has followed his wife to a research position at the local university. Previously, Bob worked in a tech hub that also happened to be an early adopter of a new technology stack. Purely by coincidence, just as you hire Bob, another part of your team decides to implement this same technology stack (no one knows about Bob).

Your team has little to zero reasonable expectation that there will even be a “Bob” in town to hire when deciding to use the technology stack. Nor does anyone consider the possibility that you might need a “Bob.”

A few weeks in, when the new technology proves to have a few more rough edges than expected, weird crashes, flaky performance and so on, Bob is your “lucky” salvation.

How could a retrospective help avoid Bob’s situation in the future?

To start with, your team needs to identify that this is a “lesson learned.” Next time your company decides on using a new technology they should intentionally plan on securing a “Bob.” If you can’t find a Bob locally, perhaps this is an indicator that this is not an appropriate time to bring this tech to your company. Or maybe you need to set up an external recruitment plan.

Without retrospectives it doesn’t get better, it get’s real.

Over the years I’ve worked on a number projects that have fallen victim to the “Bob” scenario. To solve a problem the company adopts a piece of tech touted by a big company (aka Microsoft, Google, LinkedIn). And then only a few a months into the project we’ve found ourselves knee-deep in problems.

Each time I’ve watched a similar evolution take place. The first assumption is usually that the configuration of the technology is incorrect. Next, there is a search for bugs to be fixed in the next release. Finally, there is the decision to hire “Bob” or a team of “Bob’s” thereby disrupting operations and budgets by bringing on one or more individuals who actually know the ins and outs of this particular piece of tech.

In the end, only about two-thirds of these projects actually made it into production. Ouch. That is the reality.

Maybe it’s not the lack of retrospective, it’s the quest for new technology that is the real problem?

No, I’m not against new technology. Sometimes we need it and in the appropriate situations, new technology can work really well. Unfortunately, implementing new or custom technology is not an across the board success story.

And yet, this scenario often repeats itself within the very same companies that have already experienced a failure. This is why retrospectives are vital.

Learning from past mistakes

Companies need the ability to learn from and document past mistakes. Development teams need to have a method to memorize lessons learned and manage the turnover in both staff and technology.

The best bet for everyone is to conduct and widely share honest retrospectives. When this happens, we see what went well, what went wrong and where we got lucky. And, we can do our best to avoid, duplicate and or prepare in the future.

Successful Development is Intentional

Frankly, staying up all night to get lucky doesn’t work much on the social scene nor does it work on the professional level. What does work? Preparation. Analysis. Lessons Learned. Retrospectives.

At the end of every project, you should conduct an honest retrospective so that next time you’ll know what you can do to be prepared. Make sure that your success comes from intentional acts and not simply because “you got lucky.”

Retrospectives significantly improve the software testing process. Period.

The custom software development and software testing process is at odds with almost all other technologies. Can you imagine a caterer baking a custom order wedding cake and then testing it for flavor and attempting to go back and “fix” the flavor? Once the cake is baked, it’s done. It’s over.

Most industries test first and then build. Conducting retrospectives delivers a bit of this logic into the custom software development process and improves the software testing process. Retrospectives allow us to integrate lessons learned and avoid repeating the same or similar mistakes. Why fix it, if you can do it right the first time?

Ok. You convinced me to conduct a retrospective. Now What?

As I mentioned above, the most valuable training you can provide your team is how to conduct a retrospective. Fortunately, I am not the only one who is knowledgeable about retrospectives and many resources exist. If you would like to look in detail at what it is like to conduct a retrospective, I recommend this article: The 7 Step Agenda for an Effective Retrospective.

My last piece of advice is this: keep the environment surrounding a retrospective positive. If a retrospective turns negative, some people become defensive and afraid. Others simply tune out. Negative retrospectives lack the transparency and clear communication that are vital to constructing effective lessons learned.

Keep in mind that the purpose of a retrospective is not to critique or diminish your team or any one individual. The goal is to develop a culture of continuous improvement that benefits all stakeholders.

If you enjoyed this piece, please share and discuss!

prepare for failure, you will make mistakes

I made a mistake. Now what?

In the United States, we often find our motivation to achieve fueled by a culture that rewards success, while commonly denying the realities of failure. As a culture, we feel terrified to fail, despite the reality that true long-term success is often built upon the ability to recognize our failures and to learn from our mistakes.
Read more

You don't work at Google and neither do I!

Improving Software Development Outcomes when you don’t work at Google

How to Improve Software Quality: Tips for Improving Your Development Process and Outcomes

Be Realistic About Your Team, Your Company Culture, and Your Development Needs

Software is long-lived and widgets have effectively infinite lifespans. Broken things tend to stay broken for years, so if our goal is to improve our processes and develop better software we must optimize tactics with our specific organizations in mind.

Much has been written about Software Process Improvement (SPI) Best Practices, but sometimes more can be achieved by avoiding known mistakes than by trying to emulate other successful organizations and their best practices. In this piece, I will look at why it can be more effective to look at what not to do rather than to focus on best practices. To start, instead of trying to copy Google’s best practices, let’s look at the idea that better outcomes might actually be achieved by copying what successful companies like Google ARE NOT doing.

Why? You don’t work at Google.

And, neither do I.

For 99.7% of developers out there, this is an accurate statement (18 million vs 40,000). And for me, it is a 100% accurate statement. By proxy, your colleagues are also not Googlers. And, although everyone at your company is surely above average, it is highly unlikely that the majority of your developers could successfully navigate the complex and often frustrating Google interview process.

So, we don’t work at Google. Does that mean we are relegated to a life of mediocrity?

We may not work at Google (or even one of these “talent magnets”), but we do know a fair amount about how top-tier companies, like Google, develop software. Their methods are considered “state of the art” and as an industry, we often set the bar high by hoping to implement their practices and achieve their success as our goal.

Development Teams around the country dream of being Googlers.

Of course, this is in many ways comparable to your softball league trying to implement the Yankees’ training regimen and expecting to end up at the World Series or at least in the playoffs. If you think this sounds ludicrous then you are correct, it is ludicrous.

Just because it works at Google doesn’t mean it will work for you…

Why then, do we set the bar ridiculously high, despite knowing that the goals we are setting are ludicrous? We want to emulate Google because we see that what they do works. From the sidelines, we have watched their success over the years. Sometimes we catch a glimpse of what it takes, not just at the level of the players, but at the level of the organization, for Google to innovate, lead, and succeed.

Google doesn’t host open spring training or publish a “spring training guide.”

We can observe Google from the outside, but most companies and their business analysts, quality assurance teams, and software developers don’t actually get to see inside Google. For those of us that are not Googlers, we never truly get to see how the machine works from inside of Google. We are not in their meetings and we don’t get to watch how the support structures put in place play out.

All we see is the final score, which generally looks pretty good.

People like to study Google and give presentations on their successes, encouraging us to try to replicate Google’s success. And let’s be fair, on one hand, this is a great idea. At the same time, what if we try to implement these “best practices” and they don’t work? Do we keep trying and devolve into insanity.dev? No. What I advise is that instead of looking at Google’s best practices, we should instead look at what not to do. Learn from development team failures.

The Pragmatic Solution

The pragmatic solution is actually to start looking at successful companies that also do software but that are not software companies. We should then look not only at these companies’ successes, but we should also identify what we can learn from their failures.

Working with the People we Have

Your company’s real path to success is to figure out how to work with the people already on board in the organization. We can’t work like Google when we are not Google. And we shouldn’t necessarily borrow ideas from the companies that have their pick of the talent pool when we work at companies made up of everyone else. This doesn’t mean our people are inferior, but it does mean that they are our unique to our company.

We need development and project management plans that work for REAL people, for OUR people.

Brooks and the Surgical Team

This brings us to one idea that has been around for a few years, but that is often overlooked. In The Mythical Man-Month, Brooks talked about the surgical team approach. This approach takes finding a good developer or tester and then supporting this individual with a team of other staff. Brooks bases this technique on the idea that out of a set of semi-random set of developers a few of them will be 10 times more efficient and effective than the others. Given the fact that the average company has not developed Google hiring practices, it quite likely that your company will align with Brook’s numbers and a handful of your developers will have a work history demonstrating that they are more effective than their peers.

The surgical team is inherently a non-egalitarian system…

Should the few developers that have proven track records, lead “surgical teams” and how would this create an opportunity for your company to improve outcomes? For better or worse, the surgical team is inherently a non-egalitarian system that embraces the view that developers are not created equal. Instead, it focusses on building a team around your best contributors and then supporting it with the rest of the staff.

Chances are that some people will not be willing (or able) to play second fiddle to a coworker they used to be equal with, and eventually, they may need to be replaced or given a different role. In general, however, management is never easy and this is a structure that exploits and plays the hand you’ve been dealt.

Playing the hand you’ve been dealt

The reality is that most of us don’t work at Google, but instead with your average little league aka small or medium sized company. Most of us don’t have our pick of staff, we don’t work necessarily with the best and brightest. And to be honest, you and I may not fall into the category of the best and brightest ourselves. However, this reality does not mean that we won’t be successful. There is still much we can do to develop effective procedures and processes, and develop our talent to maximize the skills and the resources that we have at hand. What we can learn from Google may indeed be of value to our teams, especially if we are able to avoid the same mistakes and pitfalls that Google has learned to avoid.

Improve your outcomes by focusing on the tactics that will work for you, for your company and for your team.

Predicting the Future: The Big Data Quandary

Predicting the Future: the Big Data Quandary

The Role of Testing in Indeterminate Systems: should the humans be held accountable?

Big data is a hot topic that introduces us to incredible possibility and potentially terrible consequences.

Big data essentially means that engineers can harness and analyze traditionally unwieldy quantities of data and then create models that predict the future. This is significant for a variety of reasons, but primarily because accurate prediction of the future is worth a lot of money and it has the potential to have an effect on the lives of everyday citizens.

Good Business

On one level big data allows us to essentially reinvent the future by allowing software to encourage individuals to do something new that they as of yet have not considered (and might never have), such as recommendations for viewing on Netflix or buying on Amazon. Big Data can also provide daily efficiencies in the little things that make life better by saving us time or facilitating decision making. For businesses, Big Data can give deeper meaning to credit scores, validate mortgage rates, or guide an airline as to how much they should overbook their planes or vary their fares.


Optimizing algorithms based on data are even more powerful when we consider that their effectiveness is reliably better than actual humans attempting to make the same types of decisions and predictions. In addition to the computing power of big data, one advantage algorithms have over human predictions is that they are efficient. Algorithms do not get sidelined or distracted by bias and so avoid getting hung up by ethical judgments that humans are required to consider, either by law or by social code. Unfortunately, this doesn’t mean that algorithms won’t avoid making predictions that present ethical consequences.

Should algorithms be held to the same moral standards as people?

Optimization algorithms look for correlations and any correlation that improves a prediction may be used. Some correlations will inevitably incorporate gender, race, age, orientation, geographic location or proxies for those values. Variables of this sort are understandably subject to ethical considerations and this is where the science of big data gets awkward. A system that looks at a user’s purchasing history might end up associating a large weight (significance) to certain merchants. And those merchants might then happen to be hair care providers, which then means that there is a good chance that the algorithm has found an efficient proxy for race or gender. Similarly, the identification of certain types of specialty grocers or specialty stores, personal care vendors or clothing stores might reveal other potentially delicate relationships.

As these rules are buried deep inside a database it is hard to determine when the algorithms have managed to build a racist, sexist, or anything-ist system. To be fair, neither the system nor its developers or even the business analyst in charge of the project, had to make any conscious effort for an algorithm to identify and use these types of correlations. As a society, we implicitly know that many of these patterns exist. We know that women get paid less than men for the same work; we know that women and men have different shopping behaviours when it comes to clothing; we know that the incarceration rate for minorities is higher; and we know that there will be differences in shopping behaviors between different populations based on age, race, sex and so on.

Can algorithms be held to the same moral standards as their human developers or should the developers be held responsible for the outcomes? If the answer to either or both of these questions is “yes,” then how can this be achieved both effectively and ethically? When ethically questionable patterns are identified by an algorithm, we need to establish an appropriate response. For example, would it be acceptable predict a lower acceptable salary to a female candidate than a male candidate, when the software has determined that the female candidate will still accept the job at the lower rate? Even if the software did not know about gender, it may determine it based on any number of proxies for gender. One could argue that offering the lower salary isn’t a human judgment, it is simply following sound business logic as determined by the data. Despite the “logic” behind the data (and the fact that business makes this kind of decision all the time), hopefully, your moral answer to the question is still “no, it is not okay to offer the female candidate a lower suggested salary.”

Immoral or Amoral: What is a moral being to do?

If we label the behavior of the algorithm in human terms, we’d say that it was acting immorally; however, the algorithm is actually amoral, it does not comprehend morality. To comprehend and participate in morality (for now) we need to be human. If we use big data, the software will find patterns, and some of these patterns will present ethical dilemmas and the potential for abuse. We know that even if we withhold certain types of information from the system (such as avoiding direct input of information like race, gender, age, etc.) the system may still find proxies for that data. And, the resulting answers to that data may continue to create ethical conflicts.

Testing for Moral Conflicts

There are ways to test the software and determine if it is developing gender or race biases with controlled data. For instance, we could get create simulations of individuals that we see as equivalent for the purpose of a question and then test to see how the software evaluates them as individuals. Take the instance of two job candidates with the same general data package but vary one segment of the data, say spending habits. We then look at the results and see how the software treated the variance in the data. If we see that the software is developing an undesired sensitivity for certain data we can go to the drawing board make an adjustment, such as removing that data from the model.

In the end, an amoral system will find the optimal short-term solution; however, as history has shown, despite humanity’s tendency towards the occasional atrocities, we are moral critters. Indeed, modern laws, rules, and regulations generally exist, because as a society we see that the benefits of morality outweigh the costs. Another way to look at the issue is to consider that for the same reasons we teach morality to our children, sooner or later we will likely also have to impart morality into our software. We can teach software to operate according to rules of morality and we can also test for compliance, thereby ensuring that our software abides by society’s rules.

Responsibility: So why should you care?

Whose responsibility is it (or will it be) to make sure this happens? My prediction is that given the aforementioned conundrum, in the near future (at most the next decade) we will see the appearance of a legal requirement to verify that an algorithm is acting and will continue to act morally. And, of course, it is highly probable that this type of quality assurance will be handed to testing and QA. It will be our responsibility to verify the morality of indeterminate algorithms. So for those of us working in QA, it pays to be prepared. And for everyone else, it pays to be aware. Never assume that a natively amoral piece of technology will be ethical, verify that it is.

If you enjoyed this piece please discuss and share!

Click here to learn more about Bas and Possum Labs.

A call for a "FAA" of Software Development

Mayday! Bad UX in Airplanes

Software development teams constantly learn lessons. Some of these lessons are more significant than others.
Due to the fact that there is not a universal method or process for sharing lessons learned, many lessons are learned in parallel within different companies. Many times different teams in our very own companies make the same mistakes over and over again, simply because there is not a shared repository of knowledge and experience.

Even when a developer and or quality assurance professional attempts to research when, where and why things have gone wrong, it is very difficult to find documented and pertinent information. 

These unnecessary mistakes comprise avoidable expenses to both consumers and companies, and should at a certain price point, especially a public price point, make it very useful to have a public method for accessing “lessons learned.”

Not just a report of the problematic lines of code, but inclusive of an analysis of the effects of that code (who, what, when, where, why and, how).

What’s more, in addition to the time and financial costs of problematic code, there is also risk and liability to consider. From privacy to financial health to business wealth, the risk is great enough that I propose the creation of an organization, similar to the FAA, for documenting, reporting and making software “travel” safer.

There are many examples of bad UX to be found. Just for fun, let’s look at some real life examples of lessons learned in software code in regards to Shareholder Value and Liability in Airline Travel.

Mayday: Bad UX in Airplanes

As often happens in life, not so long ago I decided I needed a change of pace in my evening routine and one way or another and I stumbled upon Mayday a show about air crash investigations. My natural human curiosity into bad things that happen to other people caught me at first, but after a few episodes, the in-depth analysis of all the various factors that cause airplane crashes really caught my attention. As a testing and QA expert, I found it disheartening to see the frequency with which bad UX is a contributing factor to airplane crashes. In most cases, the bad UX is not the instigating cause (phew!), but the stories in the show make it evident that bad UX can easily make a bad situation worse.

Scenario #1: Meaningful Warnings

For instance, on a Ground Control display, there is a 7 character field next to each aircraft indicating its expected altitude in comparison to its published height (theoretically actual altitude). In the case of one episode, an airplane intended to be at an altitude of 36o reported flying at 370, as indicated by the display which read “360-370.”

If the broadcast stopped the display would be “360Z370.” This would indicate a final broadcast of 370 versus an expected broadcast of 360. If the broadcast stopped with this discrepancy shown the display did not set off an alarm or even display a color change, just the character “Z” in the middle of a 7 digit string that implies that half of the rest of the numbers is garbage.

This piece of information on its own is not terribly exciting nor is it something that could on its own cause an aircraft to go down. Furthermore, there is not much of a reason for a system tracking altitude to simply go off.

A bad UX design process uncovered by “hidden” warnings

That is, of course, unless the button to activate or deactivate the system is placed behind the pilot’s footrest. And the display for the error message is then placed next to the button; presumably down below the foot.

No audible alarm, no flashing lights, nothing else of note to catch a pilot’s attention. In this scenario (based on a true story) the system can easily be turned off accidentally and without warning. Then just add to the mix another plane flying in the same vicinity and the unfortunate result is many dead people spread out over the jungle. The resulting system update is the addition of an audible alarm if and when the system is switched off.

Scenario #2: How to Handle Bugs

Another episode profiles an airplane crash precipitated by an airspeed sensor that bugged. As in a bug appears to have built a nest in the sensor. In this situation, the system created contradictory warnings, while also leaving out expected warnings.

For instance, the plane went from warning of excess speed to stall warnings. The inconsistencies sadly managed to confuse the pilots into their deaths.

Now it is required (standard) flight training to include how to respond when the cockpit says: “Hey! We are receiving conflicting airspeeds!”

Scenario #3: Half-Full vs. Empty

Another show profiles a UX design process failure that came about on the maintenance side. Somehow, two similar looking modules for gauging fuel reserves came into use for similar, but different models of an airplane.

Initially, it appeared that the gauges could be installed and worked interchangeably, even going so far as to updating readings. The problem is that the readings would be a bit off — well, let’s call it like it is — completely off.

If you put the incorrect model in the wrong model of plane, the gauge will read half full, when the tank is actually empty. An easy fix turned out to be putting a key into the socket making it so the two gauges are no longer interchangeable. Proof that good UX is not just about design, but about also tracking lessons learned.

Of course, the fix, unfortunately, did not get implemented until a plane crashed into the sea before reaching land (and an airport).

Documenting Problems and Solutions

When planes go down there is a loss of human life and great financial expense. This means that all of these issues have been addressed and fixed, and they likely won’t happen again. Documentation and prevention are one of many reasons that airplanes really don’t go down very often these days. Or at least in they don’t go down very often in the markets where shareholder interest and liability make downed craft unacceptable. And, significant investment is made in the UX design process.

From my perspective, the most interesting aspect of the show Mayday is that it highlights many small UX problems discovered only because people died. The death of a person is something that has a near infinite cost associated with it in Western civilization and therefore causes a very large and detailed process to get down to root causes and implement changes. Especially when it is the death of 100 people at once. Mayday is a great show for learning how to analyze problems, while also giving viewers a deep appreciation for good UX design.

Looking at the small issues that are the root causes of these airline crashes and the unremarkable UX changes made to prevent them; it really drives home the number of small UX errors that cause small non-life threatening losses that take place every day due to lousy UX. Adding up all of these small losses might actually result in a quite significant financial cost for the various businesses involved.

And although the majority of them don’t directly cause the loss of life, life might be better (safer, less stressful, more financially secure, etc.) if these small UX design flaws could be reliably flagged and a system put in place to prevent them from recurring.

Standard tracking and reporting of software failures

This brings us back to my statement at the beginning of this piece regarding the need for a body or a system to track and report lessons learned. In airline travel in the USA, we have the Federal Aviation Administration (FAA) to make sure that airline travel is safe.

The purpose of the FAA is the Following: The Federal Aviation Administration (FAA) is the agency of the United States Department of Transportation responsible for the regulation and oversight of civil aviation within the U.S., as well as operation and development of the National Airspace System. Its primary mission is to ensure the safety of civil aviation.

Now imagine, we had a Federal Software Administration, whose primary mission is to ensure the safety and reliability of software? What if we held ourselves accountable to report and to document not only when bad UX precipitated an airplane crash, but also when people would be held to a reporting standard for all kinds of software defects that cause significant loss?

Software Asset Managment (SAM) already exists as a business practice within some organizations, but not in enough. And there is still not any central organization to pull the information documented by business with successful SAM practices.

In 2016, the Software Fail Watch identified over a billion dollars in losses just from software failures mentioned in English-language news sources and they estimate that this is only a “scratch on the surface” of actual software failures worldwide. There is much to be debated here, but if a US based or even an international agency simply started to record failures and their causes, without bothering to get into the official business of issuing of guidelines, the simple acts of investigation and reporting could create an opportunity for significant and widespread improvements in design.

I think we can all agree that our industry could greatly benefit from the creation of an efficient and productive repository for the sharing of lessons learned. The strategy of sharing lessons learned, lessons that are often relegated to oral histories shared between developers would greatly benefit our industry.

Companies may not initially be motivated to report details of their problems, however, the long-term benefits would surely outweigh any perceived negative costs. As an industry, software development can only benefit from sharing knowledge and lessons learned.

Group intelligence is known to exceed that of individual members, and in a world of increasingly unanticipated scenarios and risks, we need able to effectively anticipate and solve challenges around the security and uptime challenges faced by many companies. As an industry, perhaps instead of fearing judgment, we can instead focus on the benefits of embracing our imperfect nature, while facilitating and co-creating a more efficient and productive future.

As an industry, perhaps instead of fearing judgment, we can instead focus on the benefits of embracing our imperfect nature, while facilitating and co-creating a more efficient and productive future.

If you enjoyed this piece, please share it. 

If you have something to say, please join the discussion!

Variables in Gherkin: Readable by Humans

Clear and Readable Language

The purpose of Cucumber is to provide a clear and readable language for people (the humans) who need to understand a test’s function. Cucumber is designed to simplify and clarify testing.

For me, Cucumber is an efficient and pragmatic language to use in testing, because my entire development team, including managers and business analysts, can read and understand the tests.

Gherkin is the domain-specific language of Cucumber. Gherkin is significant because as a language, it is comprehensible to the nontechnical humans that need to make meaning out of tests.

In simpler terms: Gherkin is business readable.

Why add variables to Gherkin & Cucumber?

An unfortunate side effect of Cucumber is that in order to keep things readable, especially when dealing in multiples, it is all too easy to explode the number of behind the scenes steps. To avoid the confusion caused by an exorbitant number of steps, the simplest fix is to create user variables in the Gherkin. Variables are not natively supported by Gherkin, but that does not mean they cannot be used or added. Adding variables that allow you to reduce steps and maintain readability.

In the years since the development of Cucumber, many tips and tricks have proven useful. The usage of variables is by far the most valuable of these tricks. Variables are ways to communicate data from one step to the next, to pass a reference that other steps can act upon. This is most useful when dealing with data that has hierarchical aspects. This can range from machines that have parts, customers that have orders to make, and blogs that have posts to be posted. The idea is that sooner or later you have multiples of something, let’s call them widgets, in a single test, and you need a way to communicate which widget you are referring to in your step. A simple way to solve this is to give them names, and hence variables were born.

Efficient use of variables in Gherkin keeps your Cucumber in it’s intended state: clear and readable.

Let us consider a human resources application for this example. Say we want to create different data setups to simulate the retirement of a person in the hierarchy. When a person retires we want to make sure that any reports of this person are moved to report to their manager. First, we need to define the organization:

Given a tiny organization


Given a Team Lead with 3 employees

Or, we could use some history:

Given a Team Lead
And add 3 employees

All three of the above issues have their perks and can handle the scenario simply enough. However, if we want to add a second level, an additional employee, the scenario quickly becomes more complicated. Let’s look at the 3 options with another layer of complexity.

Given a small organization

Chances are that people have to dig into the code to figure out the precise meaning. And once you have to look at the code behind the Gherkin, the efficiency value of using Cucumber is lost, as the goal is to communicate clearly and concisely.

Given a Director with a Team Lead with 3 Employees

This would be a new step, not an ideal scenario; let’s look at option 2:

Given a director
And add a Team Lead
And add 3 employees

This becomes unclear, as we no longer know where the employee is being added, it could be the team lead or the director.

Now let’s look at what we could do with variables:

Given the employees
| Var |
| director |
| team lead|
| report1 |
| report2 |
| report3 |
And ‘director’ has reports ‘team lead’
And ‘team lead’ has reports ‘report1, report2, report3’

This is a lot more verbose, it is 10 lines versus the previous 3 lines. But there is a big difference, as these 3 steps can model any hierarchy, no matter how deep. And if we wanted to, we could make the first step implicit (creating employees once they are referred to by another step).

At the start, for instance, we could do the following:

Given the employees
| Var |
| director |
| team lead |
| report1 |
| report2 |
| report3 |
And ‘director’ has reports ‘team lead’
And ‘team lead’ has reports ‘report1, report2, report3’
When ‘team lead’ retires
Then ‘director’ manages ‘report1, report2, report3’

As you can see the “Then” verification step is very easy to re-use.  We can refer to specific employees by name which allows us to do detailed verifications of specific points in the hierarchy. Not only that, now that the language is clear, there is no reason to open up the steps to see what exactly is happening behind the scenes.

As we refer to data entities by name, we can also treat them as variables; for instance, if we want to check the “totalReportsCount” and “directReportsCount” property we could say:

Given the employees
| Var |
| director |
| team lead|
| report1 |
And ‘director’ has reports ‘team lead’
And ‘team lead’ has reports ‘report1’
When ‘team lead’ retires
Then ‘director.totalReportsCount’ equals ‘1’
And ‘director.directReportsCount’ equals ‘1’

To implement this we would need to build a resolver, a class that knows about all the different objects. When the resolver is called it checks all the different repositories, in this case, employees, and asks for the variable by name, in this case, ‘director.’ When the resolver finds the variable it uses reflection to look for the property ‘totalReportsCount’ and it evaluates the expression. We now have the generic capability to validate variables and their properties.

Adding variables in the Gherkin allows your testers to create reusable steps with a minor increase to the infrastructure of the test framework. By using the resolver before evaluating a table you can even refer to variable properties inside of tables. And, naming the objects you deal with allows you to refer to them later on, making steps more generic and keeping the language clear and complete.

Meaningful Tests and Confidence

With this usage of variables in Gherkin, Cucumber remains an efficient and pragmatic language to use for your tests. Your development team, from the managers to the business analysts will be able to understand and gather value from the tests. And, as we all know, meaningful tests create confidence and prove the value in your quality assurance efforts.

If you enjoyed this article, please share!

Navigating Babylon Part II

How to Introduce DomainSpeak in Testing

First, let’s start with a quick overview of the problem I discussed in Navigating Babylon Part I. Microservices create efficiencies in development in a world dependent on remote work environments and teams. Unfortunately, the separation of workers and teams results in the tendency for microservices to encourage the development of multiple languages or dialects that obfuscate communication and further complicating testing. We have our anti-corruption layer and we don’t want to pollute our code by spilling in sub-system language.

A Domain-specific Vocabulary for Testing: DomainSpeak

There is, however, a pragmatic solution: we can build on the anti-corruption layer by creating tests in a specific language that has been created to clearly describe business concepts. We can and should create DomainSpeak, a domain-specific vocabulary or language, to be used for testing. Once we publish information in this language it can be shared across microservices and thus influence the workflow. Periodically, as is done in the English language, we may need to improve definitions of certain vocabulary, by re-defining usage and disseminating it widely, thus influence its meaning.

How will this DomainSpeak improve testing?

For integration tests, all the different dialects should not permeate your integration tests. You need to be very clear that a word can only have a single meaning. This requires a two-part process:

  1. You need to verify that you are not inconsistently naming anything inside an actual test; and,
  2. You need to do translations in an anti-corruption layer so everything inside is consistent.

What does DomainSpeak look like in a practical sense?

When you consider how brands influence pop culture, it is through language.

In the business world, marketing professionals use domain specific languages to create a brand vocabulary or a BrandSpeak. All major and influential brands and even smaller yet influential brands have a specific vocabulary, with specific definitions and meanings, to communicate their brand to the public. All communications materials are integrated into this system.

Brand specific, intentional vocabulary, has the ability to invade and permeate. Many people are completely unaware that it was a DeBeers commercial in the 1940s that created the cultural tradition “a diamond is forever.” Other examples, “Don’t mess with Texas” came from an anti-litter campaign and although we know it’s a marketing ploy, just about everyone is on board with the idea that “What happens in Vegas, stays in Vegas.” On an international level, if you order a “coke” you will most likely get a carbonated beverage, but you won’t necessarily get a Coca-cola.

As I referenced in my first discussion on Navigating Babylon, I recommend implementing a mapping layer between the microservices and the test cases. Next, when deciding to address the language used, we take it a step further. Now focus in on the language or the DomainSpeak and how this domain-specific vocabulary improves the associated output of the test cases. This means that for example, a Customer, a User, and a Client all have specific meanings and that they cannot be interchanged.

What is the process to create this language?

The initial process is an exploratory step. To create your own DomainSpeak your testing department will need to communicate regularly with the business owners and developers. Their goal will not be to dictate what words the business owners and developers use, but to learn what words already have meanings and to document these usages. The more your communicate, recognize and document adopted meanings, the more you will discover how, where and why meanings differentiate.

For instance, the business may see a Customer as a User with an active subscription, whereas a microservice might use the words interchangeably as they do not have the concept of a subscription. You will also notice that sometimes situations may give rise to conflicting meanings. A developer may have picked up the word “Client” from a third party API he integrated with for “User,” whereas the business may use “Client” for a specific construct in a billing submodule for “customers of their customer.” In such situations, to avoid confusion and broken stuff, you will need to specify which definition is to be used and possibly introduce a new concept or word to account for the narrowing of the definition. Perhaps the “customers of their customer” will now be a “vendee” instead of a “client.” Don’t dismay if there is not an existing word that accurately matches your concept, you can always create a new word or make a composite word to meet your needs.

Indeed, by being consistent and by distributing your use of language to a wide audience you can introduce new words and shape the meaning of existing words. This means that your tests have to be very close to a formal and structured form of English. This can be accomplished by using Aspect-oriented testing or by creating fluid API wrappers on top of the microservices. Aspect-oriented testing would look like this (cucumber syntax):

Given a User
When the user adds a Subscriptions
Then User is a Customer
Whereas a fluid API would be something like this (C# syntax)
User user1 = UserManager.CreateNewUser();

This creates a lot of focus on syntactic sugar* and writing code behind the scenes to ensure that your code incorporates your business logic (your test) and looks the way you want it to. Every language has their own way to solve this challenge. Even in C, you could use macros to take a pseudo code and turn it into something that would compile, and chances are that your results would be far superior to that of your current language usage.

For my uses, the cucumber syntax, with a side of extra syntactic sugar that allows me to define variables, is very effective. (I will get into this in more detail another day.) Whichever language you use, keep in mind that the goal of creating a DomainSpeak vocabulary is not to make your code look pretty, but rather to ensure that your code communicates clearly to the business and developers and that meanings are defined with precise and consistent language.

The End Goal is Efficient Quality Assurance

The goal, after all, is to improve productivity and deliver a quality product. Clear communication will not only benefit your team internally, it will also influence other teams. By communicating your results in consistently clear and concise language to a wide audience, you will influence their behavior. You will be able to efficiently respond to questions along the lines of “we use ‘customer’ for all our ‘users.’” You will also be able to easily define and answer where the rule may not hold and why you use the word you use. Again, the goal is not to dictate to folks what words to use, but to explain the meanings of words and to encourage consistent usage. Adoption will follow slowly, and over time usage will become a matter of habit. Over time services will be rewritten and you should be able to delete a few lines from your anti-corruption layer every time one gets rewritten.

*syntactic sugar allows you to do something more esthetically, but not in a necessarily new way. Different ways of saying the same thing. Just looks different. Not important because it’s not new, but significant because it makes things more readable and understandable. Clean up language/clearer code and therefore easier to find a bug.

If you enjoyed this piece please share it with your colleagues! If you have something to add, please join the discussion!

Navigating Babylon: Part I

Navigating Babylon

“What do you mean?” is a common phrase, it is the communication equivalent to a checksum; making sure that the words are interpreted correctly. This is what testing does, it ensures that the business concepts are interpreted correctly. There are bugs where the code does not do what the developer intended, these have been addressed by tools and methodologies, and are becoming rare. The bugs that we want to talk about here are the ones where it does do what the developer intended, but not what the business desired.

What better tooling has not solved is the misinterpretation of requirements.

Again we are back to the problem of working with humans. Errors that result from misinterpretations are becoming the dominant target in testing, but to find them, it is imperative that the language that describes the tests is unambiguous. This is becoming increasingly challenging as distributed teams create isolated dialects.

Let’s look at two different ways to overcome communication problems. One option is to isolate ourselves from dialects using the anti-corruption layer pattern. A second option is to make language across teams cohesive by sharing information with clear and specific language. Effective, clear communication flows naturally and increases efficiency, which is our ultimate goal.

Microservices are adapted to today’s remote workforce and they are increasingly used because of their efficiency. Microservices on the development end reduce complexity and decrease the need for constant communication between teams. An unfortunate side effect is that over time individual microservices naturally go their own way and as a byproduct, a microservice specific code is created using unique language/dialect/slang. Microservices tendency to promote the creation of distinct vocabulary and meanings increases the likelihood for mistakes/bugs and broken tests.

Improving on the Monoliths of the Past

One of the benefits of monolithic architecture is that everyone spoke the same language. Yes, the user object was enormous and was far more complex than necessary, but everyone spoke the same language because everyone used the same object. Everyone agreed that User.Id was the identifier.

The intention behind microservices is to make life simpler and reduce complexity, but at the same time, they have increased complexity by creating many teeny distinct monoliths that naturally encourage speaking slightly different languages or dialects. We have gone from the Pyramids to the Microservices Towers of Babel. Another analogy might be to say that we have created geographically isolated human populations that encourage the development of distinct dialects and regionalisms. Where we once had User.Id, we may now find one microservice has chosen User.Identifier, while the next microservice has settled on Customer.Id, and yet in another one we find Agent.Uid.

Despite this drawback, Microservices are still a natural efficiency.

The solution is not to eliminate microservices, as they fit the increasingly popular corporate structure of remote work. Conway’s Law states that companies design software that mirror their internal structure, and as the individual developers become more isolated by remote work it makes sense that an architecture is used that supports the build out of small isolated components. Microservices are born as the way to accommodate the distributed nature of teams. However, as each service is built, we start to see slightly different terminology, slightly different assumptions, methods for error handling, exception behaviors and so on.

The Dangers of Misinterpretation

Writing the code for integration tests within microservices, we often end up with something like this:

Assert.Equal (a.SerialNumber, b.AssetId);

One service calls it a serial number, the other an AssetId. Technically, it is just a small issue that is easily understood in a conversation, but potentially grounds for a larger problem. Problems like this are only amplified when the respective developers work in different parts of the world and several different time zones. Sure, the company can clock development time 24 hours per day, but developer A has to wait 12 hours for developer B to get in and by that time developer A is already back in bed. And so, developers put in their “one line fix,” but the test cases that integrate these services tend to repeat this incongruent language.

There are patterns that solve this, the Adapter design pattern or the Anti-Corruption Layer domain driven design concept both look at solving these issues. The core idea is that you take the service that speaks a different dialect and wrap it in some code that takes care of all of your mapping for you. Effectively creating a layer where you put in all your “one line fixes.”

If one service thinks a field is a GUID, and the other a String, you convert it in there to a common format across all tests. The same goes for nullable fields, or fields named slightly different (Id, id, Uid, UserId, Identifier, UserIdentifier, uIdentifier, etc.). Create a layer for each service, even if there is no special mapping, to isolate your test cases from all the different dialects. Then create your own version of the message objects that have consistent naming, and then have code that maps your test objects to the microservice objects. If you use reflection to map the fields that match, you should be able to achieve this with relatively little code.

Now you might look at this process and think: “Wouldn’t it be easier if they just all named it the same thing?” Unfortunately, the answer is actually “no.” This problem is the direct result of the communication structure of an organization. To change the structure of communication you will have to change the organization, which is rarely a pragmatic solution.

Harnessing Babylon

Let’s look at two plausible solutions that are significantly more pragmatic than changing your overall organizational structure:

  1. Create a Department of Naming Stuff (Dons): Need a new field, ask the Dons for the name to use. They will look to see if it is already in the big book of names for stuff and, if not, they will add it and name it.
  2. Direct and consistent communication: Have developers communicate directly. If people talk to each other they start to adopt the same words for the same things, the closer the interaction the closer their use of words correlates.

Option 1

Languages (the human used kind, not the computer used kind) have this same problem. The Dons is comparable to the official Oxford Dictionary. Of course, we may have the problem that the US and UK don’t see eye to eye on what dictionary to use; especially on the common informal words. The result is that in most scenarios, the experience of the Dons is likely to be heavy handed. And, the day the Dons makes a poor naming decision, the result will be to create a ridicule of the concept.

By the time the problem is advanced to the stage where this looks like a good idea you will find that retrofitting to a standard language is cost prohibitive. Especially as the problems it causes means the project is buggy, late and over budget.

Option 2

And, given our pretense for remote work, the second option is inherently impractical as developers work in different locations and across time zones. Option two might also continue to promote regionalisms, slang, and dialects that develop when people who live in close proximity start to adapt words to have specific meanings, meanings which are often not shared across the wider organization.

Breaking out of Babylon

Fortunately, we are no longer limited to two solutions. We have a third solution that first entered the scene with radio and television, but that has now become nearly universal thanks to the digital age. This third type of technology is a unifying language. In daily life, it is what is known to us as Pop Culture.

A Unifying Language

Pop culture targets media for consumption by large segments of the American population. By favoring words with specific meanings, pop culture means that widely distributed words and meanings become adopted not only into American English but around the world. Netflix has members located in 190 countries around the world and Facebook has nearly 2 billion worldwide members. We can google truthiness, take a selfie, and Facebook the results. Pop culture introduces new words, redefines words and narrows the meaning of words by repetitive exposure to specific audiences.

We can do the same thing with microservices: we can create our own culture. We can create and distribute test results with consistent specific meanings. By creating our own uses and definitions, we can appropriate the language that we need and define it for our specific purposes. QA then becomes the company’s “pop” culture influencer and over time effectively influences the meanings that people associate with specific words. This is not a quick process, and measuring change will be difficult.

To be continued next week: How to Create a domain-specific vocabulary or DomainSpeak.

Please share this article and join the discussion in the comments!