Testing

Making Sense of Test Results


The goal of Quality Assurance is to ensure that your product meets requirements, is debugged, and delivers happy customers. To obtain satisfied customers your testing and QA processes must provide meaningful information during the development lifecycle and after that to efficiently assist engineers and decision-makers in informing choices.

Dashboards as a Tool

The challenge for many is making sense of the data. As a tool, dashboards serve as a functional way to deliver pretty information with a simple overview. Unfortunately, they can also provide too much data, hide essential details, or create easily ignorable aggregates.

Let’s look at a TFS dashboard from Visual Studio Reporting Tools:

This particular dashboard looks great and provides meaningful readings: things are good or bad (green or red), and it is easy to see that things are ok. In reality, dashboards rarely look this nice. Many teams see prominent displays of yellow and orange with little substantial evidence of green and red.

Where Dashboards Go Wrong

Flakey tests: When there are a lot of tests, with concurrency issues for instance, that fail sometimes, there will inevitably be the occasional failing test. This situation results in teams whose dashboard delivers too much unhelpful orange.

False Sense of Security: Dashboards can also provide a false sense of security. If you set a goal to keep errors under 1%, but then you only look at the percentage number of errors, and you fail to check out what is failing; you will likely miss that the same key defects may have been failing for quite some time.

A Call to Action

In the end, the goal of a dashboard is not to convey information, but rather to get people to act (assuming the data reported indicate something bad). The dashboard when thoughtfully constructed will thus be a warning flag and a call to action, but only if it says something that causes people to take action.

If the dashboard is not practical, people will ignore what it displays.

It is like the threat level announcements at the airport or the nightly news, you know all is not well, but you either learn to tune it out or find another way to numb the information. Developers that get told every day that there are some failures will do the same, sure they spend some time chasing unreproducible failures, but after awhile they will stop paying attention.

A Good Dashboard Generates an Emotional Response

The genius of a thoughtfully constructed dashboard is that it generates an emotional response.

Yay! Good!

Or

Oh. No, not again!

So how do we get people back to the emotional reaction of a simple read / green dashboard in a world where things are noisy?

This question leads to the creation of a new dashboard that focuses on the history of defects and failures. In this case, we categorize failures that are new as orange, consecutive failures over multiple days as red, with red failures that exist for several days then going black.

Createing Useful Dashboards

Screenshot © Possum Labs 2018

It is easy to see from the chart that even though there are some flay tests, there are also a few consistent failures. And it is easy to see that we have a few regular failures over the last few weeks.

This occurrence creates an opportunity to start drawing lines in the sand, for example, setting the goal to eliminate any black on the charts. Even when getting to a total pass rate seems infeasible for a project, it should be more manageable to at least fix the tests that fail consistently.

This type of goal is useful because it is something that can get an emotional response, people can filter the signal from the noise visually, so they get to the heart of the information, which is that there are indeed failures that matter. When we make it evident how things change over time, if things get worse, people will notice.

Dashboards are a Tool — not a Solution

Each organization is different, and each organization has its challenges, getting to a 100% pass rate is much easier when it is an expectation from the beginning of the project, but often systems were designed years before the first tests crept in. In those scenarios, the best plan is to create a simple chart and listen to people when they tell you it is meaningless. These are the people that give you your actual requirements; these are the people who will spell out where and what to highlight to get an emotional response to the dashboard data.  

Dashboards do not always convey meaningful data about your tests. If they lack thoughtful construction, it is likely they may fail at getting people to act when the information is bad. Bad results on your panel should be a call to action, a warning flag, but the results need to mean something or your people simply ignore and move on.

At Possum Labs, meaningful dashboards are not a solution, but instead just one of many tools available to assure that your QA delivers effective and actionable data to your teams.

Effective QA is Not an Option it’s a Necessity: Here’s How to do it Right

As has long been the case in the software and technology industry, immigration is an important source for accessing Quality Assurance people.

Part of the reason is that, as per usual, QA isn’t always a job that attracts US-based developers. The US job market is already competitive for developers and moving developers on your team to quality assurance is often undesirable for a number of reasons. The result is that relocating QA developers to the USA is in many cases the most satisfactory solution for the industry.

Unfortunately, barriers to immigration continue to drive-up demand for qualified QA, which only further exacerbates the shortage of people with an appropriate development background already in the US.

 

Quality Assurance is not Optional

Compounding the problems experienced by a physical shortage of qualified developers for QA is the fact that some decision makers continue to consider QA an “optional” line item in their budget. Unfortunately for them, QA and the surrounding processes are anything, but “optional” and a good quality assurance engineer is a key player on any team.

In companies where QA is an accepted need, it is still often considered more of a necessary evil than a definite benefit. QA is too easily blamed when a project that’s been on schedule suddenly gets hung up in testing. Another common problem appears when development teams pick up speed after implementing workflow changes, only to discover that QA is still a bottleneck keeping them from delivering a product to their customer.

 

A Look at the Current Situation

As we’ve seen, numerous factors contribute to the myths surrounding Quality Assurance that contribute to this questionable climate that makes some engineers shy away from the moniker or even the field.  Moreover, we are experiencing an actual shortage of qualified engineers, which means that QA in many instances ends up being not an afterthought, but rather a luxury.

Immigration and work status has been hot-topics for the last few years. And regardless of where you fall on the political spectrum, if you work in software, you’ve likely experienced first hand the effect on the jobs market.

For those of you that have been lucky enough to hire engineers, many of you may have also been unlucky enough to discover that said engineer has to go back to India (or wherever his or her country of origin might be). And what started out as a short trip, quickly devolves into a lengthy, even a 6-month long, process to renew or regularize an H-1B visa, following changes made to the requirements in 2017.

Whether your experience with the dwindling QA applicant pool is first hand or anecdotal,  here are some statistics for you to munch on:

The US Bureau of Labor and Statistics expects the market for software developers to blow-up by 24% from 2016 to 2026. That means a need for approximately 294,000 additional software developers in an already tight market. If you think it’s hard to convince an engineer to join your QA team now, just wait and see what it will be like in 2026.

We can’t know for sure the number of H-1B visas currently held up due to changes in requirements, but this article does a decent job of discussing the demand and actual need for H-1B Visa’s in for the USA with a focus on the State of Massachusetts. If you’d like to know more, I’d suggest taking a look; however, for our purposes, I don’t think importing QA staff is necessarily the answer.

So, you need to have QA, but you can’t hire qualified staff to take care of the work. What can you do?

The Foundations of Quality Assurance

Before I answer this question, let’s take a look at why Quality Assurance exists in the first place. From a business perspective, there are a few things that pretty much every customer expects from a software product. The following three expectations are the crucial reasons to the story behind why quality assurance is not optional::

  1. Customers expect that the programs will work as requested and designed, and within the specified environments;
  2. They hope that software will be user-friendly;
  3. And, they assume that software will have been successfully debugged: meaning that QA must deliver a product that is at the least free of the bugs that would result in numbers 1 or 2 becoming false.

Historically, software teams were small, but since the early 80s, due to the need to scale quickly and keep up with changing requirements and other advances in technology and globalization, we’ve experienced rapid growth in the size of development teams and companies. This growth has led to implementing a wide variety of tactics from workflow solutions, think Waterfall or Agile, to different methods of increasing productivity and efficiencies, such as offshore teams and microservices.

Take a look at this simple chart approximating the increase in development team size from the original release of Super Mario Brothers in 1985 to Super Mario World in 1990 and Super Mario 64 in 1996. (Noting that in the credits, by 1996 the occasionally thank entire teams, not just individuals, so actual number is likely even higher).

Super Mario Release Team Size

© Possum Labs 2018

Where we haven’t (at least in the USA) been able to keep up is in the training of new software engineers. QA departments, regardless of size, are challenged to carry-out all the different required processes to follow software through the development lifecycle to delivery and maintenance (updates), while also keeping abreast of changes to technology and integrations.

A misfortunate result of this shortage of QA engineers is that the point in the development cycle where most companies fall short is in testing. And, yet, the ability to provide useful and meaningful testing is crucial to the successful delivery of quality assurance to one’s client, whether building an in-house product, such as for a financial institution, or a commercial product for the public market.

While offshore teams may be a solution for some companies, many companies are too small to make offshore building teams practical or cost-effective.

What’s more is that many engineers tend to be good at one thing — development — they may not have a good sense of your organization’s business goals or even an understanding of what makes a good customer experience. And while your high paid development staff might excel at building clever solutions, it doesn’t necessarily mean that they also excel at testing their own goods. And do you really want to pay them to do your testing, when their time could be better invested in innovations and features? At Possum Labs we’ve determined that it is often most efficient to design workflows and teams to adjust to the people you have.

This disconnect between development requirements and a full understanding of business goals is in fact often the culprit in a pervasive disconnect between testing and business outcomes. What do I mean by disconnect? Let’s consider the four following statements and then talk about some real-life examples:

  • Users prefer seamless interfaces, intuitive commands, and technology that makes them feel smart, not dumb.
  • Businesses prefer software that assures their business goals are met and that technology allows their employees to work smarter with greater efficiency thus promoting growth and profit.
  • Today the average person is adequately adept and familiar with technology to know when your software exhibits even moderately lousy UX. And companies can also experience public shaming via social media when they make a particularly dumb or inopportune mistake.
  • And then there are the security risks.

In 2017 we saw several major security snafus experienced by large corporations that from the few details publicized were the direct result of inaction on the part of the decision-makers, despite being notified by engineering.

One might think that the decision makers, while moderately acknowledging the risk, may have simply gambled that nothing would come of the risks while taking steps to protect themselves.

I would like to go a step further. I’d wager that everyone involved fell victim to a set of common testing pitfalls.

Indeed, one of the most challenging aspects of testing is figuring out not only how to effectively and efficiently create and run tests, but most importantly to figure out how to confidently deliver meaningful results to the decision makers. Whether you are a software business or a business that uses software, successful quality assurance is crucial to the long-term health and success of your business.

 

Let’s do a quick recap of what we’ve covered so far:

  1. There is a shortage of qualified test engineers.
  2. Users want products they can rely on and that are friendly to use.
  3. Companies want products they can trust that improve efficiencies, their bottom line and that, of course, make their clients happy.
  4. It is difficult to create tests that deliver meaningful results when the testing is done by engineers that don’t necessarily understand the businesses end goals.
  5. Decision makers don’t want to understand the tests; they want to have meaningful results so that they can make effective decisions.

 

So what if we could solve all of these problems at once?

This type of solution is what Possum Labs achieves through the clever use of solutions and tools that integrate with your existing systems, processes, and people. We build out quality assurance so that anyone who understands the business goals can successfully carry-out testing and efficiently uses the results.

Not only does this solve problems 1 to 5 above, but it is also, in fact, a super solution in that it prevents most companies from having to hire or train new developers. Instead, you need to hire for people who with a keen understanding of your business and that can be trained to work with your tools. Possum Labs methods allow you to implement and upgrade your quality assurance — sometimes even reducing your staffing load — while delivering better and more meaningful results, so that your end of service recipients get better services or better products than before.

 

How does Possum Labs do this?

Each of our solutions varies a bit from company to company, but in general, we use several tools, including proxies and modules (think Lego) to make it so existing tests can be modified and new tests written with simply reorganizations of the “bricks.” This focus on custom solutions allows a non-technical individual with a solid understanding of business goals to generate tests and result that deliver meaningful results with confidence for him or her to share with decision makers.

The result is that testing bottlenecks open up allowing for a more efficient flow of information and better feedback through all channels. Products are delivered faster. Information flows smoothly. Better decisions are made, and efficiencies are gained. Developers can focus on development and decision makers in achieving their strategic goals. Meanwhile, you’ve got happy customers, and everyone can get a good night’s rest.

3 Risks to Every Team’s Progress and How to Mitigate

When looking at improving performance the first thought is often to increase the size of our development team; however, a larger group is not necessarily the only or the best solution. In this piece, I suggest several reasons to keep teams small and why to stop them from getting too tiny. I also look at several types of risk to consider when looking at team size: how team size effects communication, and the possibility of individual risk and systematic risk.

Optimal Team Size for Performance

The question of optimal team size is a perpetual debate in software organizations. To adjust, grow and develop different products we must rely on various sizes and makeups of teams.

We often assume that fewer people get less done, which results in the decision of adding people to our teams so that we can get more done. Unfortunately, this solution often has unintended consequences and unforeseen risks.

When deciding how big of a team to use, we must take into consideration several different aspects and challenges of team size. The most obvious and yet most often overlooked is communication.

Risk #1: Communication Costs Follow Geometric Growth

The main reason against big teams is communication. Adding team members results in a geometric growth of communication patterns and problems. This increase in communication pathways is easiest illustrated by a visual representation of team members and communication paths. 

Geometric Growth of Communication Paths

Bigger teams increase the likelihood that we will have a communication breakdown.

From the standpoint of improving communication, one solution that we commonly see is the creation of microservices to reduce complexity and decrease the need for constant communication between teams. Unfortunately, the use of microservices and distributed teams is not a “one size fits all” solution, as I discuss in my blog post on Navigating Babylon.

Ultimately, when it comes to improving performance, keep in mind that bigger is not necessarily better. 

Risk #2: Individual Risk & Fragility

Now a larger team seems like it would be less fragile because after all, a bigger team should be able to handle one member winning the lottery and walking out the door pretty well. This assumption is partially correct, but lottery tickets are usually individual risks (unless people pool tickets, something I have seen in a few companies).

When deciding how small to keep your team, make sure that you build in consideration for individual risk and be prepared to handle the loss of a team member.

Ideally, we want to have the smallest team as is possible while limiting our exposure to any risk tied to an individual. Unfortunately, fewer people tend to be able to get less work done than more people (leaving skill out of it for now).

Risk #3: Systematic Risk & Fragility

Systematic risk relates to events that will affect multiple people in the team. Fragility is the concept of how well structure or system can handle hardship (or changes in general). Systemic risks are aspects shared across the organization, this can be leadership, shared space, or shared resources.

Let’s look at some examples:

  • Someone brings the flu to a team meeting.
  • A manager/project manager/architect has surprise medical leave.
  • An affair between two coworkers turns sour.

All of these events can grind progress to a halt for a week (or much more). Events that impact morale can be incredibly damaging as lousy morale can be quite infectious.

In the Netherlands, we have the concept of a Baaldag (roughly translated as an irritable day) where team members limit their exposure to others when they know they won’t interact well. In the US with the stringent sick/holiday limits, this is rare.

Solutions to Mitigate Risk 

Now there are productive ways to minimize risk and improve communication. One way to do this is by carefully looking at your structure and goals and building an appropriate team size while taking additional actions to mitigate risk. Another effective technique for risk mitigation is through training. You shouldn’t be surprised, however, that my preferred method to minimize risk is by developing frameworks and using tests that are readable by anyone on your team.

Unrealistic Expectations: The Missing History of Agile

Unrealistic expectations or “why we can’t replicate the success of others…

Let’s start with a brain teaser to set the stage for questioning our assumptions.

One day a man visits a church and asks to speak with the priest. He asks the priest for proof that God exists. The priest takes him to a painting depicting a group of sailors, safely washed up on the shore following a shipwreck.

The priest tells the story of the sailors’ harrowing adventure. He explains that the sailors prayed faithfully to God and that God heard their prayers and delivered them safely to the shore.

Therefore God exists.

This is well and good as a story of faith. But what about all the other sailors who have prayed to God, and yet still died? Who painted them?

Are there other factors that might be at play?

When we look for answers, it’s natural to automatically consider only the evidence that is easily available. In this case, we know that the sailors prayed to God. God listened. The sailors survived.

What we fail to do, is look for less obvious factors.

Does God only rescue sailors that pray faithfully? Surely other sailors that have died, also prayed to God? If their prayers didn’t work, perhaps this means that something other than faith is also at play?

If our goal is to replicate success, we also need to look at what sets the success stories apart from the failures. We want to know what the survivors did differently from those that did not. We want to know what not to do, what mistakes to avoid.

In my experience, this is a key problem in the application of agile. Agile is often presented as the correct path; after all lots of successful projects use it. But what about the projects that failed, did they use Agile, or did they not implement Agile correctly? Or maybe Agile is not actually that big a factor in the success of the project?

Welcome to the history of what is wrong with Agile.

Consider this, a select group of Fortune 500 companies, including several technology leaders decides to conduct an experiment. They hand pick some people from across their organization to complete a very ambitious task. A task of an order of magnitude different from anything they’d previously attempted and with an aggressive deadline.

Question 1: How many do you think succeeded?

Answer 1: Most of them.

Question 2: If your team followed the same practices and processes that worked for these teams do you think your team would succeed?

Answer 2: Probably not.

The Original Data

In 1986, Hirotaka Takeuchi and Ikujiro Nonaka published a paper in the Harvard Business Review titled the “The New New Product Development Game.” In this paper, Takeuchi and Nonaka tell the story of businesses that conduct experiments with their personnel and processes to innovate new ways to conduct product development. The paper introduces several revolutionary ideas and terms, which most notably developed the practices that we now know as agile (and scrum).

The experiments, run by large companies and designed for product development (not explicitly intended for software development), addressed common challenges of the time regarding delays and waste in traditional methods of production. At the root of the problem, the companies saw the need for product development teams to deliver more efficiently.

The experiment and accompanying analysis focused on a cross-section of American and Japanese companies, including Honda, Epson, and Hewlett-Packard. To maintain their competitive edge each of these companies wished to rapidly and efficiently develop new products. The paper looks at commonalities in the production and management processes that arose across each company’s experiment.

These commonalities coalesced into a style of product development and management that Takeuchi and Nonaka compared to the rugby scrum. They characterized this “scrum” process with a set of 6 holistic activities. When taken individually, these activities may appear insignificant and may even be ineffective. However, when they occur together as part of cross-functional teams, they resulted in a highly effective product development process.

The 6 Characteristics (as published):

  1. Built-in instability;
  2. Self-organizing project teams;
  3. Overlapping development phases;
  4. Multilearning;
  5. Subtle control;
  6. And, organizational transfer of learning.

What is worth noting, is what is NOT pointed out in great detail.

For instance that the companies hand-picked these teams out of a large pool of, most likely, above average talent. These were not random samples, they were not even companies converting their process, these were experiments with teams inside of companies. The companies also never bet the farm on these projects, they were large, but if they failed the company would likely not go under.

If we implement agile, will we be guaranteed success?

First, it is important to note that all the teams discussed in the paper delivered positive results. This means that Takeuchi and Nonaka did not have the opportunity to learn from failed projects. As there were no failures in the data set, they did not have the opportunity to compare failures with successes, to see what might have separated the successes from failures.

Accordingly, it is important to consider that the results of the study, while highly influential and primarily positive, can easily deceive you into believing that if your company implements the agile process, you are guaranteed to be blessed with success.

After years in the field, I think it is vitally important to point out that success with an agile implementation is not necessarily guaranteed. I’ve seen too many project managers, team leads, and entire teams banging their heads up against brick walls, trying to figure out why agile just does not work for their people or their company. You, unlike the experiments, have a random set of people that you start with, and agile might not be suited for them.

To simplify this logical question; if all marbles are round, are all round things marbles? The study shows that these successful projects implemented these practices, it did not claim these practices brought success.

What is better: selecting the right people or the right processes for the people you have?

Consider that your company may not have access to the same resources available to the companies in this original experiment. These experiments took place in large companies with significant resources to invest. Resources to invest in their people. Resources to invest in training. Resources to invest in processes. Resources to cover any losses.

At the outset, it looks like the companies profiled by Takeuchi and Nonaka took big gambles that paid off as a result of the processes they implemented. However, it is very important to realize that they, in fact, took very strategic and minimal risk, because they made sure to select the best people, and did not risk any of their existing units. They spun up an isolated experiment at an arm’s length.

If you look at it this way, consider that most large multinational companies already have above average people, and then they cherry pick the best suited for the job. This is not your local pick-up rugby team, but rather a professional league. As large companies with broad resources, the strategic risks they took may not be realistic for your average small or medium-sized organization.

The companies profiled selected teams that they could confidently send to the Olympics or World Cup. How many of us have Olympians and all-star players on our teams? And even if we have one or two, do we have enough to complete a team? Generally, no.

The Jigsaw Puzzle: If one piece is missing, it will never feel complete.

Takeuchi and Nonaka further compare the characteristics of their scrum method to that of a jigsaw puzzle. They acknowledge that a single piece of the puzzle or a missing piece mean that your project will likely fail. You need all the pieces for the process to work. They neglect to emphasize that this also means that you need the right people to correctly assemble the puzzle.

The only mention they make regarding the people you have is the following:

“The approach also has a set of ‘soft’ merits relating to human resource management. The overlap approach enhances shared responsibility and cooperation, stimulates involvement and commitment, sharpens a problem-solving focus, encourages initiative taking, develops diversified skills, and heightens sensitivity toward market conditions.”

In other words, the solution to the puzzle is not only the six jigsaw puzzle pieces, but it is also your people. These “soft merits” mean that if your people are not able to share responsibility and cooperate, focus, take the initiative, develop diverse skills and so on, they aren’t the right people for an agile implementation.

If you don’t have all the pieces, you can’t complete the puzzle. And if you don’t have the right people, you can’t put the pieces together in the right order. Again, you might be round, but you might not be a marble.

Human-Centered Development for the People You HAVE

As with any custom software development project, the people who implement are key to your project’s success. Implementing agile changes the dynamics of how teams communicate and work. It changes the roles and expectations of all aspects of your project from executive management to human resources and budgeting.

Agile may work wonders for one company or team, but that success doesn’t mean that it will work wonders for YOUR team. Especially if all stakeholders do not understand the implications and needs of the process or they lack the appropriate aptitudes and skills.

In other words, if these methods don’t work for your people, don’t beat up yourself or everyone else. Instead, focus on finding a method that works for you and for your people.

Agile is not the only solution …

Why do people select agile? People implement agile because they have a problem to solve. However, with the agile approach managers need to step back and let people figure things out themselves. And that is not easy. Especially when managers are actively vested in the outcome. Most people are not prepared to step back and let their teams just “go.”

Maybe you have done the training, received the certifications, and theoretically “everyone” is on board. And yet, your company has yet to see Allstar success. Are you the problem? Is it executive management? Is it your team? What is wrong?

I cannot overemphasize that the answer is as simple as the people you have. Consider that the problem is unrealistic expectations. The assumption when using agile and scrum is that it is the best way to do development, but what if it is not the best way for you?

If you don’t have the right people or the right resources to implement agile development correctly, then you should probably do something else. At the same time, don’t hesitate to take the parts of agile that work for you. 

Citations:

Nonaka, H. T. (2014, August 01). The New New Product Development Game. Retrieved July 19, 2017, from https://hbr.org/1986/01/the-new-new-product-development-game

A call for a "FAA" of Software Development

Mayday! Bad UX in Airplanes

Software development teams constantly learn lessons. Some of these lessons are more significant than others.
Due to the fact that there is not a universal method or process for sharing lessons learned, many lessons are learned in parallel within different companies. Many times different teams in our very own companies make the same mistakes over and over again, simply because there is not a shared repository of knowledge and experience.

Even when a developer and or quality assurance professional attempts to research when, where and why things have gone wrong, it is very difficult to find documented and pertinent information. 

These unnecessary mistakes comprise avoidable expenses to both consumers and companies, and should at a certain price point, especially a public price point, make it very useful to have a public method for accessing “lessons learned.”

Not just a report of the problematic lines of code, but inclusive of an analysis of the effects of that code (who, what, when, where, why and, how).

What’s more, in addition to the time and financial costs of problematic code, there is also risk and liability to consider. From privacy to financial health to business wealth, the risk is great enough that I propose the creation of an organization, similar to the FAA, for documenting, reporting and making software “travel” safer.

There are many examples of bad UX to be found. Just for fun, let’s look at some real life examples of lessons learned in software code in regards to Shareholder Value and Liability in Airline Travel.

Mayday: Bad UX in Airplanes

As often happens in life, not so long ago I decided I needed a change of pace in my evening routine and one way or another and I stumbled upon Mayday a show about air crash investigations. My natural human curiosity into bad things that happen to other people caught me at first, but after a few episodes, the in-depth analysis of all the various factors that cause airplane crashes really caught my attention. As a testing and QA expert, I found it disheartening to see the frequency with which bad UX is a contributing factor to airplane crashes. In most cases, the bad UX is not the instigating cause (phew!), but the stories in the show make it evident that bad UX can easily make a bad situation worse.

Scenario #1: Meaningful Warnings

For instance, on a Ground Control display, there is a 7 character field next to each aircraft indicating its expected altitude in comparison to its published height (theoretically actual altitude). In the case of one episode, an airplane intended to be at an altitude of 36o reported flying at 370, as indicated by the display which read “360-370.”

If the broadcast stopped the display would be “360Z370.” This would indicate a final broadcast of 370 versus an expected broadcast of 360. If the broadcast stopped with this discrepancy shown the display did not set off an alarm or even display a color change, just the character “Z” in the middle of a 7 digit string that implies that half of the rest of the numbers is garbage.

This piece of information on its own is not terribly exciting nor is it something that could on its own cause an aircraft to go down. Furthermore, there is not much of a reason for a system tracking altitude to simply go off.

A bad UX design process uncovered by “hidden” warnings

That is, of course, unless the button to activate or deactivate the system is placed behind the pilot’s footrest. And the display for the error message is then placed next to the button; presumably down below the foot.

No audible alarm, no flashing lights, nothing else of note to catch a pilot’s attention. In this scenario (based on a true story) the system can easily be turned off accidentally and without warning. Then just add to the mix another plane flying in the same vicinity and the unfortunate result is many dead people spread out over the jungle. The resulting system update is the addition of an audible alarm if and when the system is switched off.

Scenario #2: How to Handle Bugs

Another episode profiles an airplane crash precipitated by an airspeed sensor that bugged. As in a bug appears to have built a nest in the sensor. In this situation, the system created contradictory warnings, while also leaving out expected warnings.

For instance, the plane went from warning of excess speed to stall warnings. The inconsistencies sadly managed to confuse the pilots into their deaths.

Now it is required (standard) flight training to include how to respond when the cockpit says: “Hey! We are receiving conflicting airspeeds!”

Scenario #3: Half-Full vs. Empty

Another show profiles a UX design process failure that came about on the maintenance side. Somehow, two similar looking modules for gauging fuel reserves came into use for similar, but different models of an airplane.

Initially, it appeared that the gauges could be installed and worked interchangeably, even going so far as to updating readings. The problem is that the readings would be a bit off — well, let’s call it like it is — completely off.

If you put the incorrect model in the wrong model of plane, the gauge will read half full, when the tank is actually empty. An easy fix turned out to be putting a key into the socket making it so the two gauges are no longer interchangeable. Proof that good UX is not just about design, but about also tracking lessons learned.

Of course, the fix, unfortunately, did not get implemented until a plane crashed into the sea before reaching land (and an airport).

Documenting Problems and Solutions

When planes go down there is a loss of human life and great financial expense. This means that all of these issues have been addressed and fixed, and they likely won’t happen again. Documentation and prevention are one of many reasons that airplanes really don’t go down very often these days. Or at least in they don’t go down very often in the markets where shareholder interest and liability make downed craft unacceptable. And, significant investment is made in the UX design process.

From my perspective, the most interesting aspect of the show Mayday is that it highlights many small UX problems discovered only because people died. The death of a person is something that has a near infinite cost associated with it in Western civilization and therefore causes a very large and detailed process to get down to root causes and implement changes. Especially when it is the death of 100 people at once. Mayday is a great show for learning how to analyze problems, while also giving viewers a deep appreciation for good UX design.

Looking at the small issues that are the root causes of these airline crashes and the unremarkable UX changes made to prevent them; it really drives home the number of small UX errors that cause small non-life threatening losses that take place every day due to lousy UX. Adding up all of these small losses might actually result in a quite significant financial cost for the various businesses involved.

And although the majority of them don’t directly cause the loss of life, life might be better (safer, less stressful, more financially secure, etc.) if these small UX design flaws could be reliably flagged and a system put in place to prevent them from recurring.

Standard tracking and reporting of software failures

This brings us back to my statement at the beginning of this piece regarding the need for a body or a system to track and report lessons learned. In airline travel in the USA, we have the Federal Aviation Administration (FAA) to make sure that airline travel is safe.

The purpose of the FAA is the Following: The Federal Aviation Administration (FAA) is the agency of the United States Department of Transportation responsible for the regulation and oversight of civil aviation within the U.S., as well as operation and development of the National Airspace System. Its primary mission is to ensure the safety of civil aviation.

Now imagine, we had a Federal Software Administration, whose primary mission is to ensure the safety and reliability of software? What if we held ourselves accountable to report and to document not only when bad UX precipitated an airplane crash, but also when people would be held to a reporting standard for all kinds of software defects that cause significant loss?

Software Asset Managment (SAM) already exists as a business practice within some organizations, but not in enough. And there is still not any central organization to pull the information documented by business with successful SAM practices.

In 2016, the Software Fail Watch identified over a billion dollars in losses just from software failures mentioned in English-language news sources and they estimate that this is only a “scratch on the surface” of actual software failures worldwide. There is much to be debated here, but if a US based or even an international agency simply started to record failures and their causes, without bothering to get into the official business of issuing of guidelines, the simple acts of investigation and reporting could create an opportunity for significant and widespread improvements in design.

I think we can all agree that our industry could greatly benefit from the creation of an efficient and productive repository for the sharing of lessons learned. The strategy of sharing lessons learned, lessons that are often relegated to oral histories shared between developers would greatly benefit our industry.

Companies may not initially be motivated to report details of their problems, however, the long-term benefits would surely outweigh any perceived negative costs. As an industry, software development can only benefit from sharing knowledge and lessons learned.

Group intelligence is known to exceed that of individual members, and in a world of increasingly unanticipated scenarios and risks, we need able to effectively anticipate and solve challenges around the security and uptime challenges faced by many companies. As an industry, perhaps instead of fearing judgment, we can instead focus on the benefits of embracing our imperfect nature, while facilitating and co-creating a more efficient and productive future.

As an industry, perhaps instead of fearing judgment, we can instead focus on the benefits of embracing our imperfect nature, while facilitating and co-creating a more efficient and productive future.

If you enjoyed this piece, please share it. 

If you have something to say, please join the discussion!

Navigating Babylon Part II

How to Introduce DomainSpeak in Testing

First, let’s start with a quick overview of the problem I discussed in Navigating Babylon Part I. Microservices create efficiencies in development in a world dependent on remote work environments and teams. Unfortunately, the separation of workers and teams results in the tendency for microservices to encourage the development of multiple languages or dialects that obfuscate communication and further complicating testing. We have our anti-corruption layer and we don’t want to pollute our code by spilling in sub-system language.

A Domain-specific Vocabulary for Testing: DomainSpeak

There is, however, a pragmatic solution: we can build on the anti-corruption layer by creating tests in a specific language that has been created to clearly describe business concepts. We can and should create DomainSpeak, a domain-specific vocabulary or language, to be used for testing. Once we publish information in this language it can be shared across microservices and thus influence the workflow. Periodically, as is done in the English language, we may need to improve definitions of certain vocabulary, by re-defining usage and disseminating it widely, thus influence its meaning.

How will this DomainSpeak improve testing?

For integration tests, all the different dialects should not permeate your integration tests. You need to be very clear that a word can only have a single meaning. This requires a two-part process:

  1. You need to verify that you are not inconsistently naming anything inside an actual test; and,
  2. You need to do translations in an anti-corruption layer so everything inside is consistent.

What does DomainSpeak look like in a practical sense?

When you consider how brands influence pop culture, it is through language.

In the business world, marketing professionals use domain specific languages to create a brand vocabulary or a BrandSpeak. All major and influential brands and even smaller yet influential brands have a specific vocabulary, with specific definitions and meanings, to communicate their brand to the public. All communications materials are integrated into this system.

Brand specific, intentional vocabulary, has the ability to invade and permeate. Many people are completely unaware that it was a DeBeers commercial in the 1940s that created the cultural tradition “a diamond is forever.” Other examples, “Don’t mess with Texas” came from an anti-litter campaign and although we know it’s a marketing ploy, just about everyone is on board with the idea that “What happens in Vegas, stays in Vegas.” On an international level, if you order a “coke” you will most likely get a carbonated beverage, but you won’t necessarily get a Coca-cola.

As I referenced in my first discussion on Navigating Babylon, I recommend implementing a mapping layer between the microservices and the test cases. Next, when deciding to address the language used, we take it a step further. Now focus in on the language or the DomainSpeak and how this domain-specific vocabulary improves the associated output of the test cases. This means that for example, a Customer, a User, and a Client all have specific meanings and that they cannot be interchanged.

What is the process to create this language?

The initial process is an exploratory step. To create your own DomainSpeak your testing department will need to communicate regularly with the business owners and developers. Their goal will not be to dictate what words the business owners and developers use, but to learn what words already have meanings and to document these usages. The more your communicate, recognize and document adopted meanings, the more you will discover how, where and why meanings differentiate.

For instance, the business may see a Customer as a User with an active subscription, whereas a microservice might use the words interchangeably as they do not have the concept of a subscription. You will also notice that sometimes situations may give rise to conflicting meanings. A developer may have picked up the word “Client” from a third party API he integrated with for “User,” whereas the business may use “Client” for a specific construct in a billing submodule for “customers of their customer.” In such situations, to avoid confusion and broken stuff, you will need to specify which definition is to be used and possibly introduce a new concept or word to account for the narrowing of the definition. Perhaps the “customers of their customer” will now be a “vendee” instead of a “client.” Don’t dismay if there is not an existing word that accurately matches your concept, you can always create a new word or make a composite word to meet your needs.

Indeed, by being consistent and by distributing your use of language to a wide audience you can introduce new words and shape the meaning of existing words. This means that your tests have to be very close to a formal and structured form of English. This can be accomplished by using Aspect-oriented testing or by creating fluid API wrappers on top of the microservices. Aspect-oriented testing would look like this (cucumber syntax):

Given a User
When the user adds a Subscriptions
Then User is a Customer
Whereas a fluid API would be something like this (C# syntax)
User user1 = UserManager.CreateNewUser();
SubscriptionManager.AddNewSubscriptionFor(user1);
Assert(UserManage.Get(user1).IsCustomer());

This creates a lot of focus on syntactic sugar* and writing code behind the scenes to ensure that your code incorporates your business logic (your test) and looks the way you want it to. Every language has their own way to solve this challenge. Even in C, you could use macros to take a pseudo code and turn it into something that would compile, and chances are that your results would be far superior to that of your current language usage.

For my uses, the cucumber syntax, with a side of extra syntactic sugar that allows me to define variables, is very effective. (I will get into this in more detail another day.) Whichever language you use, keep in mind that the goal of creating a DomainSpeak vocabulary is not to make your code look pretty, but rather to ensure that your code communicates clearly to the business and developers and that meanings are defined with precise and consistent language.

The End Goal is Efficient Quality Assurance

The goal, after all, is to improve productivity and deliver a quality product. Clear communication will not only benefit your team internally, it will also influence other teams. By communicating your results in consistently clear and concise language to a wide audience, you will influence their behavior. You will be able to efficiently respond to questions along the lines of “we use ‘customer’ for all our ‘users.’” You will also be able to easily define and answer where the rule may not hold and why you use the word you use. Again, the goal is not to dictate to folks what words to use, but to explain the meanings of words and to encourage consistent usage. Adoption will follow slowly, and over time usage will become a matter of habit. Over time services will be rewritten and you should be able to delete a few lines from your anti-corruption layer every time one gets rewritten.

*syntactic sugar allows you to do something more esthetically, but not in a necessarily new way. Different ways of saying the same thing. Just looks different. Not important because it’s not new, but significant because it makes things more readable and understandable. Clean up language/clearer code and therefore easier to find a bug.

If you enjoyed this piece please share it with your colleagues! If you have something to add, please join the discussion!