3 Risks to Every Team’s Progress and How to Mitigate

When looking at improving performance the first thought is often to increase the size of our development team; however, a larger group is not necessarily the only or the best solution. In this piece, I suggest several reasons to keep teams small and why to stop them from getting too tiny. I also look at several types of risk to consider when looking at team size: how team size effects communication, and the possibility of individual risk and systematic risk.

Optimal Team Size for Performance

The question of optimal team size is a perpetual debate in software organizations. To adjust, grow and develop different products we must rely on various sizes and makeups of teams.

We often assume that fewer people get less done, which results in the decision of adding people to our teams so that we can get more done. Unfortunately, this solution often has unintended consequences and unforeseen risks.

When deciding how big of a team to use, we must take into consideration several different aspects and challenges of team size. The most obvious and yet most often overlooked is communication.

Risk #1: Communication Costs Follow Geometric Growth

The main reason against big teams is communication. Adding team members results in a geometric growth of communication patterns and problems. This increase in communication pathways is easiest illustrated by a visual representation of team members and communication paths. 

Geometric Growth of Communication Paths

Bigger teams increase the likelihood that we will have a communication breakdown.

From the standpoint of improving communication, one solution that we commonly see is the creation of microservices to reduce complexity and decrease the need for constant communication between teams. Unfortunately, the use of microservices and distributed teams is not a “one size fits all” solution, as I discuss in my blog post on Navigating Babylon.

Ultimately, when it comes to improving performance, keep in mind that bigger is not necessarily better. 

Risk #2: Individual Risk & Fragility

Now a larger team seems like it would be less fragile because after all, a bigger team should be able to handle one member winning the lottery and walking out the door pretty well. This assumption is partially correct, but lottery tickets are usually individual risks (unless people pool tickets, something I have seen in a few companies).

When deciding how small to keep your team, make sure that you build in consideration for individual risk and be prepared to handle the loss of a team member.

Ideally, we want to have the smallest team as is possible while limiting our exposure to any risk tied to an individual. Unfortunately, fewer people tend to be able to get less work done than more people (leaving skill out of it for now).

Risk #3: Systematic Risk & Fragility

Systematic risk relates to events that will affect multiple people in the team. Fragility is the concept of how well structure or system can handle hardship (or changes in general). Systemic risks are aspects shared across the organization, this can be leadership, shared space, or shared resources.

Let’s look at some examples:

  • Someone brings the flu to a team meeting.
  • A manager/project manager/architect has surprise medical leave.
  • An affair between two coworkers turns sour.

All of these events can grind progress to a halt for a week (or much more). Events that impact morale can be incredibly damaging as lousy morale can be quite infectious.

In the Netherlands, we have the concept of a Baaldag (roughly translated as an irritable day) where team members limit their exposure to others when they know they won’t interact well. In the US with the stringent sick/holiday limits, this is rare.

Solutions to Mitigate Risk 

Now there are productive ways to minimize risk and improve communication. One way to do this is by carefully looking at your structure and goals and building an appropriate team size while taking additional actions to mitigate risk. Another effective technique for risk mitigation is through training. You shouldn’t be surprised, however, that my preferred method to minimize risk is by developing frameworks and using tests that are readable by anyone on your team.

Why Negligence in Software should be of Urgent Concern to You

The future of Liability in Software:

Many things set software companies apart from other businesses. A significant, but an often overlooked difference, is that the manufacturers of software exhibit little fear of getting sued for negligence, including defects or misuse of their software. For the moment, the legal consequences of defective software remain marginal.

After more than a decade, even efforts to reform the Uniform Commercial Code (UCC) to address particularities of software licensing and software liabilities remain frozen in time. As Jane Chong discusses in We Need Strict Laws, the courts consistently rule in favor of software corporations over consumers, due to the nature of contract law, tort law, and economic loss doctrine. In general, there is little consensus regarding the question: should software developers be liable for their code?

Slippery When Wet

If you go to your local hardware store, you’ll find warning signs on the exterior of an empty bucket. Look around your house or office, and you see warning labels on everything from wet-floors to microwaves and mattresses.

In software, if you are lucky you might find an EULA buried somewhere behind a link on some other page. Most products have a user manual, why is it not enough to print your “warning” inside? An easy to find, easy to implement as a standard.

Legal issues in software development: why is there no fear?

Fear is socially moderated and generated. Hence the term “mass hysteria.” We fear things that we’ve already experienced or that have happened to personal connections. We all too easily imagine horrible events that have befallen others. In an age of multimedia, this is a rather flawed system that has gone berserk or as they say “viral.” What we see has left us with some pretty weak estimates on the odds of events, like shark attacks or kidnappings.

One reason we don’t fear lawsuits around software is that we don’t see them in the public sphere. They do happen, but all too often the cases never make it very far. Or the judge rules in favor of the software developer. Interpretation of the laws makes it difficult to prove or attribute harm to a customer.

To date, we’ve yet to see a Twittergate on negligence in software development. This lack of noise doesn’t mean that no one has reasons to sue. Instead, it is more of an indicator that for the moment, the law and the precedent are not written to guide software litigation. And with little news or discussion in the public sphere, no one has the “fear.”

Yet.

A Matter of Time

Frankly, it is not a matter of will it happen, but when? What is the likelihood that in the next five years there will be successful suits brought against software firms? Should we make a bet?

Software development is an odd industry. We leave an incredible electronic trail of searchable data for everything that we do. We track defects, check-ins, test reports, audit trails, and logs. And then we back up all of them. Quite often these records last forever. Or at least as long, if not longer than the company that created the record.

Even when we change from one source control system to another, we try to make sure that we keep a detailed record intact just in case we want to go back in time.

This level of record keeping is impressive. The safety it provides and the associated forensic capabilities can be incredibly useful. So what is the catch? There is a potential for this unofficial policy of infinite data retention to backfire.

Setting the Standard

Most companies follow standard document retention policies that ensure businesses save both communications and artifacts to meet fiscal or regulatory requirements for a required period then eventually purged after some years.

Ironically, even in companies who follow conservative document retention policies, the source control, and bug tracking system is often completely overlooked if not flat out ignored. From a development perspective, this makes sense: data storage isn’t expensive, so why not keep it all?

The Catch & The Cause

The reason that document retention policies exist is not merely to keep companies organized and the IRS happy, it’s because of the potential for expensive lawsuits. Companies want to make sure that they can answer categorically why certain documents do or do not exist.

For instance, let’s say your company makes widgets and tests these before shipping them on; you don’t want to say that “we don’t know why we don’t have those test results.” By creating a documented process around the destruction of data (and following it) you can instead point to the document and say the data does not exist — it’s been destroyed according to “the policy.”

Policy? What Policy?

This example takes us back to the story of liability in software. In the software business we often keep data forever, but then we also delete data in inconsistent bursts. Maybe we need to change servers, or we are running out of space, or we find random backups floating around on tapes in vaults. So we delete it or shred it or decide to move to the Cloud to reduce the cost of storage.

This type of data doesn’t tend to carry any specific policy or instructions for what we should and shouldn’t include, or how best to store the data, and so on.

What’s more, when we document our code and record check-ins, we don’t really think about our audience. Or default audience is likely ourselves or the person(s) in the cube next to us. And our communication skills on a Friday night after a 60-hour-week won’t result in the finest or most coherent checking comments, especially if our audience ends up being someone besides our cubemate.

The reason that this type of mediocre record keeping persists is that it remains difficult to sue over software. There really are no clear-cut ways to bring a suit for defective services provided. If you live in the US you know the fear of lawsuits over slips and falls; this fear does not exist for creating a website and accepting user data.

Walking a Thin Line

My guess is that this indifference to record keeping and data retention will persist as long as potential suitors must do all the groundwork before seeing any money. And, as long as judges continue to side with corporations and leave plaintiffs in the lurch. However, as soon as that first case sneaks through the legal system and sets a precedent, anyone and everyone just may reference that case.

Ironically, patents or copyright protection don’t travel with theories presented in a trial, which means that once a case makes it through the system, the case only needs to be referenced. Suggesting that if one lawyer learns how to sue us; they all do. Think of it as an open source library you can reference, once it exists anyone gets to use it.

I expect that there will be a gold rush, we are just waiting for the first prospector to come to town with a baggy of nuggets.

As to what companies can do? For now, create an inventory of what data you keep and how it compares to any existing policies. This may involve sitting down in a meeting that will be an awkward mix of suits and shorts where there likely will be a slide titled “What is source control?” There is no right answer, and this is something for every company to decide for themselves.

Where does your development process leave a data trail? Has your company had discussions about document retention and source control?

prepare for failure, you will make mistakes

I made a mistake. Now what?

In the United States, we often find our motivation to achieve fueled by a culture that rewards success, while commonly denying the realities of failure. As a culture, we feel terrified to fail, despite the reality that true long-term success is often built upon the ability to recognize our failures and to learn from our mistakes.
Read more

Predicting the Future: The Big Data Quandary

Predicting the Future: the Big Data Quandary

The Role of Testing in Indeterminate Systems: should the humans be held accountable?

Big data is a hot topic that introduces us to incredible possibility and potentially terrible consequences.

Big data essentially means that engineers can harness and analyze traditionally unwieldy quantities of data and then create models that predict the future. This is significant for a variety of reasons, but primarily because accurate prediction of the future is worth a lot of money and it has the potential to have an effect on the lives of everyday citizens.

Good Business

On one level big data allows us to essentially reinvent the future by allowing software to encourage individuals to do something new that they as of yet have not considered (and might never have), such as recommendations for viewing on Netflix or buying on Amazon. Big Data can also provide daily efficiencies in the little things that make life better by saving us time or facilitating decision making. For businesses, Big Data can give deeper meaning to credit scores, validate mortgage rates, or guide an airline as to how much they should overbook their planes or vary their fares.

Efficient

Optimizing algorithms based on data are even more powerful when we consider that their effectiveness is reliably better than actual humans attempting to make the same types of decisions and predictions. In addition to the computing power of big data, one advantage algorithms have over human predictions is that they are efficient. Algorithms do not get sidelined or distracted by bias and so avoid getting hung up by ethical judgments that humans are required to consider, either by law or by social code. Unfortunately, this doesn’t mean that algorithms won’t avoid making predictions that present ethical consequences.

Should algorithms be held to the same moral standards as people?

Optimization algorithms look for correlations and any correlation that improves a prediction may be used. Some correlations will inevitably incorporate gender, race, age, orientation, geographic location or proxies for those values. Variables of this sort are understandably subject to ethical considerations and this is where the science of big data gets awkward. A system that looks at a user’s purchasing history might end up associating a large weight (significance) to certain merchants. And those merchants might then happen to be hair care providers, which then means that there is a good chance that the algorithm has found an efficient proxy for race or gender. Similarly, the identification of certain types of specialty grocers or specialty stores, personal care vendors or clothing stores might reveal other potentially delicate relationships.

As these rules are buried deep inside a database it is hard to determine when the algorithms have managed to build a racist, sexist, or anything-ist system. To be fair, neither the system nor its developers or even the business analyst in charge of the project, had to make any conscious effort for an algorithm to identify and use these types of correlations. As a society, we implicitly know that many of these patterns exist. We know that women get paid less than men for the same work; we know that women and men have different shopping behaviours when it comes to clothing; we know that the incarceration rate for minorities is higher; and we know that there will be differences in shopping behaviors between different populations based on age, race, sex and so on.

Can algorithms be held to the same moral standards as their human developers or should the developers be held responsible for the outcomes? If the answer to either or both of these questions is “yes,” then how can this be achieved both effectively and ethically? When ethically questionable patterns are identified by an algorithm, we need to establish an appropriate response. For example, would it be acceptable predict a lower acceptable salary to a female candidate than a male candidate, when the software has determined that the female candidate will still accept the job at the lower rate? Even if the software did not know about gender, it may determine it based on any number of proxies for gender. One could argue that offering the lower salary isn’t a human judgment, it is simply following sound business logic as determined by the data. Despite the “logic” behind the data (and the fact that business makes this kind of decision all the time), hopefully, your moral answer to the question is still “no, it is not okay to offer the female candidate a lower suggested salary.”

Immoral or Amoral: What is a moral being to do?

If we label the behavior of the algorithm in human terms, we’d say that it was acting immorally; however, the algorithm is actually amoral, it does not comprehend morality. To comprehend and participate in morality (for now) we need to be human. If we use big data, the software will find patterns, and some of these patterns will present ethical dilemmas and the potential for abuse. We know that even if we withhold certain types of information from the system (such as avoiding direct input of information like race, gender, age, etc.) the system may still find proxies for that data. And, the resulting answers to that data may continue to create ethical conflicts.

Testing for Moral Conflicts

There are ways to test the software and determine if it is developing gender or race biases with controlled data. For instance, we could get create simulations of individuals that we see as equivalent for the purpose of a question and then test to see how the software evaluates them as individuals. Take the instance of two job candidates with the same general data package but vary one segment of the data, say spending habits. We then look at the results and see how the software treated the variance in the data. If we see that the software is developing an undesired sensitivity for certain data we can go to the drawing board make an adjustment, such as removing that data from the model.

In the end, an amoral system will find the optimal short-term solution; however, as history has shown, despite humanity’s tendency towards the occasional atrocities, we are moral critters. Indeed, modern laws, rules, and regulations generally exist, because as a society we see that the benefits of morality outweigh the costs. Another way to look at the issue is to consider that for the same reasons we teach morality to our children, sooner or later we will likely also have to impart morality into our software. We can teach software to operate according to rules of morality and we can also test for compliance, thereby ensuring that our software abides by society’s rules.

Responsibility: So why should you care?

Whose responsibility is it (or will it be) to make sure this happens? My prediction is that given the aforementioned conundrum, in the near future (at most the next decade) we will see the appearance of a legal requirement to verify that an algorithm is acting and will continue to act morally. And, of course, it is highly probable that this type of quality assurance will be handed to testing and QA. It will be our responsibility to verify the morality of indeterminate algorithms. So for those of us working in QA, it pays to be prepared. And for everyone else, it pays to be aware. Never assume that a natively amoral piece of technology will be ethical, verify that it is.

If you enjoyed this piece please discuss and share!

Click here to learn more about Bas and Possum Labs.

A call for a "FAA" of Software Development

Mayday! Bad UX in Airplanes

Software development teams constantly learn lessons. Some of these lessons are more significant than others.
Due to the fact that there is not a universal method or process for sharing lessons learned, many lessons are learned in parallel within different companies. Many times different teams in our very own companies make the same mistakes over and over again, simply because there is not a shared repository of knowledge and experience.

Even when a developer and or quality assurance professional attempts to research when, where and why things have gone wrong, it is very difficult to find documented and pertinent information. 

These unnecessary mistakes comprise avoidable expenses to both consumers and companies, and should at a certain price point, especially a public price point, make it very useful to have a public method for accessing “lessons learned.”

Not just a report of the problematic lines of code, but inclusive of an analysis of the effects of that code (who, what, when, where, why and, how).

What’s more, in addition to the time and financial costs of problematic code, there is also risk and liability to consider. From privacy to financial health to business wealth, the risk is great enough that I propose the creation of an organization, similar to the FAA, for documenting, reporting and making software “travel” safer.

There are many examples of bad UX to be found. Just for fun, let’s look at some real life examples of lessons learned in software code in regards to Shareholder Value and Liability in Airline Travel.

Mayday: Bad UX in Airplanes

As often happens in life, not so long ago I decided I needed a change of pace in my evening routine and one way or another and I stumbled upon Mayday a show about air crash investigations. My natural human curiosity into bad things that happen to other people caught me at first, but after a few episodes, the in-depth analysis of all the various factors that cause airplane crashes really caught my attention. As a testing and QA expert, I found it disheartening to see the frequency with which bad UX is a contributing factor to airplane crashes. In most cases, the bad UX is not the instigating cause (phew!), but the stories in the show make it evident that bad UX can easily make a bad situation worse.

Scenario #1: Meaningful Warnings

For instance, on a Ground Control display, there is a 7 character field next to each aircraft indicating its expected altitude in comparison to its published height (theoretically actual altitude). In the case of one episode, an airplane intended to be at an altitude of 36o reported flying at 370, as indicated by the display which read “360-370.”

If the broadcast stopped the display would be “360Z370.” This would indicate a final broadcast of 370 versus an expected broadcast of 360. If the broadcast stopped with this discrepancy shown the display did not set off an alarm or even display a color change, just the character “Z” in the middle of a 7 digit string that implies that half of the rest of the numbers is garbage.

This piece of information on its own is not terribly exciting nor is it something that could on its own cause an aircraft to go down. Furthermore, there is not much of a reason for a system tracking altitude to simply go off.

A bad UX design process uncovered by “hidden” warnings

That is, of course, unless the button to activate or deactivate the system is placed behind the pilot’s footrest. And the display for the error message is then placed next to the button; presumably down below the foot.

No audible alarm, no flashing lights, nothing else of note to catch a pilot’s attention. In this scenario (based on a true story) the system can easily be turned off accidentally and without warning. Then just add to the mix another plane flying in the same vicinity and the unfortunate result is many dead people spread out over the jungle. The resulting system update is the addition of an audible alarm if and when the system is switched off.

Scenario #2: How to Handle Bugs

Another episode profiles an airplane crash precipitated by an airspeed sensor that bugged. As in a bug appears to have built a nest in the sensor. In this situation, the system created contradictory warnings, while also leaving out expected warnings.

For instance, the plane went from warning of excess speed to stall warnings. The inconsistencies sadly managed to confuse the pilots into their deaths.

Now it is required (standard) flight training to include how to respond when the cockpit says: “Hey! We are receiving conflicting airspeeds!”

Scenario #3: Half-Full vs. Empty

Another show profiles a UX design process failure that came about on the maintenance side. Somehow, two similar looking modules for gauging fuel reserves came into use for similar, but different models of an airplane.

Initially, it appeared that the gauges could be installed and worked interchangeably, even going so far as to updating readings. The problem is that the readings would be a bit off — well, let’s call it like it is — completely off.

If you put the incorrect model in the wrong model of plane, the gauge will read half full, when the tank is actually empty. An easy fix turned out to be putting a key into the socket making it so the two gauges are no longer interchangeable. Proof that good UX is not just about design, but about also tracking lessons learned.

Of course, the fix, unfortunately, did not get implemented until a plane crashed into the sea before reaching land (and an airport).

Documenting Problems and Solutions

When planes go down there is a loss of human life and great financial expense. This means that all of these issues have been addressed and fixed, and they likely won’t happen again. Documentation and prevention are one of many reasons that airplanes really don’t go down very often these days. Or at least in they don’t go down very often in the markets where shareholder interest and liability make downed craft unacceptable. And, significant investment is made in the UX design process.

From my perspective, the most interesting aspect of the show Mayday is that it highlights many small UX problems discovered only because people died. The death of a person is something that has a near infinite cost associated with it in Western civilization and therefore causes a very large and detailed process to get down to root causes and implement changes. Especially when it is the death of 100 people at once. Mayday is a great show for learning how to analyze problems, while also giving viewers a deep appreciation for good UX design.

Looking at the small issues that are the root causes of these airline crashes and the unremarkable UX changes made to prevent them; it really drives home the number of small UX errors that cause small non-life threatening losses that take place every day due to lousy UX. Adding up all of these small losses might actually result in a quite significant financial cost for the various businesses involved.

And although the majority of them don’t directly cause the loss of life, life might be better (safer, less stressful, more financially secure, etc.) if these small UX design flaws could be reliably flagged and a system put in place to prevent them from recurring.

Standard tracking and reporting of software failures

This brings us back to my statement at the beginning of this piece regarding the need for a body or a system to track and report lessons learned. In airline travel in the USA, we have the Federal Aviation Administration (FAA) to make sure that airline travel is safe.

The purpose of the FAA is the Following: The Federal Aviation Administration (FAA) is the agency of the United States Department of Transportation responsible for the regulation and oversight of civil aviation within the U.S., as well as operation and development of the National Airspace System. Its primary mission is to ensure the safety of civil aviation.

Now imagine, we had a Federal Software Administration, whose primary mission is to ensure the safety and reliability of software? What if we held ourselves accountable to report and to document not only when bad UX precipitated an airplane crash, but also when people would be held to a reporting standard for all kinds of software defects that cause significant loss?

Software Asset Managment (SAM) already exists as a business practice within some organizations, but not in enough. And there is still not any central organization to pull the information documented by business with successful SAM practices.

In 2016, the Software Fail Watch identified over a billion dollars in losses just from software failures mentioned in English-language news sources and they estimate that this is only a “scratch on the surface” of actual software failures worldwide. There is much to be debated here, but if a US based or even an international agency simply started to record failures and their causes, without bothering to get into the official business of issuing of guidelines, the simple acts of investigation and reporting could create an opportunity for significant and widespread improvements in design.

I think we can all agree that our industry could greatly benefit from the creation of an efficient and productive repository for the sharing of lessons learned. The strategy of sharing lessons learned, lessons that are often relegated to oral histories shared between developers would greatly benefit our industry.

Companies may not initially be motivated to report details of their problems, however, the long-term benefits would surely outweigh any perceived negative costs. As an industry, software development can only benefit from sharing knowledge and lessons learned.

Group intelligence is known to exceed that of individual members, and in a world of increasingly unanticipated scenarios and risks, we need able to effectively anticipate and solve challenges around the security and uptime challenges faced by many companies. As an industry, perhaps instead of fearing judgment, we can instead focus on the benefits of embracing our imperfect nature, while facilitating and co-creating a more efficient and productive future.

As an industry, perhaps instead of fearing judgment, we can instead focus on the benefits of embracing our imperfect nature, while facilitating and co-creating a more efficient and productive future.

If you enjoyed this piece, please share it. 

If you have something to say, please join the discussion!