Keep track of IT complexity – as you do with black swans and fat tails

If we want to increase the quality in IT deliveries, we must become better at understanding the complexity and handle our IT system as complex systems, something that is rarely done today.
This is part one in the series on complexity.

Already at the end of the 60s, people began to discuss the so called “software crisis” with problems such as unreliable software, growing maintenance costs, under-performing adaptability and deliveries that didn’t stay on budget. These problems are all too familiar even today. A number of approaches and methods have been applied to improve the situation such as these without meaningfully changing the rate of successful IT deliveries. A recurring problem is the complexity. New improved work approaches that are applied to improve deliveries are not able to compensate for the increased complexity that comes from the technical development that makes possible ever more advanced systems. Often, delivery problems, failures in security or functionality are directly connection just to complexity.

Complexity
To understand how complex problems can be handled, we must look more comprehensively at what complexity is. A common model to describe and divide up different types of problems is the Cynefin-framework:

The framework divides up problems or situations in five categories: Complex, complicated, chaotic, simple and disordered (middle). The framework describes certain characteristics for respective categories and the method of attack that can be used to handle the problem or situation. Based on this model we can think about how we work with IT systems and realise that we often work with complex systems in an incorrect way. Or said more properly, we do not work with complexity in general.

Complicated systems
According to the Cynefin framework, complicated problems are problems with clear rules and limitations and simple dependencies. It is possible to understand causal relationships and events are predictable. These problems (together with the simple ones) are also called linear, since there is a linear relationship between cause and effect. The best way to attack or overcome problem is through analysis. This is a classic engineering question and within IT and systems theory, it is almost exclusively within these problem domains we work. We have well developed methods to set requirements, analyse, test and develop according to this principle. The problem is that we all too often don’t work with complicated systems but with complex systems. This is very important to understand since to a large degree this explains why IT projects with very comprehensive requirement specifications, analysis phases and implementation projects can still fail to deliver what was expected.

Complex systems
What are complex systems then? There is no unambiguous definition of complex systems and the research in the subject spreads widely but complex systems are often described as systems that consist of many different interacting components. The greatest amount of interactions and possible outcomes make it impossible to predict the system’s behaviour to fully model it.
It is easy to see that ever more IT systems are just complex systems only because of their size and number of dependencies which cannot be fully analysed. It is also easy to see this through the number of unexpected problems, incidents or for the share of possibilities the complex systems create. Another indication for complex systems, is that regardless how much analytic work in preparation we do to secure a system, it isn’t enough. To work based on a linear perspective is good for linear problems, but for non-linear problems it is completely wrong. Unexpected incidents will occur.
This leads to two characteristics with complex systems that are central to understand how we need to work with them. Black Swan Events and Fat Tails.

Unlikely incidents
Black Swan events are unlikely incidents with large consequences. For example, to work with time-optimization of a certain process in a factory is normal work that all manufacturers need to work with. But compared to an incident where the whole factory burns down, the work is largely irrelevant. For complex systems, it is evident that these incidents are more common than you think and by definition they are impossible to predict. Working with risk management for complex systems is therefore unusually difficult since it isn’t actually possible to estimate which problem could occur or how serious they can become.

Characteristics of complex systems
Fat Tails is a characteristic of complex systems that says that it is much more probable for unlikely incidents than what we believed, count on, model, prove or design for. This is a direct consequence of many interacting components of complex systems that make the number of outcomes or the status in the system is much larger than what can be analysed. The term comes from how a distribution curve of incidents looks for complex systems where incidents that are at the ends of the curve are more common than expected and the end of the curve is thicker than normal. That is, this is not a normal distribution.

Nuclear plants – break more often than we think
A clear and maybe a little controversial example of this is nuclear plants. Sometimes you hear from manufacturers or researchers that the risk of an accident may be, for example, one in a million. But nuclear plants are complex systems and the model that calculated the probability cannot take into account everything that can happen, for example a large tsunami (Fukushima) or several mutually independent error sources (Chernobyl). By comparing the number of nuclear plants in operation and the number of nuclear plant accidents that have happened, you can easily understand that the models for risk are incorrect by many factors of ten. The number of unpredicted incidents is much larger than you believe and there are Black Swan incidents at regular intervals that no one thought could occur. These are basic characteristics of complex systems.

The same applies for complex IT systems. We must begin to think about complexity when we develop systems. The methods we traditionally make use of are often meant to manage complicated systems with linear relationships. Systems were, in advance, we can specify functionality, model behaviour, record all dependencies or create the test cases that are needed to capture all errors. This does not work for complex systems since change and unexpected incidents need to be taken into consideration.

Johannes Jansson, consultant and associate partner, Tagore

PS: The next blog entry is about how we manage complexity in practice.