Black Swan

I'm currently reading the book "Black Swan" by Nassim Nicholas Taleb. That led me to the realization that we deal with a lot of Black Swan events in technology.

Most incidents in tech are Black Swans:

  1. the event is a surprise (to the observer)
  2. the event has a major (negative) effect
  3. the event is rationalized by hindsight, as if it could have been expected

Especially point 3 is interesting as postmortems and RCAs often seem to try to "explain" and rationalize events. This leads to action items trying to address the specific situation and contributing factors ("root causes") --- as if we can expect the same situation again soon. Maybe Taleb's advise should be followed more often: avoid trying to predict events which are unpredictable, but instead build robustness against negative events.

Or in other words: MTRS is more important to optimize than MTBF. Failures (Black Swans) are certain to come, but we can't know their probability and how they will look like.