The O2 outage, how task automation could help

Andy Harris

We learn from here that the root cause of the O2 outage was expired certificates on SGSN-MME software from Ericsson.
Ericsson and O2 engineers would have been flat out either updating certificates or software. They would have needed to perform a risk analysis on moving to the new software and how that would work with the various versions of hardware that O2 have installed.
Once the fix recipe was determined it would be a case of rolling it out to all affected units. Doing this manually will always introduce some human error. We’ll also hope that not all the units had the same credentials (although this would make the job easier – it’s not good security practice!)
At these moments, the core value of Privileged Task Automation is that you can delegate recipe as a task to other people. Therefore in a crisis, you’ll get more done, faster with reduced human error. For example, by building a task that checks the software version before updating you can reduce the risk in updating unnecessary nodes.