Joyent and Adobe have the outages for the month of May.
Joyent post their post-mortem.
Adobe posts an apology.
Both of these outages occurred during maintenance where someone type a command that did what it was supposed to do impacting a service. The human perception problem is the person who typed the command could not see/perceive the system wide impact of their command.
Adobe doesn’t provide any details on what they plan to do. Joyent does.
You can imagine the efforts people will go through to create safeguards to eliminate the possibility of this type of outage. Unfortunately, this will also most likely put a burden on day to day operations.
Another way to solve the problem is to give people the ability to see the impact of their actions. No one in their right mind would execute a command to reboot all the servers at Joyent. And no one in their right mind would delete a directory with all user records.