This is a curated list of news related with serious system incidents.
Nowadays, I hear more news related with system crashes that last hours (even days). These crashes sometimes effect millions of users and many apps. To purpose is to learn from their problem to build more reliable systems that enpowers community. Maybe, this will turn to a book that covers the history of system failures and outages.
Table of contents generated with markdown-toc
By mistake production database is deleted and after nearly 19 hours hard work broadcasted live on YouTube (yes, they live streamed all system recovery effort) system back online with the lost of six hours database data. Read details
Instapaper hits file size limit of ext3 file system with its huge 2TB database. System starts to reject saving new articles. Until moving database to another instance, users suffered to reach saved articles. Read details
- Feb 28, 2017 S3 crash
- Add table of contents.
- Search historical incidents like water, electricity, etc.
- Read this wiki: List of major power outages
- Look for the details of Akbank outages
- 2012, Netflix Christmas Eve Outage
- There is a collection of outages
- Office365 outage
- Github incident
- Facebook, Instagram and Messenger Down
- Cloudflare outage
Contributions to the list is welcomed and encouraged. Embrace the system incidents ;). You can follow this guide.
Content is licensed under Creative Commons Attribution 4.0 License.