Discover the history of system failures, from the Blue Screen of Death to rare Linux exploits [View this email in your browser]( by Ady Stokes and Rabi'a Brown | [Read online at Ministry of Testing]( Blue screens are only the beginning As much of the world recovers from last Friday's [CrowdStrike / Azure]( [black swan event]( the Ministry of Testing folks thought we'd offer up a few light-hearted accounts of prized devices (and underpowered work laptops) in their death throes or just playing dead. Note: Some of you may never have seen a Windows "blue screen of death," which we'll sometimes call BSoD! Read on for more details. Some of these system failures are [classic Windows blue-screen "Abandon All Hope Ye Who Enter Here" incidents]( that have dismayed Windows users since back before a few of you were born. And some, well, have more to do with heedless end users than anything else (one of the authors of this article raises her hand). By the way, Linux users who are chuckling smugly to yourselves: you are NOT immune from a black swan event of your own. If you haven't heard of the [xz exploit-via-social-engineering saga]( now's your chance to read up. Take one overworked, unpaid engineer maintaining critical code, add at least one shadowy committer / savvy, talented developer who was probably trained in [Le Carré-level spycraft]( and what follows could have been the result. From the linked article: ââ"Their grand scheme was: - sneakily backdoor the release tarballs, but not the source code
- use sockpuppet accounts to convince the various Linux distributions to pull the latest version and package it
- once those distributions shipped it, they could take over any downstream user/company system/etc" If that prospect is too scary to contemplate right now, read on for lighter tales of silicon gone wrong from Ady Stokes and Rabi'a Brown. Birth of the BSoD! According to Wikipedia the first BSoD or Blue Screen of Death occurred in Windows NT 3.1, the first version of Windows NT, in 1993. The screen was designed to inform, not to be a âcrashâ screen. It would come up if there was a DOS (Disk Operating System) error, and it was supposed to display an error message. However, an inherent bug caused random characters to be shown instead: The original Blue Screen of Death Probably the most famous and public BSoD appeared as Bill Gates was presenting Windows 98 to a live television audience. It was an embarrassing but funny moment for him and the company, including a âWhoooooaâ response when it appeared, and [you can watch it on YouTube](. ð TestBash is our annual conference happening in September in Brighton, UK. As we like to say, our network is your network, we'd love for you to join us. We will have 2 days of learning and community as testing professionals share what they know about testing, AI and more! [Explore the TestBash Experience]( Are red and purple the new blue? While many people who work in technology are familiar with blue screens of death, in researching this issue I found there are also red and purple versions. In fact, there are a number of different screens of death. Letâs look at a few. Red screen of death Seen on Windows Vista, whose original project name was "Longhorn," this was seen when a reset failed. The term "red screen of death" has also been used to refer to PlayStation error messages. [Screenshot of a red screen of death error messag] Purple screen of death This purple diagnostic screen occurs when VMWare's VMKernel catches a machine check exception and the system âcannot continue.â [Screenshot of a purple screen of death] Black screen of death From Windows 3 onwards, if the operating system failed to boot up, you'd see a black screen. Boot failure could be caused by a number of issues, only some of which actually generated an error message. This often left the user stranded with a black screen and a flashing cursor. [Screenshot of black screen of death] An example of an error message display with instructions for what to do next. No operating system is perfect Early Apple Macintosh systems used images of "happy" and "sad" Macs. While the operating system was loading, a smiling Mac OS logo appeared. [Happy MacOS screenshot message] But if something went wrong, the sad Mac would be shown. Simpler times indeed. [Sad mac error message] Amazon Web Services (AWS) has failed multiple times! Because of the way software is so interdependent on other services these days, if one part goes down, it can bring many others with it. AWS is no exception. In 2017 a typing mistake when patching a billing problem brought down the cloud and all the services using it. From Apple to storage and payments, several hours were lost before the problem was addressed. December 2021 was not the best month for AWS: there were THREE major outages, affecting many services. From device problems to power cuts, it's estimated that millions of users were affected, calling into question the risk of having so many companies rely on so few providers. Thereâs humour in failure What could be more human than reacting to last week's outage, the reported largest IT outage in history, by making jokes about it? And we made all kinds of jokes, from the simple, âitâs late but the Y2K (year 2000 problem) is finally here,â to people saying they thought their first day at Microsoft or CrowdStrike went well. There was so much comic invention, it seems only fair to highlight some. Small changes have catastrophic effects X user [@itsfoss2 shared a modified video]( of someone removing a blue smartie from an art installation and it all falling down. The person is a CrowdStrike intern and what is falling down is the BSoD. Have a break Incredibly speedy marketing post from KitKat [KitKat screen of death] You canât stop some people X user Leo Skelly said, âMeh. An IT outage ainât gonna stop me from workingâ and showed how he could still write binary code. [Todo 19/7/24 written on lined paper as binary code] If it's not a BSoD, then what is it? There are times when what is happening doesnât fit into your range of knowledge. Turning on my laptop recently to see gibberish on one of my three monitors fit that category for me. Being logical, the first thing I did was check that the cables were all inserted as they should be. They were. Next, I checked my display settings, trying different configurations for my three screens. Two were fine, the other definitely was not. Next came a restart, because who doesnât love turning devices off and on again! No luck. A screen resolution change was my next attempt, also with no luck. Next up came investigating the graphics driver settings and any other settings I could think of. More frustration. Finally, I took the monitor out of the configuration completely, removed the cable, and spent the rest of the morning on two screens. After dinner (donât @ me, dinner is in the middle of the day because I grew up with dinner ladies!) I set up the monitor anew. Bingo, it worked great. For about a week. I finally resorted to trying new equipment, and the least expensive solution was new cables. Fortunately, that finally put the problem to rest. I'm not sure what the moral of the story is: persistence, keep trying. Or, maybe it is just that software and hardware can and always will be weird. [A monitor malfunctioning] Core meltdown, or why the word "laptop" shouldn't be taken literally A few years ago, I "inherited" a lovely, more than adequately powered Dell Latitude from a former employer who didn't seem to care if I returned it. I wiped it with Darik's Boot & Nuke and installed debian with an encrypted hard disk. (More about those later.) A bit later, I took it with me on my "year abroad" outside the US, where it traveled with me from country to country. I didn't have a proper desk of my own anymore, and rarely used coworking spaces or tables at cafés. Instead, I usually ensconced the laptop on my lap, which on cold days was amply covered with blankets. Problem was: the vents were on the bottom of the laptop. You can see where this is going. After a few months of this heedless abuse, the Dell gave up the ghost unceremoniously. When it finally occurred to me to look at the bottom and back edge of the laptop, I saw a smaller-scale, less dramatic version of this: [Image Title: TMI-2 Core End-State Configuration Description: The diagram illustrates the internal structure and damage of the TMI-2 nuclear reactor core. It highlights various components and damage areas as follows: 1A and 2B inlets: Located on opposite sides of the reactor core. Upper grid damage: Indicated at the upper part of the core. Coating of previously-molten material on bypass region interior surfaces: Found in the upper-mid section of the core. Cavity: Located in the central part of the core. Loose core debris: Found in the central cavity. Crust: Located just below the loose core debris. Previously molten material: Present beneath the crust layer. Hole in baffle plate: Shown towards the middle-lower section. Ablated incore instrument guide: Positioned towards the bottom of the core. Lower plenum debris: Accumulated at the very bottom. Possible region depleted in uranium: Shown in the lower section of the core. Each of these labels is indicated with lines pointing to the respective areas within the reactor core structure.] This was what the core of one of the Three Mile Island nuclear reactors looked like after quite a while without cooling water. Had there been a full-on meltdown, we would have had our own Chernobyl in Pennsylvania. I'm lucky I didn't electrocute myself or set myself on fire. [A photo of burnt laptop and keyboard] Look upon my works, ye mighty, and despair. Grubby encryption lockouts: a thick brick wall you can scale I've run Linux at home for nearly two decades, and have worked with many of the major FOSS distributions (distros) like Ubuntu, debian, Arch, and Manjaro. Issues come up sometimes, but a good web search or two generally yields a solution. It helps to work with a user-friendly, well-maintained distro like Manjaro, of course, especially if the user community is active and willing to help. Nearly 10 years ago, I installed debian for the first time on the late lamented Dell (the one that melted). I also decided to encrypt my hard disk for the first time as well, using [Linux Unified Key Setup (LUKS)](. I wasn't yet using 1Password, so I committed that encryption password to muscle memory. (I don't mess around anymore: my latest encryption password is in 1Password.) Muscle memory is one thing, typing too fast is another. The GRUB bootloader emphatically does NOT echo your password by default (or perhaps at all) as you type it. So when you first encounter the cryptomount error below, you're likely to think, "my hard drive got fried." Not: âI mistyped my password.â Enter passphrase for hd0.gpt1(uuid): Attempting to decrypt master key⦠error: access denied error: disk 'cryptouuid/uuid' not found. Entering rescue mode⦠grub rescue > The first few times I encountered this error, I simply rebooted. And magically, after a reboot (and a correctly entered password), the error disappeared. Hurrah! But it kept recurring. I started to suspect that a mistyped encryption password was to blame, but either I couldn't find any relevant information online or I simply didn't search. Finally, YEARS later (I don't want to admit how many), I tested my hypothesis on a new device. Yep, an incorrect password was to blame. A web search quickly revealed how to address the error: three commands in GRUB will do it. grub rescue > cryptomount (hd0,gpt1) // you take the device names from the first line above. Note the comma. Enter passphrase for hd0.gpt1(uuid) Slot 0 opened // this means your disk can now be accessed grub rescue > insmod normal grub rescue > normal // your usual list of OS boot options should appear shortly A green screen of⦠the uncanny Manjaro is by far my favorite Linux distro for reasons I mentioned above: ease of use, ease of finding answers, and a good tradeoff between configurability and working "out-of-the-box." (Arch left me in tears frequently, Ubuntu eats too many resources, and debian isn't terribly well-maintained.) This attractive minimalist desktop greets me upon a successful boot. But the operating system and running software do work themselves into a state sometimes, and I don't yet know why. Sometimes when I boot up, usually after a recent shutdown, I get this alternate-universe desktop image: This seriously scared me at first. I'm running Manjaro on a six-year-old Intel NUC that was in storage for about two years. It's mortal, just like I am. And let's face it: that colour is just EERIE. Is someone trying to tell me something? The first time it happened, I tried a simple logout, for lack of any other ideas. Logging back in, the alien green haze was gone. I still see green every so often, but the trick continues to work. For now. A sparkling screen of death After the Dell tragedy, I was still without a permanent address, and I needed a new laptop for short money that would ship to a location outside the United States. I bought a budget Windows 10 laptop from one of the big companies that make such things. "This time," I said to myself, "I won't be so careless. I'll keep the laptop's vents free of obstructions." I bought a Roost laptop stand, giving those vents plenty of breathing room, and ran the laptop in the stand 99 percent of the time. I hadn't bought a Windows device for myself in several years, and much to my surprise, the user experience was good to great. I didn't run into much bloatware, and most programs ran reliably without slowing down the laptop or crashing. And I saw none of the blue screens of death I remembered so vividly from my pre-2006 Windows devices. Two years passed, I found myself a permanent address at last, and I installed the laptop in its Roost on my desk. Now, I did keep it running most of the time, but I figured the sleep mechanism would take care of any overheating problem. I also left the laptop unplugged and switched on at times to drain the battery a bit. One day this past spring, I logged in as usual. Right away I noticed a strange pattern of sparkling lights on the desktop image. And then⦠nothing. Blank screen. I tried plugging it back in. Nothing: deadwood. I tried leaving it plugged in for a while. Nothing. [sparkling screen of death] I said to myself: dead battery. I did a web search on my options, all of which, given the policies of the laptop's manufacturer, involved me buying a brand new laptop. And that purchase was definitely not within my budget at the time. Oddly enough, a week or so earlier, I had remembered the 2018-era Intel NUC I'd brought from my previous home. And I'd said to myself: how about trying out a new-to-you Linux distro? Within about two hours, Manjaro was chugging away nicely on the NUC, plugged into my living room TV as a display. Thank goodness. I wouldn't need to buy a new laptop after all. I'm typing this on the Manjaro box right now. However, even the NUC seems to run "hot" even though it's adequately ventilated. So I've taken to shutting it down and leaving it that way unless I absolutely need to type a lot of material quickly. I've opened the tiny box to see if I can replace components, but I found out quickly I'd need doll-sized hands to do so. So I'm enjoying Manjaro on my "old" NUC for now. Until the next⦠mysterious ⦠incident, at least. For more information - [10 historical software bugs with extreme consequences]( Solarwinds Pingdom blog
- [Black Swan Events and Their Impact on Investments]( Brian J. Bloch, Investopedia
- [Backgrounder on the Three Mile Island Accident]( United States Nuclear Regulatory Commission
- [Blue screen of death Wikipedia]( Learn More with Ministry of Testing - [Crowdstrike Mass Global IT Outage]( - AJ Wilson "I found a community in the Ministry of Testing, a form of belonging which helped me grow personally and in my career." â Kim Knup [Upgrade your software testing career with Ministry of Testing]( [Website]( [LinkedIn]( [YouTube]( [Twitter]( [Instagram]( Copyright © 2024 Ministry of Testing, All rights reserved.
You have opted to join this email list. Our mailing address is: Ministry of Testing 19 New RoadBrighton, East Sussex BN1 1UF
United Kingdom
[Add us to your address book]( Want to change how you receive these emails?
You can [update your preferences]( or [unsubscribe from this list](.