Benjamin Sago / ogham / cairnrefinery / etc…

Lessons learned when my SSD died

So there I was, happily typing away, when my screen turned black and was replaced with this confused-looking flashing icon:

A folder with a question mark in it, which is the Apple symbol for “Your SSD died”.

To their credit, Apple Support quickly took my computer in, diagnosed the issue, and returned it back to me in just over a week — not bad. To their discredit, they haven’t figured out which parts of a computer you’re not supposed to make inseparable:

The boot issue was confirmed to be caused by a hardware fault with the machine’s Solid State Drive which will require replacement of the Logic Board to address the issue.

While it was going through the process, I wasn’t too worried, because I knew I had a backup. And here’s what went right: I still have all my files, I’m still in control of all my internet accounts, and I didn’t tear my hair out during the recovery process. I back up all my files to a remote server, and while the (long, auto-generated) server password lives on my machine, I had a paper copy hidden in my house that finally proved useful.

So what went wrong?

Something’s got to go wrong, right? When you read data recovery stories, they go in one of two directions: either everything goes smoothly without a hitch, or there’s no backup or it’s corrupted or something and everything you’ve ever worked on is lost. When I saw the flashing icon, I expected one of these two things to happen. Either my backup procedure worked, or I’d have to start from scratch.

Instead, it… kind of worked? Here’s a list of everything that went wrong:

  • Because I was only backing up certain folders, I lost a lot of ‘transient’ files, like in ~/Desktop or ~/Downloads. Most of that stuff probably isn’t important, but I’d still like to know what it was.

  • Also missing from the backup were all my SSH keys. Whoops!

  • While restoring, I messed up one of the options, and all my 755-permissioned executable files became non-executable 644.

  • Despite only backing up certain folders, I wasn’t ignoring compiler output and node_modules folders, making the recovery process take three times as long as it downloaded hundreds of tiny files.

  • It happened while I was at work, and I didn’t have the adapter needed to plug my USB flash drive into this USB-C-only computer.

  • I panicked and disabled 2-Factor Authentication for all the accounts I happened to have open on my work machine at the time.

So, not great, but not a disaster either. Even though it could have gone a lot worse, the overall problem was that I only thought about the upload, rather than the download.

I thought that just because my backup script ran without error each time, I’d be fine in an emergency. I never considered that running it once a day was only half the battle!

Of course, it’ll happen again

Life is nothing without learning. What should I do for next time?

  • I’m just going to find the ten or so largest space sinks (~/Library/Caches, node_modules), ignore them, and back up all the rest. It’s just not worth being economical about this!

  • The script now runs exa ~/Desktop ~/Downloads --long --tree and backs up the result so I at least know what’s in those folders.

  • I got one of those cheap Android phones. It’s got my e-mail password and is signed in to all of my accounts, and is locked in my drawer so even if I lose everything else I still have that one.

  • USB-C-to-A converter now lives in my bag.

I don’t think I’ll be able to test the entire process until I’m forced to, the next time a hard drive gives up the ghost. But making all these mistakes means I’ll hopefully remember them then.