A brush with death?


kode54

Recommended Posts

I just had an incredible experience. I received a notification from my Tower that my disk 1 was triggering read errors. It reported that it could not read the SMART data without spinning the drive up. The array stats were reporting some 140 trillion read errors for that drive.

 

So I took the array offline, and powered the machine off. I attempted to coax some life out of the drive using its power cables, to feel for some sign of life. It didn't even spin up. So I removed it, and chucked it aside.

 

Then I proceeded to power the machine up, and initiated New Config, retaining the cache pool settings. Then I noticed: One of my SSDs was missing as well.

 

I opened the case again, remembering that SSD shared its power cable with the drive I just mercilessly chucked aside onto the carpeted floor. I traced it through a haphazard attempt at cable routing I made when I first installed that power cable, to find that the power cable had become disconnected from the power supply.

 

Whoops.

 

So, as coolly as I removed the drive and set things aside, I put it back in, adjusted the power cables into a more sensible configuration, closed it all up, and booted it once again.

 

Upon booting with array set to not auto start, I popped into maintenance mode, mounted the drive to replay its journal, unmounted, checked it, came back all clean. I also ran a btrfsck on the cache pool, since half of that was disconnected while the VM inside was running. No problems there, either.

 

Live and learn. Maybe now I'll set aside some money for a 4TB or 5TB parity drive, before something really does die.

Link to comment

Interesting experience  :)

 

I'm surprised you didn't check your cables when opened the case -- especially since the drive wasn't spinning.

 

If I understand correctly, it had not "disabled" the drive (since no writes had been attempted) ... so simply reconnecting the cable would have resolved everything with no further action needed.

 

This is one of the disadvantages of modular power supplies => in electronics EVERY connection is a potential point of failure ... as you've just learned firsthand.

 

Link to comment

If I understand correctly, it had not "disabled" the drive (since no writes had been attempted) ... so simply reconnecting the cable would have resolved everything with no further action needed.

Only, I have hotplug disabled for my SATA ports in the BIOS settings, so I don't think it would have remounted.

Link to comment

If I understand correctly, it had not "disabled" the drive (since no writes had been attempted) ... so simply reconnecting the cable would have resolved everything with no further action needed.

Only, I have hotplug disabled for my SATA ports in the BIOS settings, so I don't think it would have remounted.

 

Possibly.  But I'd think if you'd shut down; fixed the cables; then rebooted you'd have been fine.    At the worst case, a New Config without changing ANY of the assignments, and checking the "parity is already valid" box, would have likely done the trick.

 

Link to comment

... I'd think if you'd shut down; fixed the cables; then rebooted you'd have been fine.    At the worst case, a New Config without changing ANY of the assignments, and checking the "parity is already valid" box, would have likely done the trick.

I interpreted this

Live and learn. Maybe now I'll set aside some money for a 4TB or 5TB parity drive, before something really does die.

to mean he is running without parity.
Link to comment

... to mean he is running without parity.

 

Actually I hadn't even considered that.  It amazes me that anyone would build a fault-tolerant NAS system and not bother with the fault tolerance  :)

 

Especially with so little storage in the system -- I'd just pop a single large drive in my primary desktop if all I wanted was a few TB of storage without parity.

 

Link to comment

It amazes me that anyone would build a fault-tolerant NAS system and not bother with the fault tolerance  :)

...

No Parity? Eggadddsss

 

...I myself haven't yet assigned a parity drive on my secondary unraid box (which backs up my primary)...  So I don't consider that as amazing.  However, my use-case might be on the fringe.

Link to comment

It amazes me that anyone would build a fault-tolerant NAS system and not bother with the fault tolerance  :)

...

No Parity? Eggadddsss

 

...I myself haven't yet assigned a parity drive on my secondary unraid box (which backs up my primary)...  So I don't consider that as amazing.  However, my use-case might be on the fringe.

 

Depends on just how much storage you have.  If you have a small server with only a few TB of space, and don't need to parity protect it, it's far less expensive to just buy a single external drive.  e.g. an 8TB Seagate external drive is $189.99  [ http://www.newegg.com/Product/Product.aspx?Item=N82E16822178951&cm_re=8TB_Seagate_external-_-22-178-951-_-Product ]

 

On the other hand, if you have a significant amount of storage, and simply don't feel the need to parity protect it -- perhaps because it's a backup and it's "no big deal" if you lose a drive -- then UnRAID is still a reasonable way to combine the storage capacities of a bunch of drives.

 

Link to comment

I also cannot afford additional storage, and even if I could, I would need to upgrade my unRAID license to fit it into this machine, which is already almost maxed with its SATA capacity, so then I'd be looking to buy interface cards, and I couldn't use anything that's x16, since this particular board lumps that into the same unbreakable IOMMU group as the video card I am passing to my Windows installation.

 

Basically, I'm doing everything, flying by the seat of my pants, using what I've already got.

 

It's incredibly convenient being able to create arbitrary file separation points or shares within a large merged storage set, such as the 8TB I have now, and sharing it with my whole network. It's also very convenient having Windows running under a hypervisor, and having all the Docker services for random things I choose to run.

 

I'm not quite yet a Mr. Moneybags Data Hoarder. I was already lucky enough to have scrambled enough to buy that second 4TB drive when I needed to convert data from one drive partition format to another. I've just been lucky I haven't even gone near capacity with what I have now.

Link to comment

... Beg, borrow or steal for that parity drive.... or make sure the data you are storing is copied elsewhere.

 

Absolutely agree ... EXCEPT => those aren't mutually exclusive events.  You want BOTH a parity drive (to provide fault tolerance)  AND  to "make sure the data ... is copied elsewhere" (for backup).

 

Link to comment

Kode,

 

I have been using unrAID for ages... I cannot tell you how many HDD have failed on me. More than one would think. Parity is what has saved me every time. I am dual parity user now.

 

Beg, borrow or steal for that parity drive.... AND make sure the data you are storing is copied elsewhere.

 

I cannot say I disagree with you. Perhaps I shall ask for a 5TB hard drive for Christmas, and employ it as a parity drive. And maybe also remove this 640GB drive, as it's kind of a joke to waste a port with that.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.