Jump to content

help please - 2 drive issues


tiwing

Recommended Posts

Hi, I have found my parity disk is offline, and another disk has read errors. Both are 10TB Reds and both were purchased about 18 months ago and have been flawless. this is a lightly used home server. 

 

All disks are connected to an LSI SAS controller (LSI00301 SAS 9207-8i) which has been perfect for a year.

 

Both failed and read error drives test fine on a quick smart test.

 

I followed other advice on the forums after a successful SMART test and shut down the server, re-seated the card, powered back up, unassigned the parity, started the array, stopped the array, assigned the parity, started the array .... and it's still disabled.

 

I currently have the webGUI working, my VMs have started (running on cache drive), and all my dockers have started fine. Unassigned Devices is working fine. 

 

I'm a rookie, please treat me like I'm 9 years old :) What do I do next? What other info can I provide for the kind members here to help out?

 

thank you!

 

Tiwing

 

edit: attaching diagnostic zip

kscs-fvm2-diagnostics-20201111-1620.zip

Edited by tiwing
Link to comment

Figured another reply since I see this morning nothing has changed in terms of errors and offline drives. But this picture ...

image.thumb.png.b3a1c17a8f2884fd8c1c51af6b88deb2.png

 

There are a lot of reads and writes for a drive that is supposedly offline (parity), and both the drives are now showing up in UD ... ??? that is 16 hours since reboot.

 

Edit: also, ran extended test as above, then for giggles started extended test on my other 10TB drive. It's still running 8 hours later. So the above extended test on drive 7 I assume failed but didn't show a failed error. 

Edited by tiwing
Link to comment

Hi, update: swapped a new set of sas to sata cables. Same issue. Swapped cables between sas ports. Same issue. Swapped cables on the drives themselves. Same issue. Swapped sas cards. Same issue. 

 

Data loss is not a concern here since I have a local backups of somewhat important or hard to replace stuff on a 22 TB backup unraid box, and another local backup plus cloud backup of critical stuff. The rest I can get again over time. The backup unraid box is made of identical mobo, CPU, power supply, and I can swap out bits and pieces if there is more testing to be done, and I can fill it up from whatever drives are still good... Or pullout the failed drives and copy to backup as much as I can get by mounting on the backup in a USB enclosure.... 

 

What are my next steps? Swap drives from the sas card to the mobo itself? Try a New Power supply (hard to find since mobo has a 10slot plus 4slot connector), Or how to test if both drives are actually bad and went bad at exactly the same time... Especially since both drives were purchased on the same day?? Both 10tb drives with read errors are still under warranty and can be RMA'd at WD but I have to know they have actually gone bad right? 

 

For anyone reading this, two drives can go bad at the same time. I always thought "what are the chances" yet here I am. Please please do local and cloud backups of the stuff you care about. Period. I've seen soo many "I'm freaking out right now" threads. Thankfully my situation is more "it's gonna cost some money but meh"... 

 

Thanks for help everyone. 

 

Edit forgot to mention once in a while another drive shows read errors also, a much older 4tb red. It's occasional but also concerning. I have to think it's related... 

Edited by tiwing
Link to comment

tried swapping drives to the mobo sata ports. same issue. 

 

So I've used unbalance to clear the data drive (frigging awesome plugin!), confirmed there is no data remaining on the drive, done a new config without parity or the data drive, and removed the two 10TB drives from my array. Now that they're sitting on my desk, trying to figure out the best way to test them before going the RMA route.  I have a USB3 to SATA powered adaptor that I can plug into windows, and I can easily download a linux live CD on my desktop machine... what's the best way to test the drives? And if they test "good" .... is a pre-clear all that's needed to put them back into my array? (makes me nervous as f to do this though)

 

i'm also running a read check on the remaining disks in the array to see if anything else has issues.

 

cheers

Link to comment

My (old) PC does have free SATA. However I can't accept a config change when it boots because my wireless keyboard isn't recognized until after boot... argh. And my USB-> SATA doesn't recognize the 10TB drives in windows10

 

So I've stuck the drives back in unraid and am running binhex's preclear docker on them. I'll report back in a couple of days :)

 

thanks

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...