tiwing Posted November 11, 2020 Share Posted November 11, 2020 (edited) Hi, I have found my parity disk is offline, and another disk has read errors. Both are 10TB Reds and both were purchased about 18 months ago and have been flawless. this is a lightly used home server. All disks are connected to an LSI SAS controller (LSI00301 SAS 9207-8i) which has been perfect for a year. Both failed and read error drives test fine on a quick smart test. I followed other advice on the forums after a successful SMART test and shut down the server, re-seated the card, powered back up, unassigned the parity, started the array, stopped the array, assigned the parity, started the array .... and it's still disabled. I currently have the webGUI working, my VMs have started (running on cache drive), and all my dockers have started fine. Unassigned Devices is working fine. I'm a rookie, please treat me like I'm 9 years old What do I do next? What other info can I provide for the kind members here to help out? thank you! Tiwing edit: attaching diagnostic zip kscs-fvm2-diagnostics-20201111-1620.zip Edited November 11, 2020 by tiwing Quote Link to comment
trurl Posted November 11, 2020 Share Posted November 11, 2020 5 minutes ago, tiwing said: What other info can I provide You should always Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
tiwing Posted November 11, 2020 Author Share Posted November 11, 2020 LOL I attached above first. not so good at following directions apparently (9 year old?? LOL I'm actually 47...) thanks! kscs-fvm2-diagnostics-20201111-1620.zip Quote Link to comment
trurl Posted November 11, 2020 Share Posted November 11, 2020 23 minutes ago, tiwing said: attached above first The reason I said NEXT is so you would make a new post. Otherwise I wouldn't have noticed this thread again. Looks like disk7 may be a problem. Run an extended SMART test on it. Quote Link to comment
tiwing Posted November 12, 2020 Author Share Posted November 12, 2020 Ran extended test, but unsure where to get results. All I see is this, but I still have read errors showing on drive 7 (sdf) and parity is still disabled (sdg). Now drive 2 (sdi) is showing one read error as well. indicates a controller failure perhaps??? Quote Link to comment
tiwing Posted November 12, 2020 Author Share Posted November 12, 2020 (edited) Figured another reply since I see this morning nothing has changed in terms of errors and offline drives. But this picture ... There are a lot of reads and writes for a drive that is supposedly offline (parity), and both the drives are now showing up in UD ... that is 16 hours since reboot. Edit: also, ran extended test as above, then for giggles started extended test on my other 10TB drive. It's still running 8 hours later. So the above extended test on drive 7 I assume failed but didn't show a failed error. Edited November 12, 2020 by tiwing Quote Link to comment
JorgeB Posted November 12, 2020 Share Posted November 12, 2020 Crazy number of reads/writes usually happen after a device drops offline, usually a cable issue, replace both cables on those disks. Quote Link to comment
tiwing Posted November 12, 2020 Author Share Posted November 12, 2020 They are SAS cables, will order a new set today and will report back once they're installed... *keeping fingers crossed* thank you! Quote Link to comment
trurl Posted November 12, 2020 Share Posted November 12, 2020 Read, Write, and Error counts will all reset when you reboot (or you can reset them at Main - Array Operation - Clear Stats). Your parity disk will have to be rebuilt to get it enabled again. Quote Link to comment
tiwing Posted November 15, 2020 Author Share Posted November 15, 2020 (edited) Hi, update: swapped a new set of sas to sata cables. Same issue. Swapped cables between sas ports. Same issue. Swapped cables on the drives themselves. Same issue. Swapped sas cards. Same issue. Data loss is not a concern here since I have a local backups of somewhat important or hard to replace stuff on a 22 TB backup unraid box, and another local backup plus cloud backup of critical stuff. The rest I can get again over time. The backup unraid box is made of identical mobo, CPU, power supply, and I can swap out bits and pieces if there is more testing to be done, and I can fill it up from whatever drives are still good... Or pullout the failed drives and copy to backup as much as I can get by mounting on the backup in a USB enclosure.... What are my next steps? Swap drives from the sas card to the mobo itself? Try a New Power supply (hard to find since mobo has a 10slot plus 4slot connector), Or how to test if both drives are actually bad and went bad at exactly the same time... Especially since both drives were purchased on the same day?? Both 10tb drives with read errors are still under warranty and can be RMA'd at WD but I have to know they have actually gone bad right? For anyone reading this, two drives can go bad at the same time. I always thought "what are the chances" yet here I am. Please please do local and cloud backups of the stuff you care about. Period. I've seen soo many "I'm freaking out right now" threads. Thankfully my situation is more "it's gonna cost some money but meh"... Thanks for help everyone. Edit forgot to mention once in a while another drive shows read errors also, a much older 4tb red. It's occasional but also concerning. I have to think it's related... Edited November 15, 2020 by tiwing Quote Link to comment
JorgeB Posted November 16, 2020 Share Posted November 16, 2020 18 hours ago, tiwing said: Swap drives from the sas card to the mobo itself? Try a New Power supply Both good options. Quote Link to comment
tiwing Posted November 17, 2020 Author Share Posted November 17, 2020 tried swapping drives to the mobo sata ports. same issue. So I've used unbalance to clear the data drive (frigging awesome plugin!), confirmed there is no data remaining on the drive, done a new config without parity or the data drive, and removed the two 10TB drives from my array. Now that they're sitting on my desk, trying to figure out the best way to test them before going the RMA route. I have a USB3 to SATA powered adaptor that I can plug into windows, and I can easily download a linux live CD on my desktop machine... what's the best way to test the drives? And if they test "good" .... is a pre-clear all that's needed to put them back into my array? (makes me nervous as f to do this though) i'm also running a read check on the remaining disks in the array to see if anything else has issues. cheers Quote Link to comment
trurl Posted November 17, 2020 Share Posted November 17, 2020 The disk manufacturers usually have free download that run on Windows which you can use for testing. If you have an extra SATA port on your desktop that might work better than a USB adaptor. Quote Link to comment
tiwing Posted November 17, 2020 Author Share Posted November 17, 2020 My (old) PC does have free SATA. However I can't accept a config change when it boots because my wireless keyboard isn't recognized until after boot... argh. And my USB-> SATA doesn't recognize the 10TB drives in windows10 So I've stuck the drives back in unraid and am running binhex's preclear docker on them. I'll report back in a couple of days thanks Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.