drmit Posted September 3, 2023 Share Posted September 3, 2023 Hi all, First off thanks to everyone who contributes to this community. I started with Unraid less than two years ago after never having used a Linux-based system before, and while I haven't contributed to these forums much, I've found them immensely helpful. My unraid server is currently running 6.12.3 with an array of 3 x 10TB WD Red Plus drives with single parity (i.e. 2 x 10TB array disks and one 10TB parity drive). All drives are attached directly to the motherboard, so no controllers in use. I run parity checks once per month on the first of the month, and last month my Parity drive ended up disabled, and a SMART check of the drive found one CRC error. I assumed at the time it may be a one-off, so I checked my cabling, which all looked fine, and rebuilt parity onto the same drive. A SMART extended test completed fine, apart from CRC error count = 1. All seemed fine until this month's parity check. This time, the parity drive became disabled, but I can't seem to read the SMART data on it (possibly because I can't figure out how to get it to spin up??). When I try to run smartctl -a /dev/sdg from terminal, I get the response: Short INQUIRY response, skip product id A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. I tried to stop the array, remove the disk from the array, restart in maintenance mode, stop the array, and re-add the disk to the array as a Parity drive. While it appears in the drop-down list initially (sdg), after selecting it the UI refreshes and the option for sdg is then gone from the list. The drive is listed under unassigned devices (so I was able to copy out the disk log information), but I can't seem to do anything with it. Could anyone provide a suggestion on what my next step should be? The drives are only about 1.5 yrs old so should still be under warranty, but if there is something simple I'm missing here I'd like to get to the bottom of it rather than leave my array unprotected while the RMA runs its course (I'm in New Zealand and bought the drives from Amazon, so who knows how long the RMA will take). Could a SATA/power cable or SATA port on the motherboard be at fault here (they appear fine visually)? I've attached the syslog and disk log but if any other data would be useful I can add that too. Thanks for your help! syslog.txt disk log information sdg.txt Quote Link to comment
trurl Posted September 3, 2023 Share Posted September 3, 2023 Attach diagnostics to your NEXT post in this thread Quote Link to comment
drmit Posted September 3, 2023 Author Share Posted September 3, 2023 Thanks for the tip trurl, diagnostics attached. sirpleximus-diagnostics-20230903-1425.zip Quote Link to comment
trurl Posted September 3, 2023 Share Posted September 3, 2023 Connection problems with parity. Check connections, both ends, including power. Do you have power splitters? Quote Link to comment
drmit Posted September 3, 2023 Author Share Posted September 3, 2023 No power splitters, and only using 3 max of the 4 SATA power connectors on any one branch from the PSU (which is new as well, Seasonic Prime PX-650). Will attempt to check the cables with a multimeter, though those pins are quite narrow. Any idea on expected resistance? Quote Link to comment
drmit Posted September 3, 2023 Author Share Posted September 3, 2023 (edited) SATA power and data cables to the problem drive now checked and all seem to have good continuity on all of the pins. Resistance varies but no one pin seemed much worse than another. I was finally able to figure out how to 'detach' the drive from the Settings menu in Unassigned Devices. I then re-attached, after which I was able to add it back into the array as a parity drive (it auto-populated the first parity drive slot once I re-attached it). I then rebuilt parity overnight and the check completed with no errors. After 're-attaching' the drive and starting the array I was able to download the SMART log (attached) which states that the CRC error count is now 2 (it was 1 after the last failed parity check). It also says (for both 'errors'): When the command that caused the error occurred, the device was doing SMART Offline or Self-test. Why would it say that when the error occurred during a parity check, not a SMART test? Another extended SMART test is now underway. I'm a bit unsure what to do next. Should I RMA the drive? Replace the cables with new ones and hope parity continues to remain valid? My data is all backed up, but ironically my backup server (TrueNAS) seems to be having hardware issues right now, so I'd prefer to not take any chances. What is anyone's experience in RMAing a drive with a few CRC errors? Will WD just replace with new, or would they reject it if one of their tests seems to indicate the drive is fine? WDC_WD101EFBX-68B0AN0_VCJW6NVP-20230904-0806.txt Edited September 3, 2023 by drmit added bit about extended SMART test underway Quote Link to comment
itimpi Posted September 3, 2023 Share Posted September 3, 2023 13 minutes ago, drmit said: What is anyone's experience in RMAing a drive with a few CRC errors They may not accept it as CRC errors are connection errors and rarely indicate a disk problem. As long as the value is not constantly increasing then you can ignore it. The value never resets to 0 so being steady is fine. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.