banangineer Posted December 13, 2021 Share Posted December 13, 2021 General System Info: - 3 10TB Drives - 1 10TB Drive for Parity - 2 2TB SSDs in Cache Pool Issue: I was logging into my UnRAID panel to start a parity check, but upon looking at the Dashboard, one of my drives showed as Disabled (dev2), with the contents being emulated. I kept seeing sector errors, but couldn't find exactly what was the issue. All of my 10TB drives were purchased around March of 2021, and the server was built in July, so it's not like they're end of life. The only thing that I can think of is a forced shutdown from when my cat sat on the power button too long and the array wasn't able to gracefully shut down. Attempts to Resolve: 1. I found this thread in the UnRAID forum, and followed Squid's advice. However, after stopping the array and attempting to re-add the drive, it gives an error and assumes unassigned, and the problem drive is no longer in the dropdown. 2. I tried this again, but by following the official UnRAID docs to try and remount and rebuild the drive from parity. 3. Restarted the UnRAID system a few times for sanity checks 4. Checked the SATA connections to the motherboard and drives, unplug-then-plug, nothing changed I've done everything short of taking out the drive and plugging it into my main PC and seeing if I can run further tests on it. Before I do that, I wanted to forward my diagnostics and see if someone else has a better idea of what's going on and what can be done to fix it. zawarudo-diagnostics-20211213-0403.zip Quote Link to comment
JorgeB Posted December 13, 2021 Share Posted December 13, 2021 Was this disk2? Model Family: Seagate Enterprise Capacity 3.5 HDD Device Model: ST10000NM0086-2AA101 Serial Number: ZA2E1Z0F If yes it appears to be failing and needs replacing with a new one. Quote Link to comment
banangineer Posted December 13, 2021 Author Share Posted December 13, 2021 25 minutes ago, JorgeB said: Was this disk2? Model Family: Seagate Enterprise Capacity 3.5 HDD Device Model: ST10000NM0086-2AA101 Serial Number: ZA2E1Z0F If yes it appears to be failing and needs replacing with a new one. Yes, that is the one that is currently failing. Hopefully I can still RMA it with Seagate since it was purchased earlier this year. In the meantime, I have another drive that is registered in the array, but is still listed as disk3 in the array. The drive is not attached to either Share, and I'm fine with dropping the contents of it. Could I shrink the array from 3 + 1 parity to 2 + 1 parity, moving disk3 to replace disk2, and re-add my RMA'd or replaced drive back as disk3 later? Or is that not how shrinking the array works? Also, for my own education, can you tell that's the failing drive because of the sector errors? Or are there other useful logs that tell you that? Quote Link to comment
ChatNoir Posted December 13, 2021 Share Posted December 13, 2021 Some SMART attributes do not look good, in particular #187, 197, 198. D# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 054 049 044 - 66767508 3 Spin_Up_Time PO---- 092 091 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 43 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 2576 7 Seek_Error_Rate POSR-- 080 060 045 - 4395031003 9 Power_On_Hours -O--CK 097 097 000 - 2837 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 42 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 001 001 000 - 282 188 Command_Timeout -O--CK 100 097 000 - 1 9 9 189 High_Fly_Writes -O-RCK 071 071 000 - 29 190 Airflow_Temperature_Cel -O---K 062 047 040 - 38 (Min/Max 37/38) 191 G-Sense_Error_Rate -O--CK 100 100 000 - 268 192 Power-Off_Retract_Count -O--CK 100 100 000 - 10 193 Load_Cycle_Count -O--CK 098 098 000 - 5469 194 Temperature_Celsius -O---K 038 053 000 - 38 (0 21 0 0 0) 195 Hardware_ECC_Recovered -O-RC- 078 064 000 - 66767508 197 Current_Pending_Sector -O--C- 099 094 000 - 880 198 Offline_Uncorrectable ----C- 099 094 000 - 880 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 200 Pressure_Limit PO---K 100 100 001 - 0 240 Head_Flying_Hours ------ 100 253 000 - 2809h+04m+29.660s 241 Total_LBAs_Written ------ 100 253 000 - 17757413192 242 Total_LBAs_Read ------ 100 253 000 - 80081644090 Possibly other things too. Quote Link to comment
JorgeB Posted December 13, 2021 Share Posted December 13, 2021 34 minutes ago, banangineer said: Could I shrink the array from 3 + 1 parity to 2 + 1 parity, moving disk3 to replace disk2, and re-add my RMA'd or replaced drive back as disk3 later? No, you could move any data form emulated disk2 to other disk(s) and then do a new config and re-sync parity. 35 minutes ago, banangineer said: Also, for my own education, can you tell that's the failing drive because of the sector errors? Disk has a lot of pending sectors. Quote Link to comment
banangineer Posted December 13, 2021 Author Share Posted December 13, 2021 10 minutes ago, JorgeB said: 46 minutes ago, banangineer said: Could I shrink the array from 3 + 1 parity to 2 + 1 parity, moving disk3 to replace disk2, and re-add my RMA'd or replaced drive back as disk3 later? No, you could move any data form emulated disk2 to other disk(s) and then do a new config and re-sync parity. You make a good point. In the meantime I'm going to replace the drive with another similar spec drive and hope for the best. I'll avoid shrinking the array and causing further issues. 13 minutes ago, JorgeB said: Disk has a lot of pending sectors. Thanks, I'll do some research to get a better idea, I appreciate the direction. 14 minutes ago, ChatNoir said: Some SMART attributes do not look good, in particular #187, 197, 198. I remember seeing #187 being a warning in UnRAID so that checks out. I'll go ahead and get a replacement drive in ASAP and ship off my dud for an RMA. I'll update this thread when the new drive is in and the bad drive has been replaced. Quote Link to comment
trurl Posted December 13, 2021 Share Posted December 13, 2021 You can see SMART warnings on the Dashboard page. Setup Notifications to alert you immediately by email or other agent as soon as a problem is detected. Quote Link to comment
banangineer Posted December 15, 2021 Author Share Posted December 15, 2021 Just to update the situation. Bought an Iron Wolf NAS drive and replaced it. Everything seemed fine. Up until the 19% mark of the rebuild, and I started getting tons of errors on my disk1 (one of the drives that was working). I had started it and went out to get food, several hours later I came back to the attached image and tons of chirping from what is disk1. Can I assume I have a second failed drive as well? My parity and disk3 are still fine. But disk1 has several errors and they keep climbing. I was able to grab most of what I needed from it, albeit slowly. Quote Link to comment
banangineer Posted December 15, 2021 Author Share Posted December 15, 2021 I'm updating this again to include the diagnostics. Even though the SMART check passed on Disk1, the seek error rate seems extremely high. I can also confirm that I have begun losing files while the disk was rebuilding and the disk was erroring. In case anyone would like to hear it for themselves: https://imgur.com/a/QUrhx7z zawarudo-diagnostics-20211214-2302.zip Quote Link to comment
JorgeB Posted December 15, 2021 Share Posted December 15, 2021 Disk1 is also failing, this means single parity can't help, you can try using ddrescue with both failing disks. Quote Link to comment
banangineer Posted December 15, 2021 Author Share Posted December 15, 2021 5 hours ago, JorgeB said: Disk1 is also failing, this means single parity can't help, you can try using ddrescue with both failing disks. At this point was I was able to gather and collect everything I needed from the failing drive, albeit slowly. It didn't kill the share completely so I was lucky it could move the data somewhat. At this point both drives are essentially dead to me. If I can keep my cache drives so I at least have my Docker and VM data, I'll be fine with nuking the array. What are next steps? Can I remove the second failed drive, create a new config with the three drives remaining and add back the other drives when they're back from RMA? Quote Link to comment
JorgeB Posted December 15, 2021 Share Posted December 15, 2021 12 minutes ago, banangineer said: Can I remove the second failed drive, create a new config with the three drives remaining and add back the other drives when they're back from RMA? Yes, drives added later will need to be cleared, but they will be empty so not a problem. Quote Link to comment
banangineer Posted December 15, 2021 Author Share Posted December 15, 2021 1 minute ago, JorgeB said: Yes, drives added later will need to be cleared, but they will be empty so not a problem. That's fine with me. If I went with a new config for my array, does that reset my cache drives? They're pooled, and even though I've backed up the most important parts, can I assume since they're detached from the array they're unaffected if I rebuild the array from scratch? Quote Link to comment
trurl Posted December 15, 2021 Share Posted December 15, 2021 10 minutes ago, banangineer said: If I went with a new config for my array, does that reset my cache drives? They're pooled, and even though I've backed up the most important parts, can I assume since they're detached from the array they're unaffected if I rebuild the array from scratch? New Config only resets your disk assignments. It doesn't change anything on any disk, except rebuilding parity to any disks assigned to any parity slot. Quote Link to comment
banangineer Posted December 15, 2021 Author Share Posted December 15, 2021 2 minutes ago, trurl said: New Config only resets your disk assignments. It doesn't change anything on any disk, except rebuilding parity to any disks assigned to any parity slot. I think that may be what I have to do for the time being. I don't really want to drop more cash on drives until after I've RMA'd the two failing ones. Assuming that the new config only affects the array, I will probably go ahead remove the second failed drive for RMA, and build a new config using my three (hopefully) working drives. Once the others are back from RMA, I'll add a second parity disk and just keep building to the array from there. Is this the recommended approach? Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 New Config has a checkbox to say parity is already valid, but it won't be if any disks are removed so don't check the box and it will rebuild. Quote Link to comment
trurl Posted December 16, 2021 Share Posted December 16, 2021 Also, if any disk with data on it shows as unmountable, DON'T format, ask for advice. And make sure you double check connections when mucking about inside. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.