Couch Posted January 7 Share Posted January 7 (edited) Hello, i am kind of new to troubleshooting on linux based OS's I have had an unraid box running for about 2 years, with little to no issues, but now my disk 8 in my array keeps being disabled, The following steps have been tried: New config new controller/cables replaced disk (i only have very used disks, but plan to replace as soon as i can afford it) reformat disk/preclear it I have tried looking at the logfiles, but i am a little lost to where to find the right ones, and what to look for. Please help and please let me know how i can make helping easier Edited January 7 by Couch Quote Link to comment
trurl Posted January 7 Share Posted January 7 Wish you had asked before doing anything. Most things you tried were the wrong things to do. Looks like disk 5 is empty or nearly so. Is it supposed to be like that? Attach Diagnostics to your NEXT post in this thread. Quote Link to comment
Couch Posted January 7 Author Share Posted January 7 1 hour ago, trurl said: Wish you had asked before doing anything. Most things you tried were the wrong things to do. Looks like disk 5 is empty or nearly so. Is it supposed to be like that? Attach Diagnostics to your NEXT post in this thread. No worries it fixed it when i had the issue last, as mentioned all my disks are heavily used and i have plenty of them Can you elaborate on it being the wrong steps? Yes, i am nearing my storage cap, and will need to expand soon Attached is my diagnostics Thank you for helping hera-diagnostics-20240107-2006.zip Quote Link to comment
trurl Posted January 7 Share Posted January 7 Is disk5 supposed to be nearly empty? And nothing assigned as disk6? Since you rebooted after disk8 became disabled we can't see anything in syslog about why, but SMART report for disk8 looks mostly OK, except for a large number of UDMA CRC errors, which is probably the reason, a bad connection. Take a look at your Dashboard page. It should be showing you SMART ( 👎) warnings for several of your disks. Post a screenshot of that. Quote Link to comment
trurl Posted January 7 Share Posted January 7 I have reviewed SMART for each of your array disks. Fortunately all their SMART warnings are UDMA CRC, which indicate bad connections. In particular, disks 5, 7, and 8. How are these disks powered? Any power splitters involved? FIrst thing you should do is check all disk connections, SATA and power, both ends, including splitters. Then reboot, post new diagnostics, and we can continue to work on disk8 as well as a few other problems you have. Quote Link to comment
Couch Posted January 7 Author Share Posted January 7 Thank you, i will look through these steps tomorrow Quote Link to comment
trurl Posted January 7 Share Posted January 7 1 hour ago, Couch said: Can you elaborate on it being the wrong steps? 6 hours ago, Couch said: New config This forces Unraid to accept the disks just as they are, and (optionally) rebuild parity. If the disk had truly failed there would have been no way to rebuild it, so all of it's data would be lost. Rebuilding data disks is the whole reason you have parity. If you're going to New Config every time a disk gets disabled, you might as well not have parity at all and forget about recovering any data. And even if the physical disk could still be used with its contents in the New Config, it probably wouldn't agree with parity so if you didn't let parity rebuild the array would be out of sync. And, while a disk is disabled, Unraid is still emulating it from parity. It is possible many files could have been written to the emulated disk. By rebuilding parity instead of rebuilding the data disk, all those writes would be lost since they were not on the physical disk. It's even possible that some of the lost writes would be filesystem metadata, and so could result in corruption of the filesystem of the physical disk. 6 hours ago, Couch said: reformat disk/preclear it Format is a write operation. If you format a disk while it's still in the array, parity is updated just as with any write operation. So, after formatting a disk in the array, parity agrees that it is a formatted disk with an empty filesystem, so any data it might have had can't be recovered from parity rebuild. If instead you did this outside the array, it was mostly pointless. Assuming you did it in the order stated, formatted then precleared, then the format would be totally pointless since it would be cleared. If instead you precleared then formatted, the only way that would be accepted that way into the array is if you did New Config to force it. If you want a formatted disk in the array, you should format it in the array. Unraid only requires a clear disk when adding it to a new slot in an array that already has valid parity. This is so parity will remain valid since a clear disk (all zeros) has no effect on parity. If you add a disk to a new slot in the parity array, Unraid will clear it if it hasn't been precleared, then you can format it in the array so it can accept files. Clearing a disk has nothing at all to do with disks that are already part of the array, though some will use preclear to test a new disk before using it as a replacement. Quote Link to comment
Couch Posted January 9 Author Share Posted January 9 On 1/7/2024 at 8:32 PM, trurl said: I have reviewed SMART for each of your array disks. Fortunately all their SMART warnings are UDMA CRC, which indicate bad connections. In particular, disks 5, 7, and 8. How are these disks powered? Any power splitters involved? FIrst thing you should do is check all disk connections, SATA and power, both ends, including splitters. Then reboot, post new diagnostics, and we can continue to work on disk8 as well as a few other problems you have. Okay, i checked connectors and they all seem to be plugged in right, i am using 1 splitter for 2 disks, but they are not the disks with issues Power-supply is an older used one, so it might have bad connectors hera-diagnostics-20240109-1029.zip Attached is the new diagnostics Again thank you for your help and your guidance Quote Link to comment
trurl Posted January 9 Share Posted January 9 Disks 1,2,3,4,7 all mount and have data. Disk5 also mounted but is empty or nearly so. Nothing assigned as disk6. And, disabled/emulated disk8 mounts and has data so that is good. You didn't answer this question: On 1/7/2024 at 2:20 PM, trurl said: Is disk5 supposed to be nearly empty? And nothing assigned as disk6? Not clear from your initial description whether or not you let parity rebuild when you did New Config, but I assume parity must be valid since disk8 is being emulated just fine. The contents of emulated disk8 will be the result of rebuilding disk8. Unraid disables a disk when a write to it fails for any reason. This is because it is no longer in sync with the array. After a disk becomes disabled, it isn't used again until rebuilt (or you force it with New Config). Instead, Unraid emulates the disk. Any reads of the disabled disk instead read all other disks and get its data from the parity calculation. Any writes to the disabled disk instead update parity as if the disk had been written. The initial failed write, and any subsequent writes to the disabled disk, can be recovered by rebuilding. If we are reasonably confident everything is working well, usually we will just say rebuild on top of the same disk. But, it might be safer to rebuild to a new disk and keep the original with its existing contents in case of problems rebuilding. Do you have another disk you can use for rebuilding disk8? Quote Link to comment
Couch Posted January 9 Author Share Posted January 9 55 minutes ago, trurl said: Disks 1,2,3,4,7 all mount and have data. Disk5 also mounted but is empty or nearly so. Nothing assigned as disk6. And, disabled/emulated disk8 mounts and has data so that is good. You didn't answer this question: Not clear from your initial description whether or not you let parity rebuild when you did New Config, but I assume parity must be valid since disk8 is being emulated just fine. The contents of emulated disk8 will be the result of rebuilding disk8. Unraid disables a disk when a write to it fails for any reason. This is because it is no longer in sync with the array. After a disk becomes disabled, it isn't used again until rebuilt (or you force it with New Config). Instead, Unraid emulates the disk. Any reads of the disabled disk instead read all other disks and get its data from the parity calculation. Any writes to the disabled disk instead update parity as if the disk had been written. The initial failed write, and any subsequent writes to the disabled disk, can be recovered by rebuilding. If we are reasonably confident everything is working well, usually we will just say rebuild on top of the same disk. But, it might be safer to rebuild to a new disk and keep the original with its existing contents in case of problems rebuilding. Do you have another disk you can use for rebuilding disk8? Very sorry I do not know why disk 5 has been replaced recently, so i had to move stuff away from it temporarily. Disk 6 missing is on me, i put the disks in 7-8 by mistake, but have not bothered moving them. Yes i let it rebuild every time there is anything, and it runs parity checks weekly Yes, i will try rebuilding hopefully over the weekend when i have a bit of time Thank you very much! Quote Link to comment
itimpi Posted January 9 Share Posted January 9 27 minutes ago, Couch said: runs parity checks weekly This is probably excessive. More frequent is running them monthly, or even quarterly. Quote Link to comment
trurl Posted January 9 Share Posted January 9 40 minutes ago, Couch said: disk 5 has been replaced recently, so i had to move stuff away from it temporarily. No good reason to move data off a disk you are going to rebuild. The whole point of rebuild is so the replacement will have all the data the original had. And if the original disk was actually disabled, there are some good reasons to not move data off the emulated disk, since all the other disks have to get involved emulating the data for the disk. 40 minutes ago, Couch said: it runs parity checks weekly Most only do monthly or even less frequently. Parity is realtime so check is just a check that parity is still in sync. But 4TB parity check should take much less than a day. Quote Link to comment
trurl Posted January 9 Share Posted January 9 12 minutes ago, trurl said: And if the original disk was actually disabled, there are some good reasons to not move data off the emulated disk, since all the other disks have to get involved emulating the data for the disk. If you are concerned for the data and worried about rebuild, then the better approach would be to COPY not MOVE the data somewhere OFF the array. That way you don't modify the original disk, and you don't modify parity and the other array disks while your array is compromised. And if you are concerned for the data, best approach is to have another copy of anything important and irreplaceable on another system. Parity is not a substitute for backup. Quote Link to comment
Couch Posted January 9 Author Share Posted January 9 22 minutes ago, itimpi said: This is probably excessive. More frequent is running them monthly, or even quarterly. Ah, well i might have to adjust a bit then 20 minutes ago, trurl said: No good reason to move data off a disk you are going to rebuild. The whole point of rebuild is so the replacement will have all the data the original had. And if the original disk was actually disabled, there are some good reasons to not move data off the emulated disk, since all the other disks have to get involved emulating the data for the disk. Most only do monthly or even less frequently. Parity is realtime so check is just a check that parity is still in sync. But 4TB parity check should take much less than a day. Moved due to me now having spare disks at the time, so i had to collect my data between disks. And i am aware it does not work as a backup Thank you ❤️ Quote Link to comment
Couch Posted January 14 Author Share Posted January 14 Okay, my backup is taking way longer than expected, currently 39 hours remaining of total backup so i'll return as soon as i have had a chance to finish it Quote Link to comment
Couch Posted January 14 Author Share Posted January 14 On 1/9/2024 at 5:15 PM, trurl said: If you are concerned for the data and worried about rebuild, then the better approach would be to COPY not MOVE the data somewhere OFF the array. That way you don't modify the original disk, and you don't modify parity and the other array disks while your array is compromised. And if you are concerned for the data, best approach is to have another copy of anything important and irreplaceable on another system. Parity is not a substitute for backup. Just making sure i got these the steps correct here, since my other moves were not the greatest backup data shutdown array remove bad disk and add other disk, start array let it rebuild shutdown array create new config and assign disks back where they belong rebuild array am i missing something? Quote Link to comment
trurl Posted January 15 Share Posted January 15 Maybe there is some confusion, could be me. When you assign the disk to the same slot as the removed disk and start the array, the new disk is rebuilt so it contains the contents of the missing disk. Have no idea why you think another New Config and rebuild is needed after that. Quote Link to comment
trurl Posted January 15 Share Posted January 15 https://docs.unraid.net/unraid-os/manual/storage-management/#replacing-faileddisabled-disks Quote Link to comment
trurl Posted January 15 Share Posted January 15 1 minute ago, trurl said: Maybe there is some confusion, could be me. Maybe new diagnostics would clarify your current situation. Quote Link to comment
Couch Posted January 16 Author Share Posted January 16 On 1/15/2024 at 2:09 PM, trurl said: Maybe there is some confusion, could be me. When you assign the disk to the same slot as the removed disk and start the array, the new disk is rebuilt so it contains the contents of the missing disk. Have no idea why you think another New Config and rebuild is needed after that. Ahh i might have misunderstood then, so i just replace the disk and let it rebuild, nothing else? Also nothing has changed yet, as my backup is at 95% currently, and i am waiting for that to finish before making changes Quote Link to comment
Solution Couch Posted January 17 Author Solution Share Posted January 17 (edited) Okay, backup done and solution seems to have worked Steps taken: backed up system and shut it down replaced unresponsive disk with another one started up system and ran pre-clear on disk assigned disk to now empty spot and started array rebuild disk so far it looks to be solved, thank you so much for the help! ❤️ Edited January 17 by Couch Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.