timekiller Posted January 6, 2021 Share Posted January 6, 2021 I am in the process of replacing drives. I removed a 4TB drive and replaced with a 12TB drive. I'm 32% into a data rebuild and now I have a unraid reporting that a drive is disabled, contents emulated. My logs show a ton of messages like: Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979624 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979632 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979640 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979648 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979656 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979664 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979672 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979680 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979688 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979696 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979704 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979712 Jan 6 10:31:30 Storage kernel: md: disk7 read error, sector=5384979720 I know I need to address this, but I'm nervous about doing anything while the data rebuild is running. Fortunately I do have 2 parity drives, so I should not lose any data but I'll feel a lot better when the data rebuild is complete (in 2 days!). What has me concerned right now is that I can't write to any share under /mnt/user. I can read from the array and if I write to a specific disk at /mnt/disk#/<share> I can see the new content when I access the share through /mnt/user/<share>. What I need to know is: Is it safe to let the data rebuild continue? Am I better off shutting down and seeing what is up with disk7? If I shutdown and it turns out disk7's sata cable is lose or something else like that then would I create more issues with the disk suddenly coming back? The nightmare scenario I'm imagining is if a new file is written while disk7 is offline, then I shut down and get disk7 back online, then when I boot back up the parity drives will have the wrong calculations based on disk7 and the data rebuild would be corrupted. Is this a valid concern? Diagnostics attached storage-diagnostics-20210106-1339.zip Quote Link to comment
timekiller Posted January 6, 2021 Author Share Posted January 6, 2021 Hmm, actually I just realized disk8 is the one that is disabled/emulated, not disk7. Now I'm even more concerned Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 Looks like you were rebuilding disk9 and started getting errors on disk 7 and 8. Disk 8 became disabled, but it can't disable disk7 because it can't disable more disks than you have parity. Totally pointless to continue rebuild. Looks like connection issues. You probably disturbed them when you replaced the drive. Shutdown, check all connections, all disks, power and SATA, including splitters. Reboot and post new diagnostics. You will have to rebuild 8 and 9 now. Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 Or maybe you have a controller problem with heat or something since it took several hours before the errors started. Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 Looks like a lot of problems with that Highpoint controller reported on the forum. Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 1 hour ago, timekiller said: I just realized disk8 is the one that is disabled/emulated, not disk7 25 minutes ago, trurl said: Looks like you were rebuilding disk9 and started getting errors on disk 7 and 8. Disk 8 became disabled, but it can't disable disk7 because it can't disable more disks than you have parity. Most of what I said here was based on reading syslog. I can't see your Main page, but you should be able to see all of it very clearly in the webUI. The rebuilding / emulated / invalid disk9 is marked by a yellow triangle, the disabled / emulated disk8 is marked by a red X, and all the disks getting errors have nonzero in the Errors column. Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 Do you have backups of everything important and irreplaceable? Might be a good time to copy things off to somewhere else before attempting the rebuild again. Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 37 minutes ago, trurl said: Totally pointless to continue rebuild. The reason I say this is because it must be able to reliably read all other disks in order to reliably rebuild a disk. With dual parity, that only goes to all other disks except one. You have two that can't be reliably read, so the rebuild you were hoping to continue can't be good. Quote Link to comment
trurl Posted January 6, 2021 Share Posted January 6, 2021 Since you haven't visited since I began posting to your thread I will ping you just in case you get email notifications from the forum. @timekiller Quote Link to comment
timekiller Posted January 7, 2021 Author Share Posted January 7, 2021 thanks @trurl I was dealing with some other (non unraid) stuff. Everything you said makes total sense. Going to shutdown and check cabling now. Fortunately I am backed up. I sync everything to google cloud regularly, and the replaced drives have not been wiped yet, so worst case I can swap them in and create a fresh array and I shouldn't lose anything. Also, yes I can see everything you were talking about on my main screen: Quote Link to comment
timekiller Posted January 7, 2021 Author Share Posted January 7, 2021 Quick update: I powered down and reseated cables. Powered up and immediately disks 7 and 8 had problems again. Same as before, disk8 disabled, disk7 read errors. I powered down again and connected disk7 and disk8 to different ports on the sata card. Power up and disk8 is immediately offline gain, but disk7 looks ok. Data rebuild can now continue as there is enough parity information to rebuild. Fingers crossed that disk7 stays ok - only time will tell. If disk9 completes the data rebuild (in about 3 days) then I can power down, swap out disk8 and see what's up (probably have to RMA it). Quote Link to comment
trurl Posted January 7, 2021 Share Posted January 7, 2021 6 hours ago, trurl said: Looks like a lot of problems with that Highpoint controller reported on the forum. Quote Link to comment
trurl Posted January 7, 2021 Share Posted January 7, 2021 3 hours ago, timekiller said: swap out disk8 and see what's up (probably have to RMA it). I doubt there is anything wrong with that disk, SMART in diagnostics looks OK. Same for disk7 and 9. I guess you could run an extended SMART test on them, but since you are having multiple disk problems I think it more likely that the disks are not the cause. Quote Link to comment
timekiller Posted January 7, 2021 Author Share Posted January 7, 2021 2 minutes ago, trurl said: I doubt there is anything wrong with that disk, SMART in diagnostics looks OK. Same for disk7 and 9. I guess you could run an extended SMART test on them, but since you are having multiple disk problems I think it more likely that the disks are not the cause. Hmm, I suppose I could try swapping in my old controllers to test. If it's the Highpoint that would be unfortunate since I bought it on ebay a few months ago. Though at least that would give me an excuse to get something that will be supported in the next version of unraid. Thanks @trurl Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.