SavellM Posted February 4, 2021 Share Posted February 4, 2021 (edited) Hey guys, So I built a new unRAID box (coming from TrueNAS) and its been running for a few days. I ran a preclear on ALL drives and it passed. I restarted and its now showing 2 drives disabled. I tried restarting again and same thing. Right now there is no data on these drives so its not the end of the world, but I just wanted to check if these drives are actually dead or not. They may be I wont deny that, but like I say, just want to double check before I just chuck them. If they really are dead, can I just remove them from the pool? Would I need to do a new config, and just take out these 2 drives? They arent in use so I dont need to replace them just yet. Diagnostics attached. odin-diagnostics-20210204-2041.zip Edited February 4, 2021 by SavellM Quote Link to comment
JorgeB Posted February 5, 2021 Share Posted February 5, 2021 Diags are after rebooting so we can't see what happened but both disks looks fine and multiple disks getting disabled at the same time suggest a power/connection problem, replace/swap cables/slots to rule that out and post new diags if it happens again, before rebooting. Quote Link to comment
SavellM Posted February 5, 2021 Author Share Posted February 5, 2021 Thanks @JorgeB So they disabled at different times, and only after a reboot. So after the server was started it was only then showing as disabled. All drives including SSD's are on a backplane SM bpn-846-sas3-el1 The first one disabled a few days ago, and then I restarted yesterday, and then this second one was disabled. I also cannot really see any fault with them, is there any way to re-enable and re-add them to the pool? Quote Link to comment
JorgeB Posted February 5, 2021 Share Posted February 5, 2021 2 hours ago, SavellM said: and only after a reboot. That suggests they are getting disable on shutdown. 2 hours ago, SavellM said: is there any way to re-enable and re-add them to the pool? Yes, see here: https://wiki.unraid.net/Troubleshooting#Re-enable_the_drive You can rebuild both at the same time. Quote Link to comment
SavellM Posted February 5, 2021 Author Share Posted February 5, 2021 (edited) Awesome thanks. I ended up removing them from the array. I am now doing another full preclear with erase and clear. So hopefully it will stress it enough to show any further issues. If it passes I'll re-add them back. I also ended up getting my SAS3 HBA this morning, so replaced the SAS2 one. So backplane has new cables and new HBA and its now SAS3 to SAS3 instead of SAS2 HBA. When you say that suggests it is getting disabled on shutdown is there any significance to that? @JorgeB Edited February 5, 2021 by SavellM Quote Link to comment
JorgeB Posted February 5, 2021 Share Posted February 5, 2021 2 minutes ago, SavellM said: When you say that suggests it is getting disabled on shutdown is there any significance to that? No really, but Unraid only disables a disk when a write fails, one thing you can do is instead of clicking shutdown/reboot on the GUI, first stop the array, disks can't get disable after array is stopped, so if it happens after array stop grab diags then, if not you can reboot or shutdown safely. Quote Link to comment
SavellM Posted February 5, 2021 Author Share Posted February 5, 2021 7 minutes ago, JorgeB said: No really, but Unraid only disables a disk when a write fails, one thing you can do is instead of clicking shutdown/reboot on the GUI, first stop the array, disks can't get disable after array is stopped, so if it happens after array stop grab diags then, if not you can reboot or shutdown safely. Cool will do that going forward if I remember. Its so infrequent that I reboot it. Thanks for your help. Do you know of any way to properly stress a drive to properly ascertain if it is dead or dying? Quote Link to comment
JorgeB Posted February 5, 2021 Share Posted February 5, 2021 Extended SMART test is usually a good way to test. Quote Link to comment
SavellM Posted February 5, 2021 Author Share Posted February 5, 2021 42 minutes ago, JorgeB said: Extended SMART test is usually a good way to test. Thanks will run that once the pre-clear/empty is complete in a few days lol. Quote Link to comment
SavellM Posted February 9, 2021 Author Share Posted February 9, 2021 (edited) So I stopped the array after those 2 drives passed SMART and Pre-clear again. One of the other drives has just got set to disabled. So that makes 3 now. @JorgeB any ideas? I wonder if its because the drive is sleeping, and doesn't wake up quick enough for the stop command? Also diagnostics attached. odin-diagnostics-20210209-0920.zip Edited February 9, 2021 by SavellM Quote Link to comment
JorgeB Posted February 9, 2021 Share Posted February 9, 2021 10 minutes ago, SavellM said: I wonder if its because the drive is sleeping, and doesn't wake up quick enough for the stop command? I would say it's a strong possibility, assuming the disk was spun down: Feb 9 09:19:37 Odin emhttpd: shcmd (11123): umount /mnt/disk11 Feb 9 09:19:37 Odin kernel: XFS (dm-8): Unmounting Filesystem Feb 9 09:19:37 Odin kernel: md: disk11 read error, sector=8589967488 Feb 9 09:19:37 Odin kernel: sd 1:0:17:0: Power-on or device reset occurred Feb 9 09:19:37 Odin kernel: md: disk11 write error, sector=8589967488 Feb 9 09:19:37 Odin kernel: md: disk11 read error, sector=32768 Feb 9 09:19:37 Odin kernel: md: disk11 write error, sector=32768 Feb 9 09:19:37 Odin emhttpd: shcmd (11124): rmdir /mnt/disk11 Error happened during unmount, although the error was immediate, i.e., it's not like the disk took long to respond, so it could be a compatibility issue, try spinning those disks up before shutdown, or if it's a possibility connected them to a different controller, like the onboard SATA ports. Quote Link to comment
SavellM Posted February 9, 2021 Author Share Posted February 9, 2021 Ok will do. I also seems to be all the Seagate drives running into this. The WD drives so far havent been disabled, only Seagate. They were both sleeping at the time. As this is a new build any wonder if its something to do with RC2 and any sleep issues? Quote Link to comment
JorgeB Posted February 9, 2021 Share Posted February 9, 2021 2 hours ago, SavellM said: its something to do with RC2 and any sleep issues? I would suspect more an issue with the combination of LSI + those Seagate drives + Sleep Quote Link to comment
SavellM Posted February 9, 2021 Author Share Posted February 9, 2021 I'm sure that others must be running something similar, no? Those Supermicro backplanes are pretty common, as is my HBA. I posted in prerelease channel too, so lets see. Thank you so much for your help. After my next parity finishes I'll try spin up all and then stopping array to check if that works. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.