TechTitus Posted May 15, 2021 Share Posted May 15, 2021 Hi all, a few days ago my dockers started to show unable to update and my SSD drops out of the cache array at random times now killing everything cache related. Does anyone have any insight into this? Diagnostics are attached to this thread. homesrv01-diagnostics-20210515-0112.zip Quote Link to comment
JorgeB Posted May 15, 2021 Share Posted May 15, 2021 This sometimes helps, some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. Quote Link to comment
TechTitus Posted May 15, 2021 Author Share Posted May 15, 2021 11 hours ago, JorgeB said: This sometimes helps, some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. I added that and now my docker isn't starting at all. Quote Link to comment
TechTitus Posted May 15, 2021 Author Share Posted May 15, 2021 (edited) Also the APP store has an error and stated to post in the community forums. https://pastebin.com/LTk294hd Edited May 15, 2021 by TechTitus Quote Link to comment
TechTitus Posted May 15, 2021 Author Share Posted May 15, 2021 Does anyone think a new install on the boot drive would help? It looks like the cache drive is dropping in and out ever few seconds. Quote Link to comment
JorgeB Posted May 16, 2021 Share Posted May 16, 2021 12 hours ago, TechTitus said: I added that and now my docker isn't starting at all. Please post new diags. Quote Link to comment
TechTitus Posted May 16, 2021 Author Share Posted May 16, 2021 9 hours ago, JorgeB said: Please post new diags. I completely wiped the install and transferred my key to a new USB. It seems to have fixed the issue but now that I'm on 6.9.2 I can see everything except the files in the Media folder. Update: I set new permissions and I can see most files but it seems it gets stuck on certain disks. Not sure what's going on. I've attached the newest diag to this post. nas01-diagnostics-20210516-1223.zip Quote Link to comment
itimpi Posted May 16, 2021 Share Posted May 16, 2021 According to your diagnostics you seem to be getting read errors on multiple disks. I would carefully check the power and SATA cables on all drives, and also run a file system check on each of the array drives. Quote Link to comment
TechTitus Posted May 17, 2021 Author Share Posted May 17, 2021 (edited) On 5/16/2021 at 1:13 PM, itimpi said: According to your diagnostics you seem to be getting read errors on multiple disks. I would carefully check the power and SATA cables on all drives, and also run a file system check on each of the array drives. Maybe I need a new backplane because when I do the filesystem check, it states there is an I/O error on at least half the disks. Anyone know how I can test if the disks are still any good outside of UnRaid? ETA: Is there a way I can test each controller to make sure they're functioning properly? I assume they are if UnRaid is able to see the drives to mount...but I want to be sure. Edited May 17, 2021 by TechTitus Added information Quote Link to comment
TechTitus Posted May 20, 2021 Author Share Posted May 20, 2021 Now the array is taking an extremely long time to mount and getting UDMA CRC errors. Does anyone know where I can start to try and narrow down what's causing this? nas01-diagnostics-20210520-0134.zip Quote Link to comment
itimpi Posted May 20, 2021 Share Posted May 20, 2021 Just now, TechTitus said: getting UDMA CRC errors. These are connection issues - normally power or SATA cable related although it can also be the controller to which the disks are attached. Quote Link to comment
TechTitus Posted May 20, 2021 Author Share Posted May 20, 2021 (edited) 13 hours ago, itimpi said: These are connection issues - normally power or SATA cable related although it can also be the controller to which the disks are attached. I think this might be a PSU issue. I took one of the drives that were getting gap errors in Diskspeed and hooked it directly through SATA and power connector that was connected to my cache disk and UnRaid could not detect it. I then put it back into the slot in the backplane and it was detected. I also noticed these errors are only happening on disk over 5TB. I also noticed in Diskpeed, the only disks that were having gap issues were anything over 5TB. Edited May 20, 2021 by TechTitus Quote Link to comment
TechTitus Posted May 21, 2021 Author Share Posted May 21, 2021 To give an update. I've swapped disks to other backplanes, using different SAS cables and connected to different controllers and I've also swapped out the PSU for a brand new in box.. The only disks that I'm having issues with are ones that I've shucked. None of the non-shucked disks are giving me these issues. I'm going to purchase a disk from the store and place it in the exact same slot where a disk is giving me issues and determine if it's the disk. Quote Link to comment
TechTitus Posted May 24, 2021 Author Share Posted May 24, 2021 On 5/21/2021 at 3:56 AM, TechTitus said: To give an update. I've swapped disks to other backplanes, using different SAS cables and connected to different controllers and I've also swapped out the PSU for a brand new in box.. The only disks that I'm having issues with are ones that I've shucked. None of the non-shucked disks are giving me these issues. I'm going to purchase a disk from the store and place it in the exact same slot where a disk is giving me issues and determine if it's the disk. I have removed all shucked disks from the system and mounting now completed instantly (it used to take hours) and no longer getting any UDMA errors. All physical hardware have remained the same except the removal of the disks. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.