JorgeB Posted July 14, 2021 Share Posted July 14, 2021 2nd device wasn't added to the pool, try again, stop array, unassign sdi, start array, stop array, re-assign sdi, start array and post new diags. Quote Link to comment
Kosmos Posted July 14, 2021 Share Posted July 14, 2021 1 hour ago, JorgeB said: 2nd device wasn't added to the pool, try again, stop array, unassign sdi, start array, stop array, re-assign sdi, start array and post new diags. I tried this previously as suggested by some other post (and intended to point out with "When setting up the pool (again) everything works fine"), but it didn't help making the pool consistent over reboots/time. The last time I tried this, the problem reoccured after a while. I checked the SSD but it seems OK, data is stored persistently for a week even without power. server-diagnostics-20210714-1857-newpool.zip Quote Link to comment
JorgeB Posted July 15, 2021 Share Posted July 15, 2021 11 hours ago, Kosmos said: The last time I tried this, the problem reoccured after a while. Now it's working correctly, if it stops working I'd need to see the diags showing the problem, before rebooting. Quote Link to comment
Kosmos Posted July 16, 2021 Share Posted July 16, 2021 (edited) On 7/15/2021 at 8:30 AM, JorgeB said: Now it's working correctly, if it stops working I'd need to see the diags showing the problem, before rebooting. Alright, it just started happening again. Here is a log. The thing is, it only occurs after restarts - not while running, so I can't really provide logs from before. One thing i noticed is the encryption symbol (green lock, left from the pool name) was there before the reboot and now is gone. So maybe it's some issue with the drive header being corrupted? server-diagnostics-20210716-1300_again.zip Edited July 16, 2021 by Kosmos Quote Link to comment
JorgeB Posted July 16, 2021 Share Posted July 16, 2021 19 minutes ago, Kosmos said: The thing is, it only occurs after restarts - not while running, so I can't really provide logs from before. That is strange, I do see some data corruption detected, you should run memtest and then a scrub on the pool, other than that next time grab diags before rebooting (stop the array, grab diags, then reboot) and new diags after rebooting if the same thing happens. Quote Link to comment
Kosmos Posted July 16, 2021 Share Posted July 16, 2021 3 minutes ago, JorgeB said: That is strange, I do see some data corruption detected, you should run memtest and then a scrub on the pool, other than that next time grab diags before rebooting (stop the array, grab diags, then reboot) and new diags after rebooting if the same thing happens. Thanks for looking into this 😃 I will try to get the logs in this order (after my vacations). From the current point of view, do you think it's some hardware failure? Best regards Quote Link to comment
JorgeB Posted July 16, 2021 Share Posted July 16, 2021 21 minutes ago, Kosmos said: do you think it's some hardware failure? Corruption being detected is usually a RAM problem, but not sure this is related to the other problem. Quote Link to comment
Kosmos Posted August 23, 2021 Share Posted August 23, 2021 Hey everyone, thanks for sticking with me. In the meantime, I moved my hard drives to completely new hardware. The problem persists: Sometimes after reboots, the encryption symbol of the second cache disk goes missing and when the array starts, it's throwing the "missing disk" errors. Furthermore, I have a feeling that it appears more often when there was data written to the disk before rebooting - maybe a coincidence and not a causality... Â When I changed the disk order in the pool, the disk went missing as well, unfortunately losing all its data. Â This leads me to the conclusion that the ssd (controller) is damaged. I will continue to try and grab logs before and after a reboot with the problem occuring. Quote Link to comment
Kosmos Posted August 30, 2021 Share Posted August 30, 2021 (edited) I managed to get logs before and after rebooting when the problem occured (attached). This time, I created a new pool (different name) from the same ssd drives just before it happened. With the old name, it did'nt happen in the last 10 +- 2 reboots. So maybe it's a cache management issue after all? Best regards after-reboot.zip before_reboot.zip Edited August 30, 2021 by Kosmos files were missing Quote Link to comment
JorgeB Posted August 31, 2021 Share Posted August 31, 2021 Problem is that this device wasn't decrypted after the reboot: Â Aug 30 20:40:37 Server emhttpd: import 32 cache device: (sdc) SanDisk_SSD_PLUS_240GB_184302A005B3 Â It's strange since the device was there, but since it wasn't decrypted it can be used by btrfs, so it was like the device wasn't present and the pool balanced to single, not sure why this is happening, if you have a spare try replacing that SSD with a different one, if it still happens it's likely a bug. Quote Link to comment
Kosmos Posted September 8, 2021 Share Posted September 8, 2021 It should be decrypted only after entering the password and starting the array, right? However, Unraid doesn't show the encrypted volume properly before starting the array, already (after boot) Â Also, when usind the "failing" ssd in another pool (single, not raid) it's working properly. The combination of the "failing" ssd with a different HDD continued to fail, but the combination of the other "working" ssd with the other HDD did not fail after many reboots. (logs attached) Â So it seems to me that this particular disk is not working properly in a (encrypted) btrfs raid1 (pool). Could it be due to the pcie -> sata addon card they are attached to (all 3)? Â I may try to change ports or use the pool without encryption. 1a_before-reboot-diagnostics-20210830-2036.zip 1b_after-reboot-diagnostics-20210830-2100.zip 2a_before-reboot-diagnostics-20210901-1613.zip 2b_after-reboot-diagnostics-20210901-1735.zip 3a_before-reboot-diagnostics-20210901-2343.zip 3b_after-reboot-diagnostics-20210901-2351.zip Quote Link to comment
JorgeB Posted September 8, 2021 Share Posted September 8, 2021 5 minutes ago, Kosmos said: It should be decrypted only after entering the password and starting the array, right? Correct. Â Not sure why that device is not being decrypted, it's being detected so it should also be decrypted, but I've never used encryption, so not familiar that with how it works, could be an Unraid bug, if you have a different device test with that, if it works it was likely a device problem, if it's the same it's likely a bug. Quote Link to comment
Kosmos Posted September 15, 2021 Share Posted September 15, 2021 (edited) On 9/8/2021 at 6:26 PM, JorgeB said: Correct. Â Not sure why that device is not being decrypted, it's being detected so it should also be decrypted, but I've never used encryption, so not familiar that with how it works, could be an Unraid bug, if you have a different device test with that, if it works it was likely a device problem, if it's the same it's likely a bug. In my opinion, it could be both. It appears, the partition information is lost at some point, so Unraid does not detect there is a cache partition on the ssd. As a consequence, nothing can be decrypted. However, the problem appears without encryption as well. I tried to change the sata controller and cable as well, but it didn't help either. Â Anyway, I wonder why this particular disk is recognized correctly sometimes, but sometimes not after reboots. So either unraid is not reading the disk correctly, or the ssd is resetting/deleting it's partition (headers) sometimes for unknown reasons... Â I attached a screenshot of the unassigned drives after the problem occured (encryption lock symbol and partition gone) and a SMART report of this ssd as well. Â server-smart-20210915-1213.zip Edited September 15, 2021 by Kosmos Quote Link to comment
JorgeB Posted September 15, 2021 Share Posted September 15, 2021 I suspect that it's a device problem. Quote Link to comment
Kosmos Posted September 15, 2021 Share Posted September 15, 2021 2 hours ago, JorgeB said: I suspect that it's a device problem. Probably, yes, I will ask SanDisk about this. Thanks again for your continued help! See you on the next one 😛  p.s: you may flag this topic solved, I can not do it, because I missed creating a new one and took over from johnsanc (😇) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.