June 24, 20224 yr Hi, This morning I received a notification about a cache pool device is missing: Samsung 980 1TB Now, the strange part is that in the dashboard main section I can see all 3 1TB devices (1x Samsung 980 1TB, 2x Teamgroup 1TB), but the one that looks strange is one of the Teamgroup 1TB ssd's as it's not reporting temperature readings any more. But, why did Unraid tell me the Samsung 1TB is missing? I'm really confused here. I had an unplanned power outage 2 days ago, and I'm guessing this could be the result of that, not sure. Any idea what I should do here?
June 24, 20224 yr 13 minutes ago, chaosclarity said: Any idea what I should do here? You should probably start by attaching your diagnostics to your next post.
June 24, 20224 yr Author Just now, ChatNoir said: You should probably start by attaching your diagnostics to your next post. Attached tower-diagnostics-20220624-0831.zip
June 24, 20224 yr 29 minutes ago, chaosclarity said: but the one that looks strange is one of the Teamgroup 1TB ssd's as it's not reporting temperature readings any more. Yep, that's the one, it dropped offline: Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 170 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 171 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 172 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 173 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 174 QID 2 timeout, aborting Jun 24 04:50:37 Tower kernel: nvme nvme0: I/O 12 QID 0 timeout, reset controller Jun 24 04:50:39 Tower kernel: nvme nvme0: I/O 170 QID 2 timeout, reset controller Jun 24 04:53:40 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1 The below can sometimes help, if not try a different PCIe/m.2 slot if available, or a different model device. Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference.
June 24, 20224 yr Author 5 minutes ago, JorgeB said: Yep, that's the one, it dropped offline: Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 170 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 171 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 172 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 173 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 174 QID 2 timeout, aborting Jun 24 04:50:37 Tower kernel: nvme nvme0: I/O 12 QID 0 timeout, reset controller Jun 24 04:50:39 Tower kernel: nvme nvme0: I/O 170 QID 2 timeout, reset controller Jun 24 04:53:40 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1 The below can sometimes help, if not try a different PCIe/m.2 slot if available, or a different model device. Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. Thank you. One question, in this current state as it's running now, is the cache pool operating OK?
June 24, 20224 yr Yes because there's redundancy, but you should bring the dropped device online and run a scrub to sync the pool, also good to monitor to cache any future issues, more info here.
June 24, 20224 yr Author 51 minutes ago, JorgeB said: Yes because there's redundancy, but you should bring the dropped device online and run a scrub to sync the pool, also good to monitor to cache any future issues, more info here. Do you think then it's a good idea to setup a scrub schedule for the cache pool? Or, just run it if I have issues like this?
June 24, 20224 yr Monthly scrub is a good idea, but much more important is to monitor the pool for any errors since the GUI currently doesn't show that.
June 24, 20224 yr Author 1 hour ago, JorgeB said: Monthly scrub is a good idea, but much more important is to monitor the pool for any errors since the GUI currently doesn't show that. I added that config line as you stated and then rebooted. Unfortunately the teamgroup 1TB ssd didn't ever come back so the array was started without it and Unraid removed it from the cache pool. Is it safe to say this ssd is dead?
June 24, 20224 yr Power cycle the server, rebooting might not be enough, if it still doesn't come back switch slots with another device, if still no it's likely dead.
June 28, 20224 yr Author On 6/24/2022 at 1:24 PM, JorgeB said: Power cycle the server, rebooting might not be enough, if it still doesn't come back switch slots with another device, if still no it's likely dead. I finally got around to power cycling the server. The m.2 came back this time around and was added to the cache pool again. Hopefully no more issues...
July 12, 20223 yr Author On 6/24/2022 at 1:24 PM, JorgeB said: Power cycle the server, rebooting might not be enough, if it still doesn't come back switch slots with another device, if still no it's likely dead. Well, yesterday it dropped again. Attached diagnostics. But I'm almost thinking the drive is faulty. tower-diagnostics-20220712-0727.zip
July 12, 20223 yr Author 4 minutes ago, chaosclarity said: Well, yesterday it dropped again. Attached diagnostics. But I'm almost thinking the drive is faulty. tower-diagnostics-20220712-0727.zip 281.2 kB · 0 downloads Doh, I powered off the server completely, unplugged. Upon checking to add the dropped drive back to the cache pool, it is now gone for good.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.