July 8, 20232 yr Hi folks, been trying to diagnose what on earth is plaguing my unraid install lately and I think I'm starting to narrow down onto the culprit. I got an error message that said something about my NVMe drive, and it mentioned __btrfs_free_ext, I've attached my diagnostics file too, if anyone can offer any insight I'd be so appreciative. I'm at a total loss right now. Thanks in advance. armaserverv2-diagnostics-20230708-1647.zip
July 9, 20232 yr NVMe device dropped offline, this can sometimes help with that: On the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference.
July 11, 20232 yr Author On 7/9/2023 at 10:12 AM, JorgeB said: NVMe device dropped offline, this can sometimes help with that: On the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference. Tried this, and tried reseating the drive in another slot. Still getting the same issue. I've ordered a replacement drive that should be arriving today, so here's hoping that fixes the problem. I'm not hopeful though..
July 11, 20232 yr Author 35 minutes ago, JorgeB said: I would try with a different brand/model device. The old one is a Seagate Barracuda, new one is a Kioxia Exceria, fingers crossed
July 11, 20232 yr Author 5 hours ago, JorgeB said: I would try with a different brand/model device. okay so I've put the new drive in and used clonezilla to duplicate the old drive block for block. Unraid says its unmountable, great. Any ideas?
July 11, 20232 yr Author Just now, JorgeB said: Is it the exact same capacity? What's the unmountable error specifically? All it's telling me is "Unmountable: Unsupported partition layout" Its a totally cloned drive so I dont get why its unable to mount
July 11, 20232 yr 5 minutes ago, Zeragonii said: "Unmountable: Unsupported partition layout" Likely the new device is not the exact same capacity, if that's the case a clone won't work, you can mount the old device with UD for example and copy the data.
July 11, 20232 yr Author 1 minute ago, JorgeB said: Likely the new device is not the exact same capacity, if that's the case a clone won't work, you can mount the old device with UD for example and copy the data. I fixed it, by doing literally nothing other than frustratedly updating to 6.12.2 to install zfs and manually shift all the data rather than clone the drive And the cache is working... all the data is there... I dont understand it, nor do I want to question the voodoo magic that fixed everything.. Thanks for trying though, the UnRaid community is awesome and I'm eternally grateful for you trying to help me solve this headache. Might be back if this drive does the same as the last one though.. Fingers crossed you never see my name again
July 12, 20232 yr Author 22 hours ago, JorgeB said: Likely the new device is not the exact same capacity, if that's the case a clone won't work, you can mount the old device with UD for example and copy the data. Turns out it might be something else dying. Now the system just hard crashes with no error in the log. I'm thinking the CPU may be bad as I had issues with it when it was in my gaming PC for a while, thought it would be under less stress as a server workhorse but I guess bad batches are always going to be bad. Here's the syslog, if anyone thinks they can find a hint at what's wrong. syslog
July 12, 20232 yr Don't see anything relevant logged, but this is not uncommon if it's a hardware issue, did you take care of this: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173
July 12, 20232 yr Author 53 minutes ago, JorgeB said: Don't see anything relevant logged, but this is not uncommon if it's a hardware issue, did you take care of this: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Trying now, will report back after I've tried everything, if necessary. Thanks again!
July 15, 20232 yr Author On 7/12/2023 at 2:08 PM, JorgeB said: Don't see anything relevant logged, but this is not uncommon if it's a hardware issue, did you take care of this: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Tried everything in that article, no dice. Swapped the CPU and motherboard, no dice. Currently running another round of memtest to rule out possible memory issue. Next port of call after that is GPU, after that I honestly dont know what it could be.
July 17, 20232 yr Author Okay so a bit of an update, after a full round of hardware testing I've ended up putting my problematic GPU back in but installing an older driver, that seems to have resolved this issue. I honestly want to break something at this point, so much frustration over a driver update... Gotta love it. Will post full solution details once I've fully confirmed its the GPU driver.
July 20, 20232 yr Author Back again. The original error has returned: "nvme nvme0: controller is down" I have isolated the issue to one single docker container, Tdarr, however after scrutinising the logs for that I cant find anything to indicate what's causing the issue. I've added the power state stuff to the bzroot line, I've tried a different physical drive, I just can't for the life of me figure out why my drives are crapping out so hard 🙃
July 21, 20232 yr Author On 7/12/2023 at 2:08 PM, JorgeB said: Don't see anything relevant logged, but this is not uncommon if it's a hardware issue, did you take care of this: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173 Ran out of random linux kernel threads to defer to, this is my current syslinux config, still getting the same error: "nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10" Is there any recourse at all for this? The error doesn't happen if I keep my Tdarr docker container off so for now that's what I'm doing to keep the rest of my services running.
July 21, 20232 yr Look for a BIOS update, but best bet would be to use a different brand/model device (or board).
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.