Seems one of my NVMe drives threw up on itself overnight. Help? (Diagnostics attached)


Go to solution Solved by JorgeB,

Recommended Posts

I'm seeing a bunch of these errors in the log. Any ideas what happened and how to fix it? Thanks!

 

kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 268, rd 5154780, flush 1, corrupt 0, gen 0

kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 268, rd 5154780, flush 1, corrupt 0, gen 0

 

edit: For what its worth, all the file shares on this drive seem to be there, its just all the dockers are offline. I'm hoping I can just recreate the docker.img, but I'll wait for some input as I'd rather not screw things up worse.

truffle-diagnostics-20230301-0900.zip

Edited by flyize
Link to comment
  • Solution
Feb 28 23:25:01 Truffle kernel: nvme nvme0: I/O 102 QID 3 timeout, aborting
Feb 28 23:25:01 Truffle kernel: nvme nvme0: I/O 34 QID 4 timeout, aborting
Feb 28 23:25:04 Truffle kernel: nvme nvme0: I/O 65 QID 2 timeout, aborting
Feb 28 23:25:04 Truffle kernel: nvme nvme0: I/O 66 QID 2 timeout, aborting
Feb 28 23:25:31 Truffle kernel: nvme nvme0: I/O 102 QID 3 timeout, reset controller
Feb 28 23:25:34 Truffle kernel: nvme nvme0: I/O 8 QID 0 timeout, reset controller
Feb 28 23:27:04 Truffle kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1

 

Device dropped offline, a power cycle should bring it back, not just a reboot, if it does this can sometimes help with these issues:

 

On the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


See if it helps.

Link to comment
  • 10 months later...
On 3/1/2023 at 9:41 AM, JorgeB said:
Feb 28 23:25:01 Truffle kernel: nvme nvme0: I/O 102 QID 3 timeout, aborting
Feb 28 23:25:01 Truffle kernel: nvme nvme0: I/O 34 QID 4 timeout, aborting
Feb 28 23:25:04 Truffle kernel: nvme nvme0: I/O 65 QID 2 timeout, aborting
Feb 28 23:25:04 Truffle kernel: nvme nvme0: I/O 66 QID 2 timeout, aborting
Feb 28 23:25:31 Truffle kernel: nvme nvme0: I/O 102 QID 3 timeout, reset controller
Feb 28 23:25:34 Truffle kernel: nvme nvme0: I/O 8 QID 0 timeout, reset controller
Feb 28 23:27:04 Truffle kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1

 

Device dropped offline, a power cycle should bring it back, not just a reboot, if it does this can sometimes help with these issues:

 

On the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


See if it helps.

 

 

You're a god-send. Thank you for the power cycle tip, had reset multiple times without any results. I had two nvme drives out of a three-drive zfs pool that were down. What causes this error? Would I need an apostrophe between the boot option items... such as "  append initrd=/bzroot, nvme_core.default,_ps_max_latency_us=0 pcie_aspm=off".

I would really like to try and prevent this in the future, any help would be appreciated. Sorry to revive an old thread.

Edited by bs.king
Link to comment
  • 3 weeks later...

Running 6.12.6 I too just came across this from having my dockers not available this morning. Ran diagnostics before power cycling as I couldn't get docker to respond or the webgui. Found the same errors in my logs as the OP. Will add this to my config, but also curious why this type of issue would occur in the first place.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.