BTRFS bdev /dev/nvme0n1p1 error

Followers

February 2, 20242 yr

Hi guys for the last several days my system has been flooded with BTRFS error. i used to have one cache on my unraid pool and one day i decided to add another one as BTRFS so that the i expected the cache to mirror the other one. everything seemed to be running perfectly for couple of weeks until one day i noticed that the temperature on the new cache was showed as "*" . Then i went to check the log and it was flooded with BTRFS error. first i just unmount the disk and re-formatted it , put it back on the pool. but the error seemed to be happening again. i tried to put it on the other NVME slots on the motherboard but it didn't fix the issue. i also tried other NVME stick , but the issue was still there.

The other thing i noticed was that every time it happens , the second cache always got crashed since i couldn't see the identity or capabilities details whenever i clicked on that cache icon. Also after i stopped the array , the other cache always went missing until i rebooted the system.

I need some guidance here, since i am very new to unraid.

thanks

here's attached my recent diagnostic.

example of the recent logs:

Feb 3 00:49:25 ASAS kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme0n1p1 errs: wr 11126892, rd 1855, flush 30442, corrupt 0, gen 0 Feb 3 00:49:25 ASAS kernel: BTRFS warning (device nvme1n1p1): lost page write due to IO error on /dev/nvme0n1p1 (-5)

asas-diagnostics-20240203-0109.zip

Edited February 2, 20242 yr by febriantoarbi

Quote

Solved by JorgeB

February 8, 20242 yr

Go to solution

2 yr2 yr mrpainnogain changed the title to BTRFS bdev /dev/nvme0n1p1 error

Replies 50
Views 7.1k
Created 2 yr2 yr
Last Reply 2 yr2 yr

Popular Days

Posted Images

February 2, 20242 yr

Community Expert

NVMe device dropped offline almost immediately after mounting the pool:

Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 134 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 135 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 136 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 137 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 138 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:41 ASAS kernel: nvme nvme0: I/O 134 QID 5 timeout, reset controller
Feb  2 18:47:41 ASAS kernel: nvme nvme0: I/O 19 QID 0 timeout, reset controller

Try swapping the devices and see if the problem follows the device.

P.S. in case you're not aware the pool is missing another device, besides this one that dropped now:

Feb  2 18:45:12 ASAS emhttpd:  Total devices 3 FS bytes used 634.53GiB
Feb  2 18:45:12 ASAS emhttpd:  devid    1 size 953.87GiB used 651.03GiB path /dev/nvme1n1p1
Feb  2 18:45:12 ASAS emhttpd:  devid    4 size 953.87GiB used 6.00GiB path /dev/nvme0n1p1
Feb  2 18:45:12 ASAS emhttpd:  *** Some devices missing

Quote

February 2, 20242 yr

Author

18 minutes ago, JorgeB said:

NVMe device dropped offline almost immediately after mounting the pool:

Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 134 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 135 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 136 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 137 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:11 ASAS kernel: nvme nvme0: I/O 138 (I/O Cmd) QID 5 timeout, aborting
Feb  2 18:47:41 ASAS kernel: nvme nvme0: I/O 134 QID 5 timeout, reset controller
Feb  2 18:47:41 ASAS kernel: nvme nvme0: I/O 19 QID 0 timeout, reset controller

Try swapping the devices and see if the problem follows the device.

P.S. in case you're not aware the pool is missing another device, besides this one that dropped now:

Feb  2 18:45:12 ASAS emhttpd:  Total devices 3 FS bytes used 634.53GiB
Feb  2 18:45:12 ASAS emhttpd:  devid    1 size 953.87GiB used 651.03GiB path /dev/nvme1n1p1
Feb  2 18:45:12 ASAS emhttpd:  devid    4 size 953.87GiB used 6.00GiB path /dev/nvme0n1p1
Feb  2 18:45:12 ASAS emhttpd:  *** Some devices missing

do you mean i need to stop the array and assign cache 1 with nvme0n1 and cache 2 with nvme1n1 and start the array again? will i get "wrong" cache notifications?

the 2L082xxx is my first cache from the beginning

regarding to the missing device,

may be it was the first NVME that i thought was broken, but i was pretty sure that i unmount that NVME and also formatted it. i also took it off from the motherboard.

Edited February 2, 20242 yr by febriantoarbi

Quote

February 2, 20242 yr

Community Expert

Power cycle the server to see if the device comes back online, just rebooting will likely not be enough, if it does try mounting the pool again.

Quote

February 2, 20242 yr

Author

4 minutes ago, JorgeB said:

Power cycle the server to see if the device comes back online, just rebooting will likely not be enough, if it does try mounting the pool again.

i have already done that couple hours ago, the cache came back online. re-mounted to the pool. but the issue happened again

Quote

February 2, 20242 yr

Community Expert

The device may be failing, to confirm swap m.2 slots with the other one and see where the issue follows.

Quote

February 2, 20242 yr

Author

8 minutes ago, JorgeB said:

The device may be failing, to confirm swap m.2 slots with the other one and see where the issue follows.

are you saying my first cache is failing? or the cache 2 ? is doing BTRFS scrub necessary in this case?

just a friendly reminder, i did try different m2 slots with the previous NVME that i thought was broken, but i did not move the NVME on the first cache at all.

Edited February 2, 20242 yr by febriantoarbi

Quote

February 3, 20242 yr

Community Expert

Device with serial ending in SYH is the one that dropped, swap slots and see if the same device drops again, if yes it may be failing.

Quote

February 3, 20242 yr

Author

4 hours ago, JorgeB said:

Device with serial ending in SYH is the one that dropped, swap slots and see if the same device drops again, if yes it may be failing.

Okay , i will try that later. In the mean time, will it be possible if i just unmount the problem device? And just run the pool with the healthy device. Because the last time i did this, i got the “too many missing disks” error when trying to start the array.

Quote

February 4, 20242 yr

Community Expert

If the pool is redundant it should mount with just one device, post new diags if it doesn't.

Quote

February 4, 20242 yr

Author

2 hours ago, JorgeB said:

If the pool is redundant it should mount with just one device, post new diags if it doesn't.

ok currently i am trying to start the array with only the first cache ( the cache that i have been using far before i added another cache) and i am having this "

cache - too many missing/wrong devices" . how do i resolve this?

asas-diagnostics-20240204-1938.zip

Edited February 4, 20242 yr by mrpainnogain

Quote

February 4, 20242 yr

Community Expert

Cache profile is set to single, edit /boot/config/pools/cache.cfg and change diskFsProfile="single" to diskFsProfile="raid1", then try again, not sure if you need to reboot first for the change to be seen.

Quote

February 4, 20242 yr

Author

4 minutes ago, JorgeB said:

Cache profile is set to single, edit /boot/config/pools/cache.cfg and change diskFsProfile="single" to diskFsProfile="raid1", then try again, not sure if you need to reboot first for the change to be seen.

do i need to do this via terminal?

i got the permission denied , when trying to access that as a root

Quote

February 4, 20242 yr

Community Expert

1 minute ago, mrpainnogain said:

do i need to do this via terminal?

i got the permission denied , when trying to access that as a root

Easiest way is via Dynamix File Manager

Quote

February 4, 20242 yr

Author

21 minutes ago, JorgeB said:

Cache profile is set to single, edit /boot/config/pools/cache.cfg and change diskFsProfile="single" to diskFsProfile="raid1", then try again, not sure if you need to reboot first for the change to be seen.

somehow i still got the same warning after changing it to raid1 via the terminal command with sudo nano ......

i also did reboot the system

asas-diagnostics-20240204-1959.zip

Edited February 4, 20242 yr by mrpainnogain

Quote

February 4, 20242 yr

Community Expert

Assuming you rebooted after the change, try reimporting the pool with the single device, stop array, unassign all pool devices, start array to reset the pool, stop array, reassign the single pool device (leave the slots set to 2 or how it was, don't set to 1), start array.

Quote

February 4, 20242 yr

Author

19 minutes ago, JorgeB said:

Assuming you rebooted after the change, try reimporting the pool with the single device, stop array, unassign all pool devices, start array to reset the pool, stop array, reassign the single pool device (leave the slots set to 2 or how it was, don't set to 1), start array.

i did unassigned all devices , start array , then stop it again . and i am about to mount the cache , and it says the FS is auto . is it safe to start the array again ? the FS was BTRFS previously.

do i have to change the file system type to BTRFS manually?

Edited February 4, 20242 yr by mrpainnogain

Quote

February 4, 20242 yr

Author

okay i did start the array , but it says

Unmountable: Unsupported or no file system.

asas-diagnostics-20240204-2034.zip

Quote

February 4, 20242 yr

Author

okay somehow it managed to work again after i removed the other NVME which considered to be "broken" . now i remember that the cache with series 2L028LXX used to be labeled "nvme0n1" and somehow it changed to nvme1n1 after the 2nd nvme was added. and that may be causing the issue with the array. all good for now, but i am still figuring out how to do the BTRFS with the other cache.

Quote

February 5, 20242 yr

Community Expert

18 hours ago, mrpainnogain said:

okay somehow it managed to work again after i removed the other NVME which considered to be "broken"

Yep, sorry, forgot to mention that device would need to be disconnected or wiped.

18 hours ago, mrpainnogain said:

but i am still figuring out how to do the BTRFS with the other cache.

Do you mean add the other NVMe device back?

Quote

February 6, 20242 yr

Author

19 hours ago, JorgeB said:

Yep, sorry, forgot to mention that device would need to be disconnected or wiped.

Do you mean add the other NVMe device back?

Yes, since i am afraid in my case is there is something wrong with the way i add the 2nd NVME to the cache pool to be configured as BTRFS. i have already sent the NVME back to the shop and they did not find anything wrong with that. in the mean time i want to test the NVME as "unassigned device" and install VM on that, just to make sure that the device is actually in good condition. other than BTRFS format, is there anyway i can mirror 1 cache to the other? may be copying the contents of one cache to the other one while services like dockers and VM's are off on regular basis?

Quote

February 6, 20242 yr

Community Expert

You can create a mirror with btrfs o zfs, you can also manually sync one to the other, with rsync and a user script for example.

Quote

February 7, 20242 yr

Author

21 hours ago, JorgeB said:

You can create a mirror with btrfs o zfs, you can also manually sync one to the other, with rsync and a user script for example.

hey i have another question , i just put another NVME on the system . but now the the cache that was supposed to be labelled as nvme0n1 became "nvme1n1" and it triggered the disk missing warning again and the cache became " unmountable: unsupported file system" . how do i force the name to stay on "nvme0n1" on the first cache?

asas-diagnostics-20240207-1402.zip

Edited February 7, 20242 yr by mrpainnogain

Quote

February 7, 20242 yr

Community Expert

3 hours ago, mrpainnogain said:

nvme0n1

That is the device name and is not used by Unraid to identify drives (instead Unraid uses the drive serial number). Something else must be going on.

Quote

February 7, 20242 yr

Author

14 minutes ago, itimpi said:

That is the device name and is not used by Unraid to identify drives (instead Unraid uses the drive serial number). Something else must be going on.

hmm any idea what might be the issue here? because with only one NVME everything went smooth. but the issue came out every time i added another NVME ( doesn't matter the slot on mobo)

Quote

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Followers

Go to topic listing

Replies 50
Views 7.1k
Created 2 yr2 yr
Last Reply 2 yr2 yr

BTRFS bdev /dev/nvme0n1p1 error

Featured Replies

Solved by JorgeB

Top Posters In This Topic

Popular Days

Most Popular Posts

mrpainnogain

Posted Images

Join the conversation

Top Posters In This Topic

Popular Days

Most Popular Posts

mrpainnogain

Posted Images

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)