Azxiana Posted May 7, 2022 Share Posted May 7, 2022 This is the second time in a week that this has happened with this drive. Crucial P2 500GB. I thought it was an issue with the Hyper M.2 PCI-Express card that has previously been suspicious so I ended up rebuilding the server with a fresh motherboard so I could have the M.2 drives on the motherboard itself. So it has failed like this on both an old hardware configuration and a fresh one. It is time to replace this drive or could there be something else that I am missing? May 7 04:44:46 Emilia kernel: nvme nvme1: I/O 24 QID 5 timeout, aborting May 7 04:45:17 Emilia kernel: nvme nvme1: I/O 24 QID 5 timeout, reset controller May 7 04:45:47 Emilia kernel: nvme nvme1: I/O 10 QID 0 timeout, reset controller May 7 04:48:29 Emilia kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 May 7 04:48:29 Emilia kernel: nvme nvme1: Abort status: 0x371 May 7 04:50:37 Emilia kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 May 7 04:50:37 Emilia kernel: nvme nvme1: Removing after probe failure status: -19 May 7 04:52:45 Emilia kernel: nvme nvme1: Device not ready; aborting reset, CSTS=0x1 May 7 04:52:45 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 May 7 04:52:45 Emilia kernel: blk_update_request: I/O error, dev nvme1n1, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 May 7 04:52:45 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 0, flush 1, corrupt 0, gen 0 May 7 04:52:45 Emilia kernel: BTRFS warning (device nvme1n1p1): chunk 393078636544 missing 1 devices, max tolerance is 0 for writable mount May 7 04:52:45 Emilia kernel: BTRFS: error (device nvme1n1p1) in write_all_supers:3845: errno=-5 IO failure (errors while submitting device barriers.) May 7 04:52:45 Emilia kernel: BTRFS info (device nvme1n1p1): forced readonly May 7 04:52:45 Emilia kernel: BTRFS warning (device nvme1n1p1): Skipping commit of aborted transaction. May 7 04:52:45 Emilia kernel: BTRFS: error (device nvme1n1p1) in cleanup_transaction:1942: errno=-5 IO failure May 7 04:52:45 Emilia kernel: BTRFS warning (device nvme1n1p1): Skipping commit of aborted transaction. May 7 04:52:45 Emilia kernel: BTRFS: error (device nvme1n1p1) in cleanup_transaction:1942: errno=-5 IO failure May 7 04:52:45 Emilia kernel: nvme nvme1: failed to set APST feature (-19) May 7 04:54:01 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 1, flush 1, corrupt 0, gen 0 May 7 04:54:01 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 2, flush 1, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 125920 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 650208 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 125664 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 125824 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 125984 op 0x1:(WRITE) flags 0x1800 phys_seg 48 prio class 0 May 7 04:54:06 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 649952 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 650112 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 650272 op 0x1:(WRITE) flags 0x1800 phys_seg 48 prio class 0 May 7 04:54:06 Emilia kernel: BTRFS: error (device loop2) in btrfs_commit_transaction:2377: errno=-5 IO failure (Error while writing out transaction) May 7 04:54:06 Emilia kernel: BTRFS info (device loop2): forced readonly May 7 04:54:06 Emilia kernel: BTRFS warning (device loop2): Skipping commit of aborted transaction. May 7 04:54:06 Emilia kernel: BTRFS: error (device loop2) in cleanup_transaction:1942: errno=-5 IO failure May 7 04:54:06 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 27288 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 May 7 04:54:07 Emilia kernel: btrfs_dev_stat_print_on_error: 11 callbacks suppressed May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 3, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de6a78 len 4096 err no 10 May 7 04:54:07 Emilia kernel: blk_update_request: I/O error, dev loop2, sector 1297496 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 4, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 5, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7228 len 4096 err no 10 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 6, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7230 len 4096 err no 10 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7238 len 4096 err no 10 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 7, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 8, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 9, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7300 len 4096 err no 10 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7308 len 4096 err no 10 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7310 len 4096 err no 10 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 10, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7310 len 4096 err no 10 May 7 04:54:07 Emilia kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 19, rd 1, flush 0, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS error (device nvme1n1p1): bdev /dev/nvme1n1p1 errs: wr 1, rd 11, flush 1, corrupt 0, gen 0 May 7 04:54:07 Emilia kernel: BTRFS warning (device nvme1n1p1): direct IO failed ino 263 rw 0,0 sector 0x3de7310 len 4096 err no 10 Quote Link to comment
JorgeB Posted May 7, 2022 Share Posted May 7, 2022 Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. Quote Link to comment
Azxiana Posted May 7, 2022 Author Share Posted May 7, 2022 1 minute ago, JorgeB said: Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference. I will give it a try, thanks! This drive and its companion have been in service for almost two years in this server without issue until this past week. I actually have to power the entire server off to get the drive to come back. Quote Link to comment
Azxiana Posted May 8, 2022 Author Share Posted May 8, 2022 I just got back from Best Buy with two replacement SSDs. It happened again. ¯\_(ツ)_/¯ Quote Link to comment
Froberg Posted June 8, 2022 Share Posted June 8, 2022 I'm having it too with standard SSD's.. find any cause? Jun 8 03:18:34 FortyTwo kernel: BTRFS error (device sdb1): block=455832535040 write time tree block corruption detected Jun 8 03:18:34 FortyTwo kernel: BTRFS: error (device sdb1) in btrfs_commit_transaction:2438: errno=-5 IO failure (Error while writing out transaction) Jun 8 03:18:34 FortyTwo kernel: BTRFS info (device sdb1): forced readonly Jun 8 03:18:34 FortyTwo kernel: BTRFS warning (device sdb1): Skipping commit of aborted transaction. Jun 8 03:18:34 FortyTwo kernel: BTRFS: error (device sdb1) in cleanup_transaction:2011: errno=-5 IO failure Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 29152 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 458944 op 0x1:(WRITE) flags 0x1800 phys_seg 1 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 983232 op 0x1:(WRITE) flags 0x1800 phys_seg 1 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 459392 op 0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 983680 op 0x1:(WRITE) flags 0x1800 phys_seg 2 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 460448 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 984736 op 0x1:(WRITE) flags 0x1800 phys_seg 8 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS: error (device loop2) in free_log_tree:3451: errno=-5 IO failure Jun 8 03:18:36 FortyTwo kernel: BTRFS info (device loop2): forced readonly Jun 8 03:18:36 FortyTwo kernel: BTRFS warning (device loop2): Skipping commit of aborted transaction. Jun 8 03:18:36 FortyTwo kernel: BTRFS: error (device loop2) in cleanup_transaction:2011: errno=-5 IO failure Jun 8 03:18:36 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 32928 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 Jun 8 03:18:36 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0 Jun 8 03:28:38 FortyTwo root: Restoring original turbo write mode Jun 8 03:28:38 FortyTwo kernel: mdcmd (129): set md_write_method auto Jun 8 03:28:38 FortyTwo kernel: Jun 8 03:44:08 FortyTwo kernel: blk_update_request: I/O error, dev loop2, sector 29152 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 Jun 8 03:44:08 FortyTwo kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 11, rd 0, flush 0, corrupt 0, gen 0 BTFRS has been the single most unstable thing with my UnRAID experience thus far.. Quote Link to comment
JorgeB Posted June 8, 2022 Share Posted June 8, 2022 3 hours ago, Froberg said: I'm having it too with standard SSD's.. find any cause? Please post the complete diagnostics. Quote Link to comment
Froberg Posted June 9, 2022 Share Posted June 9, 2022 On 6/8/2022 at 12:37 PM, JorgeB said: Please post the complete diagnostics. Seems like it had another crash during the night.. fortytwo-diagnostics-20220609-1637.zip Quote Link to comment
Froberg Posted June 9, 2022 Share Posted June 9, 2022 (edited) yeah BTFRS is completely unmountable now.. never had it this bad before.. even set up a script to regularly monitor for corruption. Damn thing just keeps screwing up. It just says the filesystem is unmountable now, like it's all gone.. thinking I'll just to xfs from now on.. haven't had btfrs be stable despite changing s-ata connections, cables, power delivery and even switching to new SSD's entirely. Annoying. Please advise, otherwise I think I'll just have to recover from backup appdata. Edited June 9, 2022 by Froberg Quote Link to comment
JorgeB Posted June 9, 2022 Share Posted June 9, 2022 Errors you're having suggest a possible RAM issue, start by running memtest, after that there are some recovery options here. Quote Link to comment
Froberg Posted June 9, 2022 Share Posted June 9, 2022 50 minutes ago, JorgeB said: Errors you're having suggest a possible RAM issue, start by running memtest, after that there are some recovery options here. What makes you suspect a memory issue? It's ECC memory and it's been put through memtest before.. I'll usually get 8-12 months out of the btfrs cache before it implodes.. Quote Link to comment
JorgeB Posted June 9, 2022 Share Posted June 9, 2022 On 6/8/2022 at 8:19 AM, Froberg said: write time tree block corruption detected This means btrfs detected corruption before writing the data to the disk, it usually indicates RAM or other kernel memory corruption. Quote Link to comment
Froberg Posted June 9, 2022 Share Posted June 9, 2022 (edited) 2 hours ago, JorgeB said: This means btrfs detected corruption before writing the data to the disk, it usually indicates RAM or other kernel memory corruption. Seems fine so far.. I think it's BTFRS itself that self-corrupts.. I think I've tried all variables by now, including getting the motherboard replaced at one point. Eh - think I'll just go for xfs and get started on recovery.. haven't tried recovering using backup/restore before, so it'll be a nice test at least. edit: pass complete, no errors. Edited June 9, 2022 by Froberg Quote Link to comment
ChatNoir Posted June 9, 2022 Share Posted June 9, 2022 You might want to try a more recent memtest : https://www.memtest86.com/download.htm The built in is older and do not work well with ECC memory. Quote Link to comment
Froberg Posted June 9, 2022 Share Posted June 9, 2022 Surely other issues would have cropped up during six years of use other than this btfrs issue? Surely? Quote Link to comment
ChatNoir Posted June 9, 2022 Share Posted June 9, 2022 8 minutes ago, Froberg said: Surely other issues would have cropped up during six years of use other than this btfrs issue? Surely? Things work then doesn't anymore. Maybe it's not that, but I wouldn't want to run any computer on faulty RAM. Quote Link to comment
Froberg Posted June 9, 2022 Share Posted June 9, 2022 Just now, ChatNoir said: Things work then doesn't anymore. Maybe it's not that, but I wouldn't want to run any computer on faulty RAM. Yes, obviously. Probably just shouldn't have upgraded the OS Was running fine until then. I'll try the other memtest once I'm done recovering. Plex takes literal ages. Quote Link to comment
JorgeB Posted June 10, 2022 Share Posted June 10, 2022 13 hours ago, Froberg said: I think it's BTFRS itself that self-corrupts.. Btrfs is very susceptible to RAM or any other hardware corruption issue, much more than other filesystems, so if there's an issue it's where you'll see it first, but there are many users, not just in Unraid, using very large btrfs filesystem for years without issues, I myself have roughly 200 btrfs filesystems in use for about 5 or 6 years, only had issues with one, it got trashed twice in a couple of months, traced it to a bad disk. Quote Link to comment
Froberg Posted June 11, 2022 Share Posted June 11, 2022 23 hours ago, JorgeB said: Btrfs is very susceptible to RAM or any other hardware corruption issue, much more than other filesystems, so if there's an issue it's where you'll see it first, but there are many users, not just in Unraid, using very large btrfs filesystem for years without issues, I myself have roughly 200 btrfs filesystems in use for about 5 or 6 years, only had issues with one, it got trashed twice in a couple of months, traced it to a bad disk. A bad disk shouldn't really be causing the loss of an entire raid setup though, ideally, surely? Quote Link to comment
JorgeB Posted June 11, 2022 Share Posted June 11, 2022 39 minutes ago, Froberg said: A bad disk shouldn't really be causing the loss of an entire raid setup though, ideally, surely? It was a single disk filesystem, an array disk. Quote Link to comment
Froberg Posted October 2, 2022 Share Posted October 2, 2022 On 6/11/2022 at 10:09 AM, JorgeB said: It was a single disk filesystem, an array disk. So thread necrophilia is a thing. I just switched to a new system and I am putting this old one through its paces before deciding whether to use it as an upgrade for my backup-server, that's only running intermittently. One thing I did notice recently was the log drive filling up quite rapidly, but I couldn't see any immediate issues. Switched to running single-disk cache, now back to running btfrs raid1 in the new system. Here's the new system: Uptime is close to five days. The old one would rise to 30% log usage within a day usually. Usually with the btfrs issue I found out when the log was full and related issues started to occur, rebooting fixed the log-issue and then I'd be able to tell that btfrs was FUBAR'ed. It's been happening with varying frequency. I did discover just now, that since I'm using Dynamix I was supposed to increase the size of the log, so I changed it to 512 megs with mount -o remount,size=512m /var/log - maybe that will help with the issue I was having. Can btfrs corrupt if the system runs out of memory somehow? Memtest is currently running on my test bench and is showing no issues, running the latest version of memtest.. going to let it complete regardless. Any specific memtest config or something you want me to try out to be sure? I don't want to rely on the board and memory for my backup box if it's the cause of my issues after all. Quote Link to comment
JorgeB Posted October 2, 2022 Share Posted October 2, 2022 A 24H memtest while not definitive will catch most issues. Quote Link to comment
Froberg Posted October 2, 2022 Share Posted October 2, 2022 1 hour ago, JorgeB said: A 24H memtest while not definitive will catch most issues. Four hours to complete the test.. all passed. Memtest free won't let you run more than four passes, not sure how I'd accomplish a 24 hour test from looking at the settings. I'm running another test now though. Quote Link to comment
Froberg Posted October 3, 2022 Share Posted October 3, 2022 22 hours ago, JorgeB said: A 24H memtest while not definitive will catch most issues. I've re-run the tests five times now.. still going strong with zero errors. Anything more I can do to rule out any hardware fault? Quote Link to comment
JorgeB Posted October 3, 2022 Share Posted October 3, 2022 Use the server normally and monitor the pool for errors. https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.