Jump to content

Cache drives in BTRFS pool keep dropping out


l92

Recommended Posts

I keep having issues with devices in the cache pool disappearing from the system. This is a new server build from December, so first I got a replacement M.2 nvme to see if that would fix it, but it did not and both the other SSDs dropped multiple times.

 

Next I had a replacement motherboard sent and just got that setup, and within 2 hours one of my SSDs just dropped out again...

 

Jan 24 19:14:55 Media ntpd[1957]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
Jan 24 19:15:38 Media kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Jan 24 19:15:38 Media kernel: caller _nv000907rm+0x1bf/0x1f0 [nvidia] mapping multiple BARs
Jan 24 19:39:53 Media kernel: nvme nvme0: I/O 514 QID 4 timeout, aborting
Jan 24 19:40:23 Media kernel: nvme nvme0: I/O 514 QID 4 timeout, reset controller
Jan 24 19:40:53 Media kernel: nvme nvme0: I/O 22 QID 0 timeout, reset controller
Jan 24 19:41:45 Media kernel: nvme nvme0: Device not ready; aborting reset
Jan 24 19:41:45 Media kernel: nvme nvme0: Abort status: 0x7
Jan 24 19:41:57 Media nginx: 2020/01/24 19:41:57 [error] 6086#6086: *6391 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.103, server: , request: "POST /webGui/include/DashUpdate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.7", referrer: "http://192.168.0.7/Dashboard"
Jan 24 19:42:06 Media kernel: nvme nvme0: Device not ready; aborting reset
Jan 24 19:42:06 Media kernel: nvme nvme0: Removing after probe failure status: -19
Jan 24 19:42:12 Media nginx: 2020/01/24 19:42:12 [error] 6086#6086: *6872 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.103, server: , request: "POST /webGui/include/DashUpdate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.7", referrer: "http://192.168.0.7/Dashboard"
Jan 24 19:42:27 Media nginx: 2020/01/24 19:42:27 [error] 6086#6086: *6229 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.0.103, server: , request: "POST /webGui/include/DashUpdate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.0.7", referrer: "http://192.168.0.7/Dashboard"
Jan 24 19:42:27 Media kernel: nvme nvme0: Device not ready; aborting reset
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 104580456
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 97666840
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 2, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 96589704
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 97655216
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 3, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 4, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 5, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 6, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 7, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 136028264
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 97654272
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 97655440
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 104523792
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 8, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 96588568
Jan 24 19:42:27 Media kernel: print_req_error: I/O error, dev nvme0n1, sector 104555408
Jan 24 19:42:27 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:27 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:27 Media kernel: nvme nvme0: failed to set APST feature (-19)
Jan 24 19:42:27 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
### [PREVIOUS LINE REPEATED 2 TIMES] ###
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:27 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:27 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:27 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
### [PREVIOUS LINE REPEATED 2 TIMES] ###
Jan 24 19:42:27 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
### [PREVIOUS LINE REPEATED 47 TIMES] ###
Jan 24 19:42:32 Media kernel: btrfs_dev_stat_print_on_error: 1806 callbacks suppressed
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1759, rd 5, flush 53, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1760, rd 5, flush 53, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1761, rd 5, flush 53, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1761, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: btrfs_end_buffer_write_sync: 67 callbacks suppressed
Jan 24 19:42:32 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1762, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1763, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1764, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1765, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:32 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1766, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1767, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1768, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1769, rd 5, flush 54, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1769, rd 5, flush 55, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1770, rd 5, flush 55, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1771, rd 5, flush 55, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1772, rd 5, flush 55, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1772, rd 5, flush 56, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1773, rd 5, flush 56, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 1774, rd 5, flush 56, corrupt 0, gen 0
Jan 24 19:42:37 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:37 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:37 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:37 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:37 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:37 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:39 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:39 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:41 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:41 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:42 Media kernel: BTRFS warning (device nvme0n1p1): lost page write due to IO error on /dev/nvme0n1p1
Jan 24 19:42:42 Media kernel: BTRFS error (device nvme0n1p1): error writing primary super block to device 3
Jan 24 19:42:42 Media kernel: btrfs_dev_stat_print_on_error: 42 callbacks suppressed

 

media-diagnostics-20200124-1948.zip

Edited by l92
Link to comment

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append"

 

nvme_core.default_ps_max_latency_us=0

Reboot and see if it makes a difference.

 

 

 

 

Link to comment

This is what I'm seeing.  This is my second set of ssds since I thought it was the ssd at first.  I've re-seated the cables, and made sure that my cache is direct to motherboard, not through HBA.  Second restart and I'm still getting this.  The following is just a snippet on the disk log.

 

 

Jan 26 10:32:01 Starswirl emhttpd: import 30 cache device: (sdf) SPCC_Solid_State_Disk_A472079B020C00262699
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): disk space caching is enabled
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): has skinny extents
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): bdev /dev/sdh1 errs: wr 263018, rd 23717, flush 2409, corrupt 0, gen 0
Jan 26 10:34:01 Starswirl kernel: BTRFS error (device sdf1): bad tree block start, want 73693298688 have 0
Jan 26 10:34:01 Starswirl kernel: BTRFS error (device sdf1): bad tree block start, want 73693331456 have 0
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): enabling ssd optimizations
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): resizing devid 1
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): new size for /dev/sdf1 is 128035643392
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): resizing devid 2
Jan 26 10:34:01 Starswirl kernel: BTRFS info (device sdf1): new size for /dev/sdh1 is 128035643392
Jan 26 10:34:12 Starswirl kernel: BTRFS error (device sdf1): parent transid verify failed on 73693413376 wanted 69368 found 68134
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693413376 (dev /dev/sdh1 sector 28417056)
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693417472 (dev /dev/sdh1 sector 28417064)
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693421568 (dev /dev/sdh1 sector 28417072)
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693425664 (dev /dev/sdh1 sector 28417080)
Jan 26 10:34:12 Starswirl kernel: BTRFS error (device sdf1): bad tree block start, want 73693282304 have 0
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693282304 (dev /dev/sdh1 sector 28416800)
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693286400 (dev /dev/sdh1 sector 28416808)
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693290496 (dev /dev/sdh1 sector 28416816)
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693294592 (dev /dev/sdh1 sector 28416824)
Jan 26 10:34:12 Starswirl kernel: BTRFS error (device sdf1): bad tree block start, want 73693315072 have 0
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693315072 (dev /dev/sdh1 sector 28416864)
Jan 26 10:34:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73693319168 (dev /dev/sdh1 sector 28416872)
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 176128 csum 0x8acff355 expected csum 0xb8b17d18 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS error (device sdf1): parent transid verify failed on 73673310208 wanted 69339 found 68104
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 180224 csum 0xd11a6c45 expected csum 0xfe22cdb7 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 184320 csum 0x057f09ad expected csum 0x0702dba3 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 188416 csum 0xf937f2c0 expected csum 0x4dda44e4 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 176128 csum 0x8acff355 expected csum 0xb8b17d18 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 180224 csum 0xd11a6c45 expected csum 0xfe22cdb7 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 184320 csum 0x057f09ad expected csum 0x0702dba3 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 188416 csum 0xf937f2c0 expected csum 0x4dda44e4 mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 196608 csum 0x7fac8920 expected csum 0xf5a14b2a mirror 1
Jan 26 10:34:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 225280 csum 0xe58eeb77 expected csum 0x5323f039 mirror 1
Jan 26 10:34:13 Starswirl kernel: BTRFS error (device sdf1): parent transid verify failed on 73675341824 wanted 69339 found 68104
Jan 26 10:34:13 Starswirl kernel: BTRFS error (device sdf1): bad tree block start, want 73675440128 have 0
Jan 26 10:34:13 Starswirl kernel: BTRFS error (device sdf1): parent transid verify failed on 73690021888 wanted 69364 found 68131
Jan 26 10:34:13 Starswirl kernel: BTRFS error (device sdf1): bad tree block start, want 73692643328 have 0
Jan 26 10:34:16 Starswirl kernel: BTRFS error (device sdf1): parent transid verify failed on 73925115904 wanted 67698 found 66117
Jan 26 10:34:16 Starswirl kernel: BTRFS error (device sdf1): parent transid verify failed on 73924788224 wanted 67698 found 66113
Jan 26 10:36:12 Starswirl kernel: BTRFS error (device sdf1): bad tree block start, want 73675456512 have 0
Jan 26 10:36:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73675456512 (dev /dev/sdh1 sector 28381984)
Jan 26 10:36:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73675460608 (dev /dev/sdh1 sector 28381992)
Jan 26 10:36:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73675464704 (dev /dev/sdh1 sector 28382000)
Jan 26 10:36:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73675468800 (dev /dev/sdh1 sector 28382008)
Jan 26 10:36:12 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157138944 csum 0xb4a334e4 expected csum 0xba584add mirror 1
Jan 26 10:36:12 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157138944 (dev /dev/sdh1 sector 2461520)
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157220864 csum 0xd944f41f expected csum 0x13533a17 mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157237248 csum 0x23f5ee5c expected csum 0x4e3b7d58 mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157290496 csum 0xded640ab expected csum 0x183e388b mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157274112 csum 0x7ce00715 expected csum 0x762b032e mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157278208 csum 0x50e30ace expected csum 0x070295be mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157253632 csum 0x2455a7a9 expected csum 0x627fc70a mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157249536 csum 0x746a5ce1 expected csum 0x5a44453c mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157220864 (dev /dev/sdh1 sector 25926976)
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157274112 csum 0x7ce00715 expected csum 0x762b032e mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 157278208 csum 0x50e30ace expected csum 0x070295be mirror 1
Jan 26 10:55:27 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157237248 (dev /dev/sdh1 sector 25929592)
Jan 26 10:55:27 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157290496 (dev /dev/sdh1 sector 25697328)
Jan 26 10:55:27 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157249536 (dev /dev/sdh1 sector 25929616)
Jan 26 10:55:27 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157253632 (dev /dev/sdh1 sector 25925432)
Jan 26 10:55:27 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157274112 (dev /dev/sdh1 sector 25700232)
Jan 26 10:55:27 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 157278208 (dev /dev/sdh1 sector 25700240)
Jan 26 17:09:13 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 60366848 csum 0x65eca540 expected csum 0x85b3eb5a mirror 2
Jan 26 17:09:13 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 60366848 (dev /dev/sdh1 sector 30345872)
Jan 26 17:09:13 Starswirl kernel: BTRFS error (device sdf1): parent transid verify failed on 73690988544 wanted 69368 found 68131
Jan 26 17:09:13 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73690988544 (dev /dev/sdh1 sector 28412320)
Jan 26 17:09:13 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73690992640 (dev /dev/sdh1 sector 28412328)
Jan 26 17:09:13 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73690996736 (dev /dev/sdh1 sector 28412336)
Jan 26 17:09:13 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 0 off 73691000832 (dev /dev/sdh1 sector 28412344)
Jan 26 17:09:13 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 161538048 csum 0x81d538a7 expected csum 0xe92d2060 mirror 1
Jan 26 17:09:13 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 161538048 (dev /dev/sdh1 sector 2554376)
Jan 26 17:09:13 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 161529856 csum 0x98f94189 expected csum 0x5f2f3f20 mirror 1
Jan 26 17:09:13 Starswirl kernel: BTRFS info (device sdf1): read error corrected: ino 275 off 161529856 (dev /dev/sdh1 sector 2599296)
Jan 26 17:09:13 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 34754560 csum 0xcf367eb5 expected csum 0x97e1b66c mirror 2
Jan 26 17:09:13 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 34770944 csum 0x0601253f expected csum 0x2ba6b50c mirror 2
Jan 26 17:09:13 Starswirl kernel: BTRFS warning (device sdf1): csum failed root 5 ino 275 off 34750464 csum 0x2b6854a2 expected csum 0x05cb278a mirror 2

starswirl-diagnostics-20200127-1737.zip

Link to comment

Even after 2 reboots the system would not let me assign the drive (it is not listed in UI)

 

Jan 30 14:47:09 Media kernel: nvme nvme0: Device not ready; aborting initialisation
Jan 30 14:47:09 Media kernel: nvme nvme0: Removing after probe failure status: -19

 

lscpi outpu shows that only one of the drive has loaded a driver

 

01:00.0 Non-Volatile memory controller: Realtek Semiconductor Co., Ltd. Device 5762 (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Realtek Semiconductor Co., Ltd. Device 5762
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 54
	NUMA node: 0
	Region 0: Memory at f7e00000 (64-bit, non-prefetchable) [size=16K]
	Region 5: Memory at f7e04000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 512 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Via message/WAKE#
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [148 v1] Device Serial Number 00-00-00-01-00-4c-e0-00
	Capabilities: [158 v1] Secondary PCI Express <?>
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 1048576ns
		Max no snoop latency: 1048576ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=60us PortTPowerOnTime=60us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=32768ns
		L1SubCtl2: T_PwrOn=60us
	Kernel modules: nvme, rsnvme

 

04:00.0 Non-Volatile memory controller: Realtek Semiconductor Co., Ltd. Device 5762 (rev 01) (prog-if 02 [NVM Express])
	Subsystem: Realtek Semiconductor Co., Ltd. Device 5762
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 32
	NUMA node: 0
	Region 0: Memory at f2100000 (64-bit, non-prefetchable) [size=16K]
	Region 5: Memory at f2104000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 8GT/s (ok), Width x4 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Via message/WAKE#
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
			 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [100 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [148 v1] Device Serial Number 00-00-00-01-00-4c-e0-00
	Capabilities: [158 v1] Secondary PCI Express <?>
	Capabilities: [178 v1] Latency Tolerance Reporting
		Max snoop latency: 1048576ns
		Max no snoop latency: 1048576ns
	Capabilities: [180 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=60us PortTPowerOnTime=60us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=32768ns
		L1SubCtl2: T_PwrOn=60us
	Kernel driver in use: nvme
	Kernel modules: nvme, rsnvme

 

media-diagnostics-20200130-1457.zip

Edited by l92
Link to comment
9 hours ago, l92 said:

Back to the original thread topic. I just noticed that yesterday I had a cache drive drop out again. Attached are the logs. I would really appreciate any help to fix this as it has been very frustrating switching out all this hardware trying to resolve this

media-diagnostics-20200130-1330.zip 331.33 kB · 0 downloads

If disabling power states didn't help it's likely a hardware/bios problem, either the board or the NVMe device.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...