BrandonK Posted October 23, 2022 Share Posted October 23, 2022 (edited) Woke up this morning to my Plex server not being able to play media. Tried restarting the container, but it would not start back up. I then tried restarting the Docker service, but it would not start back up. Finally I resorted to restarting the server and now my NVME drive shows "Unmountable: No file system". This is the drive where I try to keep my Docker image and appdata. (My appdata and system shares are both "Prefer : cache_nvme", which is the failed drive.) This error is over my head so if anyone can make sense of it I would really appreciate it! I have an appdata backup from 2 days ago, but if it is possible to recover the drive that would be ideal. Diags are attached. btv-diagnostics-20221023-1134.zip Edit: I probably was hasty in my initial post so here's a little more details. 1. Most recent change I've made to the system was a few weeks ago when I swapped out a few drives in the array. Replaced the parity drive with a new 16TB drive. Removed an old 3TB data drive and replaced it with the old 10TB parity drive. That process took a long time but finished a week or two ago. 2. This is a cache pool drive, not anything in my data array. 3. This drive has been in use for about a year. If I look at the disk log information, I see stuff I don't understand, but seems to point to corruption. Quote Oct 23 15:43:56 BTV kernel: nvme0n1: p1 Oct 23 15:43:56 BTV kernel: BTRFS: device fsid 63fe8b2f-5a89-489a-97c6-e1b4bd246b5b devid 1 transid 575326 /dev/nvme0n1p1 scanned by udevd (2050) Oct 23 15:44:45 BTV emhttpd: Samsung_SSD_970_EVO_Plus_1TB_S6S1NJ0RB20890Y (nvme0n1) 512 1953525168 Oct 23 15:44:45 BTV emhttpd: import 30 cache device: (nvme0n1) Samsung_SSD_970_EVO_Plus_1TB_S6S1NJ0RB20890Y Oct 23 15:44:45 BTV emhttpd: read SMART /dev/nvme0n1 Oct 23 15:44:56 BTV kernel: BTRFS info (device nvme0n1p1): turning on async discard Oct 23 15:44:56 BTV kernel: BTRFS info (device nvme0n1p1): using free space tree Oct 23 15:44:56 BTV kernel: BTRFS info (device nvme0n1p1): has skinny extents Oct 23 15:44:56 BTV kernel: BTRFS info (device nvme0n1p1): enabling ssd optimizations Oct 23 15:44:56 BTV kernel: BTRFS info (device nvme0n1p1): start tree-log replay Oct 23 15:44:56 BTV kernel: blk_update_request: critical medium error, dev nvme0n1, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 Oct 23 15:44:56 BTV kernel: BTRFS error (device nvme0n1p1): bdev /dev/nvme0n1p1 errs: wr 0, rd 0, flush 1, corrupt 0, gen 0 Oct 23 15:44:56 BTV kernel: BTRFS warning (device nvme0n1p1): chunk 1048576 missing 1 devices, max tolerance is 0 for writable mount Oct 23 15:44:56 BTV kernel: BTRFS: error (device nvme0n1p1) in write_all_supers:3845: errno=-5 IO failure (errors while submitting device barriers.) Oct 23 15:44:56 BTV kernel: BTRFS warning (device nvme0n1p1): Skipping commit of aborted transaction. Oct 23 15:44:56 BTV kernel: BTRFS: error (device nvme0n1p1) in cleanup_transaction:1942: errno=-5 IO failure Oct 23 15:44:56 BTV kernel: BTRFS: error (device nvme0n1p1) in btrfs_replay_log:2279: errno=-5 IO failure (Failed to recover log tree) Oct 23 15:44:57 BTV root: mount: /mnt/cache_nvme: can't read superblock on /dev/nvme0n1p1. Oct 23 15:44:57 BTV kernel: BTRFS error (device nvme0n1p1): open_ctree failed 4. SMART tests, etc. show no errors with the drive. 5. I've not had any power failures or reboots recently. The server is on a new (few weeks old) APC UPS. Edited October 23, 2022 by BrandonK Additional information added Quote Link to comment
Solution JorgeB Posted October 24, 2022 Solution Share Posted October 24, 2022 This suggests a problem with the NVMe device: Oct 23 11:02:46 BTV kernel: blk_update_request: critical medium error, dev nvme0n1, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 And the SMART report confirms it: === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: FAILED! - available spare has fallen below threshold - media has been placed in read only mode You need to replace it. Quote Link to comment
BrandonK Posted October 24, 2022 Author Share Posted October 24, 2022 Thanks Kind of the answer I was expecting but not hoping for. I ordered a second one last night and I'll RMA the one that failed (hopefully). It's possible to do a mirrored cache pool (RAID 1) right? If so that's the route I'll go. Quote Link to comment
JorgeB Posted October 24, 2022 Share Posted October 24, 2022 4 minutes ago, BrandonK said: It's possible to do a mirrored cache pool (RAID 1) right? Yes, but would should still keep backups of anything important, any type of RAID adds redundancy, but it's not a backup. Quote Link to comment
BrandonK Posted October 24, 2022 Author Share Posted October 24, 2022 Just now, JorgeB said: Yes, but would should still keep backups of anything important, any type of RAID adds redundancy, but it's not a backup. Absolutely! I have the CA Appdata Backup configured and running every other day. A true lifesaver here. Only lost about 48 hours of app data. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.