TheSystemAdmin Posted January 28, 2022 Share Posted January 28, 2022 Hello unRAID Community! I was watching Plex when it disconnected on me. I hopped onto my webGUI and received no notifications, but it did not look good. 1. Several (not all) containers were stopped. 2. All VMs are gone ("No Virtual Machines installed") 3. Several TBs of data is not showing up in Windows or through the "Shares" tab, but the utilization on the disks appears to be correct. Logs are spamming this: Jan 28 11:37:55 TSA-NAS01 kernel: blk_update_request: I/O error, dev sdk, sector 73447704 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Jan 28 11:37:55 TSA-NAS01 kernel: BTRFS error (device sdf1): bdev /dev/sdk1 errs: wr 52, rd 8464053, flush 0, corrupt 0, gen 0 Jan 28 11:37:55 TSA-NAS01 kernel: sd 1:0:0:0: [sdf] tag#31 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Jan 28 11:37:55 TSA-NAS01 kernel: sd 1:0:0:0: [sdf] tag#31 CDB: opcode=0x88 88 00 00 00 00 00 00 3e ae 20 00 00 00 20 00 00 Jan 28 11:37:55 TSA-NAS01 kernel: blk_update_request: I/O error, dev sdf, sector 4107808 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0 Jan 28 11:37:55 TSA-NAS01 kernel: BTRFS error (device sdf1): bdev /dev/sdf1 errs: wr 54, rd 10210651, flush 0, corrupt 0, gen 0 Jan 28 11:37:55 TSA-NAS01 kernel: sd 2:0:0:0: [sdk] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Jan 28 11:37:55 TSA-NAS01 kernel: sd 2:0:0:0: [sdk] tag#18 CDB: opcode=0x88 88 00 00 00 00 00 00 3e 0e 20 00 00 00 20 00 00 Jan 28 11:37:55 TSA-NAS01 kernel: blk_update_request: I/O error, dev sdk, sector 4066848 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0 Jan 28 11:37:55 TSA-NAS01 kernel: BTRFS error (device sdf1): bdev /dev/sdk1 errs: wr 52, rd 8464054, flush 0, corrupt 0, gen 0 Jan 28 11:37:55 TSA-NAS01 kernel: BTRFS info (device sdf1): no csum found for inode 72150 start 1000931328 Jan 28 11:37:55 TSA-NAS01 kernel: sd 1:0:0:0: [sdf] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 cmd_age=0s Jan 28 11:37:55 TSA-NAS01 kernel: sd 1:0:0:0: [sdf] tag#22 CDB: opcode=0x88 88 00 00 00 00 00 04 61 59 18 00 00 00 08 00 00 Jan 28 11:37:55 TSA-NAS01 kernel: blk_update_request: I/O error, dev sdf, sector 73488664 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Jan 28 11:37:55 TSA-NAS01 kernel: BTRFS error (device sdf1): bdev /dev/sdf1 errs: wr 54, rd 10210652, flush 0, corrupt 0, gen 0 From what I can tell in my quick (panicked) Google searches is there is something wrong with my cache. I have a pool of 2 SSDs that show 0 Errors, if I try to scrub them, I get an aborted status: UUID: bdbe2a64-9dd0-40b4-82fb-75fba1b30eca Scrub started: Fri Jan 28 11:21:47 2022 Status: aborted Duration: 0:00:00 Total to scrub: 178.97GiB Rate: 0.00B/s Error summary: no errors found Also getting this on the Balance Status: Before I start ripping things apart and re-seating cables. I wanted to make sure I'm on the right direction. While losing data is not the end of the world, I would rather not have to rebuild everything. Both SSDs are connected straight to the motherboard while the rest of my data disks are through an HBA. I do have backups utilizing the CloudBerry App to a Backblaze S2 bucket which does show data (woo!) I also have backups via the CA Backup / Restore Appdata plugin which appears to have run today at 3am. Though it currently reports it has no backup sets since that data is now missing on the unRAID side. (Again, also in Backblaze) Any help would be really appreciated! Thank you. tsa-nas01-diagnostics-20220128-1140.zip Quote Link to comment
Solution JorgeB Posted January 28, 2022 Solution Share Posted January 28, 2022 Jan 28 10:35:32 TSA-NAS01 kernel: ahci 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xb0010000 flags=0x0000] Problem with the onboard SATA controller, both cache devices dropped offline because of that: Jan 28 10:36:33 TSA-NAS01 kernel: ata1.00: disabled Jan 28 10:37:30 TSA-NAS01 kernel: ata2.00: disabled This is quite common with some Ryzen boards, rebooting should bring the pool back but if it keeps happening best to use an ad-don controller (or replace the board). 1 Quote Link to comment
Squid Posted January 28, 2022 Share Posted January 28, 2022 Cabling certainly appears to be the prime suspect (the drive isn't even showing any SMART report at all) 6 minutes ago, TheSystemAdmin said: Though it currently reports it has no backup sets since that data is now missing on the unRAID side Since it looks like you sync the backup from the plugin to backblaze it's probably not a major issue, but I don't recommend storing a backup of the drive you're backing up on the drive itself. 1 Quote Link to comment
TheSystemAdmin Posted January 28, 2022 Author Share Posted January 28, 2022 2 minutes ago, Squid said: Cabling certainly appears to be the prime suspect (the drive isn't even showing any SMART report at all) Since it looks like you sync the backup from the plugin to backblaze it's probably not a major issue, but I don't recommend storing a backup of the drive you're backing up on the drive itself. True, I have been debating on plugging an external drive in and having it backup to that for a local copy but the data footprint is so small that pulling from the cloud wouldn't take more than an hour or two. Quote Link to comment
TheSystemAdmin Posted January 28, 2022 Author Share Posted January 28, 2022 8 minutes ago, JorgeB said: Jan 28 10:35:32 TSA-NAS01 kernel: ahci 0000:03:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000e address=0xb0010000 flags=0x0000] Problem with the onboard SATA controller, both cache devices dropped offline because of that: Jan 28 10:36:33 TSA-NAS01 kernel: ata1.00: disabled Jan 28 10:37:30 TSA-NAS01 kernel: ata2.00: disabled This is quite common with some Ryzen boards, rebooting should bring the pool back but if it keeps happening best to use an ad-don controller (or replace the board). Reboot appears to have resolved it, data is back, containers started up and VMs are reflecting. Will definitely consider replacing the board if this issue occurs a second time. Been debating on switching to Intel but the wife won't approve any more tech spending for a few months. Haha Quote Link to comment
TheSystemAdmin Posted January 28, 2022 Author Share Posted January 28, 2022 Since my "system" share is on the cache and I lost both drives, unRAID just ran with what it had on the array? Would that account for data not reflecting, performance being terrible and VMs missing? Quote Link to comment
JorgeB Posted January 28, 2022 Share Posted January 28, 2022 5 minutes ago, TheSystemAdmin said: Would that account for data not reflecting, performance being terrible and VMs missing? Correct, all pool data became inaccessible. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.