jaso Posted June 3, 2020 Posted June 3, 2020 (edited) My unraid server had some trouble earlier: Unraid Cache disk message: 03-06-2020 19:06 Warning [TOWER] - Cache pool BTRFS missing device(s) Samsung_SSD_860_EVO_500GB_S4BENG0KC05104W (sdg) + Unraid Disk 6 error: 03-06-2020 19:07 Alert [TOWER] - Disk 6 in error state (disk dsbl) WDC_WD40EZRX-00SPEB0_WD-WCC4E52UR3RJ (sdh) + Unraid array errors: 03-06-2020 19:07 Warning [TOWER] - array has errors Array has 1 disk with read errors I used the Tools > Diagnostics > Download to grab all the logs and config. Then thought I shut down the array to do some troubleshooting. Unfortunately I am now stuck in a constant loop of "Array Stopping • Retry unmounting disk share(s)...". From the Syslog Jun 3 17:55:35 Tower kernel: mdcmd (772): spindown 6 Jun 3 19:05:39 Tower kernel: ata5.00: exception Emask 0x52 SAct 0xfc0 SErr 0xffffffff action 0xe frozen Jun 3 19:05:39 Tower kernel: ata5: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch } Jun 3 19:05:39 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED Jun 3 19:05:39 Tower kernel: ata5.00: cmd 60/20:30:40:d9:08/00:00:16:00:00/40 tag 6 ncq dma 16384 in Jun 3 19:05:39 Tower kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error) Jun 3 19:05:39 Tower kernel: ata5.00: status: { DRDY } Jun 3 19:05:39 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED Jun 3 19:05:39 Tower kernel: ata5.00: cmd 60/08:38:d8:b5:6c/00:00:04:00:00/40 tag 7 ncq dma 4096 in Jun 3 19:05:39 Tower kernel: res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error) Jun 3 19:05:39 Tower kernel: ata5.00: status: { DRDY } then a bit later in the syslog: Jun 3 19:05:39 Tower kernel: ata5.00: status: { DRDY } Jun 3 19:05:39 Tower kernel: ata5: hard resetting link Jun 3 19:05:39 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable! Jun 3 19:05:40 Tower kernel: ata5: failed to resume link (SControl FFFFFFFF) Jun 3 19:05:40 Tower kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 3 19:05:46 Tower kernel: ata5: hard resetting link Jun 3 19:05:46 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable! Jun 3 19:05:47 Tower kernel: ata5: failed to resume link (SControl FFFFFFFF) Jun 3 19:05:47 Tower kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 3 19:05:47 Tower kernel: ata5: limiting SATA link speed to <unknown> Jun 3 19:05:52 Tower kernel: ata5: hard resetting link Jun 3 19:05:52 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable! Jun 3 19:05:53 Tower kernel: ata5: failed to resume link (SControl FFFFFFFF) Jun 3 19:05:53 Tower kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF) Jun 3 19:05:53 Tower kernel: ata5.00: disabled Jun 3 19:05:53 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable! Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 Sense Key : 0x5 [current] Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 ASC=0x21 ASCQ=0x4 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 CDB: opcode=0x28 28 00 16 08 d9 40 00 00 20 00 Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 369678656 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 Sense Key : 0x5 [current] Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 ASC=0x21 ASCQ=0x4 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 CDB: opcode=0x28 28 00 04 6c b5 d8 00 00 08 00 Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 74233304 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 Sense Key : 0x5 [current] Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 ASC=0x21 ASCQ=0x4 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 CDB: opcode=0x28 28 00 05 20 56 88 00 00 08 00 Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 86005384 and then a little bit later: Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#11 CDB: opcode=0x2a 2a 00 01 de 5e 08 00 02 00 00 Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 31350280 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 3, rd 2, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: ata5: EH complete Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: rejecting I/O to offline device Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 86005384 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 3, rd 3, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: sd 4:0:0:0: rejecting I/O to offline device Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 75279920 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 4, rd 3, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: ata5.00: detaching (SCSI 4:0:0:0) Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 75281280 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 5, rd 3, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 27140976 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 6, rd 3, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 7, rd 3, flush 0, corrupt 0, gen 0 Jun 3 19:05:53 Tower kernel: BTRFS: error (device sdg1) in btrfs_commit_transaction:2267: errno=-5 IO failure (Error while writing out transaction) Jun 3 19:05:53 Tower kernel: BTRFS info (device sdg1): forced readonly Jun 3 19:05:53 Tower kernel: BTRFS warning (device sdg1): Skipping commit of aborted transaction. Jun 3 19:05:53 Tower kernel: BTRFS: error (device sdg1) in cleanup_transaction:1860: errno=-5 IO failure Jun 3 19:05:53 Tower kernel: BTRFS info (device sdg1): delayed_refs has NO entry Jun 3 19:05:53 Tower kernel: loop: Write error at byte offset 14237696, length 4096. Jun 3 19:05:53 Tower kernel: loop: Write error at byte offset 20107264, length 4096. Jun 3 19:05:53 Tower kernel: loop: Write error at byte offset 2207744000, length 4096. Jun 3 19:05:53 Tower kernel: BTRFS warning (device loop2): chunk 13631488 missing 1 devices, max tolerance is 0 for writeable mount Jun 3 19:05:53 Tower kernel: BTRFS: error (device loop2) in write_all_supers:3716: errno=-5 IO failure (errors while submitting device barriers.) I grabbed the syslog again, in an attempt to see what was causing the "unmounting loop": Jun 3 20:06:47 Tower kernel: print_req_error: I/O error, dev loop2, sector 2969408 Jun 3 20:06:50 Tower emhttpd: Unmounting disks... Jun 3 20:06:50 Tower emhttpd: shcmd (91679): umount /mnt/disk4 Jun 3 20:06:50 Tower root: umount: /mnt/disk4: target is busy. Jun 3 20:06:50 Tower emhttpd: shcmd (91679): exit status: 32 Jun 3 20:06:50 Tower emhttpd: shcmd (91680): umount /mnt/cache Jun 3 20:06:50 Tower root: umount: /mnt/cache: target is busy. Jun 3 20:06:50 Tower emhttpd: shcmd (91680): exit status: 32 Jun 3 20:06:50 Tower emhttpd: Retry unmounting disk share(s)... Jun 3 20:06:52 Tower kernel: btrfs_dev_stat_print_on_error: 110 callbacks suppressed Jun 3 20:06:52 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 42, rd 38010, flush 0, corrupt 0, gen 0 I'd prefer a graceful shutdown rather than a hard restart. Any got any ideas how to unmount disk4 and my cache? Kind Regards, jaso Edited June 3, 2020 by jaso update title to mark as solved Quote
itimpi Posted June 3, 2020 Posted June 3, 2020 I do not think you can achieve a graceful shutdown as you are getting hardware errors (quite likely cable type connection issues) on some drives so that the drives will never finish unmounting. Quote
jaso Posted June 3, 2020 Author Posted June 3, 2020 1 minute ago, itimpi said: I do not think you can achieve a graceful shutdown as you are getting hardware errors (quite likely cable type connection issues) on some drives so that the drives will never finish unmounting. Thanks itimpi. Hard reboot time :-( Quote
jaso Posted June 3, 2020 Author Posted June 3, 2020 (edited) Figured out what the problem was. Dead RAID card. /mnt/disk6 and /mnt/cache were both being served by a generic 2x sata card. It just up and died after 5 years of top-notch service. Will have to wait for a few days for a new raid card to arrive. In the meantime I've moved my cache ssd to another sata slot, and /mnt/disk6 is being emulated for now... Cheers, jaso Edited June 7, 2020 by jaso typo Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.