Jump to content

[6.8.3] Stuck in loop "retry unmounting disk shares" (SOLVED: dead RAID card)


Recommended Posts

Posted (edited)

My unraid server had some trouble earlier:

 

Unraid Cache disk message: 03-06-2020 19:06

Warning [TOWER] - Cache pool BTRFS missing device(s)
Samsung_SSD_860_EVO_500GB_S4BENG0KC05104W (sdg)

+

Unraid Disk 6 error: 03-06-2020 19:07

Alert [TOWER] - Disk 6 in error state (disk dsbl)
WDC_WD40EZRX-00SPEB0_WD-WCC4E52UR3RJ (sdh)

+

Unraid array errors: 03-06-2020 19:07

Warning [TOWER] - array has errors
Array has 1 disk with read errors

 

I used the Tools > Diagnostics > Download to grab all the logs and config. Then thought I shut down the array to do some troubleshooting. Unfortunately I am now stuck in a constant loop of  "Array Stopping • Retry unmounting disk share(s)...".

 

From the Syslog

Jun  3 17:55:35 Tower kernel: mdcmd (772): spindown 6
Jun  3 19:05:39 Tower kernel: ata5.00: exception Emask 0x52 SAct 0xfc0 SErr 0xffffffff action 0xe frozen
Jun  3 19:05:39 Tower kernel: ata5: SError: { RecovData RecovComm UnrecovData Persist Proto HostInt PHYRdyChg PHYInt CommWake 10B8B Dispar BadCRC Handshk LinkSeq TrStaTrns UnrecFIS DevExch }
Jun  3 19:05:39 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jun  3 19:05:39 Tower kernel: ata5.00: cmd 60/20:30:40:d9:08/00:00:16:00:00/40 tag 6 ncq dma 16384 in
Jun  3 19:05:39 Tower kernel:         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
Jun  3 19:05:39 Tower kernel: ata5.00: status: { DRDY }
Jun  3 19:05:39 Tower kernel: ata5.00: failed command: READ FPDMA QUEUED
Jun  3 19:05:39 Tower kernel: ata5.00: cmd 60/08:38:d8:b5:6c/00:00:04:00:00/40 tag 7 ncq dma 4096 in
Jun  3 19:05:39 Tower kernel:         res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x56 (ATA bus error)
Jun  3 19:05:39 Tower kernel: ata5.00: status: { DRDY }

 

then a bit later in the syslog:

Jun  3 19:05:39 Tower kernel: ata5.00: status: { DRDY }
Jun  3 19:05:39 Tower kernel: ata5: hard resetting link
Jun  3 19:05:39 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable!
Jun  3 19:05:40 Tower kernel: ata5: failed to resume link (SControl FFFFFFFF)
Jun  3 19:05:40 Tower kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
Jun  3 19:05:46 Tower kernel: ata5: hard resetting link
Jun  3 19:05:46 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable!
Jun  3 19:05:47 Tower kernel: ata5: failed to resume link (SControl FFFFFFFF)
Jun  3 19:05:47 Tower kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
Jun  3 19:05:47 Tower kernel: ata5: limiting SATA link speed to <unknown>
Jun  3 19:05:52 Tower kernel: ata5: hard resetting link
Jun  3 19:05:52 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable!
Jun  3 19:05:53 Tower kernel: ata5: failed to resume link (SControl FFFFFFFF)
Jun  3 19:05:53 Tower kernel: ata5: SATA link down (SStatus FFFFFFFF SControl FFFFFFFF)
Jun  3 19:05:53 Tower kernel: ata5.00: disabled
Jun  3 19:05:53 Tower kernel: ahci 0000:02:00.0: AHCI controller unavailable!
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 Sense Key : 0x5 [current] 
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 ASC=0x21 ASCQ=0x4 
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#6 CDB: opcode=0x28 28 00 16 08 d9 40 00 00 20 00
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 369678656
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 Sense Key : 0x5 [current] 
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 ASC=0x21 ASCQ=0x4 
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#7 CDB: opcode=0x28 28 00 04 6c b5 d8 00 00 08 00
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 74233304
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 0, rd 2, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 Sense Key : 0x5 [current] 
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 ASC=0x21 ASCQ=0x4 
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#8 CDB: opcode=0x28 28 00 05 20 56 88 00 00 08 00
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 86005384

and then a little bit later:

Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: [sdg] tag#11 CDB: opcode=0x2a 2a 00 01 de 5e 08 00 02 00 00
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 31350280
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 3, rd 2, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: ata5: EH complete
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: rejecting I/O to offline device
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 86005384
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 3, rd 3, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: sd 4:0:0:0: rejecting I/O to offline device
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 75279920
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 4, rd 3, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: ata5.00: detaching (SCSI 4:0:0:0)
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 75281280
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 5, rd 3, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: print_req_error: I/O error, dev sdg, sector 27140976
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 6, rd 3, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 7, rd 3, flush 0, corrupt 0, gen 0
Jun  3 19:05:53 Tower kernel: BTRFS: error (device sdg1) in btrfs_commit_transaction:2267: errno=-5 IO failure (Error while writing out transaction)
Jun  3 19:05:53 Tower kernel: BTRFS info (device sdg1): forced readonly
Jun  3 19:05:53 Tower kernel: BTRFS warning (device sdg1): Skipping commit of aborted transaction.
Jun  3 19:05:53 Tower kernel: BTRFS: error (device sdg1) in cleanup_transaction:1860: errno=-5 IO failure
Jun  3 19:05:53 Tower kernel: BTRFS info (device sdg1): delayed_refs has NO entry
Jun  3 19:05:53 Tower kernel: loop: Write error at byte offset 14237696, length 4096.
Jun  3 19:05:53 Tower kernel: loop: Write error at byte offset 20107264, length 4096.
Jun  3 19:05:53 Tower kernel: loop: Write error at byte offset 2207744000, length 4096.
Jun  3 19:05:53 Tower kernel: BTRFS warning (device loop2): chunk 13631488 missing 1 devices, max tolerance is 0 for writeable mount
Jun  3 19:05:53 Tower kernel: BTRFS: error (device loop2) in write_all_supers:3716: errno=-5 IO failure (errors while submitting device barriers.)

 

 

I grabbed the syslog again, in an attempt to see what was causing the "unmounting loop":

Jun  3 20:06:47 Tower kernel: print_req_error: I/O error, dev loop2, sector 2969408
Jun  3 20:06:50 Tower emhttpd: Unmounting disks...
Jun  3 20:06:50 Tower emhttpd: shcmd (91679): umount /mnt/disk4
Jun  3 20:06:50 Tower root: umount: /mnt/disk4: target is busy.
Jun  3 20:06:50 Tower emhttpd: shcmd (91679): exit status: 32
Jun  3 20:06:50 Tower emhttpd: shcmd (91680): umount /mnt/cache
Jun  3 20:06:50 Tower root: umount: /mnt/cache: target is busy.
Jun  3 20:06:50 Tower emhttpd: shcmd (91680): exit status: 32
Jun  3 20:06:50 Tower emhttpd: Retry unmounting disk share(s)...
Jun  3 20:06:52 Tower kernel: btrfs_dev_stat_print_on_error: 110 callbacks suppressed
Jun  3 20:06:52 Tower kernel: BTRFS error (device sdg1): bdev /dev/sdg1 errs: wr 42, rd 38010, flush 0, corrupt 0, gen 0

 

I'd prefer a graceful shutdown rather than a hard restart. Any got any ideas how to unmount disk4 and my cache?

 

Kind Regards,

jaso

 

 

 

 

 

 

Edited by jaso
update title to mark as solved
Posted

I do not think you can achieve a graceful shutdown as you are getting hardware errors (quite likely cable type connection issues) on some drives so that the drives will never finish unmounting.

Posted
1 minute ago, itimpi said:

I do not think you can achieve a graceful shutdown as you are getting hardware errors (quite likely cable type connection issues) on some drives so that the drives will never finish unmounting.

Thanks itimpi.

Hard reboot time  :-(

Posted (edited)

Figured out what the problem was. Dead RAID card.

 

/mnt/disk6 and /mnt/cache were both being served by a generic 2x sata card. It just up and died after 5 years of top-notch service.

 

Will have to wait for a few days for a new raid card to arrive. In the meantime I've moved my cache ssd to another sata slot, and /mnt/disk6 is being emulated for now...

 

Cheers,

jaso

Edited by jaso
typo

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...