Jump to content

3rd time having cache drive hose up!


NeoMatrixJR

Recommended Posts

Ok, I've run to the end of my last extension.  Oddly enough it seems my cache always dies just past my trial license...requiring a reboot. :(

Debating if I should purchase unRAID now or move on because of this issue.  It causes my dockers and VMs to die and I have a fair few of them.

 

The really odd thing in the logs is the BTRFS errors...and I'm not using BTRFS.  All drives (except boot) are XFS. (boot is vfat)

 

Diagnostics are attached.  My plan, over the weekend, was to move to a new USB stick, buy a license, and then upgrade to 6.4.  Now I'm not sure what I'm going to do.  There's a good chance this might just fix again after a reboot or two (last time one reboot ended up with all my shares missing...then coming back on reboot again??? Dunno what happened there...) but I can't keep doing this.

 

Hardware is a Dell r710 with dual hex-core Xeon CPU & 24GB Ram.  Running a Perc H310 in IT mode for HBA.  Array is partially populated by various SATA disks...whatever I could scrounge up that worked with a shiny new 4tb drive for parity and another 4tb +others for data.  Cache is an Intel 180gb SSD in an optical tray in the server for now until I can get around the limited power hookups in the r710.

theconstruct-diagnostics-20180119-1026.zip

Link to comment

These are hardware errors, SSD was disabled and the filesystem shutdown in the end, possibly a bad cable, but there's no SMART since it dropped offline:

 

Jan 19 00:59:24 THECONSTRUCT kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jan 19 00:59:24 THECONSTRUCT kernel: ata1.00: failed command: FLUSH CACHE EXT
Jan 19 00:59:24 THECONSTRUCT kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Jan 19 00:59:24 THECONSTRUCT kernel:         res 40/00:01:00:00:00/04:00:00:00:00/e0 Emask 0x4 (timeout)
Jan 19 00:59:24 THECONSTRUCT kernel: ata1.00: status: { DRDY }
Jan 19 00:59:24 THECONSTRUCT kernel: ata1: hard resetting link
Jan 19 00:59:29 THECONSTRUCT kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 19 00:59:34 THECONSTRUCT kernel: ata1: SRST failed (errno=-16)
Jan 19 00:59:34 THECONSTRUCT kernel: ata1: hard resetting link
Jan 19 00:59:39 THECONSTRUCT kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 19 00:59:44 THECONSTRUCT kernel: ata1: SRST failed (errno=-16)
Jan 19 00:59:44 THECONSTRUCT kernel: ata1: hard resetting link
Jan 19 00:59:49 THECONSTRUCT kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 19 01:00:19 THECONSTRUCT kernel: ata1: SRST failed (errno=-16)
Jan 19 01:00:19 THECONSTRUCT kernel: ata1: limiting SATA link speed to 1.5 Gbps
Jan 19 01:00:19 THECONSTRUCT kernel: ata1: hard resetting link
Jan 19 01:00:24 THECONSTRUCT kernel: ata1: SRST failed (errno=-16)
Jan 19 01:00:24 THECONSTRUCT kernel: ata1: reset failed, giving up
Jan 19 01:00:24 THECONSTRUCT kernel: ata1.00: disabled
Jan 19 01:00:24 THECONSTRUCT kernel: ata1: EH complete
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 175860562
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x35 35 00 00 00 00 00 00 00 00 00
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 175860576
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 175860584
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 175860619
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 175860622
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 05 dd d6 80 00 00 20 00
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 98424448
Jan 19 01:00:24 THECONSTRUCT kernel: XFS (sdb1): metadata I/O error: block 0xa7b6b12 ("xlog_iodone") error 5 numblks 64
Jan 19 01:00:24 THECONSTRUCT shfs/user: err: shfs_ftruncate: ftruncate: (5) Input/output error
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 10 44 64 98 00 00 08 00
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 272917656
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x2a 2a 00 10 77 e5 a8 00 00 88 00
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 276293032
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536621, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536622, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536623, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536624, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x2a 2a 00 12 3c 9c c0 00 00 40 00
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536625, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 305962176
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536626, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 38245264, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536627, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536628, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: Buffer I/O error on dev sdb1, logical block 34536629, lost async page write
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 10 44 64 98 00 00 08 00
Jan 19 01:00:24 THECONSTRUCT kernel: blk_update_request: I/O error, dev sdb, sector 272917656
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x2a 2a 00 05 ff 92 d0 00 00 08 00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x2a 2a 00 05 ff 92 f0 00 00 08 00
Jan 19 01:00:24 THECONSTRUCT kernel: XFS (sdb1): xfs_do_force_shutdown(0x2) called from line 1200 of file fs/xfs/xfs_log.c.  Return address = 0xffffffff812b4a91
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan 19 01:00:24 THECONSTRUCT kernel: sd 2:0:0:0: [sdb] tag#0 CDB: opcode=0x2a 2a 00 06 0f 5a 58 00 00 08 00
Jan 19 01:00:24 THECONSTRUCT kernel: XFS (sdb1): Log I/O Error Detected.  Shutting down filesystem

 

Link to comment

Why haven't you let us help you before now? I see another post you made about this on another thread, and it was replied to, but you didn't follow up.

 

The btrfs being referred to in your syslog is the docker.img. It is a virtual disk that contains your dockers. Most likely you keep corrupting it because you have misconfigured dockers filling it up, or possibly it is just too small. From your diagnostics it looks like you have only given it 1GB to work with, and you say you have a lot of dockers.

 

Have you read the docker FAQ?

 

Link to comment

Thanks!  Sorry I didn't follow up on the last one...Most likely it's because I didn't realize started topics don't auto-set subscribe so I lost track of it after I got it working.  I caught that this time around and will watch for it in the future.

1.) Hardware issue... I've got an idea to help with this but I'm running a Dell r710 with proprietary cabling.  I'm working on a solution for this though.  Is there any way to get the system to re-check/reinitialize the drive while running?  A reboot tends to seem to fix this and it happens rarely. (so oddly inconsistent for a hardware issue IMHO...), but I wish I didn't have to shut down to fix.  Granted, I'm now out of trials so it won't matter anymore.  Now I have to buy and reboots won't matter so much now.

2.) docker FAQ - Hadn't seen that, but will dig in ASAP.

Link to comment

4th, 5th, and 6th time!  All over the weekend.  Definitely a hardware issue, and not the drive.  I plugged it into my pc via USB and everything is fine on the drive.  My options were either build a really janky power cable and keep my 2.5" SSDs and try to make that work, or find a better solution.  Finding that the r710 onboard sata is ATA only not AHCI sealed the deal.  I bought a pcie -> 2x M.2 + 2xSata board + 2x 256GB SSDs + a 2x esata -> Sata bracket (for my external blu-ray drive).  Going to fix this the right way and get redundancy set up for cache.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...