6.5.3 Upgrade Unmountable drives

TBSCamCity · May 17, 2019

I recently upgraded from 6.3.3 (which I was running stable for over a year) to 6.5.3. I had one drive unmountable upon upgrading, so I ran xfs repair. This didn't resolve the issue so I rebuilt with a new drive. About a day later now I have 4 unmountable drives, and a parity drive offline. On top of that 6 of my 14 drives now show smart errors. I have 12 WD Red/White label 8 TB drives, and 2 Seagate Ironwolf Pro 8TB drives (parity). It seems highly unlikely to me that I actually have 1/2 my drives dying all at once, and I have tried to remedy the issue with a change of cables, and HBA so I don't think it's hardware related. I also did not back up my 6,3,3 usb drive before updating, which I know I should have. I don't have an option to downgrade back to 6.3.3 either (it just says downgrade to 6.5.3, which I'm already on). I have included the 3 log files and smart data from diagnostics. Any help you guys could offer would really help me out. For now all my data is inaccessible, and I don't know what to do about this.

Edited May 17, 2019 by TBSCamCity

itimpi · May 17, 2019

If you 0nly recently upgraded, what is the rational for going with 6.5.3 release which is nearly a year old? The current stable release is 6.7.

BTW: If you just provide the diagnostics zip file it will give all the information you provided as a single file that is much easier to work with.

Edited May 17, 2019 by itimpi

TBSCamCity · May 17, 2019

I upgraded to 6.5.3 because that was marked as the stable release on the GUI. Here is the full zip.

main-diagnostics-20190516-2239.zip

itimpi · May 17, 2019

The problem is that very few people will be running 6.5.3 so support could be difficult. You should ideally upgrade to 6.6.7 (the previous Stable release) or 6.7 (the current recently released Stable release) as whatever problem you are encountering on 6.5.3 may well have been fixed in one of the release made since then.

JorgeB · May 17, 2019

Disk6 is failing, other than that there a few CRC errors on other disks, which usually indicate a cable problem, though SMART for disk2 is missing, these errors on two disks could also suggest a cable/connection issue:

May 16 22:39:48 Main kernel: sd 1:0:2:0: attempting task abort! scmd(ffff882fffaea548)
May 16 22:39:48 Main kernel: sd 1:0:2:0: [sdd] tag#2 CDB: opcode=0x8a 8a 00 00 00 00 00 00 0e f8 40 00 00 04 00 00 00
May 16 22:39:48 Main kernel: scsi target1:0:2: handle(0x000c), sas_address(0x50030480005b0f48), phy(8)
May 16 22:39:48 Main kernel: scsi target1:0:2: enclosure_logical_id(0x50030480005b0f7f), slot(4)
May 16 22:39:51 Main kernel: sd 1:0:2:0: task abort: SUCCESS scmd(ffff882fffaea548)
May 16 22:39:51 Main kernel: sd 1:0:7:0: attempting task abort! scmd(ffff882fffaea948)
May 16 22:39:51 Main kernel: sd 1:0:7:0: [sdi] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 00 00 0e fc 40 00 00 04 00 00 00
May 16 22:39:51 Main kernel: scsi target1:0:7: handle(0x0011), sas_address(0x50030480005b0f4f), phy(15)
May 16 22:39:51 Main kernel: scsi target1:0:7: enclosure_logical_id(0x50030480005b0f7f), slot(11)
May 16 22:39:51 Main kernel: sd 1:0:7:0: device_block, handle(0x0011)
May 16 22:39:51 Main kernel: sd 1:0:7:0: task abort: SUCCESS scmd(ffff882fffaea948)
May 16 22:39:51 Main kernel: sd 1:0:7:0: device_unblock and setting to running, handle(0x0011)

There are a few disks needing a filesystem check, ideally done after resolving these connection issues.

TBSCamCity · May 17, 2019

Thanks, my first thought was connection issue of some sort. I have tried multiple cables and HBA's with no success. I just upgraded to 6.7 and now every disk is unmountable and shows SMART errors. Perhaps my backplane is the issue (it's a Supermicro 846 chassis).

JorgeB · May 17, 2019

13 minutes ago, TBSCamCity said:

Perhaps my backplane is the issue (it's a Supermicro 846 chassis)

Could be, could also be the HBA, PSU, etc,

TBSCamCity · May 17, 2019

9 minutes ago, johnnie.black said:

Could be, could also be the HBA, PSU, etc,

Yeah I have spare HBA's, PSU's, and cables from other Supermicro servers I have so I ruled all those out. I don't have a spare backplane for the 4U chassis to test though. I also figured it was very odd timing for one of these to die right when I upgraded to 6.5.3, but I guess it could just be coincidence! I will try to find the culprit this weekend and check back if I can rule out the hardware.

6.5.3 Upgrade Unmountable drives

Recommended Posts

TBSCamCity

Link to comment

itimpi

Link to comment

TBSCamCity

Link to comment

itimpi

Link to comment

JorgeB

Link to comment

TBSCamCity

Link to comment

JorgeB

Link to comment

TBSCamCity

Link to comment

Join the conversation