Jump to content
TBSCamCity

6.5.3 Upgrade Unmountable drives

8 posts in this topic Last Reply

Recommended Posts

Posted (edited)

I recently upgraded from 6.3.3 (which I was running stable for over a year) to 6.5.3. I had one drive unmountable upon upgrading, so I ran xfs repair. This didn't resolve the issue so I rebuilt with a new drive. About a day later now I have 4 unmountable drives, and a parity drive offline. On top of that 6 of my 14 drives now show smart errors. I have 12 WD Red/White label 8 TB drives, and 2 Seagate Ironwolf Pro 8TB drives (parity). It seems highly unlikely to me that I actually have 1/2 my drives dying all at once, and I have tried to remedy the issue with a change of cables, and HBA so I don't think it's hardware related. I also did not back up my 6,3,3 usb drive before updating, which I know I should have. I don't have an option to downgrade back to 6.3.3 either (it just says downgrade to 6.5.3, which I'm already on).  I have included the 3 log files and smart data from diagnostics. Any help you guys could offer would really help me out. For now all my data is inaccessible, and I don't know what to do about this.

 

Edited by TBSCamCity

Share this post


Link to post
Posted (edited)

If you 0nly recently upgraded, what is the rational for going with 6.5.3 release which is nearly a year old?    The current stable release is 6.7.  

 

BTW:   If you just provide the diagnostics zip file it will give all the information you provided as a single file that is much easier to work with.

 

 

Edited by itimpi

Share this post


Link to post

The problem is that very few people will be running 6.5.3 so support could be difficult.    You should ideally upgrade to 6.6.7 (the previous Stable release) or 6.7 (the current recently released Stable release) as whatever problem you are encountering on 6.5.3 may well have been fixed in one of the release made since then.

Share this post


Link to post

Disk6 is failing, other than that there a few CRC errors on other disks, which usually indicate a cable problem, though SMART for disk2 is missing, these errors on two disks could also suggest a cable/connection issue:

May 16 22:39:48 Main kernel: sd 1:0:2:0: attempting task abort! scmd(ffff882fffaea548)
May 16 22:39:48 Main kernel: sd 1:0:2:0: [sdd] tag#2 CDB: opcode=0x8a 8a 00 00 00 00 00 00 0e f8 40 00 00 04 00 00 00
May 16 22:39:48 Main kernel: scsi target1:0:2: handle(0x000c), sas_address(0x50030480005b0f48), phy(8)
May 16 22:39:48 Main kernel: scsi target1:0:2: enclosure_logical_id(0x50030480005b0f7f), slot(4)
May 16 22:39:51 Main kernel: sd 1:0:2:0: task abort: SUCCESS scmd(ffff882fffaea548)
May 16 22:39:51 Main kernel: sd 1:0:7:0: attempting task abort! scmd(ffff882fffaea948)
May 16 22:39:51 Main kernel: sd 1:0:7:0: [sdi] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 00 00 0e fc 40 00 00 04 00 00 00
May 16 22:39:51 Main kernel: scsi target1:0:7: handle(0x0011), sas_address(0x50030480005b0f4f), phy(15)
May 16 22:39:51 Main kernel: scsi target1:0:7: enclosure_logical_id(0x50030480005b0f7f), slot(11)
May 16 22:39:51 Main kernel: sd 1:0:7:0: device_block, handle(0x0011)
May 16 22:39:51 Main kernel: sd 1:0:7:0: task abort: SUCCESS scmd(ffff882fffaea948)
May 16 22:39:51 Main kernel: sd 1:0:7:0: device_unblock and setting to running, handle(0x0011)

There are a few disks needing a filesystem check, ideally done after resolving these connection issues.

Share this post


Link to post

Thanks, my first thought was connection issue of some sort. I have tried multiple cables and HBA's with no success. I just upgraded to 6.7 and now every disk is unmountable and shows SMART errors. Perhaps my backplane is the issue (it's a Supermicro 846 chassis).

Share this post


Link to post
13 minutes ago, TBSCamCity said:

Perhaps my backplane is the issue (it's a Supermicro 846 chassis)

Could be, could also be the HBA, PSU, etc,

Share this post


Link to post
9 minutes ago, johnnie.black said:

Could be, could also be the HBA, PSU, etc,

Yeah I have spare HBA's, PSU's, and cables from other Supermicro servers I have so I ruled all those out. I don't have a spare backplane for the 4U chassis to test though. I also figured it was very odd timing for one of these to die right when I upgraded to 6.5.3, but I guess it could just be coincidence! I will try to find the culprit this weekend and check back if I can rule out the hardware.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.