TBSCamCity Posted May 17, 2019 Share Posted May 17, 2019 (edited) I recently upgraded from 6.3.3 (which I was running stable for over a year) to 6.5.3. I had one drive unmountable upon upgrading, so I ran xfs repair. This didn't resolve the issue so I rebuilt with a new drive. About a day later now I have 4 unmountable drives, and a parity drive offline. On top of that 6 of my 14 drives now show smart errors. I have 12 WD Red/White label 8 TB drives, and 2 Seagate Ironwolf Pro 8TB drives (parity). It seems highly unlikely to me that I actually have 1/2 my drives dying all at once, and I have tried to remedy the issue with a change of cables, and HBA so I don't think it's hardware related. I also did not back up my 6,3,3 usb drive before updating, which I know I should have. I don't have an option to downgrade back to 6.3.3 either (it just says downgrade to 6.5.3, which I'm already on). I have included the 3 log files and smart data from diagnostics. Any help you guys could offer would really help me out. For now all my data is inaccessible, and I don't know what to do about this. Edited May 17, 2019 by TBSCamCity Quote Link to comment
itimpi Posted May 17, 2019 Share Posted May 17, 2019 (edited) If you 0nly recently upgraded, what is the rational for going with 6.5.3 release which is nearly a year old? The current stable release is 6.7. BTW: If you just provide the diagnostics zip file it will give all the information you provided as a single file that is much easier to work with. Edited May 17, 2019 by itimpi Quote Link to comment
TBSCamCity Posted May 17, 2019 Author Share Posted May 17, 2019 I upgraded to 6.5.3 because that was marked as the stable release on the GUI. Here is the full zip. main-diagnostics-20190516-2239.zip Quote Link to comment
itimpi Posted May 17, 2019 Share Posted May 17, 2019 The problem is that very few people will be running 6.5.3 so support could be difficult. You should ideally upgrade to 6.6.7 (the previous Stable release) or 6.7 (the current recently released Stable release) as whatever problem you are encountering on 6.5.3 may well have been fixed in one of the release made since then. Quote Link to comment
JorgeB Posted May 17, 2019 Share Posted May 17, 2019 Disk6 is failing, other than that there a few CRC errors on other disks, which usually indicate a cable problem, though SMART for disk2 is missing, these errors on two disks could also suggest a cable/connection issue: May 16 22:39:48 Main kernel: sd 1:0:2:0: attempting task abort! scmd(ffff882fffaea548) May 16 22:39:48 Main kernel: sd 1:0:2:0: [sdd] tag#2 CDB: opcode=0x8a 8a 00 00 00 00 00 00 0e f8 40 00 00 04 00 00 00 May 16 22:39:48 Main kernel: scsi target1:0:2: handle(0x000c), sas_address(0x50030480005b0f48), phy(8) May 16 22:39:48 Main kernel: scsi target1:0:2: enclosure_logical_id(0x50030480005b0f7f), slot(4) May 16 22:39:51 Main kernel: sd 1:0:2:0: task abort: SUCCESS scmd(ffff882fffaea548) May 16 22:39:51 Main kernel: sd 1:0:7:0: attempting task abort! scmd(ffff882fffaea948) May 16 22:39:51 Main kernel: sd 1:0:7:0: [sdi] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 00 00 0e fc 40 00 00 04 00 00 00 May 16 22:39:51 Main kernel: scsi target1:0:7: handle(0x0011), sas_address(0x50030480005b0f4f), phy(15) May 16 22:39:51 Main kernel: scsi target1:0:7: enclosure_logical_id(0x50030480005b0f7f), slot(11) May 16 22:39:51 Main kernel: sd 1:0:7:0: device_block, handle(0x0011) May 16 22:39:51 Main kernel: sd 1:0:7:0: task abort: SUCCESS scmd(ffff882fffaea948) May 16 22:39:51 Main kernel: sd 1:0:7:0: device_unblock and setting to running, handle(0x0011) There are a few disks needing a filesystem check, ideally done after resolving these connection issues. Quote Link to comment
TBSCamCity Posted May 17, 2019 Author Share Posted May 17, 2019 Thanks, my first thought was connection issue of some sort. I have tried multiple cables and HBA's with no success. I just upgraded to 6.7 and now every disk is unmountable and shows SMART errors. Perhaps my backplane is the issue (it's a Supermicro 846 chassis). Quote Link to comment
JorgeB Posted May 17, 2019 Share Posted May 17, 2019 13 minutes ago, TBSCamCity said: Perhaps my backplane is the issue (it's a Supermicro 846 chassis) Could be, could also be the HBA, PSU, etc, Quote Link to comment
TBSCamCity Posted May 17, 2019 Author Share Posted May 17, 2019 9 minutes ago, johnnie.black said: Could be, could also be the HBA, PSU, etc, Yeah I have spare HBA's, PSU's, and cables from other Supermicro servers I have so I ruled all those out. I don't have a spare backplane for the 4U chassis to test though. I also figured it was very odd timing for one of these to die right when I upgraded to 6.5.3, but I guess it could just be coincidence! I will try to find the culprit this weekend and check back if I can rule out the hardware. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.