Jump to content

[SOLVED] Consistent Read/Write Errors after moving to a backplane


Recommended Posts

Posted

I just decided to upgrade my setup by moving everything from a tower into a Supermicro Chassis. Previous to the upgrade. Everything on my system was working perfectly. With the upgrade the only real changes besides the chassis are that I am now using the Supermicro PWS-1K28P-SQ dual power supplies and I've installed an LSI 9300-8i HBA to connect the backplanes (BPN-SAS3-846EL1/BPN-SAS3-826EL1) to which I've installed all of my array drives. I'm running a dual parity setup with 6 array drives.

 

This is my first time working with a backplane setup so I'm pretty sure my issue probably stems from the HBA or backplane itself. However, I think this may also be a classic "I don't know what I don't know" which is why I would love to get your insights on this. It may get a little wordy but I'll just describe everything I've done so far. 

On first startup, everything went pretty smoothly. I made sure to turn off the array auto start before doing the migration and when I started it in the new system, it initially told me all of my drives were missing, but I was able to see all of them so after reassigning them to the appropriate drives I was able to start the array and run a full parity check with no problems. After a day, it told me that two of my drives had failed (Disks 2 and 3). I had a suspicion that this was not the case, but they were the two smallest drives on my array so I decided to replace them with larger ones anyways. As I replaced the drives I reseated all of the power cables and the SFF-8644 cables to the HBA and the front backplane (I haven't checked the rear backplane yet since I only have my cache drive on it and haven't experienced any issues with it yet). I was able to get through the full rebuild without any problems and the system ran well for another day. After which it told me that two other drives had gone bad. (Disks 5 and 6)

 

At this point there have also been several warnings about read errors on multiple drives. It seemed unlikely to me that these drives were actually all going bad at the same time so I did a little bit of research on my setup and decided to double check that the HBA is in IT mode. Sure enough, it is running in IT mode although I'm not sure if it's the most recent firmware. (I'll edit in the version once I get back to my system) I did notice that the PCI Slot in the SAS Configuration Utility was FF and gave me a message stating that it's an invalid PCI slot. So I moved it to a different slot and it now gives me an actual number. I booted my system, and hoping that the slot was the issue, I stopped my array, set the "bad" drives to no device, started it, stopped it again, and started the rebuild with the same drives. The rebuild succeeded although I was still getting warning messages about read errors on disk 3 which is one of my brand new drives. After it succeeded, the system worked fine for a day, and now it's telling me that Disks 4 and 5 are bad. 

 

I have gotten a couple S.M.A.R.T. error messages on several different drives which seem to be related to the read errors but I'm currently in the process of running some extended tests on those drives. 

 

I tried to search for everything I could on this subject and from what I've found it looks like I just need to start testing individual hardware components to see if I can find what's causing the issue. As I mentioned, I am pretty new to this type of setup so I figured I would put up my diagnostics and see if anyone can provide some ideas about things I may not be familiar with. If you need any more information from me please let me know. Any help is appreciated! Thank you!

babel-diagnostics-20200526-0818.zip

Posted

Note that because of the old firmware you might need to do a new config since all disks identifier string will likely change, this you be easier to do with all disks enable, if you can get them like that.

Posted

Oh! Good to know. I was informed when I got the card that it should be on the latest firmware. I will get this updated and report back to see if the issue is resolved. Thank you for the quick response!

  • 2 weeks later...
Posted

Update: A week has gone by and that seemed to do the trick. It looks like it didn't help that one of the drives is actually giving me read errors. I'm in the process of replacing that drive now but everything else seems to be working perfectly! Thank you johnnie.black!

  • JorgeB changed the title to [SOLVED] Consistent Read/Write Errors after moving to a backplane

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...