August 23, 20232 yr Hello gurus can someone point me in the right direction or maybe solve this issue for me. I had a drive (12TB) go disabled red x (Drive 6). I try to renabled it and to get it back online but couldn't get it back to green, I beleive the drive has read errors so I replaced the drive with a 20TB WD Red that I purchased and unfortuantly my noobness and stupidity did soemthing while I was trying to get the disabled 12TB to renable and UNRAID forced me to preclear and format the new drive thus losing 12TB of data. I rebooted before running diagnostics which was dumb of me, but honestly, I was worried about and focused on the 12TN of tv shows I was about to lose. Yes, I know I did soemthing and drive 6 should of rebuilt itself. So, I was running a parity check and after 12hrs parity checked paused and drive 4 became disabled, drives 1-6 are in the same disk shelf (EMC KTN-STL3 15 Bay connected to my server by SF-8088 to SF-8088 cable so I thought maybe it was a hardware issue, so I pulled the HBA, replaced the expander with a spare that I had and the SF-8088 to SF-8087 card and the cables. I moved the expander to a different PCI slot and reattched the SF-9-97 to SF-8087 cables in different positions. Long story short I ran another parity check, and it ran for roughly 20hrs and then paused and disk 4 went disabled again. I would lijke to rebuild my 12TB of shows back but as a parity check usually takes 3 days to complete, I would like to resolve the issue before I lcompletly corrupt 129TB of Movies and TV Shows. I'm assuming it's either the HBA or the expander. Like the HBA is resetting itself all the time. Suggestions, is there a type or quality of cables I should buy? tower-diagnostics-20230823-1527.zip Edited August 24, 20232 yr by Haroldkidd No need for 1 attachment
August 24, 20232 yr Aug 22 13:34:41 Tower kernel: mpt2sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready Aug 22 13:34:41 Tower kernel: mpt2sas_cm0: fault_state(0x7e23)! Aug 22 13:34:41 Tower kernel: mpt2sas_cm0: sending diag reset !! The HBA keeps crashing, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot if available.
August 24, 20232 yr Author 1 hour ago, JorgeB said: Aug 22 13:34:41 Tower kernel: mpt2sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready Aug 22 13:34:41 Tower kernel: mpt2sas_cm0: fault_state(0x7e23)! Aug 22 13:34:41 Tower kernel: mpt2sas_cm0: sending diag reset !! The HBA keeps crashing, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot if available. Jorge, Thanks that's what I feared, I ordered a LSI 9211-8i to swap out with. My current one I ordered from ebay so a little worried it may not be as new as advertised, I currently have the pannel off of my 3U case and a box fan blowing directly onto the HBA and it still crashes. I'm going to try another PCI slot and see maybe if that will work. It just sucks that a fullo parity check takes 3 days to complete and it usually crashes after a day and a half Appreciate it. Did you by chance look at the log dated 20230823, i ran that the second crash after moving the expander and SAS cables. I'm sure its the HBA in that also.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.