Posted May 23, 20232 yr Unraid 6.11.5 I keep having disks go into an error state during a parity check and it’s ticking me off and trying to zero in on the culprit. This has happened 3 times during a Pairty Check, usually within 10-15hrs into it the Parity check will stop and the server reports a disl going into error state. Drive 1 and 4 (20TB) were the ones that were reported. I'm sure I may of not made my post clear and unconfusing, I aplogize for that and will try to clarify anything in followup questions. I plan on checking the hardware connection’s tomorrow. syslog.1.txt.zip tower-smart-20230523-0643.zip tower-diagnostics-20230523-0648.zip Edited May 23, 20232 yr by Haroldkidd Added Complete diagnostics
May 23, 20232 yr Author Hope that is the information you wanted and it helps, kind of unfamiliar with all of this. Thank you for your assitance.
May 23, 20232 yr Community Expert Solution May 22 21:35:25 Tower kernel: mpt2sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready May 22 21:35:25 Tower kernel: mpt2sas_cm0: fault_state(0x7e21)! May 22 21:35:25 Tower kernel: mpt2sas_cm0: sending diag reset !! May 22 21:35:26 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS The HBA keeps faulting and resetting, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot if available.
May 23, 20232 yr Author JorgeB, thanks I'll look into that. I just expanded my disk storage and rplaced the original Marvell controller with a LSI 9207-8i HBA and a SFF-8088 to SAF-8087 adapter. My case is a 3U rack so it may be getting hot. I will check the HBA seating and connections, whats a good way of cooling it. Maybe it's time to upgrade my Tower is back from 2014 that I purchased from Limetech when they were selling pre configured servers. Also if I do manage to be able to get through a complete Parity check, what do i do about the high number of Synch Errors? Edited May 23, 20232 yr by Haroldkidd
May 23, 20232 yr Community Expert 1 minute ago, Haroldkidd said: whats a good way of cooling it. These controllers are made for server cases, they require some airflow around them, if your case doesn't have good airflow in that zone you can add a fan close or on top of it. 2 minutes ago, Haroldkidd said: Also if I do manage to be able to get through a complete Parity check, what do i do about the high number of Synch Errors? First step would be a correcting check.
May 23, 20232 yr Author Thank you, I figured that would be the solution i just wish that there was a way to see what was being corrected. I just spent days updating quality and missing episodes and movies. Thank you for your reply. Edited May 23, 20232 yr by Haroldkidd
May 23, 20232 yr Author 4 hours ago, JorgeB said: May 22 21:35:25 Tower kernel: mpt2sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready May 22 21:35:25 Tower kernel: mpt2sas_cm0: fault_state(0x7e21)! May 22 21:35:25 Tower kernel: mpt2sas_cm0: sending diag reset !! May 22 21:35:26 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS The HBA keeps faulting and resetting, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot if available. I checked all the cables, reset both the HBA and the expander and I left the panal off of the tower and put a fan blowing on it, so we will see. Just curious why this only happens when I do a parity check and not when transfering new data or downloading to the share folders. Maybe because since all the drives are spun up when doing a Parity check it overheats. Curious
May 23, 20232 yr Community Expert Activity on all disks for a long period will cause more stress on the HBA.
May 23, 20232 yr Author 44 minutes ago, JorgeB said: Activity on all disks for a long period will cause more stress on the HBA. Makes me not want to do a parity check good lord, it takes 3 days to do a check. Thanks, I appreciate your help. Edited May 23, 20232 yr by Haroldkidd
June 24, 20232 yr Author So this happened again, I did a Parity check before going to bed because I havent; done one since May 23, 2023 and I have added a lot more data to it. Woke up this morning with the Parity check paused and drive one disabled. Ugh this is frustrating, the HBA and expander are all new and I have the case open and a 26" box fan constantly blowing on the the card. Do I need to replace the wires or the HBA again? Hate to have to replace the wires as I suck at cable management, and it will be ugly. Thanks for any support. Last time I had to remove the hba and reset it and disconnect and reconnect the cables before I could actually do a clean parity check. Will try to do that again. Could it be the Intel RES2SV240 24-Port Expander? It didn't come with the Bracket to screw it to the case, and it wobbles back and forth if you physically move it. tower-diagnostics-20230624-0705.zip syslog.1.txt syslog.txt Edited June 24, 20232 yr by Haroldkidd Added info
June 25, 20232 yr Community Expert 20 hours ago, Haroldkidd said: the HBA and expander are all new Do you mean it's new vs the one from the previous diags? It's still logged as what looks like a HBA problem, could also be some issue between the board and the HBA.
June 25, 20232 yr Author JorgeB, this is the same expander and HBA from the last post, was just explaining that it’s newly purchased and installed back in May 2023. I was able to run a parity check and currently sitting at 22% with 1 synch error. I had to pull the Intel RES2SV240 24-Port Expander out and reinsert it and make sure all the data cables were secure. I think it’s not sitting right in the PCI slot because there’s no bracket to secure it down and any movement or vibration causes it to wobble which maybe the reason why it’s resetting itself a lot. After this parity check I’m going to find an old bracket and see if that fixes the problem.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.