Haroldkidd Posted May 23, 2023 Share Posted May 23, 2023 (edited) Unraid 6.11.5 I keep having disks go into an error state during a parity check and it’s ticking me off and trying to zero in on the culprit. This has happened 3 times during a Pairty Check, usually within 10-15hrs into it the Parity check will stop and the server reports a disl going into error state. Drive 1 and 4 (20TB) were the ones that were reported. I'm sure I may of not made my post clear and unconfusing, I aplogize for that and will try to clarify anything in followup questions. I plan on checking the hardware connection’s tomorrow. syslog.1.txt.zip tower-smart-20230523-0643.zip tower-diagnostics-20230523-0648.zip Edited May 23, 2023 by Haroldkidd Added Complete diagnostics Quote Link to comment
JorgeB Posted May 23, 2023 Share Posted May 23, 2023 Please post the complete diagnostics. Quote Link to comment
Haroldkidd Posted May 23, 2023 Author Share Posted May 23, 2023 Hope that is the information you wanted and it helps, kind of unfamiliar with all of this. Thank you for your assitance. Quote Link to comment
Solution JorgeB Posted May 23, 2023 Solution Share Posted May 23, 2023 May 22 21:35:25 Tower kernel: mpt2sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready May 22 21:35:25 Tower kernel: mpt2sas_cm0: fault_state(0x7e21)! May 22 21:35:25 Tower kernel: mpt2sas_cm0: sending diag reset !! May 22 21:35:26 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS The HBA keeps faulting and resetting, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot if available. 1 Quote Link to comment
Haroldkidd Posted May 23, 2023 Author Share Posted May 23, 2023 (edited) JorgeB, thanks I'll look into that. I just expanded my disk storage and rplaced the original Marvell controller with a LSI 9207-8i HBA and a SFF-8088 to SAF-8087 adapter. My case is a 3U rack so it may be getting hot. I will check the HBA seating and connections, whats a good way of cooling it. Maybe it's time to upgrade my Tower is back from 2014 that I purchased from Limetech when they were selling pre configured servers. Also if I do manage to be able to get through a complete Parity check, what do i do about the high number of Synch Errors? Edited May 23, 2023 by Haroldkidd Quote Link to comment
JorgeB Posted May 23, 2023 Share Posted May 23, 2023 1 minute ago, Haroldkidd said: whats a good way of cooling it. These controllers are made for server cases, they require some airflow around them, if your case doesn't have good airflow in that zone you can add a fan close or on top of it. 2 minutes ago, Haroldkidd said: Also if I do manage to be able to get through a complete Parity check, what do i do about the high number of Synch Errors? First step would be a correcting check. Quote Link to comment
Haroldkidd Posted May 23, 2023 Author Share Posted May 23, 2023 (edited) Thank you, I figured that would be the solution i just wish that there was a way to see what was being corrected. I just spent days updating quality and missing episodes and movies. Thank you for your reply. Edited May 23, 2023 by Haroldkidd Quote Link to comment
Haroldkidd Posted May 23, 2023 Author Share Posted May 23, 2023 4 hours ago, JorgeB said: May 22 21:35:25 Tower kernel: mpt2sas_cm0 fault info from func: mpt3sas_base_make_ioc_ready May 22 21:35:25 Tower kernel: mpt2sas_cm0: fault_state(0x7e21)! May 22 21:35:25 Tower kernel: mpt2sas_cm0: sending diag reset !! May 22 21:35:26 Tower kernel: mpt2sas_cm0: diag reset: SUCCESS The HBA keeps faulting and resetting, make sure it's well seated and sufficiently cooled, you can also try a different PCIe slot if available. I checked all the cables, reset both the HBA and the expander and I left the panal off of the tower and put a fan blowing on it, so we will see. Just curious why this only happens when I do a parity check and not when transfering new data or downloading to the share folders. Maybe because since all the drives are spun up when doing a Parity check it overheats. Curious Quote Link to comment
JorgeB Posted May 23, 2023 Share Posted May 23, 2023 Activity on all disks for a long period will cause more stress on the HBA. Quote Link to comment
Haroldkidd Posted May 23, 2023 Author Share Posted May 23, 2023 (edited) 44 minutes ago, JorgeB said: Activity on all disks for a long period will cause more stress on the HBA. Makes me not want to do a parity check good lord, it takes 3 days to do a check. Thanks, I appreciate your help. Edited May 23, 2023 by Haroldkidd Quote Link to comment
Haroldkidd Posted June 24, 2023 Author Share Posted June 24, 2023 (edited) So this happened again, I did a Parity check before going to bed because I havent; done one since May 23, 2023 and I have added a lot more data to it. Woke up this morning with the Parity check paused and drive one disabled. Ugh this is frustrating, the HBA and expander are all new and I have the case open and a 26" box fan constantly blowing on the the card. Do I need to replace the wires or the HBA again? Hate to have to replace the wires as I suck at cable management, and it will be ugly. Thanks for any support. Last time I had to remove the hba and reset it and disconnect and reconnect the cables before I could actually do a clean parity check. Will try to do that again. Could it be the Intel RES2SV240 24-Port Expander? It didn't come with the Bracket to screw it to the case, and it wobbles back and forth if you physically move it. tower-diagnostics-20230624-0705.zip syslog.1.txt syslog.txt Edited June 24, 2023 by Haroldkidd Added info Quote Link to comment
JorgeB Posted June 25, 2023 Share Posted June 25, 2023 20 hours ago, Haroldkidd said: the HBA and expander are all new Do you mean it's new vs the one from the previous diags? It's still logged as what looks like a HBA problem, could also be some issue between the board and the HBA. Quote Link to comment
Haroldkidd Posted June 25, 2023 Author Share Posted June 25, 2023 JorgeB, this is the same expander and HBA from the last post, was just explaining that it’s newly purchased and installed back in May 2023. I was able to run a parity check and currently sitting at 22% with 1 synch error. I had to pull the Intel RES2SV240 24-Port Expander out and reinsert it and make sure all the data cables were secure. I think it’s not sitting right in the PCI slot because there’s no bracket to secure it down and any movement or vibration causes it to wobble which maybe the reason why it’s resetting itself a lot. After this parity check I’m going to find an old bracket and see if that fixes the problem. Quote Link to comment
JorgeB Posted June 25, 2023 Share Posted June 25, 2023 Or try a different PCIe slot if possible. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.