wirenut Posted May 1, 2018 Share Posted May 1, 2018 (edited) Upgraded a few days back o 6.5.1. update went fine server working correctly. Expanded arry 2 days ago with additional 4 tb drive, no issues detected. monthly parity check started last night, this morning i see email notifications of one disk disabled and 8 with read errors. parity check still running but sync errors just keep increasing, 391748176 and counting. Something very strange and wrong here. Should i stop the parity check at this point? What steps can i take to troubleshoot? tower-diagnostics-20180501-0552.zip Edited August 27, 2018 by wirenut problem solved Quote Link to comment
itimpi Posted May 1, 2018 Share Posted May 1, 2018 No point in continuing with that many disks reporting errors! i would suggest powering down, checking controller firmly seated and check power and SATA cabling is OK, and then power up and try again. Quote Link to comment
JorgeB Posted May 1, 2018 Share Posted May 1, 2018 One of your SASLP controllers crashed, no point in continuing, reboot might fix the problem for now, until it happens again, it's a known issue with these controllers and you should replace them with LSI HBAs. Quote Link to comment
wirenut Posted May 1, 2018 Author Share Posted May 1, 2018 Thank you for the help. OK. I'll look into replacement of the controller. Been lucky to date i guess as this hasn't happened before. After checking I booted server and it came back on with notification array turned good. Array has 0 disks with read errors. However disk 1 is disabled. Do i replace it? rebuild it or something else? Quote Link to comment
JorgeB Posted May 1, 2018 Share Posted May 1, 2018 If SMART looks fine, since it was offline you'll need to post new diags if you want us to check it, and no data was changed since the errors started I would do a new config instead, then a parity check, preferably without those controllers. If you want to rebuild then use a spare and keep the old disk intact in case something goes wrong. If SMART is bad it's a different story. Quote Link to comment
wirenut Posted May 1, 2018 Author Share Posted May 1, 2018 (edited) Short smart test complete without errors. Running long test now. when done I will post with new diagnostics. I have one spare so replace and rebuild is option if needed. no data changed after errors started. thanks. Edited May 1, 2018 by wirenut Quote Link to comment
wirenut Posted May 1, 2018 Author Share Posted May 1, 2018 that took awhile longer then i remember. smart test passed without error, attached report along with new diagnostics. new config the way to go then? I've never tried this, ive gathered from what ive read that just reassign all discs to their original assignments, confirm parity is valid and start array? then do a parity check. As changing the controllers out are not immediately an option I suppose the same thing could happen and am aware of that. tower-smart-20180501-1520.zip tower-diagnostics-20180501-1524.zip Quote Link to comment
wirenut Posted May 1, 2018 Author Share Posted May 1, 2018 Now home from work with better time to research I think the rebuild with spare disk is best option for my current situation as it appears the controller is the point of mistrust and my backups are not where they should be. If u have any links to steer me in the direction of the LSI controllers that would work with my current board pci 2.0 slots I'd be greatful. Thanks for the help and advice. Quote Link to comment
John_M Posted May 2, 2018 Share Posted May 2, 2018 Dell Perc H310 pulled from decommissioned servers are usually available on ebay for very reasonable prices. They need a x8 PCIe 2.0 slot and you'll need to re-flash the firmware. Quote Link to comment
JorgeB Posted May 2, 2018 Share Posted May 2, 2018 10 hours ago, wirenut said: that took awhile longer then i remember. smart test passed without error, attached report along with new diagnostics. Disk looks fine. 10 hours ago, wirenut said: new config the way to go then? I've never tried this, ive gathered from what ive read that just reassign all discs to their original assignments, confirm parity is valid and start array? Correct. As for the controllers, any LSI with a SAS2008/2308/3008 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed. Quote Link to comment
gtroyp Posted July 10, 2018 Share Posted July 10, 2018 I just added a 9211-8i to my system, and everything was great, until I tried to do a drive replace (didn't run a parity check first), and I got read errors like the OP. Troubleshooting tips? Quote Link to comment
JorgeB Posted July 10, 2018 Share Posted July 10, 2018 6 hours ago, gtroyp said: Troubleshooting tips? Start your won thread and post your diagnostics: Tools -> Diagnostics Quote Link to comment
gtroyp Posted July 12, 2018 Share Posted July 12, 2018 On 7/10/2018 at 1:38 AM, johnnie.black said: Start your won thread and post your diagnostics: Tools -> Diagnostics Wasn't needed. It was a bad cable. Fixed THAT issue... Quote Link to comment
wirenut Posted August 27, 2018 Author Share Posted August 27, 2018 Finally changed out the marvel cards for a couple LSI 9211 8i SAS cards flashed to IT mode and all seems well, Also cut my parity check time by almost half of what they had been. This problem is solved. Thanks again for the help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.