ridley Posted April 21, 2017 Share Posted April 21, 2017 I have had a few problems recently with the server failing ie I cannot access from the network but the console is responsive. But a reboot or shutdown will not work, it says the system is going into shutdown but never completes. I have been trying to diagnose but this happened again today and I had to hard reset and the parity check started. However 7 drives are now showing errors and the check has stopped. I have been running "fix common problems" in troubleshoot mode and I attach the diagnostic. Could someone please advise as to what is my next step without losing everything. tower-diagnostics-20170421-1808.zip Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 Bump Any ideas? After reseating the cables, I restarted the server and started a read check. It is showing millions of errors on 7 drives (maybe 8 if parity was included). The number of errors on each drive is nearly identical so I cannot see it being all of the drives, RAID card? Link to comment
JorgeB Posted April 22, 2017 Share Posted April 22, 2017 System unresponsive: most common fix is converting all reiserfs disks to xfs. Disk errors appear SASLP related, reboot, grab and post new diags. Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 Thanks for that. I have two SASLP in the system, could that be a problem? New Diagnostic attached. tower-diagnostics-20170422-1323.zip Link to comment
JorgeB Posted April 22, 2017 Share Posted April 22, 2017 All disks but one look fine, disk5 show a single pending sector, possibly a false positive and unrelated to latest errors but you should run an extended SMART test to confirm. Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA2084939 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 Parity getting disabled and the multiple disk errors were the result of one of the SASLPs crashing (the one with 8 disks connected), this a somewhat common issue (with both the SASLP and the SAS2LP) although it only affects a small number of users. Some things that can help with that is disabling vt-d if not needed, board bios update or using the controller in a different slot if available. You can cancel the read check you're doing and might at well start a parity sync. Link to comment
JorgeB Posted April 22, 2017 Share Posted April 22, 2017 Don't forget that the system unresponsive issue is unrelated to this and best bet to fix that would be to convert all disks to xfs. Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 12 minutes ago, johnnie.black said: Don't forget that the system unresponsive issue is unrelated to this and best bet to fix that would be to convert all disks to xfs. How do I convert to XFS? Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 21 minutes ago, johnnie.black said: vt-d Whats this? Link to comment
JorgeB Posted April 22, 2017 Share Posted April 22, 2017 Bios setting to enable hardware passtrough for VMs, if enable and you don't need it disable it. Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 BIOS of the SAS2LP presumably? Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 19 minutes ago, johnnie.black said: Wow, that could take a while. Are you sure this is necessary? Link to comment
JorgeB Posted April 22, 2017 Share Posted April 22, 2017 47 minutes ago, ridley said: Are you sure this is necessary? Can't be sure it will fix your problem, but it's without a doubt the #1 cause for an unresponsive server on v6. Also, reiserfs is on the way out, it's badly supported by the maintainers and there's been a few serious issues lately, besides, performance can be terrible in certain situations, so I would recommend converting even if you didn't have problems. Link to comment
JonathanM Posted April 22, 2017 Share Posted April 22, 2017 50 minutes ago, ridley said: BIOS of the SAS2LP presumably? No, motherboard. Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 1 minute ago, johnnie.black said: Can't be sure it will fix your problem, but it's without a doubt the #1 cause for an unresponsive server on v6. Also, reiserfs is on the way out, it's badly supported by the maintainers and there's been a few serious issues lately, besides, performance can be terrible in certain situations, so I would recommend converting even if you didn't have problems. The strange thing is that the server doesnt become unresponsive, you can still use the console or telnet in. It just will not shutdown/reboot. I have a syslog from the last time it became unresponsive I can upload that if it helps. Link to comment
JorgeB Posted April 22, 2017 Share Posted April 22, 2017 Just now, ridley said: The strange thing is that the server doesnt become unresponsive, you can still use the console or telnet in. It just will not shutdown/reboot. That's the usual symptom for this issue, sometimes together with SHFS using 100% CPU. Link to comment
ridley Posted April 22, 2017 Author Share Posted April 22, 2017 25 minutes ago, johnnie.black said: That's the usual symptom for this issue, sometimes together with SHFS using 100% CPU. OK this could take some time, any recommendations re transferring the data? Using Teracopy could be a tad slow. Link to comment
JorgeB Posted April 22, 2017 Share Posted April 22, 2017 The thread I linked has info on the multiple ways of doing it, I personally use rsync. Link to comment
ridley Posted April 23, 2017 Author Share Posted April 23, 2017 On 4/22/2017 at 2:10 PM, johnnie.black said: All disks but one look fine, disk5 show a single pending sector, possibly a false positive and unrelated to latest errors but you should run an extended SMART test to confirm. Device Model: WDC WD20EARS-00MVWB0 Serial Number: WD-WCAZA2084939 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 Parity getting disabled and the multiple disk errors were the result of one of the SASLPs crashing (the one with 8 disks connected), this a somewhat common issue (with both the SASLP and the SAS2LP) although it only affects a small number of users. Some things that can help with that is disabling vt-d if not needed, board bios update or using the controller in a different slot if available. You can cancel the read check you're doing and might at well start a parity sync. OK Checked motherboard BIOS and there is no VT-D option. Have checked the BIOS's on the SASLP and they both have int13 disabled and are running 3.1.0.15n I have not more PCI-e slots so could only swap them around. What to do now? It appears that the SASLP card keeps crashing and I lose connection to multiple (8?) drives. Suggestions please. Link to comment
JorgeB Posted April 23, 2017 Share Posted April 23, 2017 4 minutes ago, ridley said: OK Checked motherboard BIOS and there is no VT-D option. Pretty sure it has to exist, check the manual, though it may be disable by default, you can check the status on unRAID's main page, click on info on the top right and check if IOMMU is enable or disable. 5 minutes ago, ridley said: What to do now? Get another controller, at the moment LSI are the ones that work best. Link to comment
ridley Posted April 23, 2017 Author Share Posted April 23, 2017 1 minute ago, johnnie.black said: Pretty sure it has to exist, check the manual, though it may be disable by default, you can check the status on unRAID's main page, click on info on the top right and check if IOMMU is enable or disable. Get another controller, at the moment LSI are the ones that work best. LSI Model number? Link to comment
JorgeB Posted April 23, 2017 Share Posted April 23, 2017 Most popular are the 9210-8i and 9211-8i, or clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed to IT mode to work with unRAID but are generally a bit cheaper. Link to comment
ridley Posted April 23, 2017 Author Share Posted April 23, 2017 1 minute ago, johnnie.black said: Most popular are the 9210-8i and 9211-8i, or clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed to IT mode to work with unRAID but are generally a bit cheaper. Not the SAS 8308ELP then? Link to comment
JorgeB Posted April 23, 2017 Share Posted April 23, 2017 Just now, ridley said: Not the SAS 8308ELP then? Don't no that model but it looks like an older one, probably doesn't even support > 2TB, stick with the models I listed. Link to comment
ridley Posted April 23, 2017 Author Share Posted April 23, 2017 What does upgrading the SASLP bios improve/solve? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.