madmanx Posted January 6, 2022 Share Posted January 6, 2022 (edited) Hi All I have been trying to identify the cuas eof constant crashes when carrying out parity checks it will get to around 1% then system will totally hang requiring a power cycle it has been left on for a whole night with the system never responding so I am confident it isn't just slow performance. I have carried out disk check, they provided no errors so I repaired anyway because I am hearing some strange things. In addition I have noticed my smart report showing errors but my drives showing healthy. I also have found that the disks with the exception of one refuse to run extended test, I click on it and it runs for like 1 second then stops. They also don't display drive capabilities these are all the same brand the 2 other disks are a different brand. Just some background on the system 5820k X99 mother board 8 SAS Drives with SAS controller 6x toshiba 3tb 2x hitachi 3tb (these are the owns which run the smart tesdt) I am a noob so I maybe be missing something so your help is much appreciated. I can see others have something similar but I can't see anyone given an actual solution. Diagnostic file attached titan-diagnostics-20220106-1309.zip Edited January 6, 2022 by madmanx Add more information Quote Link to comment
madmanx Posted January 6, 2022 Author Share Posted January 6, 2022 Hi thank you for the reply, forgot to say, I have already done that and sent it to flash as it wouldn't log anything when sent to the array as I assume the array is crashed until restart. Attached. syslog Quote Link to comment
madmanx Posted January 7, 2022 Author Share Posted January 7, 2022 19 hours ago, trurl said: setup syslog server Hi thank you for the reply, forgot to say, I have already done that and sent it to flash as it wouldn't log anything when sent to the array as I assume the array is crashed until restart. Attached. I'm sure I replied to this using the quotes syslog Quote Link to comment
trurl Posted January 7, 2022 Share Posted January 7, 2022 Was the crash between these 2 entries? Jan 6 08:42:45 Titan ool www[31028]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config Jan 6 12:51:48 Titan rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="6885" x-info="https://www.rsyslog.com"] start Quote Link to comment
madmanx Posted January 7, 2022 Author Share Posted January 7, 2022 13 minutes ago, trurl said: Was the crash between these 2 entries? Jan 6 08:42:45 Titan ool www[31028]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config Jan 6 12:51:48 Titan rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="6885" x-info="https://www.rsyslog.com"] start Yes that was one of them. I'm going to try it again but this time boot into SafeMode also forgot to say I have also put the memory through memtest and 2 passes with 0 errors. I'm going to drop some new RAM and see what happens Quote Link to comment
trurl Posted January 7, 2022 Share Posted January 7, 2022 Power might be another consideration though you don't have that many disks so if the PSU isn't actually failing I would guess it is adequate. Quote Link to comment
madmanx Posted January 7, 2022 Author Share Posted January 7, 2022 Well I did think about this but it ran fine after changing the motherboard previously it was a ryzen 1600x, could it be the SAS raid controller? Quote Link to comment
trurl Posted January 7, 2022 Share Posted January 7, 2022 1 minute ago, madmanx said: could it be the SAS raid controller? I guess it could be. The thing about parity checks is it accesses all the array disks simultaneously. One of the reasons power is sometimes mentioned in these situations. A normal desktop PC with multiple drives may not have them all spin up at the same time. Then again they may not ever spin down after boot in that environment either. I would think 1% into the check would have everything spinning already, but don't know how accurate that "1%" really is. I guess a controller could react badly to multiple disk access too. Maybe overheating? Quote Link to comment
madmanx Posted January 7, 2022 Author Share Posted January 7, 2022 (edited) That makes sense, I am watching it quite closely and all are running. I am a little confused by the log, it shows that I have carried out parity checks does it only log the parity check when it is finished? Edited January 7, 2022 by madmanx Quote Link to comment
madmanx Posted January 7, 2022 Author Share Posted January 7, 2022 (edited) Just happened 8.4% although last night it clearly ran for at least 3hrs Edited January 7, 2022 by madmanx Quote Link to comment
trurl Posted January 7, 2022 Share Posted January 7, 2022 2 minutes ago, madmanx said: confused by the log If you mean those logs from syslog server, sometimes they help and sometimes they don't. Syslog server doesn't really get going until after most of the boot process has completed, so those entries don't appear, but it is the entries leading up to crash that are sometimes useful in a crash. Sometimes nothing obvious is logged when this happens because it just crashes before anything can be logged. Quote Link to comment
madmanx Posted January 7, 2022 Author Share Posted January 7, 2022 I've just reseated the SAS card ill try the new RAM and if none of that works I'll just set the whole thing alight and start again Quote Link to comment
madmanx Posted January 9, 2022 Author Share Posted January 9, 2022 Found the solution it was the RAM for some reason the RAM ran at a higher speed than it was rated at, after reducing it down it's been working perfectly to parity checks and all good Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.