April 28, 201115 yr Machine boots into Unraid and starts parity check. After about 4% and 7 errors (!), it crashes. I can't get to the machine over the network after the crash. Machine runs memtest fine for 24 hours. Version is 4.6 pro I've attached a syslog from before the crash and the output of /proc/cpuinfo and /proc/meminfo This is the crash, in photographs: http://dl.dropbox.com/u/1146700/unraid%20crash/Unraid%20crash.html Please help - Unraid was great until this happened :-) syslog.zip cpu_and_memory_info.txt
April 28, 201115 yr Good pics, those should help. When the machine crashes are you able to use the system console (keyboard and monitor hooked up to the server directly) or does that freeze up too?
April 28, 201115 yr What happened before the parity check? I mean the first time, as in power failure or... etc? Parity checks are where things can heat up and stumble. Power supplies, drives, etc. Have you checked SMART reports?
May 1, 201115 yr In addition to the above (very good, by the way), I'll add another possible suspect - the mvsas and scst modules, used by some of your drives. In examining some earlier syslogs, similar in vintage to yours, I was rather unimpressed by the stability and maturity of those 2 sub-systems, and in fact, Tom has dramatically changed the SAS support in the v5 series. Failing any better solution, you may want to consider upgrading to the latest 5.0beta. I don't know if Brian will see this (Bjp999), but if he does, I hope he will comment on whether he believes the SAS support seems more stable (I *think* he is using the latest beta). In addition, this could be a one-off anomaly, power spike or something. It would be useful to repeat the crash, several times perhaps, looking for what is common to each crash. It won't be quite the same, after your first crash, since those 7 parity errors should have been corrected, but could still be useful.
May 2, 201115 yr Author Thanks all for the replies. @Rajahal - yes, the console is still working after the crash. The terminal hangs if I try to use the Unraid volume. (The individual drives seem to work though.) For some reason the network is dead - am going to look into this today. @cyrnel - I thought about hardware failure. It is likely, as the parts are all several years old and have had past lives. But Memtest passes, and the failure is always at the same point in the parity check (4%). If it was a hardware fault, I'd expect either 1) Memtest to also fail after the same amount of time and/or 2) the timing to be random. What do you think? @RobJ - Thanks for the idea. I had been holding back on trying 5.0 because replacing something tried and tested with something that's relatively untested didn't seem wise. But if there are known (or at least suspected) problems, I'm all for trying it. Plus I'd get AFP support :-) I'm going to tinker today and will report back.
May 2, 201115 yr Author I've just upgraded to 5.0. Things are looking pretty good: :-) Cancel will stop the Parity-Check. Total size: 2 TB Current position: 288.07 GB (14%) Estimated speed: 91.53 MB/sec Estimated finish: 312 minutes Sync errors: 2 (Previously the parity check completely stopped at 4% after 7 sync errors) I can't login to the shares (my fault for leaving *passwd files around?) but on the shell /mnt/user looks fine. So kudos to Tom for 5.0 - it looks great so far.
May 3, 201115 yr Author This seems to have been fixed by upgrading to 5.0 :-) Parity check did find 66 errors, which is concerning, but things seem to be working OK. Thanks for the advice everyone.
May 3, 201115 yr This seems to have been fixed by upgrading to 5.0 :-) Parity check did find 66 errors, which is concerning, but things seem to be working OK. Thanks for the advice everyone. 5.0 is a no correct check, you need top run a correcting check
Archived
This topic is now archived and is closed to further replies.