miccos Posted May 27, 2016 Posted May 27, 2016 Hi guys, Hoping to get some assistance. I have been running unRaid5 since it's stable release and have had no issues except a drive failure(easy swap) but lately i am not sure what is going on. When I turn my server on and runs like normal. I check SAB and sickbeard and then 2 minutes later its gone. I cannot get to /tower, sickbeard, sab or unmenu. If I get in quick enough and stop the array things seem to be fine but the syslog shows me nothing (that I can understand, but by this time I have hard rebooted it and the syslog for when it failed is gone). So I connected up a monitor and keyboard to get the syslog that way and I cannot type anything. It does not show the usual Tower User# (what ever it is when you log in) all it showed was a list of different [<C------------>] BLAH BLAH BLAH (I had no way of getting this sorry.) Attached is what I could but I not sure it is going to help. Your advice is appreciated syslog.txt
RobJ Posted May 27, 2016 Posted May 27, 2016 I suspect what you saw was a Call Trace, possibly a kernel 'OOPS'. That plus 2 segfaults in the syslog, neither associated with dependency issues, points to a memory fault as the most likely cause. Reboot and run the Memtest from the boot menu, and let it run for multiple passes or until errors appear.
miccos Posted May 27, 2016 Author Posted May 27, 2016 Thanks RobJ, Running memtest now and 3 passes and no errors. I'll let run for a few more hours and see what happens. Any other thoughts? As this same problem has happened a few times. Would upgrading to v6 give any benefit? Ie different tools, finer troubleshooting. Only reason I haven't is the transferring of all my sick and Sab settings. Again the help is appreciated.
miccos Posted May 28, 2016 Author Posted May 28, 2016 So I ran out to 10 passes for the memory check with no errors. Thought I would see how things go, everything started jumped onto to SAB and sickbeard to see what I had missed. And then gone. Took a photo of what the screen shows when connected to the server. Hope that gives some more ideas. Thanks
JonathanM Posted May 28, 2016 Posted May 28, 2016 Since reiserfs is mentioned in the crash screen, I'd do a file system check on all your drives.
miccos Posted May 28, 2016 Author Posted May 28, 2016 Thanks will run some smart checks. I tried to do via console last night couldn't seem to get to work.
remotevisitor Posted May 28, 2016 Posted May 28, 2016 Smart checks will check for hardware reported issues detected by the disks and is a good idea to check. But the suggegestion was to run a file system check which is is a different procedure. For the reiserfs file systems a file system check can take a while.
miccos Posted May 28, 2016 Author Posted May 28, 2016 Thanks, So I ran the file system check across my drives, and 1 came back with this. Comparing bitmaps..Bad nodes were found, Semantic pass skipped 10 found corruptions can be fixed only when running with --rebuild-tree So looked at doing the rebuild but I read it has its risks. What are your thoughts? And can someone explain why this may only cause crashes when the array is running but be perfectly stable stopped or in maintenance mode? Cheers SDG_Disk4_Error.txt
RobJ Posted May 28, 2016 Posted May 28, 2016 Yes, you have a badly corrupted file system on that disk, with multiple corruptions dictating the need for the --rebuild-tree option. I assume you've been reading Check Disk File systems? Go ahead and proceed with that option, but be aware there's a good chance there's some data loss, and even what it recovers may require some handwork, to restore the right file name and put in the right folder. It's rare but there have been other occasions when certain forms of corruption in the Reiser file system could actually crash the system. The crashes could only happen of course when you were actually using the corrupt file system on that drive.
miccos Posted May 29, 2016 Author Posted May 29, 2016 Ok Thanks all for your advice. So far so good but it has only been 10 minutes. I have ended with only 7 files in my lost+found folder and so should be easy to sort. I can't seem to get access to it though. Advice on this?? I'll see how the server goes for a week and if I don't have any more issues we can call this solved.
itimpi Posted May 29, 2016 Posted May 29, 2016 Ok Thanks all for your advice. So far so good but it has only been 10 minutes. I have ended with only 7 files in my lost+found folder and so should be easy to sort. I can't seem to get access to it though. Advice on this?? I'll see how the server goes for a week and if I don't have any more issues we can call this solved. The lost+found folder will not have the correct permissions to allow network access. This can be corrected by running the 'newperms' command against that folder from a telnet session.
miccos Posted May 29, 2016 Author Posted May 29, 2016 Thanks itimpi, Found a way to get the files and just deleted the folders. Only 7 files but will sort them out later. Everything is still looking good, thanks folks.
miccos Posted June 1, 2016 Author Posted June 1, 2016 Thanks everyone for your assistance. The server has been running nicely for a few days now.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.