TedatTNT Posted March 9, 2016 Share Posted March 9, 2016 To clarify, these are the issues that I have had THROUGH the last two software updates. Every time I use the UNRAID heavily (moving/saving/accessing large files or LOTS of small files), I lose the ability to communicate with my UNRAID server via Telnet or via the web GUI. I can still use a keyboard/mouse/monitor. Every time I perform a parity check (or one is performed automatically), I lose the ability to communicate with my UNRAID server via Telnet or via the web GUI. I can still use a keyboard/mouse/monitor. This has been an issue for more than a year! I have posted and asked for help numerous times, but even though there are some very helpful folks here, we haven't found a solution and I am about a month away from giving up and buying a Drobo. Here are things that we've tried: posting the diagnostic report (attached) - although I can only get it after a reboot, so it isn't real helpful moving the Ethernet cable to the other NIC on my MB -- both NICs use chipsets recommended on the HCL Using a PCI-based, add-on NIC Replacing one drive that reported some SMART issues Running memtest, swapping memory, running again, etc. Replacement of the power supply Replacement/reloading the USB flash drive I'm at my wit's end -- I'm hoping somebody has some insight that will help me regain the stability that I had four/five years ago, but haven't had in over a year. unraid-diagnostics-20160309-0811.zip Link to comment
ashman70 Posted March 9, 2016 Share Posted March 9, 2016 Can you tell us about your hardware? When you say you had stability for or five years ago, was that on the same hardware you are using today? What changed over a year ago when you began to have these problems? Did you do a hardware update then? Link to comment
JorgeB Posted March 9, 2016 Share Posted March 9, 2016 There’s been various reports of solving this type of issue on V6 by converting all disks from Reiserfs to XFS, unfortunately it’s an involved process and it will take some time, but if you’re willing to do it, see here: http://lime-technology.com/forum/index.php?topic=37490.0 Link to comment
Helmonder Posted March 9, 2016 Share Posted March 9, 2016 To clarify, these are the issues that I have had THROUGH the last two software updates. Every time I use the UNRAID heavily (moving/saving/accessing large files or LOTS of small files), I lose the ability to communicate with my UNRAID server via Telnet or via the web GUI. I can still use a keyboard/mouse/monitor. Every time I perform a parity check (or one is performed automatically), I lose the ability to communicate with my UNRAID server via Telnet or via the web GUI. I can still use a keyboard/mouse/monitor. Can you explain what you mean by the above ? Does "I lose the ability to communicate with my unraid server via TElnet of the webgui" mean that you cannot access it any more ? Does it come back after a time or do you need to reboot ? Do your shares stil function ? Does the move/copy/parity check complete and what are the results ? I do not understand what you mean by "I can still use keyboard/mouse/monitor".... Do you mean that your pc still functions when your unraid has crashed ? Why would you think it might not ? These things do not seem related.. This has been an issue for more than a year! I have posted and asked for help numerous times, but even though there are some very helpful folks here, we haven't found a solution and I am about a month away from giving up and buying a Drobo. Here are things that we've tried: posting the diagnostic report (attached) - although I can only get it after a reboot, so it isn't real helpful First thing to do is get a syslog of the issue at hand.. Since you say you can reproduce the issue (just start a copy or a parity check right?) this is not difficult and discribed in the troubleshooting faq: If you are running headless (no monitor and possibly no keyboard or graphics card), then you can try directing the output of the command above to your flash drive or a data drive. For example, the following will output the last syslog lines to syslogtail.txt on your flash drive. This should allow you to obtain the very last message that the system was able to log. tail -f --lines=100 /var/log/syslog >/boot/syslogtail.txt Post the results in this thread. moving the Ethernet cable to the other NIC on my MB -- both NICs use chipsets recommended on the HCL Using a PCI-based, add-on NIC Replacing one drive that reported some SMART issues Running memtest, swapping memory, running again, etc. Replacement of the power supply Replacement/reloading the USB flash drive All of the above were done with the same result ? I see you got advice on changing the power supply, that means it was a thought that you had power issues... Starting a big copy and a parity check both spin up a lot if not all of your drives... What happens when you do just that, just spin up all your drives, does that crash your system ? If it does not... do you run cache_dirs ? If so turn it off and spin up all your drives and start browsing around your drives.. does that crash your system ? I'm at my wit's end -- I'm hoping somebody has some insight that will help me regain the stability that I had four/five years ago, but haven't had in over a year. It might help if you point us to the threads with all the questions that were allready asked and you answered.. Saves asking them again.. Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Can you tell us about your hardware? When you say you had stability for or five years ago, was that on the same hardware you are using today? What changed over a year ago when you began to have these problems? Did you do a hardware update then? I upgraded the motherboard a couple of years ago from from an ASRock MB that was short on PCIe to an ASUS P5W-DH Deluxe board that was retired from being a HTPC. I also added PCIe for additional SATA ports. It seemed to work beautifully for weeks. I also updated the software at that point. A couple of months later, I began to notice problems. Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 There’s been various reports of solving this type of issue on V6 by converting all disks from Reiserfs to XFS, unfortunately it’s an involved process and it will take some time, but if you’re willing to do it, see here: http://lime-technology.com/forum/index.php?topic=37490.0 Thanks -- I will look into this. My fear, though, is that the system would 'hang' during the process and I'd have a hell of a time getting it completed. Still, I'm very tempted to proceed. Link to comment
isvein Posted March 9, 2016 Share Posted March 9, 2016 How are the temperatures on drives and cpu? Link to comment
Helmonder Posted March 9, 2016 Share Posted March 9, 2016 There’s been various reports of solving this type of issue on V6 by converting all disks from Reiserfs to XFS, unfortunately it’s an involved process and it will take some time, but if you’re willing to do it, see here: http://lime-technology.com/forum/index.php?topic=37490.0 Thanks -- I will look into this. My fear, though, is that the system would 'hang' during the process and I'd have a hell of a time getting it completed. Still, I'm very tempted to proceed. Its not that difficult.. Easiest thing (but costing something) is to add a drive to the system and making that xfs.. then move the reiserfs drives contents to the xfs drive.. and then repeat.. But I would try and get some more certainty on this beiing the issue.. It -is- a lengthy process and if it doesnt work that would be a bit of a dissapointment... Have you made sure to remove ALL of your plugins and running stock unraid only ? See if you can get it to break then ? But first of all: get a syslog of the system failing, that should be easy to do and will probably do a lot in solving the issue.. Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Does "I lose the ability to communicate with my unraid server via TElnet of the webgui" mean that you cannot access it any more ? Does it come back after a time or do you need to reboot ? Do your shares stil function ? Does the move/copy/parity check complete and what are the results ? I do not understand what you mean by "I can still use keyboard/mouse/monitor".... Do you mean that your pc still functions when your unraid has crashed ? Why would you think it might not ? These things do not seem related.. When this happens, I can no longer access the UNRAID server via any means by which one would access it over a network. No web GUI, no telnet, no shares, nothing. Yes, after a reboot (and starting the array), everything works until the next time that it doesn't. Any reading/writing via the network fails. The parity check completes successfully. Keyboard/mouse -- I have a keyboard/mouse/monitor attached to my UNRAID server. These still work and do not appear to have suffered an issue. I can still type in commands directly and they still work. First thing to do is get a syslog of the issue at hand.. Since you say you can reproduce the issue (just start a copy or a parity check right?) this is not difficult and discribed in the troubleshooting faq: If you are running headless (no monitor and possibly no keyboard or graphics card), then you can try directing the output of the command above to your flash drive or a data drive. For example, the following will output the last syslog lines to syslogtail.txt on your flash drive. This should allow you to obtain the very last message that the system was able to log. tail -f --lines=100 /var/log/syslog >/boot/syslogtail.txt Post the results in this thread. I'm not running headless -- is there a different method that I should employ? All of the above were done with the same result ? I see you got advice on changing the power supply, that means it was a thought that you had power issues... Starting a big copy and a parity check both spin up a lot if not all of your drives... What happens when you do just that, just spin up all your drives, does that crash your system ? If it does not... do you run cache_dirs ? If so turn it off and spin up all your drives and start browsing around your drives.. does that crash your system ? I was able to spin up all drives and it worked fine. I could browse files / open files on various drives. I don't know what cache_dirs is. Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 How are the temperatures on drives and cpu? Drives are 34 - 39 degrees Not sure how to find the CPU temp. Link to comment
Helmonder Posted March 9, 2016 Share Posted March 9, 2016 Does "I lose the ability to communicate with my unraid server via TElnet of the webgui" mean that you cannot access it any more ? Does it come back after a time or do you need to reboot ? Do your shares stil function ? Does the move/copy/parity check complete and what are the results ? I do not understand what you mean by "I can still use keyboard/mouse/monitor".... Do you mean that your pc still functions when your unraid has crashed ? Why would you think it might not ? These things do not seem related.. When this happens, I can no longer access the UNRAID server via any means by which one would access it over a network. No web GUI, no telnet, no shares, nothing. Yes, after a reboot (and starting the array), everything works until the next time that it doesn't. Any reading/writing via the network fails. The parity check completes successfully. Keyboard/mouse -- I have a keyboard/mouse/monitor attached to my UNRAID server. These still work and do not appear to have suffered an issue. I can still type in commands directly and they still work. First thing to do is get a syslog of the issue at hand.. Since you say you can reproduce the issue (just start a copy or a parity check right?) this is not difficult and discribed in the troubleshooting faq: If you are running headless (no monitor and possibly no keyboard or graphics card), then you can try directing the output of the command above to your flash drive or a data drive. For example, the following will output the last syslog lines to syslogtail.txt on your flash drive. This should allow you to obtain the very last message that the system was able to log. tail -f --lines=100 /var/log/syslog >/boot/syslogtail.txt Post the results in this thread. I'm not running headless -- is there a different method that I should employ? All of the above were done with the same result ? I see you got advice on changing the power supply, that means it was a thought that you had power issues... Starting a big copy and a parity check both spin up a lot if not all of your drives... What happens when you do just that, just spin up all your drives, does that crash your system ? If it does not... do you run cache_dirs ? If so turn it off and spin up all your drives and start browsing around your drives.. does that crash your system ? I was able to spin up all drives and it worked fine. I could browse files / open files on various drives. I don't know what cache_dirs is. Okay. That suggests that your array itself is continuing to run but your network fails.. That is why you got advised to change the network card.. Gottit.. If you have a monitor attached you can still use the procedure to capture the syslog... You could also just do: Tail -f /var/log/syslog This will show your syslog building up on your monitor, then bring your system down by starting a parity check and make a picture with your camera from the syslog data on your screen, post that.. Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Its not that difficult.. Easiest thing (but costing something) is to add a drive to the system and making that xfs.. then move the reiserfs drives contents to the xfs drive.. and then repeat.. But I would try and get some more certainty on this beiing the issue.. It -is- a lengthy process and if it doesnt work that would be a bit of a dissapointment... Have you made sure to remove ALL of your plugins and running stock unraid only ? See if you can get it to break then ? But first of all: get a syslog of the system failing, that should be easy to do and will probably do a lot in solving the issue.. I have a new 3TB just sitting here and an empty slot - could definitely start with this. The only plugin (other than the included UNRAID Server OS and Dynamix webGUI) is ProFTPd -- I share files with others, occasionally. To test without it, can it be temporarily disabled, or would I have to uninstall it completely and reconfigure it again later? Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 If you have a monitor attached you can still use the procedure to capture the syslog... You could also just do: Tail -f /var/log/syslog This will show your syslog building up on your monitor, then bring your system down by starting a parity check and make a picture with your camera from the syslog data on your screen, post that.. Okay - I'm on it!! Link to comment
ashman70 Posted March 9, 2016 Share Posted March 9, 2016 Do you have a backup of your data or is it only on your UnRaid server? Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Tail -f /var/log/syslog This will show your syslog building up on your monitor, then bring your system down by starting a parity check and make a picture with your camera from the syslog data on your screen, post that.. When I went to take a photo, the lines were changing pretty quick, so I missed a bunch of them -- should I do it again -- maybe shoot video? Ted Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Do you have a backup of your data or is it only on your UnRaid server? Only on my UNRAID. Link to comment
JorgeB Posted March 9, 2016 Share Posted March 9, 2016 Keyboard/mouse -- I have a keyboard/mouse/monitor attached to my UNRAID server. These still work and do not appear to have suffered an issue. I can still type in commands directly and they still work. I hadn’t notice this part before, so your server continues to work except you can no longer use the network? In that case, when it happens again, type diagnostics on the console, then attach the zip created on your flash drive. Also, in this case, converting to xfs may not help, though I’d recommend doing that anyway, but after solving this issue. Link to comment
Helmonder Posted March 9, 2016 Share Posted March 9, 2016 Tail -f /var/log/syslog This will show your syslog building up on your monitor, then bring your system down by starting a parity check and make a picture with your camera from the syslog data on your screen, post that.. When I went to take a photo, the lines were changing pretty quick, so I missed a bunch of them -- should I do it again -- maybe shoot video? Ted As far as I can tell that at least confirms it is something at network level that fails.. New interfaces are beiing created.. Must be because they became unavailable somewhere before in the log.. See what johnie.black posted below... This will get you a complete diagnostics set.. I was not aware that could be done from console, it should make stuff clear (at least we then know what the issue is, a little bit of googling then mostly finds a solution..) Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Tail -f /var/log/syslog This will show your syslog building up on your monitor, then bring your system down by starting a parity check and make a picture with your camera from the syslog data on your screen, post that.. When I went to take a photo, the lines were changing pretty quick, so I missed a bunch of them -- should I do it again -- maybe shoot video? Ted So, very strange... It only took a moment to make it freeze before -- just accessed a few files and hadn't even begun the Parity Check. This time, with a video camera recording the screen, I have transferred dozens of GB's of data and been running a Parity Check concurrently and it hasn't failed. I even tried accessing identical files. It seems that the fix is to keep a video camera recording the screen -- that was easier than I thought. Link to comment
Helmonder Posted March 9, 2016 Share Posted March 9, 2016 Happens with me at the dentist every time... Link to comment
itimpi Posted March 9, 2016 Share Posted March 9, 2016 I originally thought you were getting a complete system lockup where even the console was not working. These problems can be particularly hard to resolve as at the point of failure you can no longer get access to diagnostic information. However if the console remains working then the problem is elsewhere. If you have a working console session then you should be able to use the diagnostics command to get the full diagnostic ZIP file put into the 'logs' folder on the USB stick. This would include the full syslog amongst other things and other information useful in diagnosing problems. As long as that command completes OK then it should mean that the unRAID server is basically functioning and the problem probably then comes down to trying to resolve why you are losing access via the network. Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Okay, I've collected some video -- I'm hoping that this helps. I've edited it down to a 4-minute file and added a few notes and I'm hoping that for somebody, it will be clear what is going on. I can't attach a video here, so this is a link to the video on Dropbox... https://dl.dropboxusercontent.com/u/53914472/UNRAID-001.mp4 Thanks in advance... Ted Link to comment
Helmonder Posted March 9, 2016 Share Posted March 9, 2016 please use the diagnostics command as mentioned, makes it a lot easier.. Link to comment
Helmonder Posted March 9, 2016 Share Posted March 9, 2016 Are you using an Marvell 88E8053 gigabit ethernet interface ? There is an issue that (as far as I can understand) confirms to what you are experiencing... Pretty old issue though... Probably not the same but pointing to the network interface card.. Someone more kernel knowledgeable should take over though... https://bugzilla.redhat.com/show_bug.cgi?id=216799 Link to comment
TedatTNT Posted March 9, 2016 Author Share Posted March 9, 2016 Are you using an Marvell 88E8053 gigabit ethernet interface ? Yes, the dual NIC's are the Marvell Yukon 88E8053's. I have the some issues no matter which is being used. Is there a HIGHLY recommended GB LAN card that I could purchase and test with? I'm asking because the Marvell Yukon's are on the hardware compatibility list. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.