[SOLVED] Unraid is Crashing/Hanging when writing files to the array


Recommended Posts

Dear Experts,

 

I have been happily running Unraid for about 6 months now with no problems at all.  The issue I am faced with now is when I come to write new files over the network to it which timeout with nothing being written, the whole system just hangs/stops as a result.  I then can't connect to it via the web interface or log into the system with Putty either and have to perform a hard reset on the server to get it back up again.

 

As it seems to stop working I can't retrieve any log files post this happening as I have to perform a hard reset on the box wiping the logs.  I have included my current log file  from a hard reset this morning.  Parity is OK.

 

I also upgraded from RC5 to R8a thinking this might help but the same behavior occurs.

 

I am wondering whether there is something fundamentally wrong with the install on the flash drive, so am wondering if I should reformat it and start with a fresh copy of unraid?  If so, how can I transfer across my current settings for the array name etc.  I can always install the plugins again later I guess?

 

I can't find any mention of this type of problem on the forums so would appreciate any advice from the people with more knowledge and experience than I.

 

Many thanks in advance!

syslog-2012-11-26.txt

Link to comment

I had a similar issue some time ago, turned out it was due to the onboard nic sharing an IRQ with the secondary SATA controller.

 

Few questions - have you made any hardware changes recently?

Are you using the onboard nic?  If so, what chipset is it? (mine was Realtek)

Are you using the onboard SATA connectors?

What size are the files you're attempting to copy?

 

Some tests:

1) Can you successfully copy 1+ GB of data from disk to disk using Midnight Commander (taking the network out of the equation)?

2) Can you successfully copy small (2-5 MB files) across the network?

3) Can you successfully copy 1+ GB file across the network?

 

If test 1 works and you are using the onboard nic, try adding a dedicated nic (most recommend Intel) and see if that makes any differenct.

 

Link to comment

I had a similar issue some time ago, turned out it was due to the onboard nic sharing an IRQ with the secondary SATA controller.

 

Few questions - have you made any hardware changes recently?

Are you using the onboard nic?  If so, what chipset is it? (mine was Realtek)

Are you using the onboard SATA connectors?

What size are the files you're attempting to copy?

 

Some tests:

1) Can you successfully copy 1+ GB of data from disk to disk using Midnight Commander (taking the network out of the equation)?

2) Can you successfully copy small (2-5 MB files) across the network?

3) Can you successfully copy 1+ GB file across the network?

 

If test 1 works and you are using the onboard nic, try adding a dedicated nic (most recommend Intel) and see if that makes any differenct.

 

No hardware changes recently other than replacing a 2Tb Parity drive with a 3Tb one but was writing file to the system for a couple of weeks.

 

The log file shows the network card to be:

Tower kernel: eth0: Identified chip type is 'RTL8168E/8111E'. (Network)

 

I am using a Foxconn A88GMV AMD 880G (Socket AM3) Motherboard and no extra SATA cards at the moment just the 6 onboard SATA connectors. I have also tried fitting some new SATA cables as well but to no avail.

 

I have also just tried a fresh vanilla install of unRAID 8a with only the unmenu plugin and just this morning tried to copy a 1Gb file which again ended up with the network drive no longer being visible and the webgui cannot be accessed anymore, so I assumed crashed again.

 

The motherboard BIOS is 2010 but not sure if upgrading it would make any difference. I will perform the tests you mentioned and report back.

Link to comment

I had a similar issue some time ago, turned out it was due to the onboard nic sharing an IRQ with the secondary SATA controller.

 

Few questions - have you made any hardware changes recently?

Are you using the onboard nic?  If so, what chipset is it? (mine was Realtek)

Are you using the onboard SATA connectors?

What size are the files you're attempting to copy?

 

Some tests:

1) Can you successfully copy 1+ GB of data from disk to disk using Midnight Commander (taking the network out of the equation)?

2) Can you successfully copy small (2-5 MB files) across the network?

3) Can you successfully copy 1+ GB file across the network?

 

If test 1 works and you are using the onboard nic, try adding a dedicated nic (most recommend Intel) and see if that makes any differenct.

 

OK I have tried all the network tests you suggested.

 

1. Yes no problem but was only running at less than 9Mb/s which seems slow (copied a 8Gb file between disks)

2. Yes they seem to copy including ones that were 50-60Mb

3. As soon as I tried to copy large video files it seems to crash the system

 

Streaming from unRAID is faultless and has never crashed when watching movies etc, its ONLY when writing files to the array.  Do people think it's still a network problem please?

 

Is there a way to capture the log files onto the USB stick so that after the reboot I can see what errors are occuring as they get wiped when I reboot the box?

 

Thanks!

Link to comment

I am having this exact same problem except I am new to Unraid.  I setup my array with 2 data drives and copied all of my media files (1TB worth) over the network with no problems at 50+MB/sec.  I then installed my 3rd disk, configured as parity, ran parity sync which took about 25 hours then ran an additional parity check once the sync was finished to double check everything.  Now however when I try to copy any large files over the network, I get 15MB/sec at best or less transfer and within transferring the first file I get a network error and my array is completely and I have to reboot the system.  Reading works fine so just like the original poster it is writing only to the array.

Link to comment

Try this:

 

      http://lime-technology.com/forum/index.php?topic=24271.msg211992#msg211992

 

It is a bit of a long shot as you had problems prior to rc8 but several people have had Samba problems in rc8 that were fixed by upgrading from 3.6.7 to 3.6.8

 

Thanks for the advice. I did upgrade to Samba 3.6.8 but the problem still seems to be happening.  I did manage to capture the log file from the server whilst the network connection and webgui stopped working using the tail -f --lines=100 /var/log/syslog >/boot/syslogtail.txt command.

 

The file doesn't look easy to read but if anyone has ANY ideas of how I can fix this I would be so grateful as it can't be good for the system to have to hard reboot the box all the time.

 

Thanks in advance...

syslogtail2.txt

Link to comment

I had a similar issue some time ago, turned out it was due to the onboard nic sharing an IRQ with the secondary SATA controller.

 

Few questions - have you made any hardware changes recently?

Are you using the onboard nic?  If so, what chipset is it? (mine was Realtek)

Are you using the onboard SATA connectors?

What size are the files you're attempting to copy?

 

Some tests:

1) Can you successfully copy 1+ GB of data from disk to disk using Midnight Commander (taking the network out of the equation)?

2) Can you successfully copy small (2-5 MB files) across the network?

3) Can you successfully copy 1+ GB file across the network?

 

If test 1 works and you are using the onboard nic, try adding a dedicated nic (most recommend Intel) and see if that makes any differenct.

 

OK I have tried all the network tests you suggested.

 

1. Yes no problem but was only running at less than 9Mb/s which seems slow (copied a 8Gb file between disks)

2. Yes they seem to copy including ones that were 50-60Mb

3. As soon as I tried to copy large video files it seems to crash the system

 

Streaming from unRAID is faultless and has never crashed when watching movies etc, its ONLY when writing files to the array.  Do people think it's still a network problem please?

 

Is there a way to capture the log files onto the USB stick so that after the reboot I can see what errors are occuring as they get wiped when I reboot the box?

 

Thanks!

 

You have a Realtek nic (same as I did) and you have the lockup when writing larger files (same as I did).  I also never had an issue reading files, only writes.  I'd try a dedicated nic.

Link to comment

Run checkdisk on the flash in a PC or Mac.

Run reiserfsck check on all of the data drives. See Check File Systems in my sig.

 

Thanks, that looks like it was a good idea.  I get this error below.  I understand from the Wiki that I need to be careful about using the --rebuild-tree.  Any advice on this please?

 

Checking internal tree.. \/ 18 (of  18//165 (of 167\/  1 (of 161|bad_path: The l        eft delimiting key [11512 11513 0x1448b001 IND (1)] of the node (401319054) must          be equal to the first element's key [11510 11511 0xbf89d001 IND (1)] within the          node.                                                /166 (of 167/block 4307355        15: The level of the node (0) is not correct, (2) expected

the problem in the internal node occured (430735515)finished

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

2 found corruptions can be fixed only when running with --rebuild-tree

 

Link to comment

Run checkdisk on the flash in a PC or Mac.

Run reiserfsck check on all of the data drives. See Check File Systems in my sig.

 

Thanks to everyone for their input.

 

The final reason why the system was dropping off the network and file transfers timing out was due to file system problems. Once I ran the reiserfsck --check and subsequent fix commands on 2 drives.  No the problem has gone away and back to writing files no problems so it wasn't anything to do with the NIC (realtec).

 

There is no indication in the log files that there was a file system problem so I assume there is no way to actually know you need to run the fix on them?  I assume it would be good practice to perform a periodic reiserfsck on all drives.  Are there any plugins for this to automate the process or report on it at all?

Link to comment

I am having this exact same problem except I am new to Unraid.  I setup my array with 2 data drives and copied all of my media files (1TB worth) over the network with no problems at 50+MB/sec.  I then installed my 3rd disk, configured as parity, ran parity sync which took about 25 hours then ran an additional parity check once the sync was finished to double check everything.  Now however when I try to copy any large files over the network, I get 15MB/sec at best or less transfer and within transferring the first file I get a network error and my array is completely and I have to reboot the system.  Reading works fine so just like the original poster it is writing only to the array.

 

those speeds are normal (maybe 15/sec is a touch low, but normalish). The reason is parity calculations are being done on the fly.

 

search around for benefits of Cache drive.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.