Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Copying data from XP caused Unraid to hang (Resolved=Bad NIC)

Featured Replies

I feel like my Unraid 4.4.2 as my Windows OS that have to restart it once a week.  Somehow copying data from my XP computer caused Unraid to hang. When it happens, the unraid power is still on and I can see the hard drive lights are green. It just doesn't response to telnet or use the keyboard from the server. I have encountered this problem 4 times already and each time I have to manually turn it off then turn it back on again. Everytime I turn it on, I starts the parity check(not sure that's the proper way but all 4 times return 0 error). I have also followed the Check Disk Filesystems steps and no corruptions were found on each drive.

 

This problem seems to happen randomly because copying the same file can't reproduce it. The last time it happened was 2 weeks ago that I thought the problem went away after I installed the Supermicro 8 sata port card. Here's the syslog that I got from Unraid but I don't know how useful it is since it generate a new syslog everytime you start the server.

 

Here's the things that you might want to know.

1. It's random. It can happen during the first 30mins or 6hrs into copying.

2. Copy 1 TB of data from mounting a USB to Server works perfectly fine. I think it ran like 20hrs.

3. When it hang, pinging, telnet or use keyboard to server doesn't do anything.

4. Happened on 2 different XP computers while copying USB hard drive data to Unraid Server

 

System info:

P4 3.2ghz

Abit IC7-G mobo (4 SATA - using 3)

2GB GSkill memory

Supermicro AOC-SAT2-MV8 (8 SATA - using 3)

2 WD 2TB hard drives

4 WD 1TB hard drives

Norco 4220

Sony USB Pro License key from Tom

Router DLink DIR-655

 

 

Here's the link to the other thread that I thought it's related but I guess I should have started a new topic since someone else may or may not have encountered this before. http://lime-technology.com/forum/index.php?topic=4106.msg36479#msg36479

 

BTW, I do have another Pro License USB drive, should I give that a try. If I do replace the USB drive, is it just simple as that or I have to redo Unraid all over again.

 

Thanks in advance,

~joy

I have not reviewed the syslog, however a locked system is usually a hardware issue.

 

I've had this happen with a couple MSI boards and PCI cards.

I used some boot parameters and it alleviated the issue.

YMMV.

 

I think I used nopaic or irqpoll and it worked.

 

 

Example in syslinux.cfg (just an example).

append initrd=bzroot rootdelay=10 acpi=off nolapic noapic irqpoll

 

 

Here are other options.

 

    * nolapic

          o may be required to get some motherboards to work

    * noapic

          o may be required to get some motherboards to work

          o may be required for boards based on the nForce 5 or higher chipsets, until unRAID v4.4 final

    * acpi=off

          o may be required to get some motherboards to work

    * acpi=force

          o may be required to get some motherboards to work (Asus P4SDX may need this)

    * irqpoll

          o wastes cpu cycles, but is sometimes needed for some motherboards

    * pci=routeirq

          o may be required to get some motherboards to work

    * pci=noacpi

          o may be required to get some motherboards to work

    * pci=nomsi

          o may be required to get some motherboards to work

    * swncq=0

          o only if needed, for boards based on the nForce 5 or higher chipsets, and possibly only for unRAID v4.4-beta2

 

  • Author

I just ran a 6 hrs memtest and it passed.

 

Just want to confirm this before making this type of changes. Is the following correct? Does it matter if I use Word or Vi to make the changes?

=================================

default menu.c32

menu title Lime Technology LLC

prompt 0

timeout 50

label unRAID OS

  menu default

  kernel bzimage

  append initrd=bzroot rootdelay=10 acpi=off nolapic noapic irqpoll

label Memtest86+

  kernel memtest

====================================

Also, is there any way to capture the syslog before it locks up? Will tail command work?  tail -f syslog > syslog.backup

 

One last question. Is there a way to monitor the CPU temp while Unraid is running? I notice my CPU temp was hitting 58C when I was checking some bios setup stuff.

 

thanks,

~joy

 

 

 

 

  • Author

This is bad. After I reply my previous post. I got another lockup with about 5 mins into copying data over to the Server.  Unfortunately, I didn't capture the log. I didn't realize it wipe out the log directory when it starts up. So I am copying it to my USB drive with this command now: tail -f syslog > /boot/mylog/syslog.backup

 

I was tailing it with telnet when it happened but there's nothing there when it locked up. It's repeating these 2 lines from telnet.

 

Aug 19 00:00:15 Tower kernel: ACPI: Transitioning device [FAN] to D0

Aug 19 00:00:15 Tower kernel: ACPI: Unable to turn cooling device [f78165a0] 'on'

 

Does anyone know what's this thing doing? I have 3 fan plugs on my motherboard which have 1 cpu, 1 NB and 1 System. The only one that doesn't have a fan plugged is the System.

 

Btw, this is without WeeboTech's changes. I want to confirm my previous post before committing this change.

 

Update: I was able to reproduce it again :-(

 

Attached is the log. Strange enough, my tail log didn't capture the last two lines which my telnet was showing. My Unraid lockup at the time when these 2 lines showed up.

 

Aug 19 00:31:52 Tower kernel: ACPI: Transitioning device [FAN] to D0

Aug 19 00:31:52 Tower kernel: ACPI: Unable to turn cooling device [f78165a0] 'on'

 

thanks,

~joy

 

If the fans are working its an ACPI bug (which alot of MB have) Definately try the no ACPI boot codes

  • Author

I think I am loosing my head on this one. I brought a new cpu cooler to replace my stock one and temp dropped from 58C to 40C. I have not see that ACPI complaints anymore but as always, new problems show up.

 

Without the ACPI changes:

1. Copying data from my computer to Server still lose connection. I can't telnet or ping it. Obvious, I can't see the tower/main page but I can access the server from my server keyboard. Is there a command line to reboot Unraid gracefully like the button from tower/main page?

 

I tried to do following and hoping I can see the tower/main agian but no luck.

/etc/rc.d/rc.inet1  stop

/etc/rc.d/rc.inet1  start

 

 

With the ACPI changes:

1. Same as before, copying data to server and server lockup.

 

thanks,

~joy

 

 

 

Is there a command line to reboot Unraid gracefully like the button from tower/main page?

 

You can reboot gracefully by going through a series of commands

You can "try" to take the array off-line cleanly by typing the following series of commands:

cd

cp /var/log/syslog /boot/syslog.txt

killall smbd nmbd

sync

for disk in /mnt/disk* /mnt/cache

do

  umount $disk

done

 

mdcmd stop

 

Then you can power down by typing:

poweroff

 

or reboot with

reboot

 

When you have connectivity, attach the copy of the syslog you made to the flash drive to your next post.  It might just have the clues needed to figure out what else is happening.

 

If all you want to get to is the web-management page, you might be able to do that by

killall emhttp

nohup /usr/local/sbin/emhttp &

 

If you can get to it, it might save you from typing all the other commands above.

 

Joe L.

  • Author

This is great. I think this should have been on the Wiki.

 

I am planning to use a spare router and some different cables just to make sure it's not the DLink 655 router issue.

 

 

thanks,

~joy

This is great. I think this should have been on the Wiki.

...

 

So have you added it to the wiki yet then :)

 

This is great. I think this should have been on the Wiki.

...

 

So have you added it to the wiki yet then :)

 

 

lol  :D

  • Author

:'( Different router and cables are still no go.  Copying data is still locking up the server.

 

Here's the change in syslinux.cfg

append initrd=bzroot rootdelay=10 acpi=off nolapic

 

Here's the log: This is strange since it started to copy data around 00:30. The last file that it attempted to copy over was at 02:00 but there's nothing in the log.

 

Aug 21 00:03:19 Tower emhttp: shcmd (76): /etc/rc.d/rc.samba stop >/dev/null

Aug 21 00:03:19 Tower emhttp: shcmd (77): /etc/rc.d/rc.nfsd stop >/dev/null

Aug 21 00:03:20 Tower emhttp: shcmd (78): hostname Tower

Aug 21 00:03:20 Tower emhttp: shcmd (79): echo '# Generated' >/etc/hosts

Aug 21 00:03:20 Tower emhttp: shcmd (80): echo '127.0.0.1 Tower localhost' >>/etc/hosts

Aug 21 00:03:20 Tower emhttp: shcmd (81): cp /etc/exports- /etc/exports

Aug 21 00:03:21 Tower emhttp: shcmd (82): /etc/rc.d/rc.samba start >/dev/null

Aug 21 00:03:21 Tower emhttp: shcmd (83): /etc/rc.d/rc.nfsd start >/dev/null

Aug 21 00:07:52 Tower in.telnetd[1858]: connect from 192.168.1.100 (192.168.1.100)

Aug 21 00:08:07 Tower login[1859]: ROOT LOGIN  on `pts/0' from `192.168.1.100'

Aug 21 00:09:10 Tower in.telnetd[1871]: connect from 192.168.1.100 (192.168.1.100)

Aug 21 00:09:24 Tower login[1872]: ROOT LOGIN  on `pts/1' from `192.168.1.100'

 

I still have another USB Pro License. Can I just use that and replace my current one to see if it works? If that still doesn't work, I guess I will have to change my mobo and cpu. Most likely the Supermirco C2SEA + E5200 or E8400.

 

thanks,

~joy

 

 

  • Author

I caught something while tail syslog. Surprisingly, this time server didn't lockup and able to retain connection. Of course, the copying stopped. The only different this time was that I disabled User Share.

 

 

Does anyone know what this error mean other then lost/regain connection?

 

Aug 22 01:22:22 Tower kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Aug 22 01:22:22 Tower kernel:   Tx Queue             <0>

Aug 22 01:22:22 Tower kernel:   TDH                  <22>

Aug 22 01:22:22 Tower kernel:   TDT                  <3e>

Aug 22 01:22:22 Tower kernel:   next_to_use          <3e>

Aug 22 01:22:22 Tower kernel:   next_to_clean        <22>

Aug 22 01:22:22 Tower kernel: buffer_info[next_to_clean]

Aug 22 01:22:22 Tower kernel:   time_stamp           <5e97>

Aug 22 01:22:22 Tower kernel:   next_to_watch        <22>

Aug 22 01:22:22 Tower kernel:   jiffies              <5f50>

Aug 22 01:22:22 Tower kernel:   next_to_watch.status <0>

Aug 22 01:22:24 Tower kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Aug 22 01:22:24 Tower kernel:   Tx Queue             <0>

Aug 22 01:22:24 Tower kernel:   TDH                  <22>

Aug 22 01:22:24 Tower kernel:   TDT                  <3e>

Aug 22 01:22:24 Tower kernel:   next_to_use          <3e>

Aug 22 01:22:24 Tower kernel:   next_to_clean        <22>

Aug 22 01:22:24 Tower kernel: buffer_info[next_to_clean]

Aug 22 01:22:24 Tower kernel:   time_stamp           <5e97>

Aug 22 01:22:24 Tower kernel:   next_to_watch        <22>

Aug 22 01:22:24 Tower kernel:   jiffies              <6018>

Aug 22 01:22:24 Tower kernel:   next_to_watch.status <0>

Aug 22 01:22:26 Tower kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Aug 22 01:22:26 Tower kernel:   Tx Queue             <0>

Aug 22 01:22:26 Tower kernel:   TDH                  <22>

Aug 22 01:22:26 Tower kernel:   TDT                  <3e>

Aug 22 01:22:26 Tower kernel:   next_to_use          <3e>

Aug 22 01:22:26 Tower kernel:   next_to_clean        <22>

Aug 22 01:22:26 Tower kernel: buffer_info[next_to_clean]

Aug 22 01:22:26 Tower kernel:   time_stamp           <5e97>

Aug 22 01:22:26 Tower kernel:   next_to_watch        <22>

Aug 22 01:22:26 Tower kernel:   jiffies              <60e0>

Aug 22 01:22:26 Tower kernel:   next_to_watch.status <0>

Aug 22 01:22:27 Tower kernel: ------------[ cut here ]------------

Aug 22 01:22:27 Tower kernel: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0xf0/0x16d()

Aug 22 01:22:27 Tower kernel: NETDEV WATCHDOG: eth0 (e1000): transmit timed out

Aug 22 01:22:27 Tower kernel: Modules linked in: md_mod ata_piix piix ide_core sata_mv sata_sil libata e1000

Aug 22 01:22:27 Tower kernel: Pid: 0, comm: swapper Not tainted 2.6.27.7-unRAID #3

Aug 22 01:22:27 Tower kernel:  [<c011cc08>] warn_slowpath+0x61/0x86

Aug 22 01:22:27 Tower kernel:  [<c011545b>] enqueue_task+0xa/0x14

Aug 22 01:22:27 Tower kernel:  [<c01154eb>] activate_task+0x16/0x1b

Aug 22 01:22:27 Tower kernel:  [<c011816c>] try_to_wake_up+0x11c/0x125

Aug 22 01:22:27 Tower kernel:  [<c01157c0>] __wake_up_common+0x34/0x58

Aug 22 01:22:27 Tower kernel:  [<c0115fc8>] complete+0x28/0x36

Aug 22 01:22:27 Tower kernel:  [<c015185a>] dma_pool_free+0xde/0x128

Aug 22 01:22:27 Tower kernel:  [<c0130a7b>] clocksource_get_next+0x39/0x3f

Aug 22 01:22:27 Tower kernel:  [<c012fafa>] update_wall_time+0x584/0x71d

Aug 22 01:22:27 Tower kernel:  [<c020e218>] strlcpy+0x14/0x41

Aug 22 01:22:27 Tower kernel:  [<c02c2491>] dev_watchdog+0xf0/0x16d

Aug 22 01:22:27 Tower kernel:  [<c012ece6>] sched_clock_cpu+0x13e/0x149

Aug 22 01:22:27 Tower kernel:  [<c011ab49>] scheduler_tick+0xa0/0xc9

Aug 22 01:22:27 Tower kernel:  [<c012dd86>] hrtimer_run_pending+0x1a/0x78

Aug 22 01:22:27 Tower kernel:  [<c02c23a1>] dev_watchdog+0x0/0x16d

Aug 22 01:22:27 Tower kernel:  [<c01239be>] run_timer_softirq+0x107/0x15a

Aug 22 01:22:27 Tower kernel:  [<c01204b9>] __do_softirq+0x6c/0xcf

Aug 22 01:22:27 Tower kernel:  [<c012054e>] do_softirq+0x32/0x36

Aug 22 01:22:27 Tower kernel:  [<c0105179>] do_IRQ+0x54/0x67

Aug 22 01:22:27 Tower kernel:  [<c01035a3>] common_interrupt+0x23/0x28

Aug 22 01:22:27 Tower kernel:  [<c0107daa>] default_idle+0x2a/0x3d

Aug 22 01:22:27 Tower kernel:  [<c01019e1>] cpu_idle+0xbd/0xd5

Aug 22 01:22:27 Tower kernel:  =======================

Aug 22 01:22:27 Tower kernel: ---[ end trace 1c5a44a222f55607 ]---

Aug 22 01:22:27 Tower ifplugd(eth0)[976]: Link beat lost.

Aug 22 01:22:30 Tower kernel: e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX

Aug 22 01:22:30 Tower ifplugd(eth0)[976]: Link beat detected.

 

This is good!  You finally got the defective part to reveal itself.  It looks to me like you have a bad network card or chipset.  Try disabling it in the BIOS setup, and installing a good network card, and test again.

  • Author

This is good!  You finally got the defective part to reveal itself.  It looks to me like you have a bad network card or chipset.  Try disabling it in the BIOS setup, and installing a good network card, and test again.

 

I hope so because I just finished copying 70GB of data over the network with an old cheapo TrendNet Gig Card.  :-)

 

I am going to try copy 500GB over tonight and see what happen. (crossing fingers)

 

thanks,

~joy

  • Author

Yup, that's it. It's the NIC issue. The 500GB copy over the network went very well.

 

I want to say thank you to all that contributed. At time, I think my choice of words maybe a bit demanding. Sorry, my bad QA habit that expected everything to be fixed or working. I would think Unraid should have better error handling then this. Personally, I feel a bad NIC shouldn't lock up your system.

 

 

thanks,

~joy

 

 

 

This same scenerio happened to me too.  I wound up replacing EVERYTHING except the NIC.  Took me over a year, but finally tried a new nic and all was good.  I saw that it happened to someone else too.  At least I was able to suggest a new nic to him and he saved trouble-shooting time.  NIC's are cheap.  I'd suggest that anyone having a lock-up problem while transfering large files or large sets of files try replacing the NIC first.

  • 2 weeks later...

I would think Unraid should have better error handling then this. Personally, I feel a bad NIC shouldn't lock up your system.

 

I'm glad you were finally able to resolve this.  You are certainly not the first to have NIC issues that required replacement, much more common than it should be.

 

Perhaps just a technicality, but in this case it was a hardware issue that was crashing the Linux OS, not really the fault of unRAID itself.  unRAID runs on top of Linux, and can't generally be faulted if the OS and its hardware support is crashing.  As you say, it would be nice if there were better handling of bad NIC's in Linux ...

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.