Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Help with lockups and mounting problems

Featured Replies

Hi, I am having a lot of trouble lately. Actually just started happening today. I was copying stuff onto my unraid server and after a while it just stopped!! I didn't know what was wrong, so I checked the PC and found that it had gone off. I just thought that maybe someone had turned it off by mistake, so I turned it back on and went back to my own PC and viewed the unraid server through the webpage. I have three 500GB disks, 3 green lights, but the bottom one under free said "mounting" I waited to see would it finish mounting, but after about 10 minutes, the server crashed again. And now that's all that happens, the third harddrive says mounting, but never mounts and after 10 minutes the whole server crashes and I have to turn it off.

 

Does anyone have any ideas? I upgraded to version 4.2.4 from 4.2.1, but still hangs at the same spot.

If it says mounting for a long time, then it's usually recovering a journal or something like that.

It could also be a hard drive timing out.

 

Perhaps boot the server, then telnet in as root.

try copying /var/log/syslog to your /boot flash so that you can inspect it

 

cp /var/log/syslog /boot

 

or you can do a tail -f /var/log/syslog and see what the system is reporting.

 

I'm sure the admins are going to want to see the latest lines in your syslog to see what is happening.

 

 

 

 

 

 

  • Author

thanks,

 

do I do the tail command first? then the next reboot copy the syslog? 

 

I can't type anything into the console when it hangs. I have to reboot, and the syslog seems to be just from the last reboot, it doesn't have anything in it relating to the crash.

 

I mean is there anyway it can save the syslog at the moment of crash?

My advice was not very good, since you can't do it if you haven't finished booting.  The syslog needs to be copied (cp /var/log/syslog /boot), but the only thing you want running when it hangs is the 'tail -f /var/log/syslog' command, because it will show the very last log entries up to the moment of 'hanging', perhaps even after.

 

If you are using User Shares and you have access to the Web Management page, I would temporarily turn User Shares off, and see if that makes a difference, for the following reboot.

 

thanks,

 

do I do the tail command first? then the next reboot copy the syslog? 

 

I can't type anything into the console when it hangs. I have to reboot, and the syslog seems to be just from the last reboot, it doesn't have anything in it relating to the crash.

 

I mean is there anyway it can save the syslog at the moment of crash?

If you "tail -f" the syslog, the contents will be on the system console.  Any error messages prior to the crash will be on the screen.  You won't have them in a file, but at least you can transcribe them/take a photo (I did that once, and it worked well)

 

shutdowns are frequently power supply related.  Nothing I know of will shut the server down except the powerdown button.  Check your air-filters...

As far as the "mounting" message, unless I press the "refresh" button, my server will say "starting" forever... 

 

Yes, do a file-system check.  You might have corruption as a result of the shutdown.  I'd do a full parity check too.

 

Joe L.

  • Author

Ok guys, thanks for the help!! I will try those solutions out and get back to you  :)

  • Author

sorry guys, I was trying the check disk command as Robj said. But, it tells me that there is no md1 or md2 or anything.

when I type in the command

 

reiserfsck /dev/md2

 

it prompts for a yes or no and when I type yes it just goes back to the root prompt. It looks like it done nothing!

 

How can I list what devices are there? so I can point the program to the right place!!

That wiki page should probably be updated for other ways to run the command.  Your md1 and md2 are not available until successful mounting.  You need to instead specify the partition of the device symbol.  Either from the Devices tab on the Web Management pages, or from viewing the syslog at the Disk Inventory section near the bottom, or from a 'tail -f --lines=99 /var/log/syslog' and use Shift-PgUp to get to the Disk Inventory section, determine the drive device labels for the data disks.  They should be something like hda, hdb, sdc, sdd etc, then add a 1 to them to indicate the first partition.  The command then becomes something like 'reiserfsck /dev/hda1' or 'reiserfsck /dev/sdb1'.  Also note that the response to the yes or no must be exactly Yes, that's a capital Y and lowercase e and s.

 

If reiserfsck reports the need to rerun with another parameter, do so, exactly as it says.

 

  • Author

thanks robj.

  • Author

Well, the drives were called

 

sda1 = parity drive

sdb1 = disk1

sdc1 = disk2

sdd1 = spare

 

I ran the reiser check on the first 3 disks, with disk 1 and disk2  it ran fine, no corruption. With the parity disk it says something about not having a superblock and to rerun the reiser check with the sb option.

 

Both drives are now mounting, but it is hanging after about 10 minutes on rebuilding the parity.

 

The server will stay up as long as I don't try refreshing the parity.

 

I tried to use the spare disk for the parity drive, in case it was a major fault with the parity drive. so in devices I swapped Sda1 for Sdd1 and then let it try to rebuild again. But the same thing happened, it crashed after 10 minutes.

 

Would it be some badly corrupt files are causing this? and any time the systems tries to access the area with those files it crashes?

  • Author

some more info,

 

I disabled the usershares, but still crashed.

 

And here is the last few entries in the tail -f command. Not exactly what is written, it has a few lines saying (after the date,time and machine name kernel:)

 

can't shrink filesystem on-line

 

then it says,

 

emhttp[1303]: shcmd (30) : Killall -HUP smdb

 

Last thing is

 

MD: using 1152k window over a total of 4888386552 blocks

 

That's it, then it stopped. there are some more entries before these like

 

reiserFS: md1: checking transaction log (md1)

reiserFS: md1: using r5 hash to sort names

reiserFS: md2: using r5 hash to sort names

 

All of those log entries are completely normal, no problem indicated at all.

 

It is my fault for not mentioning that you should NOT run reiserfsck on the parity drive, as it does NOT have a file system on it, to be checked or corrected.  It only has parity bits on it.  So if you did repeat the command with that extra option on the parity drive, then that may have corrupted one or more of the early parity blocks.  I'm sorry.  If you were able to run a parity check, it would show one or more early 'parity incorrect' errors.

 

If reiserfsck found no errors on drives sdb and sdc, then you don't have any corrupt files.

 

If you un-assign the parity drive, does the system run fine, and files look good on Disk 1 and Disk 2, for at least an hour?

 

  • Author

Well Rob, I didn't run the reiserfsck on the parity drive. If in doubt, I generally don't do things :-)

 

I have left it on all last night and all of today without the parity drive enabled. It works perfect, copied lots of things over and back between different computers and the server. Left it copying over 600GB of stuff last night and it didn't have any problems.

 

so it seems to be just when I enable the parity drive that there it hangs. I have tried a different drive for parity, but it still crashed, so I can rule out faults with the drive itself.

 

Is there anything I can do? Would the "restore" option that is in the web interface do anything for me? Or has anyone any more ideas?

I have to admit I am a bit stumped.  You have given us a fair amount of info, seems enough for a diagnosis.  Were you able to obtain a syslog earlier, with 3 drives assigned?  It might be useful, can't promise.  And indicate which of your drives was the one that couldn't seem to finish mounting.

 

You might also try testing your disks with smartmontools.  See this thread for smartctl.zip:  http://lime-technology.com/forum/index.php?topic=1521.  And see this post from WeeboTech for the commands to run the short or long test on each:  http://lime-technology.com/forum/index.php?topic=1302.msg10611#msg10611.  I would test the 2 data disks first, then the others if no problems have shown up.  These are time-consuming tests, 'short' is probably much shorter.

 

You do not want to use the Restore option, except when you are about to rebuild parity.

 

so it seems to be just when I enable the parity drive that there it hangs. I have tried a different drive for parity, but it still crashed, so I can rule out faults with the drive itself.

Have you tried

A different cable?

A different Controller Port?

 

 

  • Author

Yes I tried a different cable and a different port.

 

Rob, sorry, I forgot to say that both data disks are mounting now, so everything is working fine, but I have no parity drive. If I use a completely different drive for parity that's attached using a different cable and into a different port, it still hangs.. I have tried Three different disks now for the parity and none of them work. It hangs each and everytime. I can post up the current syslog If you guys want to have a look.

  • Author

Here is the syslog.txt from yesterday evening.

 

Oh and how do I know when it is finished the self test? Can I check it or watch it's progress?

I don't see anything wrong in the syslog, but you do have my most hated of motherboard chipsets, the nForce4.  I don't know how much importance to put on this, as many people have no problems with nForce4-based boards, particularly the Asus boards.  But there are very long threads online about 3 major kinds of problems with these boards, and since you have a Maxtor IDE drive, you are 'eligible' for all 3: Maxtor compatibility, IDE detection, and data corruption.  I had to give up on mine for unRAID, but I did not have the same symptoms as you, so my problems may not be relevant.  In your case, you are not assigning the Maxtor or any IDE drives, so I don't believe there is a connection with what you are seeing.  I now have an Epox nForce570 board, very fast, lots of ports, with none of the problems of previous nForce series boards.  If you decide to build a larger unRAID array, I would recommend a different board.

 

I don't have any more ideas for your current situation.  You could try the latest beta, 4.3 beta 5, for its more recent kernel, but that's a long shot, probably won't give any improvement.

 

  • Author

Thanks Rob. They are actually Western Digital hard drives that I use in my raid. The Maxtor ide drive is just in the system because I was orginally going to put on a windows server.

 

It's just funny how it worked so well for 3 months, then just suddenly do this.

 

Thanks for all your help.

I should add that you did have network connectivity issues, twice a loss of 'link beat' in just the few seconds showing in the syslog.  I don't see how any network difficulties could affect unRAID operations, but it would make the Web management page display sometimes problematic, and the drives could disappear from the view of other computers.  The console would still be fine on the unRAID server though.  You might keep a tail running on the console, so you will know when the network is up, 'Link is up '... should show.  At the console, type:  tail -f /var/log/syslog

 

  • Author

Well I tried to use 4.3 beta, but I had no network at all. The new version probably does not support the marvel lan card :-(

 

I went back to 4.2.4 and network working again.

 

I hadn't noticed any network issues before, but, when it hangs, I can't even get into the console.

 

I made a few changes to the bios, just trying anything at the moment to see if I can get it back working properely!!

 

If I were to buy a new board for the unraid, you would recommend the nforce570? Or maybe one built on the ati xpress chipset? ( I have a socket 939 motherboard built on the ati xpress 200 chipset, I could use this if it was better!!)  And would you recommend me getting a network card or or use the onboard one? 

Well I tried to use 4.3 beta, but I had no network at all. The new version probably does not support the marvel lan card :-(

 

I went back to 4.2.4 and network working again.

 

That is odd, possibly indicative of something, not sure what yet.  You are currently using the skge driver (version 1.11), and I suppose it is possible that Tom inadvertently dropped skge from v4.3.  Did you possibly capture a 4.3 syslog?  Could be useful in seeing what Linux thinks of your networking chipsets.  Some useful console commands:

  lsmod      (shows what modules are needed for your system, look for skge among them)

  ifconfig    (lots of info about network setup, especially for eth0 the network device that unRAID will use)

  ethtool eth0  (more info about eth0)

  ethtool -i eth0    (the network driver and its version)

 

You can also find the driver by searching the syslog for lines with eth0.

 

If I were to buy a new board for the unraid, you would recommend the nforce570? Or maybe one built on the ati xpress chipset? ( I have a socket 939 motherboard built on the ati xpress 200 chipset, I could use this if it was better!!)  And would you recommend me getting a network card or or use the onboard one?

 

I can recommend nForce 5 series and higher, such as 570, 590, 680i, 680a, and the new 790's.  I can also recommend the Intel based chipsets.  The wiki has a hardware compatibility list, though usually out of date, and the forums have been discussing different boards lately.  What I can't recommend so far are any of the ATI based chipsets.  Linux support for them is very poor.  There is an ECS board being discussed that may be supported, is being tested now.

 

A different network card is a good idea, even if slow, to help eliminate another suspect piece of hardware.

 

Here's an idea. Have you checked the power supply?

 

When ever I see lockups, I always suspect cables and or controller's that need to be reseated.

I also reseat memory just to be sure..

 

But something that has been running for months, then all of a sudden stops/hangs when you add a parity drive on other ports, cables and such .....

Seems to be some sort of threshold.

 

After reading the thread, there's some report of lan interruption...

 

Could something have aged in the power supply? Is it overheating?

 

Just some thoughts.

 

What is the inventory of your system?

CPU, Memory, Hard drives, Power Supply.

 

 

  • Author

Ok, The problem happened while copying data. I had two drives and a parity drive all working perfectly, then, just stopped. the only way I can get it to work now is to disable the parity drive. I didn't just add a parity drive, I have had one installed all along. The controller is an onboard controller by nividia. It has 4 ports. I have one spare drive in the system that I tried to use for the parity drive, but it hung with that one as well. I tried changing the cables for others I have, but same problem.

 

The cpu is an opteron 170.

Motherboard - DFI Lanparty UT NF4 SLI-DR

memory is OCZ PC4000 Gold GX XTC 2GB (2*1GB sticks)

Hard drives - Western Digital Caviar SE16 500GB SATA-II 16MB Cache, there is one maxtor 200GB IDE drive, but I have disconnected it to try and solve the problems.

Power Supply - FSP Epsilon FX600-GLN  600W Active PFC

Antec P180 case.

All water cooled, it used to be my main game PC.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.