Kernal Panic / UnRaid not working / Please Help


Recommended Posts

All,

I have an original UnRaid (PATA Drives) running that I have had for several years still using the initial UnRaid version (3.xxx?).  It has about 5 or so TB in it and has worked falwlessly for the last several years.  I have since purchased 2 more systems (1 more for me and 1 for a friend (SATA Drives).  My PATA verison built by Lime Technologies now no longer works reliably.  I can get it to reboot and run for a brief period of time but then I receive the following error:

 

CPU 0: Machine Check Exception: 0000000000000004 (There may be 1 mor digit in this string)

Bank 0: b20000001040080f

Kernal panic: CPU Context Corrupt

 

What does this mean and how can it be resolved.  Of course, my biggest concern is the data.  I am adding space to my other system but just can't swap in the drives as they are all PATA not SATA.  I firstly hope t get it back to working but if not need a strategy to move the data off the disks (which hopefully are fine or seem fine as when it boots up and works for a brief time period are accessible).

 

Thank you for your help!

 

Sincerely,

Scott

Link to comment

bjp999,

Thank you for your response.  I am not familiar with Linux/Unix and I searched both the wiki and Forums re: how to run a memory test but was not successful in locating any instructions.  Can you provide guidance and instructions on how to run a memory test as you have advised?  Thanks again for the help!

Link to comment

I can reassure you that it is extremely unlikely that there is anything wrong with your data or drives, just an 'aging' problem with the system hardware.  There are a number of possibilities, chip failure, corrosion of leads, loose connection, leaking capacitor, etc.  The first thing I would do is open it up (perhaps outside?) and get the dust completely blown out.  Then one by one, disconnect and reconnect several times every single cable connection, data and power cables.  Locate any socketed chips (EEPROM's) on the motherboard, but NOT the CPU, and carefully lift each end a little and reseat, a couple of times.  Loosen the retaining screws to any addon cards, and carefully lift and reseat each card.  What you are trying to do is renew a good electrical connection between the pins and leads on all chips and connections, by sliding them in and out a few times.  Often over time, a little corrosion (practically invisible) builds between these electrical contacts.

 

Look for capacitors (round cylinders with flat tops jutting out from the surface of the motherboard), and examine them for damaged tops (compare with the others) or liquid on the tops or around the bottoms.  This unfortunately is essentially fatal.

 

Now you can reboot and check for normal operation.

 

If it still is not working, then you will need to test memory.  Most of us can hardly even remember an unRAID version that did not come with the memory test built into the boot menu.  At some point, you will need to upgrade to the latest unRAID version, see the Upgrading from unRAID version 3 wiki page.  You can't do that now though, but once the system is working correctly again, I strongly recommend it.

 

I've attached the current copy of the memtest program in memtest.zip, but I don't know if it will work for you or not, as you have a much older kernel.  You can try it by extracting it to your flash drive, opening a Telnet session, making it executable by "chmod +x /boot/memtest", then running memtest.

 

Another thing you can try is removing all but one memory card, trying each one, one at a time.  That may help identify a memory card that has failed.  After that, you may have to start removing addon cards, until you can get the system to boot and continue running.

Link to comment

RobJ,

Thank you for your prompt response and thorough answer.  I will attempt your advice although it will likely take me some time.  On a related note, if I am unbale to get it running for a long enough time to transfer data from it, what would be the best method (actually make that simplest method) for moving data to a new system?  My easiest would be to remove the drives, insert them in an external drive case (USB) , hook to another system (Windows XP based) and copy via network to another UnRaid system.  Is this possible?  If so, can I do so booting through Windows XP and mounting the Reiser volume somehow?  Thanks!

Link to comment

I took the UnRaid apart, vacuumed it, blew out the dust, reseat each connection, etc.  It booted but isn't stable as it worked for about 30 minutes while copying some data off it and then crashed again.  Next step will be to test the memory (hopefully the attached program will work).  After that I will proceed with removing each memory stick, etc.

 

Again, I hope this works but if not, I would appreciate some guidance on success/instructions of removing each drive, putting in an external USB attached case, hooking up to a Windows XP box and moving the data off that way.

Link to comment

I copied the memtest file to my flash drive but when I execute the commands (chmod...) I receive the error "Too few arguments".  I have used the ls command and don't see the memtest file anywhere.  It only shows "Samba", "Stop" and "Cmd".  Likely my lack of Linux knowledge.  I have researched but can't come up with the right answer.  Please help!

Link to comment

I can't even get the system to boot now.  My initial thoughts/questions are:

 

1) Perhaps it is my USB Key/Drive as I know some of the initial ones failed and Tom replaced them.  I have plenty of them but need to know how to transfer data from the existing key to the new key.  Is it simply a drap and drop?

 

2) I have tried several methods/programs to put the old drives in an external case and access with Windows but no luck.  This is critical for me.  I am beginning to panic a bit more as I get that the data is intact but I have no way to access it. 

 

I appreciate all of the help thus far.  I was an early adopter of UnRaid and am now concerned about how to handle the equipment as it nears the end of its life.  There needs to be a simple way to get the data off the drives when the server (not the drives) fail and on to another system.  I hope someone can help me (Tom?).

 

Thanks again.

Link to comment

It certainly "feels" like you are having some type of hardware problem.  It could be a bad memory chip, a bad USB stick, even the battery on the MB could be bad.  In order to help diagnose, you will need to be specific about the symptoms you are seeing.  "I can't get the system to boot now" could mean anything from the system won't power on to it is hanging while processing the bzroot or bzimage files to hanging in the boot process.  Were you ever able to run the memory test?  "Old age" is not a reason for the system to fail, but it could certainly be a bad motherboard.  The problem is that the only way to diagnose a bad motherboard is to eliminate all other possibilities - and if the problem persists - it must be the MB.

 

The IDE format is slowly dieing, and you hint that you may be looking to upgrade the entire system to a more modern motherboard and SATA drives.  If that is what you want to do (as opposed to trying to figure out what's gone wrong with the old system), the first steps would be to decide, procure, and setup the new system.  Not sure if you'd prefer to DIY or buy another preconfigured system from Tom.  Either way, you would need to get that setup.  You likely will not be using any of your IDE drives in that system, so you would need space >= to what you are using in your existing array.  The act of copying data off of your old disks onto the new array will not be difficult, especially if the new system has one IDE port (as many do).  But you have a bit of work in front of you before being ready for that step.

 

Tom does not monitor the forums regularly, and appeals for his help here normally go unanswered.  If you need Tom's help, I'd recommend sending him an email.

 

One final comment.  Version 3 is old.  I started using unRAID at v 4.2.1.  I have zero experience with the 3 series and am hesitant to advise you.  I did not even know that memtest was not a boot option (it is on all versions I have used).  Sorry to cop out, but you likely need help from Joe L., RobJ, Tom , or some other long time user to try and figure out what is going on with the old system.  I would NOT suggest upgrading until you get the old version working.

Link to comment

I can't even get the system to boot now.  My initial thoughts/questions are:

Can you provide more detail?  I can think of a number of situations that would be called a failure to boot, but are actually quite different.  Does the machine turn on?  Does it turn on then turn off?  Do you get video?  What is the last error message on the screen?  Have you tried removing memory cards?  You want to determine if it is a problem with a particular memory card, or the mother board, or a failing power supply, or failing cooling, such as a fan that has failed, particularly the CPU fan.

 

Edit:  Brian beat me, with some of the same points.

 

1) Perhaps it is my USB Key/Drive as I know some of the initial ones failed and Tom replaced them.  I have plenty of them but need to know how to transfer data from the existing key to the new key.  Is it simply a drap and drop?

Considering the error messages you reported in the first message, it is extremely unlikely that is has anything to do with the flash drive.  But we need more detail as to what you see when it does not boot.

 

2) I have tried several methods/programs to put the old drives in an external case and access with Windows but no luck.  This is critical for me.  I am beginning to panic a bit more as I get that the data is intact but I have no way to access it.

There needs to be a simple way to get the data off the drives when the server (not the drives) fail and on to another system.

Did you try to use the Windows Reiser driver mentioned in the link I gave?  What didn't work?  There is no need to panic, as the data IS safe, and there are multiple ways to access it, especially since you mentioned you have another unRAID system.  Plus, you can easily create another temporary unRAID server using the free version of unRAID.  You can boot any working station temporarily as an unRAID server from any unRAID flash drive, so long as it can boot from USB.

 

My own preference would be to either access the drives in that USB enclosure you mentioned from your other unRAID system using UnMENU, or prepare another bootable flash drive with the latest unRAID on it, and install one or 2 of your drives that you want to access into any working machine.  Then assign one or both drives (don't assign anything to the parity slot), give it a unique machine name and optionally a static IP, and you can now access these drives across the network.

 

The FAQ entry I gave ("How do I recover data from an unRAID disk?"), mentions 3 ways to access your data.  If this was not helpful, and we are continually trying to make it more helpful, could you indicate what kind of changes would have made it more useful to you?  The How-To's wiki page has additional 'How To' pages, that may be helpful in accessing drives, whether attached to a USB port, or temporarily installed in another machine.

 

I appreciate all of the help thus far.  I was an early adopter of UnRaid and am now concerned about how to handle the equipment as it nears the end of its life.

Before you can make any decisions, you need to determine what has failed first.  Then you will know whether you are buying replacement memory cards or a power supply, or need to begin looking for a replacement motherboard, either the same one from Lime Technology or eBay, or a new one with a new CPU and RAM.  After replacing the failed component, whatever it is, you will have the same array working again, still intact.

Link to comment

All,

Sorry about the delayed responses as work took over this week.  I had several issues when trying to boot up.  First, I  removed a memory stick and tried to boot.  The first time and error came up which kept repeating on the screen regarding some "alloc" or "allocation" error.  I removed that stick replaced with the other stick and then when it booted it mounted all the drives then remounted the drives but got stuck while trying to "remount" the second drive.  I left it churning for over an hour but never could access the web Gui from a different pc.  I then replaced both sticks and tried to boot up and didn't receive any video output on the attached monitor although the system powered up and lights were flashing on the drives, etc.  I left it for a day and know I get video now on the screen, it boots and mounts the drives, proceeds to remount the drives (not sure why the remounting is occurring) and then I receive a similar error to what was orginally reported ("Bank 0:" followed by a bunch of numbers).  I can't seem to ever get a full successful boot of the array anymore (can't access the webGUI or drives).  Again, I have another unRAID which is SATA-based and I don't mind getting extra disk space to move it over.  Alternatively, if the issue is RAM, I don't mind buying some new RAM and installing it (although I need to figure out what I need to purchase (my system is the MD1200 and I am sure the specs were somewhere in the original AVS thread).  I don't want to deal with a new mother board though. 

 

I haven't yet tried to boot a different PC with a unRAID OS and use unmenu to transfer files off each of my 11 drives to the new server.  Is this the easiest method?  Also, should I download and use a trial version on a new USB key when doing so as I don't want to mess up my current reason (although not sure that makes sense or matters since it already is messed up!).

 

As always, I appreciate the time of everyone that responds to this thread.

 

Sincerely,

Scott

Link to comment
I then replaced both sticks ...

... if the issue is RAM, I don't mind buying some new RAM and installing it (although I need to figure out what I need to purchase ...

Sorry, you appeared to identify a defective stick, then replaced both, but reported later that you weren't sure what to buy to replace them.  I wasn't sure what you meant when you replaced the sticks the first time.

 

It really does sound like bad memory, and even very slightly flaky memory MUST be replaced.  Nothing about the system can be trusted, if you have any bad memory at all installed.

 

Also, should I download and use a trial version on a new USB key when doing so as I don't want to mess up my current ...

Creating different USB unRAID flashes won't mess up anything.  The entire unRAID configuration is kept on the flash drive, not anywhere on the server itself.  You actually COULD keep multiple arrays on a server, by using multiple flash drives, although you would need to be very careful.  For example, you could assign drives A through J to a pro system, and drives K, L, and M to a basic system, and one or the other array would be available, depending on which flash you booted with.  You could also create a read-only array of 2 drives, without a parity drive, of drives already assigned to a different array, but you would want to be very careful not to ever write to those 2 drives, or you would be causing problems to the parity info of the larger array.  But at least it would give you full read-only network access to these 2 drives, and you could, on the fly, change which 2 drives were currently assigned, by stopping the array, adjusting the drive assignments, then re-starting the array.

 

That idea of course does not apply to you, as you first need a working system.  But you can plug the working and temporary flash drive into any working system to try it out, test for compatibility, run the memtest, etc.  Just do NOT assign any drives to an unRAID array, unless they are already unRAID drives (or are meant to be).  Don't even test or play adding other system's drives to an unRAID array!

Link to comment

I got YAReG to work with rfstool which is enabling me to move the contents via an external USB drive and a Windows PC to my new unRAID SATA Machine.  It was locking up frequently but I uninstalled some other software that I was trying before YAReG which seemed to solve that issue.  It certainly is going to be slow to move the data over to the new server (over 5 TB) but it will be easier in the end.  Thanks for your help!

Link to comment
  • 4 years later...

Hello guys,

 

I really need help and the problems listed above seems quite similar. I am using version 5.0 and everything has been working just fine for quite awhile now.

 

The system still boots up but if I were to copy or write any information into the NAS, it would suddenly just hang in the midst of the transfer process till a hard reset. I've taken the computer out and blew out all the dust since is has been with me for quite awhile. I've also checked if the PSU has enough power for the system.

 

The last possibility of this problem was that I was clearing out the NAS afew days ago and I saw some hidden files and figured it was what the apple computers left behind and did a quick select and delete without thinking. Is there anything placed on the hard drives that I could delete that would render a problem like this?

 

Thanks!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.