May 8, 200818 yr Hi, I just added another drive to my MD1500. I used the eSata port at the back of the machine. After adding the drive on the devices page the GUI cvame up with this error. Some restarts later nothing changed. Need help immediate. Many thanks in advance. Harald
May 8, 200818 yr Looks to me as if it thinks one of your drives was changed. (It is showing you both the old model/serial number and the new for that slot) I don't see any difference, but it thinks there is. So, un-assign the new drive (the one in the last slot is the new one, right?) then try starting the array. It should start. It might want to rebuild the disk it thinks has changed. Once it is back on-line, and your array has all green lights... First, do a file-system check on each of your data drives. If they are all good, great. Then ... upgrade to the newest 4.3-beta release.. (you only need to copy two files from the distribution you download and unzip to the flash drive. You only need to copy bzroot and bzimage. You are probably fighting an old bug that was since fixed. You should have a much easier time upgrading in 4.3 Joe L.
May 8, 200818 yr One tip, for capturing a web screen, temporarily drag the right browser border in, to make a much more compact picture, and smaller file. No need to squeeze the pic too much, but there is usually a lot of wasted horizontal space that can be squeezed out, and it is easier to manage for users with browsers in small screens. Then do your screen grab (Alt-PrntScrn), and drag the border back out where you like it.
May 8, 200818 yr Author Hi, thanks for your answers. 1.) Yes, the last drive was the one I added. 2.) After removing the drive and restarting the machine - everything was back again and green. 3.) Re-attaching the drive and looking at the BIOS scrolling I recognized the new drive printed in red in the JMicron list. I do not understand why the drive is marked red but recognized by unRAID. Hmm, as I removed the drive again I think I need to reboot again to catch the logs. 4.) Yes, the attached picture was to big. I was shaking after the red dot and I couldn't install my favorite ImageTool - it was stored on the failing unRAID array 8-( Thanks. Harald
May 8, 200818 yr Hi, thanks for your answers. 1.) Yes, the last drive was the one I added. 2.) After removing the drive and restarting the machine - everything was back again and green. 3.) Re-attaching the drive and looking at the BIOS scrolling I recognized the new drive printed in red in the JMicron list. I do not understand why the drive is marked red but recognized by unRAID. Hmm, as I removed the drive again I think I need to reboot again to catch the logs. 4.) Yes, the attached picture was to big. I was shaking after the red dot and I couldn't install my favorite ImageTool - it was stored on the failing unRAID array 8-( Thanks. Harald Un-assigning the new drive, once it has been assigned and known to the array, does not remove it from the array, it is exactly (logically) the same as if it had failed and was no longer available. If unRAID has a red indicator on the new drive, it thinks the new drive has failed. If the only drive with a red indicator is the new drive, and all others are working and have green indicators, you can un-assign it and use the "Restore" button to save a new configuration and generate new parity with the currently assigned and working drives. Never use the "Restore" button if a drive has failed and you want to re-construct its data on a replacement drive. It will instead throw away the data, and calculate parity without the failed drive. The Only time I can think of you should use the "Restore" button is to remove a drive from the configuration and have unRAID forget it ever existed. If you still have questions, it is time to attach a syslog to your next post. Instructions on how to capture one are in the wiki. If your array is all green, please upgrade to the latest version of unRAID. I know Tom re-wrote some of the logic you are dealing with to make parity-swap work properly somewhere between 4.2 and 4.3. Joe L.
May 9, 200818 yr Author Hi, thanks for your help so far. I just collected additional info: 1.) Shutdown system - the new drive is not attached to the eSATA port. 2.) Upgraded to 4.3.beta-6. 3.) Restart everything's green. 4.) Shutdown again - reattach the new drive to the back port of the machine. 5.) After reboot I see the screen in the first attached picture. You will notice that this time a different existing drive went red. In my first post of this thread you will find a picture that looks similar but a different drive went red. Yes, the blue one is the new drive. 6.) Shutdown from console, remove new drive, restart and everything is back green again. I will attach the syslog to this post. Many thanks in advance Harald
May 9, 200818 yr If everything is back green again on the management console, that is good news. It does appear as if the status is still confused somewhat when I look at the syslog you attached. Assuming the syslog is from the most recent re-start of the server, it still thinks you have a disk you have swapped out for another, and a second that is new: "md3 wrong" and "md15 new disk" . May 9 07:56:23 Tower emhttp: shcmd (26): /usr/sbin/nmbd -D May 9 07:56:23 Tower emhttp: shcmd (27): /usr/sbin/smbd -D May 9 07:56:25 Tower kernel: e1000: eth0: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX May 9 07:56:25 Tower ifplugd(eth0)[1945]: Link beat detected. May 9 07:56:26 Tower ifplugd(eth0)[1945]: Executing '/etc/ifplugd/ifplugd.action eth0 up'. May 9 07:56:26 Tower ifplugd(eth0)[1945]: Program executed successfully. May 9 07:57:26 Tower kernel: md3: wrong May 9 07:57:26 Tower kernel: md15: new disk May 9 08:00:53 Tower kernel: md3: wrong May 9 08:00:53 Tower kernel: md15: new disk May 9 08:02:56 Tower login[2051]: ROOT LOGIN on `tty1' The only way to get thinks back in sync is to use the button I said you should never use... the "Restore" button. In this situation, we want to have the unRAID server save a new configuration, with the disks as currently assigned. It will immediately rebuild parity, based on the currently assigned disks. This is why you should never use the "restore" button if a disk has failed, since it destroys parity data you would need to recover a failed drive. Just to be sure of no other issues, you might want to also perform a file-system check before you stop the array and use the restore button. The procedure is described here: http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems When you run "reiserfsck" you must respond to its prompt with "Yes" (Capital "Y", lower case "es") for it to proceed. You only need to run reiserfsck on your data disks. The parity drive does not have a file-system on it. It just has parity bits. Don't run reiserfsck on the parity drive. You do want to run reiserfsck on the data disks ... /dev/md1 through /dev/md15 (although I'm not sure if disk 15 was ever formatted yet) If it has never been formatted, don't expect it to pass the file-system check. Once you run the checks, stop the array and use the "Restore" button. It should get you to where no disks are reported as bad or wrong in the syslog. Joe L.
May 9, 200818 yr I note that md3 is sdq, the very last drive identified, and also the very first drive in the Device Inventory, of all hard drives. Also, this is md15 that is being added, the very last possible unRAID array data drive, and the 16th disk in the Device Inventory. Since the syslog shows no apparent reason for calling md3 wrong, and the 2 drives are at the beginning or end of 3 different lists, I suspect that a loop test is off by one. A side note, this is a perfect example of a hardware change that modifies all of the other device ID's. The new drive, added as Disk 15, has been assigned sda, which has probably shifted every single device ID.
May 9, 200818 yr I note that md3 is sdq, the very last drive identified, and also the very first drive in the Device Inventory, of all hard drives. Also, this is md15 that is being added, the very last possible unRAID array data drive, and the 16th disk in the Device Inventory. Since the syslog shows no apparent reason for calling md3 wrong, and the 2 drives are at the beginning or end of 3 different lists, I suspect that a loop test is off by one. A side note, this is a perfect example of a hardware change that modifies all of the other device ID's. The new drive, added as Disk 15, has been assigned sda, which has probably shifted every single device ID. I'll bet you are right... Robj, You might drop Tom a PM with a link to this thread, describing your suspicions. The syslog really helps.. Tom can probably address the issue quickly. In the interim, the "Restore" button will throw away the old disk configuration and rebuild parity and should get hawihoney back running again. Joe L.
May 11, 200818 yr Author RobJ & Joe L., many thanks for your support. Because of the last two articles in my thread I fear to do what you wrote. If there's a loop error and an array (w/o parity) might only recognize 14 drives instead of 15 the Restore button wouldn't help me. If this is the case I will end up with a possible missing drive in parity that contains real data. Should I call Restore or not - I'm not really sure if both of you have different meaning about this ... @JoeL: Copying syslog was the last step I did after the last re-boot (with the new drive attached). So this is the syslog that must contain the error. I note that md3 is sdq, the very last drive identified, and also the very first drive in the Device Inventory, of all hard drives. Also, this is md15 that is being added, the very last possible unRAID array data drive, and the 16th disk in the Device Inventory. Since the syslog shows no apparent reason for calling md3 wrong, and the 2 drives are at the beginning or end of 3 different lists, I suspect that a loop test is off by one. A side note, this is a perfect example of a hardware change that modifies all of the other device ID's. The new drive, added as Disk 15, has been assigned sda, which has probably shifted every single device ID. I'll bet you are right... Robj, You might drop Tom a PM with a link to this thread, describing your suspicions. The syslog really helps.. Tom can probably address the issue quickly. In the interim, the "Restore" button will throw away the old disk configuration and rebuild parity and should get hawihoney back running again. Joe L. It would help me a lot if the author of unRAID would explain his meaning ;-) Regards Harald
May 11, 200818 yr Harald - With so many drives in your array, your configuration is a bit more complex than most. Not a bad thing, but with the huge variety in MBs and addon controllers, you are more likely to run into small problems than a person with 6 drives. That being said, I don't see any problem with doing as Joe/Rob are suggesting, and hitting the restore button in your particular situation. Everyone should know, however, that the restore button is poorly labeled. Pressing it resets your parity data, so if you have a failed drive, pressing it will take you from a recoverable situation into a nonrecoverable situation in an instant. I don't think you are having any disk failures, so this is not a huge concern for you. (Although failures can happen at any time, so no guarantees). For that reason, I always run a full parity check before dismantling my box to make a change that is going to require hitting restore (like removing a disk from the array). This shortens the timespan for disk failure to only a few hours - a highly improbable situation. I believe that the "loop error" being referred to as a potential small bug in unRAID, not a data problem at all. If you are able to boot unRAID, configure all your drives using the devices page, and see everything on the main page, I think you'd be okay to hit restore and build parity. unRAID should not omit any drives from the parity calculation if you can get to that point. Normal precautions apply - if anything looks unusual on the main page, like drives are showing unformatted that you think are formatted, DO NOT START THE ARRAY. Good luck!
May 11, 200818 yr Harald, One more request... to give Tom some more clues as to what is happening, before you press the "Restore" button, can you at a telnet prompt type /root/mdcmd status >/boot/mdcmd.txt Then zip it up and attach it to this thread. It is how the internals of unRAID see your current configuration. Joe L.
May 17, 200818 yr Author Harald, One more request... to give Tom some more clues as to what is happening, before you press the "Restore" button, can you at a telnet prompt type /root/mdcmd status >/boot/mdcmd.txt Then zip it up and attach it to this thread. It is how the internals of unRAID see your current configuration. Joe L. It took some time to come back again but after these error messages I first started to save all my important data. This bug made me real nervous. Here we go: Attached you'll find a fresh sample of images and data that I took just a minute ago. I stopped my array, attached a new drive to the esata port of my Lime machine and restartet. The new drive is the last one in the list. As you can see my drives are scrambled and at one position two drives are shown. In the zip is the syslog and the mdcmd output you asked for. Many thanks in advance. Harald
May 17, 200818 yr 11.75 TB array (+1 TB parity). Nice! If it were me, I'd try this ... With the array stopped and the new drive in place, go to the devices page and unassign disk3. Then go back to main (slot 3 should be empty). (Do not start the array!) Then go back to the devices page and reassign disk3 to the same drive. Then go back to main. Hopefully disk3 will be green. I am hoping doing this unassign and reassign will make unRAID realize that the disk assigned to slot 3 is the right one. That seems to be the problem. My guess is that when you added the final disk, the BIOS scrambled the drive assignments and confused unRAID. If this works, then at that point unRAID will allow you to proceed with the expansion. The other option is to hit restore and rebuild parity (as described above). Parity will be destroyed the second you hit the restore button (don't let the name fool anyone). But then you should be able to format your new disk and unRAID will rebuild parity from scratch with all your drives.
May 20, 200818 yr Author Hi, thanks for your answer: If it were me, I'd try this ... With the array stopped and the new drive in place, go to the devices page and unassign disk3. Then go back to main (slot 3 should be empty). (Do not start the array!) Then go back to the devices page and reassign disk3 to the same drive. Then go back to main. Hopefully disk3 will be green. I am hoping doing this unassign and reassign will make unRAID realize that the disk assigned to slot 3 is the right one. That seems to be the problem. My guess is that when you added the final disk, the BIOS scrambled the drive assignments and confused unRAID. After unassigning the drive 3 (the red one) unRAID reported a missing drive. After that I re-assigned the drive and unRAID showed the same picture as shown in this thread (drive 3 red, new drive blue). It didn't help. Another try: Un-assigned the new drive. This one was really weired. The existing drive 3 went blue and unRAID offered an "upgrading disk". I went to the Lime-Tower and did a hard shutdown. After reboot everything was green again. So I started a parity check to be on the save side - it's running yet. Currently it looks to me as if the MD1500 tower from Lime can't add the drive at the eSATA port to the array if all internal drives are in use. I'm in contact with Lime - perhaps they do find a solution. Thanks Harald
June 23, 200818 yr Currently it looks to me as if the MD1500 tower from Lime can't add the drive at the eSATA port to the array if all internal drives are in use. Finally found the cause of this problem & fix is in 4.3.2.
Archived
This topic is now archived and is closed to further replies.